How should CSV2JSON behave when generating objects #755

6a6d74 · Oct 14, 2015

In response to comments from timeless (see ISSUE #679) the CSV2JSON conversion process for Generating Objects has been updated. Regarding how the information from multiple cells in the row being currently processed that refer to the same subject i, the final step (no. 2.4) now says:

If name N occurs more than once within object Si, the name-value pairs from each occurrence of name N must be compacted to form a single name-value pair with name N and whose value is an array containing all values from each of those name-value pairs. Where values from the contributing name-value pairs are of type array, the value of the resulting compacted name-value pair is an array of arrays.

This situation occurs when multiple columns in the table use the same property URL annotation.

For CSV2RDF, we think the process is clear:

if the two contributing cells have content that is unordered then the property URL (used as predicate in each triple) is repeated ... effectively the cell's information is flattened
if the contributing cells contain ordered lists of things (like authors for a journal paper), then the RDF will treat the ordered list as an RDF list. The integrity of each ordered list is maintained. There is no attempt to merge lists.

For CSV2JSON, the behaviour is a little different - in JSON, arrays are always ordered. Currently, the doc says that the arrays from the contributing cells should be kept separate; forming an array of arrays. A better behaviour for users is likely to be providing a single flattened array.

This needs to be considered further.

What ever the outcome, we need a test case to validate this edge case.

iherman · Oct 15, 2015

On 14 Oct 2015, at 17:01 , Jeremy Tandy notifications@github.com wrote:

In response to comments from timeless (see ISSUE #679 #679) the CSV2JSON conversion process for Generating Objects https://w3c.github.io/csvw/csv2json/#gen-subj-cells has been updated. Regarding how the information from multiple cells in the row being currently processed that refer to the same subject i, the final step (no. 2.4) now says:

If name N occurs more than once within object Si, the name-value pairs from each occurrence of name N must be compacted to form a single name-value pair with name N and whose value is an array containing all values from each of those name-value pairs. Where values from the contributing name-value pairs are of type array, the value of the resulting compacted name-value pair is an array of arrays.

This situation occurs when multiple columns in the table use the same property URL annotation.

For CSV2RDF, we think the process is clear:

if the two contributing cells have content that is unordered then the property URL (used as predicate in each triple) is repeated ... effectively the cell's information is flattened
if the contributing cells contain ordered lists of things (like authors for a journal paper), then the RDF will treat the ordered list as an RDF list. The integrity of each ordered list is maintained. There is no attempt to merge lists.
For CSV2JSON, the behaviour is a little different - in JSON, arrays are always ordered. Currently, the doc says that the arrays from the contributing cells should be kept separate; forming an array of arrays. A better behaviour for users is likely to be providing a single flattened array.

I am not sure. There may be good reasons for the user to keep that arrays and I am always a bit afraid of trying to guess the user's intention. I would think we should leave it as is...

This needs to be considered further.

What ever the outcome, we need a test case to validate this edge case.

+1

gkellogg · Oct 17, 2015

I created some tests for this, and in the process looked at my implementation further. I disagree with @6a6d74's proposed resolution and think that, for JSON, values should be merged into a single array. This is a fairly artificial example (multiple columns with the same propertyUrl and multiple values per column), but it is IMHO, the simplest implementation and a more natural consequence. It would be odd if adding a single column to a CSV were to make values which were simply in an array, to now be an array of arrays. The purpose of using the same propertyUrl is to merge values from different columns, (again, IMO), so having the values be merged together makes the most sense to me.

6a6d74 · Oct 17, 2015

@gkellogg - thanks for progressing this. I'm neutral for either approach.
If we get consensus I will change the doc to support flattening into a
single array rather than creating an array of arrays. Waiting for more +1s
...
On Sat, 17 Oct 2015 at 21:21, Gregg Kellogg notifications@github.com
wrote:

I created some tests for this, and in the process looked at my
implementation further. I disagree with @6a6d74
https://github.com/6a6d74's proposed resolution and think that, for
JSON, values should be merged into a single array. This is a fairly
artificial example (multiple columns with the same propertyUrl and multiple
values per column), but it is IMHO, the simplest implementation and a more
natural consequence. It would be odd if adding a single column to a CSV
were to make values which were simply in an array, to now be an array of
arrays. The purpose of using the same propertyUrl is to merge values from
different columns, (again, IMO), so having the values be merged together
makes the most sense to me.

—
Reply to this email directly or view it on GitHub
#755 (comment).

gkellogg · Oct 17, 2015

@JeniT's implementation should inform this too.

iherman · Oct 18, 2015

@gkellogg,

I created some tests for this, and in the process looked at my implementation further. I disagree with @6a6d74's proposed resolution and think that, for JSON, values should be merged into a single array. This is a fairly artificial example (multiple columns with the same propertyUrl and multiple values per column), but it is IMHO, the simplest implementation and a more natural consequence. It would be odd if adding a single column to a CSV were to make values which were simply in an array, to now be an array of arrays. The purpose of using the same propertyUrl is to merge values from different columns, (again, IMO), so having the values be merged together makes the most sense to me.

My apologies if I appear as defening the process orthodoxy here, but I guess it is my role... The specification extract that @6a6d74 quoted seems to be pretty much unequivocal: the result is an array of arrays. What you propose means to make a technical change in the document in a CR. Not because there is a problem with implementations (that is obviously not the case) but for other reasons. Although the changes are minimal, it still is a red flag that we have to justify more seriously than just "would be odd", otherwise we can get into a push back situation. Is it worth it? I do not think so, hence my proposal to leave it as is.

gkellogg · Oct 18, 2015

That extract is Jeremy's proposed change. I believe my interpretation is consistent with the published version:

If name N occurs more than once within object Si, the name-value pairs from each occurrence of name N must be compacted to form a single name-value pair with name N and whose value is an array containing all values from each of those name-value pairs.

iherman · Oct 18, 2015

Oops, my apologies! I misread and did not check.

I agree with you.

On 18 Oct 2015, at 19:50, Gregg Kellogg notifications@github.com wrote:

That extract is Jeremy's proposed change. I believe my interpretation is consistent with the published version:

If name N occurs more than once within object Si, the name-value pairs from each occurrence of name N must be compacted to form a single name-value pair with name N and whose value is an array containing all values from each of those name-value pairs.

—
Reply to this email directly or view it on GitHub.

JeniT · Oct 19, 2015

I agree with generating an array of individual values (not an array of arrays) when converting to JSON.

6a6d74 · Oct 21, 2015

I will amend the CSV2JSON doc to support the recommendation of @gkellogg / @JeniT and add a note in the change-set appendix.

6a6d74 · Oct 21, 2015

Closed. See PR #760

gkellogg added a commit that referenced this issue Oct 17, 2015

gkellogg Tests for multiple columns using same propertyUrl and multiple ordere…
…d/unordered values. For #755.
13c884c

gkellogg referenced this issue Oct 17, 2015
Merged
Tests for multiple columns using same propertyUrl and multiple ordere… #757

6a6d74 referenced this issue Oct 21, 2015
Merged
amended text wrt combining lists of values #760

6a6d74 closed this Oct 21, 2015

w3c/csvw

How should CSV2JSON behave when generating objects #755

6a6d74 commented Oct 14, 2015

iherman commented Oct 15, 2015

gkellogg added a commit that referenced this issue Oct 17, 2015

gkellogg referenced this issue Oct 17, 2015

Tests for multiple columns using same propertyUrl and multiple ordere… #757

gkellogg commented Oct 17, 2015

6a6d74 commented Oct 17, 2015

gkellogg commented Oct 17, 2015

iherman commented Oct 18, 2015

gkellogg commented Oct 18, 2015

iherman commented Oct 18, 2015

JeniT commented Oct 19, 2015

6a6d74 commented Oct 21, 2015

6a6d74 referenced this issue Oct 21, 2015

amended text wrt combining lists of values #760

6a6d74 commented Oct 21, 2015

6a6d74 closed this Oct 21, 2015

w3c/csvw

Join GitHub today

How should CSV2JSON behave when generating objects #755

Comments

6a6d74 commented Oct 14, 2015

iherman commented Oct 15, 2015

gkellogg added a commit that referenced this issue Oct 17, 2015

gkellogg referenced this issue Oct 17, 2015

Tests for multiple columns using same propertyUrl and multiple ordere… #757

gkellogg commented Oct 17, 2015

6a6d74 commented Oct 17, 2015

gkellogg commented Oct 17, 2015

iherman commented Oct 18, 2015

gkellogg commented Oct 18, 2015

iherman commented Oct 18, 2015

JeniT commented Oct 19, 2015

6a6d74 commented Oct 21, 2015

6a6d74 referenced this issue Oct 21, 2015

amended text wrt combining lists of values #760

6a6d74 commented Oct 21, 2015

6a6d74 closed this Oct 21, 2015