CARVIEW |
Metadata Vocabulary for Tabular Data
W3C Working Draft
- This version:
- https://www.w3.org/TR/2015/WD-tabular-metadata-20150108/
- Latest published version:
- https://www.w3.org/TR/tabular-metadata/
- Latest editor's draft:
- https://w3c.github.io/csvw/metadata/
- Previous version:
- https://www.w3.org/TR/2014/WD-tabular-metadata-20140710/
- Editors:
- Rufus Pollock, Open Knowledge
- Jeni Tennison, Open Data Institute
- Repository:
- We are on Github
- File a bug
- Changes:
- Diff to previous version
- Commit history
Copyright © 2015 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved. W3C liability, trademark and document use rules apply.
Abstract
Validation, conversion, display and search of tabular data on the web requires additional metadata that describes how the data should be interpreted. This document defines a vocabulary for metadata that annotates tabular data. This can be used to provide metadata at various levels, from collections of data from CSV documents and how they relate to each other down to individual cells within a table.
Status of This Document
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
The CSV on the Web Working Group was chartered to produce a Recommendation "Access methods for CSV Metadata" as well as Recommendations for "Metadata vocabulary for CSV data" and "Mapping mechanism to transforming CSV into various Formats (e.g., RDF, JSON, or XML)". This document aims to primarily satisfy the second of those Recommendations.
This document was published by the CSV on the Web Working Group as a Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to public-csv-wg@w3.org (subscribe, archives). All comments are welcome.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 1 August 2014 W3C Process Document.
Table of Contents
- 1. Introduction
- 2. Annotating Tables
- 3. Metadata Format
- 4. Processing Tables
- A. Acknowledgements
- B. IANA Considerations
- C. Security Considerations
- D. JSON-LD Context
- E. References
1. Introduction
Interpreting tabular data that is available on the web, particularly as CSV, usually requires additional metadata. As an example, say that the following CSV file were available at https://example.org/tree-ops.csv
GID,On Street,Species,Trim Cycle,Inventory Date 1,ADDISON AV,Celtis australis,Large Tree Routine Prune,10/18/2010 2,EMERSON ST,Liquidambar styraciflua,Large Tree Routine Prune,6/2/2010 3,EMERSON ST,Liquidambar styraciflua,Large Tree Routine Prune,6/2/2010
A human consumer of this data might be able to figure out the meaning of the different columns, particularly if there were some additional human-readable documentation made available. Automated processors would have a much harder time; realistically they would be limited to displaying the information in a table. Making available machine-readable metadata helps with the interpretation of the tabular data. For example, say that the following metadata file were available at https://example.org/trees-ops.csv-metadata.json
:
{ "@id": "tree-ops.csv", "@context": ["https://www.w3.org/ns/csvw", {"@language": "en"}], "dc:title": "Tree Operations", "dc:keywords": ["tree", "street", "maintenance"], "dc:publisher": [{ "sch:name": "Example Municipality", "sch:web": "https://example.org" }], "dc:license": "https://opendefinition.org/licenses/cc-by/", "dc:modified": "2010-12-31", "schema": { "columns": [{ "name": "GID", "title": [ "GID", "Generic Identifier" ], "dc:description": "An identifier for the operation on a tree.", "datatype": "string", "required": true }, { "name": "on-street", "title": "On Street", "dc:description": "The street that the tree is on.", "datatype": "string" }, { "name": "species", "title": "Species", "dc:description": "The species of the tree.", "datatype": "string" }, { "name": "trim-cycle", "title": "Trim Cycle", "dc:description": "The operation performed on the tree.", "datatype": "string" }, { "name": "inventory-date", "title": "Inventory Date", "dc:description": "The date of the operation that was performed.", "datatype": "date", "format": "M/D/YYYY" }], "primaryKey": "GID" } }
Given the location of the CSV file, this metadata document can be located by appending -metadata.json
to the URL (as described in Model for Tabular Data and Metadata on the Web). It provides information for different types of applications:
- Viewers can use the indicated metadata to provide a more user-friendly or human-readable view of the CSV file, which might include displaying it in a table or as graphs or charts.
- Data entry tools can use the metadata to prompt people to supply information that is added to a CSV file
- Validators can check that the labels of the columns in the metadata file match those in the CSV file, that the values in the columns are of the right type and in the right format, and that values in the
GID
column are all present and unique. - Converters can use the metadata to map the CSV data into other formats such as JSON, RDF and XML, or into databases or statistical applications, in intelligent ways.
- Data Aggregators can use the indicated metadata, such as descriptions, titles, modification dates and licences, to enable more intelligent retrieval of relevant data on the web.
The Model for Tabular Data and Metadata on the Web specification defines an Annotated Tabular Data Model in which tables, columns, rows and cells can be annotated with properties and values, and a Grouped Tabular Data Model in which a group of tables is annotated. That specification also describes how to locate metadata about a given CSV file.
This document defines the format and structure of metadata documents, and how these are interpreted to create an Annotated Tabular Data Model. It also defines how to validate tabular data based on some of these annotations. This metadata can be expressed as an RDF graph. However, all applications that conform to this specification (including validators and applications that read or convert tabular data) MUST read the JSON-based format described in this document.
Metadata documents are [JSON-LD] documents, however the aim is for the documents to be useable without any extra processing. To be valid, a metadata document MUST use a JSON-LD Context, either explicitly via the @context
entry, or through the use of an HTTP Link header (see Interpreting JSON as JSON-LD in [JSON-LD]). The default location for this context is https://www.w3.org/ns/csvw
. CSVW aware processors SHOULD assume a context at this location if one is not provided with the metadata document. We invite comments on the utility of this approach: is it useful for CSV metadata to be interpretable as JSON-LD?
Should JSON-LD keywords be aliased? The sense is to alias @id
as url
and not alias the others. We invite comments on the utility of this approach.
2. Annotating Tables
The metadata defined in this specification is used to annotate an existing annotated table or group of tables, as defined in [tabular-data-model]. Annotated tables form the basis for all further processing, such as validating or displaying the tables. All compliant applications MUST create annotated tables based on the algorithm defined here.
Metadata documents contain descriptions of groups of tables, tables, columns, rows, cells and regions which are used to create annotations on a tabular data model. There are two types of description objects:
- descriptions that appear within a schema, and that may apply across multiple tabular data files — these are used to describe the general structure of a tabular data file
- descriptions of particular groups of tables, tables, columns, rows, cells and regions within a single tabular data file — these are used for notes or flags on particular data
The description objects themselves contain a number of properties. These are:
- properties that are used to identify the table or column that the annotations should appear on; these match up to properties on those objects in the core tabular data model defined in [tabular-data-model] and do not form additional annotations
- properties that map directly to annotations that appear directly on the group of tables, table, column, row or cell whose description they appear on, such as the
name
of a column or thedc:provenance
of a table - properties that are specified on the description of a group of tables, table or column to provide a default value for the equivalent annotation on each of the cells that appear in that table or column
For example, in the column description
{ "name": "inventory-date", "title": "Inventory Date", "dc:description": "The date of the operation that was performed.", "datatype": "date", "format": "M/D/YYYY" }
the properties name
, title
and dc:description
are direct annotations that become name
, title
and dc:description
properties on the column in the data model. The datatype
and format
properties are inherited properties that become datatype
and format
properties on the cells within the column.
2.1 Direct Annotations
Direct annotations are properties on the description object for a given table, column, row or cell which map directly to properties on the described table, column, row or cell. The name of the annotation is the same as the name of the property on the annotation. The value of the annotation is the same as the value of the property on the description object.
2.2 Inherited Properties
A cell may be assigned annotations based on properties on the description objects for the group of tables, table, column or row that it appears in. These properties are known as inherited properties and are listed in section 3.10 Inherited Properties. To ascertain a value for these annotations, an application MUST identify the relevant property in the descriptions of the table or column.
Applications MUST raise an error if the value of a property in a table description is not compatible with the value of that property on the group of tables. Applications MUST raise an error if the value of a property in a column description is not compatible with the value of that property on the table. Applications MUST raise an error if the value of a property on a cell is not compatible with the values of that property on the column that the cell is associated with.
A value for a cell, column or table is compatible with with a value on a column, table or group of tables if they are the same value or if the first value is a sub-value of the second value. The definitions of individual inherited properties indicate what values count as sub-values of others.
3. Metadata Format
This section defines a set of properties and permitted values for annotating tabular data, and how these annotations should be interpreted by applications.
A metadata document is a JSON document which holds an object at the top level. This object is a description object of either a table group or a single table. A description object is a JSON object that describes a component of the tabular data model (a table group, a table, a column, a row or a cell) and has one or more properties are mapped into properties on that component.
3.1 Property Syntax
There are different types of properties on description objects:
- array properties
-
These hold an array of one or more objects, which are usually description objects.
For example, the
resources
property is an array property. A table group description might contain:"resources": [{ "@id": "https://example.org/countries.csv", "schema": "https://example.org/countries.json" }, { "@id": "https://example.org/country_slice.csv", "schema": "https://example.org/country_slice.json" }]
in which case the
resources
property has a value that is an array of two table description objects. - link properties
-
These hold one or more references to other resources by URL. Their values may be:
- strings — resolved as URLs against the base URL
- arrays — lists of strings which are resolved as URLs against the base URL
For example, the
dc:hasVersion
property is a link property. A table description might contain:Example 4"dc:hasVersion": "example-2014-01-03.csv"
in which case the
dc:hasVersion
property on the table would have a single value, a link toexample-2014-01-03.csv
. Alternatively, the metadata document might contain:Example 5"dc:hasVersion": [ "example-2014-01-03.csv", "example-2014-01-17.csv", "example-2014-01-25.csv" ]
in which case the
dc:hasVersion
property on the table would be an array of three values, links to other versions of the table. - URI template properties
-
A URI template property contains a [URI-TEMPLATE] which can be used to generate a URI. These URI templates are expanded in the context of each row by combining the template with a set of variables with values. The variables that are set are:
_row
_row
is set to the row number of the row that is currently being processedWhere does row numbering begin?- column names
- a variable is set for each column within the schema; the name of the variable is the percent-encoded name of the column and the value is the canonical representation of the value of the cell in that column in the row that is currently being processed
For example, the
urlTemplate
property holds a URI template that is used to generate a URL identifier for each row, which might look like:Example 6"urlTemplate": "https://example.org/example.csv#row={_row}"
The identifiers that are generated for the rows would then look like
https://example.org/example.csv#row=1
,https://example.org/example.csv#row=2
and so on.Alternatively, with the CSV and metadata in the section 1. Introduction, the
urlTemplate
might look like:Example 7"urlTemplate": "https://example.org/tree/{on%2Dstreet}/{GID}"
This would generate URIs such as
https://example.org/tree/ADDISON%20AV/1
andhttps://example.org/tree/EMERSON%20ST/2
.Once the URI has been generated, it is resolved against the location of the resource (eg the CSV file) to create an absolute URI. For example, given a
urlTemplate
within a schema such as:"urlTemplate": "#row={_row}"
and given a CSV file at
https://example.com/temp.csv
, the URL for the first row will behttps://example.com/temp.csv#row=1
. - column reference properties
-
These hold one or more references to other column description objects. The referenced description object must have an
name
property. Column reference properties can then reference column description objects through values that are:- strings — which MUST match the
name
on a column description object within the metadata document - arrays — lists of strings as above
For example, the
primaryKey
property is an column reference property on the schema. It has to hold references to columns defined elsewhere in the schema, and the descriptions of those columns must havename
properties. It can hold a single reference, like this:Example 8"schema": { "columns": [{ "name": "GID" }, ... ], "primaryKey": "GID" }
or it can contain an array of references, like this:
Example 9"schema": { "columns": [{ "name": "givenName" }, { "name": "familyName" }, ... ], "primaryKey": [ "givenName", "familyName" ] }
- strings — which MUST match the
- object properties
-
These hold one or more objects or references to objects by URL. Their values may be:
- strings — resolved as URLs against the base URL
- objects — interpreted as structured objects
- arrays — lists of strings and/or objects, interpreted as URLs or structured objects
Object properties are often used when the values can be or should be values within controlled vocabularies, or structured information which may be held elsewhere. For example, the
dc:creator
of a table should be an object property. It could be provided as a URL that indicates the creator, like this:Example 10"dc:creator": "https://ons.gov.uk"
or a structured object, like this:
Example 11"dc:creator": { "sch:name": "Office of National Statistics", "sch:url": "https://ons.gov.uk", "sch:email": "info@ons.gsi.gov.uk" }
or an array of URLs, like this:
Example 12"dc:creator": [ "https://ons.gov.uk", "https://www.gov.uk/government/organisations/department-for-transport" ]
or an array of structured objects:
Example 13"dc:creator": [{ "sch:name": "Office of National Statistics", "sch:url": "https://ons.gov.uk", "sch:email": "info@ons.gsi.gov.uk" }, { "sch:name": "Department for Transport", "sch:url": "https://www.gov.uk/government/organisations/department-for-transport" }]
or an array that mixes URLs and objects:
Example 14"dc:creator": [{ "sch:name": "Office of National Statistics", "sch:url": "https://ons.gov.uk", "sch:email": "info@ons.gsi.gov.uk" }, "https://www.gov.uk/government/organisations/department-for-transport" ]
- natural language properties
-
These hold natural language strings. Their values may be:
- strings — interpreted as natural language strings in the default language
- arrays — interpreted as alternative natural language strings in the default language
- objects whose properties MUST be language codes as defined by [BCP47] and whose values are either strings or arrays, providing natural language strings in that language
Natural language properties are used for things like descriptions and titles. For example, the
title
property provides a natural language label for a column. If it's a plain string like this:Example 15"title": "Project title"
then that string is assumed to be in the language provided through the
@language
property of the nearest@context
(or have no assumed language, if there is no such property). Multiple alternative values can be given in an array:Example 16"title": [ "Project title", "Project" ]
It's also possible to provide multiple values in different languages, using an object structure. For example:
Example 17"title": { "en": "Project title", "fr": "Titre du projet" }
and within such an object, the values of the properties can themselves be arrays:
Example 18"title": { "en": [ "Project title", "Project" ], "fr": "Titre du projet" }
We invite comment on whether it would be useful to enable some markup in natural language strings, for example by stating that they are interpreted as HTML or Markdown.
- atomic properties
-
These hold atomic values. Their values may be:
- numbers — interpreted as integers or doubles
- booleans — interpreted as booleans (
true
orfalse
) - strings — interpreted as defined by the property
- arrays — lists of numbers, booleans or strings
NoteJSON does not have date or time types. Where a property takes a date as a value, this MUST be a string in the format
YYYY-MM-DD
.
3.2 Top-Level Properties
The top-level object (whether it is a table group description or a table description) MAY have a @context
property. This holds an object that provides metadata for interpreting other properties, namely:
@language
-
indicates the default language for the values of properties in the metadata document; if present, its value MUST be a language code [BCP47] which is the default language for the values of other properties in the metadata document
NoteNote that the
@language
property of the@context
object, which gives the default language used within the metadata file, is distinct from thelanguage
property on a description object, which gives the language used in the data within a group of tables, table or column. @base
-
indicates the base URL against which other URLs within the description are resolved; if present, its value MUST be a URL which is resolved against the location of the metadata document to provide the base URL for other URLs in the metadata document; if unspecified, the base URL used for interpreting relative URLs within the metadata document is the location of the metadata document itself
NoteNote that the
@base
property of the@context
object provides the base URL used for URLs within the metadata document, not the URLs that appear within the group of tables or table it describes.
3.2.1 Importing Metadata
The top-level object (whether it is a table group description or a table description) MAY also have an import
property. This is a link property which references one or more other metadata files to be imported into the original metadata file.
If the import
property contains an array, imports are carried out in sequence: the first metadata file referenced is imported into the original metadata file; the second is imported into the result and so on. If a referenced metadata file has already been imported (or was the original metadata file) it is ignored.
If the top-level object of any of the metadata files are table descriptions, these are treated as if they were table group descriptions containing a single table description (ie having a single resource
property whose value is the same as the original table description).
An imported description object B is imported into an original description object A by merging each property of B into A. If the property from B does not exist on A, it is simply added to A. If A does have the property, the way the values are merged depends on the type of the property, as follows:
- If the property is an array property, the way in which values are merged depends on the property; see the relevant property for this definition.
- If the property is a link property, then if the property only accepts single values, the value from A overrides that from B, otherwise the result is an array of links: those from A followed by those from B that were not already a value in A.
- If the property is a URI template property, the value from A overrides that from B.
- If the property is a column reference property, the value from A overrides that from B.
- If the property is an object property, then if the property only accepts single objects:
- if the value of the property in A is a string or the value from B is a string then the value from A overrides that from B
- otherwise (if both values as objects) the objects are merged as described here
- If the property is a natural language property, the result is an object whose properties are language codes and where the values of those properties are arrays. The suitable language code for the values is either explicit within the existing value or determined through the default language in the metadata document; if it can't be determined the language code
und
should be used. The arrays should provide the values from A followed by those from B that were not already a value in A. - If the property is an atomic property, then if the property only accepts single values, the value from A overrides that from B; otherwise the result is an array of values: those from A followed by those from B that were not already a value in A.
If the type of the property cannot be determined, because it is not defined in this specification (ie because it is an extension property), the type of the property is determined based on its values in A and B, as follows, and merged accordingly:
- If the value of the property in A and the value of the property in B are both objects, they are treated as if the property is an object property that only accepts single objects.
- If one of the values is an array and the other is an object, they are treated as if the property is an object property that accepts arrays.
- If the value of the property in A and the value of the property in B are atomic values, they are treated as if the property is an atomic property that only accepts single values.
- If one of the values is an array and the other is an atomic value, they are treated as if the property is an atomic property that accepts arrays.
3.3 Common Properties
Descriptions of groups of tables, tables, schemas, columns, rows and cells MAY contain any properties whose names are either absolute URLs or prefixed names. For example, a table description may contain dc:description
, dcat:keyword
or schema:copyrightHolder
properties to provide a description, keywords or the name of the copyright holder, as defined in Dublin Core Terms, DCAT or schema.org.
The same prefixes are pre-defined as for [rdfa-core] within the RDFa 1.1 Initial Context and MUST NOT be overridden. Properties from other vocabularies MUST be defined using full URLs.
Forbidding the declaration of new prefixes ensures consistent processing between JSON-LD-aware and non-JSON-LD-aware processors.
3.4 Table Groups
A table group description is a JSON object that describes a group of tables.
3.4.1 Required Properties
resources
-
An array property of table descriptions for the tables in the group. When an array of table descriptions B is imported into an original array of table descriptions A, each table description within B is combined into the original array A by:
- if there is a table description with the same
@id
in A, the table description from B is imported into the matching table description in A - otherwise, the table description from B is appended to the array of table descriptions A
- if there is a table description with the same
3.4.2 Optional Properties
The description of a group of tables MAY also contain:
schema
- An object property that provides a single schema description as described in section 3.8 Schemas, for all the tables in the group. This may be provided as an embedded object within the JSON metadata or as a URL reference to a separate JSON schema document.
table-direction
-
An atomic property that MUST have a single string value that is one of
"rtl"
,"ltr"
or"default"
. Indicates whether the tables in the group should be displayed with the first column on the right, on the left, or based on the first character in the table that has a specific direction. See section 4.1.1 Bidirectional Tables for more details.This should be a defined controlled vocabulary in JSON-LD, so that the values map on to URIs in the RDF version rather than strings. We invite comment on how to configure the JSON-LD context to enable these values to be interpreted in this way.
dialect
-
An object property that provides a single dialect description. If provided,
dialect
provides hints to processors about how to parse the referenced files for to create tabular data models for the tables in the group. This may be provided as an embedded object or as a URL reference. See section 3.6 Dialect Descriptions for more details. templates
-
An array property of template specifications that provide mechanisms to transform the tabular data into other formats. See section 3.7 Template Specifications for more details. When an array of template specifications B is imported into an original array of template specifications A, each template specification within B is combined into the original array A by:
- if there is a template specification with the same
targetFormat
andtemplateFormat
in A, the template specification from B is imported into the matching template specification in A - otherwise, the template specification from B is appended to the array of template specifications A
- if there is a template specification with the same
@type
-
If included,
@type
MUST be set to"TableGroup"
. Publishers MAY include this to provide additional information to JSON-LD based toolchains.
The description MAY contain any common properties as defined in section 3.3 Common Properties to provide extra metadata about the set of tables as a whole.
The description MAY contain any of the properties defined in section 2.2 Inherited Properties to describe cells within the tables.
This issue relates to the use of type vs datatype as a column property. (This issue seems moot now that neither are included.)
3.5 Tables
A table description is a JSON object that describes a table within a CSV file.
A CSV file might not be the same as the table that it contains. For example, a given CSV file might contain two tables (in different regions of the CSV file), or might contain a table that isn't positioned at the top left of the CSV file. We invite comment about whether we should assume that pre-processing is used to extract tables where there isn't a 1:1 correspondence between CSV file and table, or not.
3.5.1 Required Properties
@id
-
This link property gives the single URL of the CSV file that the table is held in, relative to the location of the metadata document.
3.5.2 Optional Properties
The description of a table MAY also contain:
schema
- An object property that provides a single schema description as described in section 3.8 Schemas. This may be provided as an embedded object within the JSON metadata or as a URL reference to a separate JSON schema document.
notes
-
An object property that provides an array of objects representing annotations. This specification does not place any constraints on the structure of these objects.
NoteThe Web Annotation Working Group is developing a vocabulary for expressing annotations. In future versions of this specification, we anticipate referencing that vocabulary.
Should there be column or level notes as well?
The Annotation Model can indeed become very complex.
table-direction
- As defined for table groups.
templates
- As defined for table groups.
dialect
- As defined for table groups.
@type
-
If included,
@type
MUST be set to"Table"
. Publishers MAY include this to provide additional information to JSON-LD based toolchains.
We invite comment on whether we should include properties that help in checking the integrity of the file: datapackage includes bytes
and hash
. We could reuse the Subresource Integrity work here.
The description MAY contain any common properties as defined in section 3.3 Common Properties to provide extra metadata about the table as a whole.
The description MAY contain any of the properties defined in section 2.2 Inherited Properties to describe cells within the table.
3.6 Dialect Descriptions
Much of the tabular data that is published on the web is messy, and CSV parsers frequently need to be configured in order to correctly read in CSV. A dialect description provides hints to parsers about how to parse the file linked to from the @id
property. It can have any of the following properties, which relate to the flags described in Section 5 Parsing Tabular Data within [tabular-data-model]:
encoding
- An atomic property that sets the encoding flag to the single provided string value, which MUST be a defined [encoding].
lineTerminator
- An atomic property that sets the line terminator flag to the single provided string value.
quoteChar
- An atomic property that sets the quote character flag to the single provided value, which MUST be a single character.
doubleQuote
-
A single boolean atomic property that, if
true
, sets the escape character flag to"
. Iffalse
, to\
. skipRows
- An atomic property that sets the skip rows flag to the single provided numeric value, which MUST be a non-negative integer.
commentPrefix
- An atomic property that sets the comment prefix flag to the single provided value, which MUST be a single character string.
header
-
A single boolean atomic property that, if
true
, sets the header row count flag to1
, and iffalse
to0
, unlessheaderRowCount
is provided, in which case the value provided for theheader
property is ignored. headerRowCount
- An atomic property that sets the header row count flag to the single provided value, which MUST be a non-negative integer.
delimiter
- An atomic property that sets the delimiter flag to the single provided value, which MUST be a single character string.
skipColumns
- An atomic property that sets the skip columns flag to the single provided numeric value, which MUST be a non-negative integer.
headerColumnCount
- An atomic property that sets the header column count flag to the single provided value, which MUST be non-negative integer.
skipBlankRows
- An atomic property that sets the skip blank rows flag to the single provided boolean value.
skipInitialSpace
-
A single boolean atomic property that, if
true
, sets the trim flag to"start"
. Iffalse
, tofalse
. If thetrim
property is provided, theskipInitialSpace
property is ignored. trim
-
A single atomic property that, if the boolean
true
, sets the trim flag totrue
and if the booleanfalse
tofalse
. If the value provided is a string, sets the trim flag to the provided value, which MUST be one of"true"
,"false"
,"start"
or"end"
. @type
-
If included,
@type
MUST be set to"Dialect"
. Publishers MAY include this to provide additional information to JSON-LD based toolchains.
The default dialect description for CSV files is:
{ "encoding": "utf-8", "lineTerminator": "\r\n", "quoteChar": "\"", "doubleQuote": true, "skipRows": 0, "header": true, "headerRowCount": 1, "delimiter": ",", "skipColumns": 0, "headerColumnCount": 0, "skipBlankRows": false, "skipInitialSpace": false, "trim": false }
3.7 Template Specifications
A template specification is a definition of how tabular data can be transformed into another format. It has the following properties:
3.7.1 Required Properties
Template specifications MUST have the following properties:
targetFormat
-
A URL for the format that will be created through the transformation. If one has been defined, this should be a URL for a media type, in the form
https://www.iana.org/assignments/media-types/media-type
such ashttps://www.iana.org/assignments/media-types/text/calendar
. Otherwise, it can be any URL that describes the target format.NoteThe
targetFormat
URL is intended as an informative identifier for the target format, and applications MAY NOT access the URL. templateFormat
-
A URL for the format that is used by the template. If one has been defined, this should be a URL for a media type, in the form
https://www.iana.org/assignments/media-types/media-type
such ashttps://www.iana.org/assignments/media-types/application/javascript
. Otherwise, it can be any URL that describes the template format.NoteThe
templateFormat
URL is intended as an informative identifier for the template format, and applications MAY NOT access the URL. The template formats that an application supports are implementation defined.
3.7.2 Optional Properties
Template specifications MAY have the following properties:
title
-
A natural language property that describes the format that will be generated from the transformation. This is useful if the target format is a generic format (such as
application/json
) and the transformation is creating a specific profile of that format. source
-
A single string atomic property that provides, if included, the format to which the tabular data should be transformed prior to the transformation using the template. If the value is
"json"
, the tabular data should first be transformed first to JSON based on the simple mapping defined in Generating JSON from Tabular Data on the Web. If the value is"rdf"
, it should similarly first be transformed to XML based on the simple mapping defined in Generating RDF from Tabular Data on the Web. If thesource
property is missing ornull
then the source of the transformation is the annotated tabular data model. @type
-
If included,
@type
MUST be set to"Template"
. Publishers MAY include this to provide additional information to JSON-LD based toolchains.
The template specification MAY contain any common properties as defined in section 3.3 Common Properties to provide extra metadata about the transformation.
3.7.3 Example
The following template specification will enable a processor that supports it to generate an iCalendar document using a Mustache template based on the JSON created from the simple mapping to JSON.
{ "title": "iCalendar", "targetFormat": "https://www.iana.org/assignments/media-types/text/calendar", "templateFormat": "carview.php?tsp=https://mustache.github.io/", "source": "json" }
3.8 Schemas
A schema is a definition of a tabular format that may be common to multiple tables. For example, multiple tables from different sources may have the same columns and be designed such that they can be aggregated together.
A schema description is a JSON object that encodes the information about a schema. All the properties of a schema description are optional.
columns
-
An array property of column descriptions as described in section 3.9 Columns. These are matched to columns in tables that use the schema by position: the first column description in the array applies to the first column in the table, the second to the second and so on.
The
name
properties of the column descriptions MUST be unique within a given table description.When an array of column descriptions B is imported into an original array of column descriptions A, each column description within B is combined into the original array A by:
- if there is a column description at the same index within A and that column description has the same
name
, the column description from B is imported into the matching column description in A - otherwise, the column description is ignored
- if there is a column description at the same index within A and that column description has the same
primaryKey
-
A column reference property that holds either a single reference to a column description object or an array of references.
Validators MUST check that each row has a unique combination of cells in the indicated columns. For example, if
primaryKey
is set to["familyName", "givenName"]
then every row must have a unique value for the combination of thefamilyName
andgivenName
columns.Composite primary keys and foreign key references.
foreignKeys
-
An array property of foreign key definitions that define how the values from specified columns within this table link to rows within this table or other tables. A foreign key definition is a JSON object with the properties:
columns
- A column reference property that holds either a single reference to a column description object within this schema, or an array of references.
reference
-
An object with the properties:
resource
-
A link property holding a URL that is the identifier for a specific resource that is being referenced. If this is present then
schema
MUST NOT be present. The metadata document MUST contain a description of the resource. schema
-
A link property holding a URL that is the identifier for a schema that is being referenced. If this is present then
resource
MUST NOT be present. The metadata document that forms the basis of processing MUST contain a description of a resource that uses the referenced schema, and there MUST NOT be more than one such resource. columns
- A column reference property that holds either a single reference to a column description object within this schema, or an array of references.
NoteIt is not required for the resource or schema referenced from a
foreignKeys
property to have a similarly definedprimaryKey
.When an array of foreign key definitions B is imported into an original array of foreign key definitions A, each foreign key definition within B which does not appear within A is appended to the original array A.
The cross reference between files should be limited to files from one publisher - else they are just web links with no guarantee of whether the target of the link exists which 'foreign key' might imply.
urlTemplate
- A URI template property that MAY be used to create a unique identifier for each row when mapping data to other formats.
@type
-
If included,
@type
MUST be set to"Schema"
. Publishers MAY include this to provide additional information to JSON-LD based toolchains.
The description MAY contain any common properties as defined in section 3.3 Common Properties to provide extra metadata about the schema as a whole.
The description MAY contain any of the inherited properties defined for cells in section 2.2 Inherited Properties.
3.8.1 Examples
3.8.1.1 Foreign Key Reference Between Resources
A list of countries is published at https://example.org/countries.csv
with the structure:
countryCode,latitude,longitude,name AD,42.546245,1.601554,Andorra AE,23.424076,53.847818,"United Arab Emirates" AF,33.93911,67.709953,Afghanistan
Another file contains information about the population in some countries each year, at https://example.com/country_slice.csv
with the structure:
countryRef,year,population AF,1960,9616353 AF,1961,9799379 AF,1962,9989846
The following metadata for the group of tables links the two together by defining a foreignKeys
property:
{ "@context": "https://www.w3.org/ns/csvw", "resources": [{ "@id": "https://example.org/countries.csv", "schema": { "columns": [{ "name": "countryCode", "datatype": "string" }, { "name": "latitude", "datatype": "number" }, { "name": "longitude", "datatype": "number" }, { "name": "name", "datatype": "string" }], "urlTemplate": "https://example.org/countries.csv{#countryCode}", "primaryKey": "countryCode" } }, { "@id": "https://example.com/country_slice.csv", "schema": { "columns": [{ "name": "countryRef", "datatype": "string" }, { "name": "year", "datatype": "gYear" }, { "name": "population", "datatype": "integer" }], "foreignKeys": [{ "columns": "countryRef", "reference": { "resource": "https://example.org/countries.csv", "columns": "countryCode" } }] } }] }
When the population data in country_slice.csv
is processed (displayed or mapped into another format), a link can be made from the content of the countryRef
column based on the urlTemplate
for country.csv
. For example, if the countryRef
column (the value of columns
in the foreignKeys
object) in country_slice.csv
contains the value UK
then the processor will use that value to populate the countryCode
variable (the value of reference.columns
in the foreignKeys
object) when interpreting the urlTemplate
for country.csv
, and create the URL https://example.org/countries.csv#UK
. The processor does not need to retrieve https://example.org/countries.csv
or check that the value UK
appears within the countryCode
column to create this link: it is created purely based on the urlTemplate
in the description of the referenced resource.
3.8.1.2 Foreign Key Reference Between Schemas
When publishing information about public sector roles and salaries, as in Use Case 4, the UK government requires departments to publish two files which are interlinked. The first lists senior grades (simplified here) eg at HEFCE_organogram_senior_data_31032011.csv
:
Post Unique Reference, Name,Grade, Job Title,Reports to Senior Post 90115, Steve Egan,SCS1A,Deputy Chief Executive, 90334 90250, David Sweeney,SCS1A, Director, 90334 90284, Heather Fry,SCS1A, Director, 90334 90334,Sir Alan Langlands, SCS4, Chief Executive, xx
The second provides information about the number of junior positions that report to those individuals (simplified here) eg at HEFCE_organogram_junior_data_31032011.csv
:
Reporting Senior Post,Grade,Payscale Minimum (£),Payscale Maximum (£),Generic Job Title,Number of Posts in FTE, Profession 90284, 4, 17426, 20002, Administrator, 2,Operational Delivery 90284, 5, 19546, 22478, Administrator, 1,Operational Delivery 90115, 4, 17426, 20002, Administrator, 8.67,Operational Delivery 90115, 5, 19546, 22478, Administrator, 0.5,Operational Delivery
The schemas are reused by multiple departments and for multiple pairs of files. The schemas are therefore defined in separate files, and they need to define links between the schemas which are then picked up as applying between tables that use those schemas.
The metadata file for the particular publication of the files above is:
{ "@context": "https://www.w3.org/ns/csvw", "resources": [{ "@id": "HEFCE_organogram_senior_data_31032011.csv", "schema": "https://example.org/schema/senior-roles.json" }, { "@id": "HEFCE_organogram_junior_data_31032011.csv", "schema": "https://example.org/schema/junior-roles.json" }] }
The schema for the senior role CSV (at https://example.org/schema/senior-roles.json
) is as follows; it includes a foreign key reference to itself:
{ "@context": "https://www.w3.org/ns/csvw", "@id": "https://example.org/schema/senior-roles.json", "columns": [{ "name": "ref", "title": "Post Unique Reference" }, { "name": "name", "title": "Name" }, { "name": "grade", "title": "Grade" }, { "name": "job", "title": "Job Title" }, { "name": "reportsTo", "title": "Reports to Senior Post" }], "primaryKey": "ref", "urlTemplate": "#post-{ref}", "foreignKeys": [{ "columns": "reportsTo", "reference": { "schema": "https://example.org/schema/senior-roles.json", "columns": "ref" } }] }
The schema for the junior role CSV (at https://example.org/schema/junior-roles.json
) is as follows; it includes a foreign key reference to the senior roles schema:
{ "@context": "https://www.w3.org/ns/csvw", "@id": "https://example.org/schema/junior-roles.json", "columns": [{ "name": "reportsTo", "title": "Reporting Senior Post" }, ... ], "foreignKeys": [{ "columns": "reportsTo", "reference": { "schema": "https://example.org/schema/senior-roles.json", "columns": "ref" } }] }
In the first line of HEFCE_organogram_junior_data_31032011.csv
, the reportsTo
(Reporting Senior Post
) column contains the value 90284
. When creating a link from that column, the urlTemplate
defined within the schema at https://example.org/schema/senior-roles.json
is used to generate a URL by expanding the variable reference for ref
based on the value from the reportsTo
column. This gives the relative URL #post-90284
which is then resolved against the base URL of the resource that uses the senior-roles.json
schema within the original metadata file, namely HEFCE_organogram_senior_data_31032011.csv
.
3.9 Columns
A column description is a simple JSON object that describes a single column. The description provides additional human-readable documentation for a column, as well as additional information that may be used to validate the cells within the column, create a user interface for data entry, or inform conversion into other formats.
Should there be a way to suppress columns?
3.9.1 Required Properties
name
-
An atomic property that gives a single canonical name for the column. This MUST be a string. Conversion specifications MUST use this property as the basis for the names of properties/elements/attributes in the results of conversions.
For ease of reference within URI template properties, column names SHOULD consist only of alphanumeric characters or underscores (
[a-zA-Z0-9_]+
). Names beginning with_
are reserved by this specification and MUST NOT be used.What do to with conversion if no column name is given?
We invite comment on what the syntactic limitations should be on column names to make them most useful when used as the basis of conversion into other formats, bearing in mind that different target languages such as JSON, RDF and XML have different syntactic limitations and common naming conventions.
During validation, if there is no
title
property and the column already has atitle
annotation then a validator MUST issue a warning if the existingtitle
annotation does not match thename
specified in the column description.
3.9.2 Optional Properties
title
-
A natural language property that provides possible alternative names for the column. The possible column titles are defined as:
- if the value of
title
is a string, that string - if the value of
title
is an array, the strings in that array - if the value of
title
is an object, the string or strings that are the value of the property of that object whose name is the column language
where the column language is the value of the
language
property on the column description, or (if there is no such language), the value of thelanguage
property on the table description.If the column already has a
title
annotation (because a header row has been included in the original CSV file) then a validator MUST issue a warning if the existingtitle
annotation is not the same as any of the possible column titles.The facility to specify multiple potential titles for a column is important when the same column description is used for multiple CSVs, through a mechanism yet to be defined by this specification.
- if the value of
required
- A boolean atomic property taking a single value which indicates whether every cell within the column must have a non-null value.
predicateUrl
- An atomic property that holds one or more URIs that MAY be used as URIs for predicates if the table is mapped to another format.
@type
-
If included,
@type
MUST be set to"Column"
. Publishers MAY include this to provide additional information to JSON-LD based toolchains.
The description MAY contain any common properties as defined in section 3.3 Common Properties to provide extra metadata about the column as a whole, such as a full description.
The description MAY contain any of the inherited properties defined for cells in section 2.2 Inherited Properties.
3.10 Inherited Properties
Cell descriptions may override inherited properties, as described in section 2. Annotating Tables. It is good practice to define these properties on columns, so that all cells within a given column are handled in the same way, or on tables if appropriate. These properties are:
null
-
An atomic property giving the string or strings used for null values. If not specified, the default for this is the empty string.
language
-
An atomic property giving a single string language code as defined by [BCP47]. Indicates the language of the value within the cell.
text-direction
-
An atomic property that MUST have a single string value that is one of
"rtl"
or"ltr"
(the default). Indicates whether the text within cells should be displayed by default as left-to-right or right-to-left text. See section 4.1.1 Bidirectional Tables for more details. separator
-
An atomic property that MUST have a single string value that is the character used to separate items in the string value of the cell. If
null
or unspecified, the cell does not contain a list. Otherwise, application MUST split the string value of the cell on the specified separator character and parse each of the resulting strings separately. The cell's value will then be a list. Conversion specifications MUST use the separator to determine the conversion of a cell into the target format. See 3.12 Parsing cells for more details. default
-
An atomic property holding a single string that provides a default string value for the cell in cases where the original string value is a
null
value. This default value MAY be used when converting the table into other formats. format
-
An atomic property that contains a single string that is the definition of the format of the cell, used when parsing the cell as described in 3.12 Parsing cells.
datatype
-
An atomic property that contains a single string that is the main datatype of the values of the cell. If the cell contains a list (ie
separator
is specified and notnull
) then this is the datatype of each value within the list. Conversion specifications MUST use the datatype of the value to determine the conversion of a cell into the target format. See 3.11 Datatypes for more details. length
-
An atomic property that contains a single integer that is the exact length of the value of the cell. See section 3.11.1 Length Constraints for details.
minLength
-
An atomic property that contains a single integer that is the minimum length of the value of the cell. See section 3.11.1 Length Constraints for details.
maxLength
-
An atomic property that contains a single integer that is the maximum length of the value of the cell. See section 3.11.1 Length Constraints for details.
minimum
-
An atomic property that contains a single number that is the minimum value for the cell (inclusive); equivalent to
minInclusive
. See section 3.11.2 Value Constraints for details. maximum
-
An atomic property that contains a single number that is the maximum value for the cell (inclusive); equivalent to
maxInclusive
. See section 3.11.2 Value Constraints for details. minInclusive
-
An atomic property that contains a single number that is the minimum value for the cell (inclusive). See section 3.11.2 Value Constraints for details.
maxInclusive
-
An atomic property that contains a single number that is the maximum value for the cell (inclusive). See section 3.11.2 Value Constraints for details.
minExclusive
-
An atomic property that contains a single number that is the minimum value for the cell (exclusive). See section 3.11.2 Value Constraints for details.
maxExclusive
-
An atomic property that contains a single number that is the maximum value for the cell (exclusive). See section 3.11.2 Value Constraints for details.
3.11 Datatypes
Cells within tables may be annotated with a datatype
which indicates the type of the value obtained by parsing the value of the cell. The format expected in the cell is determined by the format
annotation, if there is one, or uses a default format determined by the type.
The possible datatypes are:
-
the datatypes defined in [xmlschema-2] with the exception of those that rely on XML mechanisms for definition, namely:
anySimpleType
string
; a sub-value ofanySimpleType
normalizedString
; a sub-value ofstring
token
; a sub-value ofnormalizedString
language
; a sub-value oftoken
Name
; a sub-value oftoken
NCName
; a sub-value ofName
boolean
; a sub-value ofanySimpleType
decimal
; a sub-value ofanySimpleType
integer
; a sub-value ofdecimal
nonPositiveInteger
; a sub-value ofinteger
negativeInteger
; a sub-value ofnonPositiveInteger
long
; a sub-value ofinteger
int
; a sub-value oflong
short
; a sub-value ofint
byte
; a sub-value ofshort
nonNegativeInteger
; a sub-value ofinteger
unsignedLong
; a sub-value ofnonNegativeInteger
unsignedInt
; a sub-value ofunsignedLong
unsignedShort
; a sub-value ofunsignedInt
unsignedByte
; a sub-value ofunsignedShort
positiveInteger
; a sub-value ofnonNegativeInteger
float
; a sub-value ofanySimpleType
double
; a sub-value ofanySimpleType
duration
; a sub-value ofanySimpleType
dateTime
; a sub-value ofanySimpleType
time
; a sub-value ofanySimpleType
date
; a sub-value ofanySimpleType
gYearMonth
; a sub-value ofanySimpleType
gYear
; a sub-value ofanySimpleType
gMonthDay
; a sub-value ofanySimpleType
gDay
; a sub-value ofanySimpleType
gMonth
; a sub-value ofanySimpleType
hexBinary
; a sub-value ofanySimpleType
base64Binary
; a sub-value ofanySimpleType
anyURI
; a sub-value ofanySimpleType
- the datatype
number
which is exactly equivalent todouble
- the datatype
binary
which is exactly equivalent tobase64Binary
- the datatype
datetime
which is exactly equivalent todateTime
- the datatype
any
which is exactly equivalent toanySimpleType
- the datatype
xml
which indicates the cell contains an XML fragment - the datatype
html
which indicates the cell contains an HTML fragment - the datatype
json
which indicates the cell contains serialized JSON
3.11.1 Length Constraints
The length
, minLength
and maxLength
properties indicate the exact, minimum and maximum lengths of the values of cells.
Applications MUST raise an error if both length
and minLength
are specified and they do not have the same value. Similarly, applications MUST raise an error if both length
and maxLength
are specified and they do not have the same value. Applications MUST raise an error if length
, maxLength
or minLength
are specified and the cell value is not a list (ie separator
is not specified), a string or one of its subtypes, or a binary value.
The length of a value of a cell is determined as follows:
- if the cell is
null
its length is zero - if the value is a list, its length is the number of items in the list
- if the value is a string or one of its subtypes, its length is the number of characters in the value
- if the value is of a binary type, its length is the number of bytes in the binary value
3.11.2 Value Constraints
The minimum
, maximum
, minInclusive
, maxInclusive
, minExclusive
and maxExclusive
properties indicate limits on the values of cells. These apply to numeric and date/time types. The minimum
property is equivalent to the minInclusive
property and the maximum
property is equivalent to the maxInclusive
property.
Validation against these properties is as defined in [xmlschema-2].
3.12 Parsing cells
Unlike many other data formats, tabular data is designed to be read by humans. For that reason, it's common for data to be represented within tabular data in a human-readable way. The separator
and format
properties indicates the format used to represent data within the table. This is used:
- by validators to check that the data in the table is in the expected format
- by converters to parse the values before mapping them into values in the target of the conversion
- when displaying data, to map it into formats that are meaningful for those viewing the data (as opposed to those publishing it)
- when inputting data, to turn entered values into representations in a consistent format
The process of parsing the string value of a cell into a single value or a list of values is as follows:
What should be the mapping of an empty cell?
- unless the
datatype
isstring
oranySimpleType
orany
, strip leading and trailing whitespace from the value - if the value is the same as the
null
value, then the value isnull
- if the
separator
property is notnull
, create a list of values by splitting the string at the character specified by theseparator
property - validate the value(s) against the
format
, if one is specified, as described below; raise an error if any of the values do not match the specified format - parse the value(s) using the
format
, as described below
3.12.1 Formats for strings
If the datatype
is a string type, the format
property provides a regular expression for the string values, in the syntax defined by [ECMASCRIPT].
We invite comment about which reference to use for regular expression syntax. Other possibilities are to use that defined by XML Schema or XPath.
3.12.2 Formats for numeric types
It is not uncommon for numbers within tabular data to be formatted for human consumption, which may involve using commas for decimal points, grouping digits in the number using commas, or adding currency symbols or percent signs to the number.
If the datatype
is a numeric type, the format
property indicates the expected format for that number. Validators MUST check that the numbers in the column adhere to the specified format. Converters MUST use the format
property to parse the number when mapping it into a suitable type in the target language of the conversion.
When the datatype
is a numeric type, the format
property's value MUST be a number format as specified in [xslt-21].
We invite comment on the best format to specify how to parse numbers.
Register of recognised date-time picture string formats.
3.12.3 Formats for booleans
Boolean values may be represented in many ways aside from the standard 1
and 0
or true
and false
.
If the datatype
is boolean
, the format
property provides the true and false values expected, separated by |
. For example if format
is Y|N
then cells must hold either Y
or N
with Y
meaning true
and N
meaning false
.
3.12.4 Formats for dates and times
Dates and times are commonly represented in tabular data in formats other than those defined in [xmlschema-2].
If the datatype
is a date or time type, the format
property indicates the expected format for that date or time. Validators MUST check that the dates or times in the column adhere to the specified format. Converters MUST use the format
property to parse the date or time when mapping it into a suitable type in the target language of the conversion.
When the datatype
is a date or time type, the format
property's value MUST be a date/time format as specified in [xslt-21].
We invite comment on which format to use when parsing dates and times.
3.12.5 Formats for durations
We invite comment on whether there are standard formats to use when parsing durations.
4. Processing Tables
This section describes how particular types of applications should use the metadata supplied about a CSV file when they process that CSV file.
4.1 Displaying Tables
We intend to include other sections here about:
- displaying metadata about groups of tables, tables, columns, rows, cells and regions
- what headings to use for columns when displaying tabular content
- how to format values in cells
Much of this is likely to be non-normative. We invite comment on whether it's useful to provide this kind of guidance.
4.1.1 Bidirectional Tables
There are two levels of bidirectionality to consider when displaying tables: the directionality of the table (ie whether the columns should be arranged left-to-right or right-to-left) and the directionality of the content of individual cells.
The table-direction
property provides information about the desired display of the table. If table-direction=ltr
then the first column SHOULD be displayed on the left and the last column on the right. If table-direction=rtl
then the first column SHOULD be displayed on the right and the last column on the left.
If table-direction=default
then tables SHOULD be displayed with attention to the bidirectionality of the content of the file. Specifically, the values of the cells in the table should be scanned breadth first: from the first cell in the first column through to the last cell in the first column, down to the last cell in the last column. If the first character in the table with a strong type as defined in [UNICODE-BIDI] indicates a RTL directionality, the table should be displayed with the first column on the right and the last column on the left. Otherwise, the table should be displayed with the first column on the left and the last column on the right. Characters such as whitespace, quotes, commas and numbers do not have a strong type, and therefore are skipped when identifying the character that determines the directionality of the table.
Implementations SHOULD enable user preferences to override the indicated metadata about the directionality of the table.
Once the directionality of the table has been determined, each cell within the table should be considered as a separate paragraph, as defined by the UBA in [UNICODE-BIDI]. The default directionality for the cell is determined by looking at the text-direction
property, which is an inherited property.
Thus, as defined by the UBA, if a cell contains no characters with a strong type (if it's a number or date for example) then the way the cell is displayed should be determined by the text-direction
property of the cell. However, when the cell contains characters with a strong type (such as letters) then they MUST be displayed according to the Unicode Bidirectional Algorithm as described in [UNICODE-BIDI].
4.2 Validating Tables
We intend to detail how to validate groups of tabular data files against metadata. This would be normative: compliant validators would have to report the errors and warnings that we define. We invite comment on whether this is a useful thing to specify.
4.3 Converting Tables
Conversions of tabular data to other formats operate over a annotated table constructed as defined in section 2. Annotating Tables. The mechanics of these conversions to other formats are defined in other specifications.
Conversion specifications MUST define a default mapping from an annotated table that lacks any annotations (ie that is equivalent to an un-annotated table).
Conversion specifications MUST use either the name
or the predicateUrl
of a column as the basis for naming machine-readable fields in the target format, such as the name of the equivalent element or attribute in XML, property in JSON or property URI in RDF.
Conversion specifications MAY use any of the properties defined in this specification to adjust the mapping of an annotated table into another format.
Conversion specifications MAY define additional properties, not defined in this specification, which are specifically used when converting to the target format of the conversion. For example, a conversion to XML might specify a element-or-attribute
property on columns that determines whether a particular column is represented through an element or an attribute in the data.
Conversion specifications SHOULD specify format-specific properties specifying external processing steps to provide more control to people defining conversions. If these are specified, the conversion specification MUST specify at what point in the processing this external processing takes place, and what it takes place on. Examples might be:
- the URL of an XSLT file that is used to process XML after it is generated
- a string containing a SPARQL CONSTRUCT pattern that is executed on RDF after it is generated
- properties that contain definitions of Javascript callback functions that are used when processing particular columns or individual rows
A. Acknowledgements
This document is largely a copy of content from the Data Package specification and the JSON Table Schema, which are maintained as part of Data Protocols. Particular contributors to that work are Rufus Pollock, Paul Fitzpatrick, Andrew Berkeley, Francis Irving, Benoit Chesneau, Leigh Dodds, Martin Keegan, and Gunnlaugur Thor Briem.
B. IANA Considerations
B.1 Registration of application/csvm+json
We intend to include a registration for a new datatype, namely application/csvm+json
. We invite comment about how to indicate that this is consistent with application/ld+json
, or whether we should just use application/json
or application/ld+json
and not create a specific media type for the metadata files defined in this document.
C. Security Considerations
TODO: General CSV security considerations.
D. JSON-LD Context
The JSON-LD context, located at https://www.w3.org/ns/csvw.jsonld
is used with metadata documents.
{
"@context": {
"id": "@id",
"type": "@type",
"dc:title": {
"@container": "@language"
},
"dc:description": {
"@container": "@language"
},
"rdfs:comment": {
"@container": "@language"
},
"rdfs:domain": {
"@type": "@id"
},
"rdfs:label": {
"@container": "@language"
},
"rdfs:range": {
"@type": "@id"
},
"rdfs:subClassOf": {
"@type": "@id"
},
"rdfs:subPropertyOf": {
"@type": "@id"
},
"owl:equivalentClass": {
"@type": "@vocab"
},
"owl:equivalentProperty": {
"@type": "@vocab"
},
"owl:oneOf": {
"@container": "@list",
"@type": "@vocab"
},
"owl:imports": {
"@type": "@id"
},
"owl:versionInfo": {
"@type": "xsd:string",
"@language": null
},
"owl:inverseOf": {
"@type": "@vocab"
},
"owl:unionOf": {
"@type": "@vocab",
"@container": "@list"
},
"rdfs_classes": {
"@reverse": "rdfs:isDefinedBy",
"@type": "@id"
},
"rdfs_properties": {
"@reverse": "rdfs:isDefinedBy",
"@type": "@id"
},
"rdfs_datatypes": {
"@reverse": "rdfs:isDefinedBy",
"@type": "@id"
},
"rdfs_instances": {
"@reverse": "rdfs:isDefinedBy",
"@type": "@id"
},
"cc": "https://creativecommons.org/ns#",
"csvw": "https://www.w3.org/ns/csvw#",
"ctag": "https://commontag.org/ns#",
"dc": "https://purl.org/dc/terms/",
"dc11": "https://purl.org/dc/elements/1.1/",
"dcat": "https://www.w3.org/ns/dcat#",
"dcterms": "https://purl.org/dc/terms/",
"foaf": "https://xmlns.com/foaf/0.1/",
"gr": "https://purl.org/goodrelations/v1#",
"grddl": "https://www.w3.org/2003/g/data-view#",
"ical": "https://www.w3.org/2002/12/cal/icaltzd#",
"ma": "https://www.w3.org/ns/ma-ont#",
"og": "https://ogp.me/ns#",
"org": "https://www.w3.org/ns/org#",
"owl": "https://www.w3.org/2002/07/owl#",
"prov": "https://www.w3.org/ns/prov#",
"qb": "https://purl.org/linked-data/cube#",
"rdf": "https://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfa": "https://www.w3.org/ns/rdfa#",
"rdfs": "https://www.w3.org/2000/01/rdf-schema#",
"rev": "https://purl.org/stuff/rev#",
"rif": "https://www.w3.org/2007/rif#",
"rr": "https://www.w3.org/ns/r2rml#",
"schema": {
"@id": "csvw:schema",
"@type": "@id"
},
"sd": "https://www.w3.org/ns/sparql-service-description#",
"sioc": "https://rdfs.org/sioc/ns#",
"skos": "https://www.w3.org/2004/02/skos/core#",
"skosxl": "https://www.w3.org/2008/05/skos-xl#",
"v": "https://rdf.data-vocabulary.org/#",
"vcard": "https://www.w3.org/2006/vcard/ns#",
"void": "https://rdfs.org/ns/void#",
"wdr": "https://www.w3.org/2007/05/powder#",
"wrds": "https://www.w3.org/2007/05/powder-s#",
"xhv": "https://www.w3.org/1999/xhtml/vocab#",
"xml": "rdf:XMLLiteral",
"xsd": "https://www.w3.org/2001/XMLSchema#",
"any": "xsd:anySimpleType",
"binary": "xsd:base64Binary",
"datetime": "xsd:dateTime",
"describedby": "wrds:describedby",
"html": "rdf:HTML",
"license": "xhv:license",
"maximum": "csvw:maxInclusive",
"minimum": "csvw:minInclusive",
"number": "xsd:double",
"role": "xhv:role",
"Column": "csvw:Column",
"Dialect": "csvw:Dialect",
"Direction": "csvw:Direction",
"Schema": "csvw:Schema",
"Table": "csvw:Table",
"TableGroup": "csvw:TableGroup",
"Template": "csvw:Template",
"columns": {
"@id": "csvw:columns",
"@type": "@id",
"@container": "@list"
},
"commentPrefix": {
"@id": "csvw:commentPrefix"
},
"datatype": {
"@id": "csvw:datatype"
},
"default": {
"@id": "csvw:default"
},
"delimiter": {
"@id": "csvw:delimiter"
},
"dialect": {
"@id": "csvw:dialect",
"@type": "@id"
},
"doubleQuote": {
"@id": "csvw:doubleQuote",
"@type": "xsd:boolean"
},
"encoding": {
"@id": "csvw:encoding"
},
"foreignKeys": {
"@id": "csvw:foreignKeys"
},
"format": {
"@id": "csvw:format"
},
"header": {
"@id": "csvw:header",
"@type": "xsd:boolean"
},
"headerColumnCount": {
"@id": "csvw:headerColumnCount",
"@type": "xsd:nonNegativeInteger"
},
"headerRowCount": {
"@id": "csvw:headerRowCount",
"@type": "xsd:nonNegativeInteger"
},
"language": {
"@id": "csvw:language"
},
"length": {
"@id": "csvw:length",
"@type": "xsd:nonNegativeInteger"
},
"lineTerminator": {
"@id": "csvw:lineTerminator"
},
"maxExclusive": {
"@id": "csvw:maxExclusive"
},
"maxInclusive": {
"@id": "csvw:maxInclusive"
},
"maxLength": {
"@id": "csvw:maxLength",
"@type": "xsd:nonNegativeInteger"
},
"minExclusive": {
"@id": "csvw:minExclusive"
},
"minInclusive": {
"@id": "csvw:minInclusive"
},
"minLength": {
"@id": "csvw:minLength",
"@type": "xsd:nonNegativeInteger"
},
"name": {
"@id": "csvw:name"
},
"notes": {
"@id": "csvw:notes"
},
"null": {
"@id": "csvw:null"
},
"predicateUrl": {
"@id": "csvw:predicateUrl",
"@type": "xsd:anyURI"
},
"primaryKey": {
"@id": "csvw:primaryKey"
},
"quoteChar": {
"@id": "csvw:quoteChar"
},
"required": {
"@id": "csvw:required",
"@type": "xsd:boolean"
},
"resources": {
"@id": "csvw:resources",
"@type": "@id",
"@container": "@set"
},
"row": {
"@id": "csvw:row",
"@container": "@set"
},
"separator": {
"@id": "csvw:separator"
},
"skipBlankRows": {
"@id": "csvw:skipBlankRows",
"@type": "xsd:boolean"
},
"skipColumns": {
"@id": "csvw:skipColumns",
"@type": "xsd:nonNegativeInteger"
},
"skipInitialSpace": {
"@id": "csvw:skipInitialSpace",
"@type": "xsd:boolean"
},
"skipRows": {
"@id": "csvw:skipRows",
"@type": "xsd:nonNegativeInteger"
},
"source": {
"@id": "csvw:source"
},
"table": {
"@id": "csvw:table",
"@type": "@id",
"@container": "@set"
},
"table-direction": {
"@id": "csvw:table-direction",
"@type": "@vocab"
},
"targetFormat": {
"@id": "csvw:targetFormat"
},
"templateFormat": {
"@id": "csvw:templateFormat"
},
"templates": {
"@id": "csvw:templates",
"@type": "@id"
},
"text-direction": {
"@id": "csvw:text-direction",
"@type": "@vocab"
},
"title": {
"@id": "csvw:title",
"@container": "@language"
},
"trim": {
"@id": "csvw:trim",
"@type": "xsd:boolean"
},
"uriTemplate": {
"@id": "csvw:uriTemplate"
},
"json": "csvw:json"
},
"@id": "https://www.w3.org/ns/csvw#",
"@type": "owl:Ontology",
"dc:title": {
"en": "Metadata Vocabulary for Tabular Data"
},
"dc:description": {
"en": "Validation, conversion, display and search of tabular data on the web\n requires additional metadata that describes how the data should be\n interpreted. This document defines a vocabulary for metadata that\n annotates tabular data. This can be used to provide metadata at various\n levels, from collections of data from CSV documents and how they relate\n to each other down to individual cells within a table."
},
"rdfs_classes": [
{
"@id": "csvw:Column",
"@type": "rdfs:Class",
"rdfs:label": {
"en": "Column Description"
},
"rdfs:comment": {
"en": "A Column Description describes a single column."
}
},
{
"@id": "csvw:Dialect",
"@type": "rdfs:Class",
"rdfs:label": {
"en": "Dialect Description"
},
"rdfs:comment": {
"en": "A Dialect Description provides hints to parsers about how to parse a linked file."
}
},
{
"@id": "csvw:Direction",
"@type": "rdfs:Class",
"rdfs:label": {
"en": "Direction"
},
"rdfs:comment": {
"en": "The class of table/text directions."
}
},
{
"@id": "csvw:Schema",
"@type": "rdfs:Class",
"rdfs:label": {
"en": "Schema"
},
"rdfs:comment": {
"en": "A Schema is a definition of a tabular format that may be common to multiple tables."
}
},
{
"@id": "csvw:Table",
"@type": "rdfs:Class",
"rdfs:label": {
"en": "Table Description"
},
"rdfs:comment": {
"en": "A table description is a JSON object that describes a table within a CSV file."
}
},
{
"@id": "csvw:TableGroup",
"@type": "rdfs:Class",
"rdfs:label": {
"en": "Table Group Description"
},
"rdfs:comment": {
"en": "A Table Group Description describes a group of Tables."
}
},
{
"@id": "csvw:Template",
"@type": "rdfs:Class",
"rdfs:label": {
"en": "Template Specification"
},
"rdfs:comment": {
"en": "A Template Specification is a definition of how tabular data can be transformed into another format."
}
}
],
"rdfs_properties": [
{
"@id": "csvw:columns",
"@type": "rdf:Property",
"rdfs:label": {
"en": "columns"
},
"rdfs:comment": {
"en": "An array of Column Descriptions."
},
"rdfs:domain": "csvw:Schema",
"rdfs:range": "csvw:Column"
},
{
"@id": "csvw:commentPrefix",
"@type": "rdf:Property",
"rdfs:label": {
"en": "comment prefix"
},
"rdfs:comment": {
"en": "A character that, when it appears at the beginning of a skipped row, indicates a comment that should be associated as a comment annotation to the table. The default is \"#\"."
},
"rdfs:domain": "csvw:Dialect"
},
{
"@id": "csvw:datatype",
"@type": "rdf:Property",
"rdfs:label": {
"en": "datatype"
},
"rdfs:comment": {
"en": "The main datatype of the values of the cell. If the cell contains a list (ie separator is specified and not null) then this is the datatype of each value within the list."
},
"rdfs:domain": {
"owl:unionOf": [
"csvw:TableGroup",
"csvw:Table",
"csvw:Schema",
"csvw:Column"
]
}
},
{
"@id": "csvw:default",
"@type": "rdf:Property",
"rdfs:label": {
"en": "default"
},
"rdfs:comment": {
"en": "An atomic property holding a single string that provides a default string value for the cell in cases where the original string value is a null value. This default value may be used when converting the table into other formats."
},
"rdfs:domain": {
"owl:unionOf": [
"csvw:TableGroup",
"csvw:Table",
"csvw:Schema",
"csvw:Column"
]
}
},
{
"@id": "csvw:delimiter",
"@type": "rdf:Property",
"rdfs:label": {
"en": "delimiter"
},
"rdfs:comment": {
"en": "The separator between cells. The default is \",\"."
},
"rdfs:domain": "csvw:Dialect"
},
{
"@id": "csvw:dialect",
"@type": "rdf:Property",
"rdfs:label": {
"en": "dialect"
},
"rdfs:comment": {
"en": "Provides hints to processors about how to parse the referenced files for to create tabular data models for an individual table, or all the tables in a group."
},
"rdfs:domain": {
"owl:unionOf": [
"csvw:TableGroup",
"csvw:Table"
]
},
"rdfs:range": "csvw:Dialect"
},
{
"@id": "csvw:doubleQuote",
"@type": "rdf:Property",
"rdfs:label": {
"en": "double quote"
},
"rdfs:comment": {
"en": "If true, sets the escape character flag to \". If false, to \\\\."
},
"rdfs:domain": "csvw:Dialect",
"rdfs:range": "xsd:boolean"
},
{
"@id": "csvw:encoding",
"@type": "rdf:Property",
"rdfs:label": {
"en": "encoding"
},
"rdfs:comment": {
"en": "The character encoding for the file, one of the encodings listed in [encoding]. The default is utf-8."
},
"rdfs:domain": "csvw:Dialect"
},
{
"@id": "csvw:foreignKeys",
"@type": "rdf:Property",
"rdfs:label": {
"en": "foreign keys"
},
"rdfs:comment": {
"en": "An array of foreign key definitions that define how the values from specified columns within this table link to rows within this table or other tables."
},
"rdfs:domain": "csvw:Schema"
},
{
"@id": "csvw:format",
"@type": "rdf:Property",
"rdfs:label": {
"en": "format"
},
"rdfs:comment": {
"en": "A definition of the format of the cell, used when parsing the cell."
},
"rdfs:domain": {
"owl:unionOf": [
"csvw:TableGroup",
"csvw:Table",
"csvw:Schema",
"csvw:Column"
]
}
},
{
"@id": "csvw:header",
"@type": "rdf:Property",
"rdfs:label": {
"en": "header"
},
"rdfs:comment": {
"en": ""
},
"rdfs:domain": "csvw:Dialect",
"rdfs:range": "xsd:boolean"
},
{
"@id": "csvw:headerColumnCount",
"@type": "rdf:Property",
"rdfs:label": {
"en": "header column count"
},
"rdfs:comment": {
"en": "The number of header columns (following the skipped columns) in each row. The default is 0.\n"
},
"rdfs:domain": "csvw:Dialect",
"rdfs:range": "xsd:nonNegativeInteger"
},
{
"@id": "csvw:headerRowCount",
"@type": "rdf:Property",
"rdfs:label": {
"en": "header row count"
},
"rdfs:comment": {
"en": "The number of header rows (following the skipped rows) in the file. The default is 1."
},
"rdfs:domain": "csvw:Dialect",
"rdfs:range": "xsd:nonNegativeInteger"
},
{
"@id": "csvw:language",
"@type": "rdf:Property",
"rdfs:label": {
"en": "language"
},
"rdfs:comment": {
"en": "A language code as defined by [BCP47]. Indicates the language of the value within the cell."
},
"rdfs:domain": {
"owl:unionOf": [
"csvw:TableGroup",
"csvw:Table",
"csvw:Schema",
"csvw:Column"
]
}
},
{
"@id": "csvw:length",
"@type": "rdf:Property",
"rdfs:label": {
"en": "length"
},
"rdfs:comment": {
"en": "The exact length of the value of the cell."
},
"rdfs:domain": {
"owl:unionOf": [
"csvw:TableGroup",
"csvw:Table",
"csvw:Schema",
"csvw:Column"
]
},
"rdfs:range": "xsd:nonNegativeInteger"
},
{
"@id": "csvw:lineTerminator",
"@type": "rdf:Property",
"rdfs:label": {
"en": "line terminator"
},
"rdfs:comment": {
"en": "The character that is used at the end of a row. The default is CRLF."
},
"rdfs:domain": "csvw:Dialect"
},
{
"@id": "csvw:maxExclusive",
"@type": "rdf:Property",
"rdfs:label": {
"en": "max exclusive"
},
"rdfs:comment": {
"en": "The maximum value for the cell (exclusive)."
},
"rdfs:domain": {
"owl:unionOf": [
"csvw:TableGroup",
"csvw:Table",
"csvw:Schema",
"csvw:Column"
]
}
},
{
"@id": "csvw:maxInclusive",
"@type": "rdf:Property",
"rdfs:label": {
"en": "max inclusive"
},
"rdfs:comment": {
"en": "The maximum value for the cell (inclusive). "
},
"rdfs:domain": {
"owl:unionOf": [
"csvw:TableGroup",
"csvw:Table",
"csvw:Schema",
"csvw:Column"
]
}
},
{
"@id": "csvw:maxLength",
"@type": "rdf:Property",
"rdfs:label": {
"en": "max length"
},
"rdfs:comment": {
"en": "The maximum length of the value of the cell."
},
"rdfs:domain": {
"owl:unionOf": [
"csvw:TableGroup",
"csvw:Table",
"csvw:Schema",
"csvw:Column"
]
},
"rdfs:range": "xsd:nonNegativeInteger"
},
{
"@id": "csvw:minExclusive",
"@type": "rdf:Property",
"rdfs:label": {
"en": "min exclusive"
},
"rdfs:comment": {
"en": "The minimum value for the cell (exclusive)."
},
"rdfs:domain": {
"owl:unionOf": [
"csvw:TableGroup",
"csvw:Table",
"csvw:Schema",
"csvw:Column"
]
}
},
{
"@id": "csvw:minInclusive",
"@type": "rdf:Property",
"rdfs:label": {
"en": "min inclusive"
},
"rdfs:comment": {
"en": "The minimum value for the cell (inclusive)."
},
"rdfs:domain": {
"owl:unionOf": [
"csvw:TableGroup",
"csvw:Table",
"csvw:Schema",
"csvw:Column"
]
}
},
{
"@id": "csvw:minLength",
"@type": "rdf:Property",
"rdfs:label": {
"en": "min length"
},
"rdfs:comment": {
"en": "The minimum length of the value of the cell."
},
"rdfs:domain": {
"owl:unionOf": [
"csvw:TableGroup",
"csvw:Table",
"csvw:Schema",
"csvw:Column"
]
},
"rdfs:range": "xsd:nonNegativeInteger"
},
{
"@id": "csvw:name",
"@type": "rdf:Property",
"rdfs:label": {
"en": "name"
},
"rdfs:comment": {
"en": "An atomic property that gives a canonical name for the column. This must be a string. Conversion specifications must use this property as the basis for the names of properties/elements/attributes in the results of conversions."
},
"rdfs:domain": "csvw:Column"
},
{
"@id": "csvw:notes",
"@type": "rdf:Property",
"rdfs:label": {
"en": "notes"
},
"rdfs:comment": {
"en": "An array of objects representing annotations. This specification does not place any constraints on the structure of these objects."
},
"rdfs:domain": "csvw:Table"
},
{
"@id": "csvw:null",
"@type": "rdf:Property",
"rdfs:label": {
"en": "null"
},
"rdfs:comment": {
"en": "The string used for null values. If not specified, the default for this is the empty string."
},
"rdfs:domain": {
"owl:unionOf": [
"csvw:TableGroup",
"csvw:Table",
"csvw:Schema",
"csvw:Column"
]
}
},
{
"@id": "csvw:predicateUrl",
"@type": "rdf:Property",
"rdfs:label": {
"en": "predicate URL"
},
"rdfs:comment": {
"en": "An atomic property that holds one or more URIs that may be used as URIs for predicates if the table is mapped to another format."
},
"rdfs:domain": "csvw:Column",
"rdfs:range": "xsd:anyURI"
},
{
"@id": "csvw:primaryKey",
"@type": "rdf:Property",
"rdfs:label": {
"en": "primary key"
},
"rdfs:comment": {
"en": "A column reference property that holds either a single reference to a column description object or an array of references."
},
"rdfs:domain": "csvw:Schema"
},
{
"@id": "csvw:quoteChar",
"@type": "rdf:Property",
"rdfs:label": {
"en": "quote char"
},
"rdfs:comment": {
"en": "The character that is used around escaped cells."
},
"rdfs:domain": "csvw:Dialect"
},
{
"@id": "csvw:required",
"@type": "rdf:Property",
"rdfs:label": {
"en": "required"
},
"rdfs:comment": {
"en": "A boolean value which indicates whether every cell within the column must have a non-null value."
},
"rdfs:domain": "csvw:Column",
"rdfs:range": "xsd:boolean"
},
{
"@id": "csvw:resources",
"@type": "rdf:Property",
"rdfs:label": {
"en": "resources"
},
"rdfs:comment": {
"en": "An array of table descriptions for the tables in the group."
},
"rdfs:domain": "csvw:TableGroup",
"rdfs:range": "csvw:Table"
},
{
"@id": "csvw:row",
"@type": "rdf:Property",
"rdfs:label": {
"en": "row"
},
"rdfs:comment": {
"en": "Relates a Table to each Row output."
},
"rdfs:subPropertyOf": "rdfs:member",
"rdfs:domain": "csvw:Table"
},
{
"@id": "csvw:schema",
"@type": "rdf:Property",
"rdfs:label": {
"en": "schema"
},
"rdfs:comment": {
"en": "An object property that provides a schema description for an individual table, or all the tables in a group."
},
"rdfs:domain": {
"owl:unionOf": [
"csvw:TableGroup",
"csvw:Table"
]
},
"rdfs:range": "csvw:Schema"
},
{
"@id": "csvw:separator",
"@type": "rdf:Property",
"rdfs:label": {
"en": "separator"
},
"rdfs:comment": {
"en": "The character used to separate items in the string value of the cell."
},
"rdfs:domain": {
"owl:unionOf": [
"csvw:TableGroup",
"csvw:Table",
"csvw:Schema",
"csvw:Column"
]
}
},
{
"@id": "csvw:skipBlankRows",
"@type": "rdf:Property",
"rdfs:label": {
"en": "skip blank rows"
},
"rdfs:comment": {
"en": "Indicates whether to ignore wholly empty rows (ie rows in which all the cells are empty). The default is false."
},
"rdfs:domain": "csvw:Dialect",
"rdfs:range": "xsd:boolean"
},
{
"@id": "csvw:skipColumns",
"@type": "rdf:Property",
"rdfs:label": {
"en": "skip columns"
},
"rdfs:comment": {
"en": "The number of columns to skip at the beginning of each row, before any header columns. The default is 0."
},
"rdfs:domain": "csvw:Dialect",
"rdfs:range": "xsd:nonNegativeInteger"
},
{
"@id": "csvw:skipInitialSpace",
"@type": "rdf:Property",
"rdfs:label": {
"en": "skip initial space"
},
"rdfs:comment": {
"en": "If true, sets the trim flag to \"start\". If false, to false."
},
"rdfs:domain": "csvw:Dialect",
"rdfs:range": "xsd:boolean"
},
{
"@id": "csvw:skipRows",
"@type": "rdf:Property",
"rdfs:label": {
"en": "skip rows"
},
"rdfs:comment": {
"en": "The number of rows to skip at the beginning of the file, before a header row or tabular data."
},
"rdfs:domain": "csvw:Dialect",
"rdfs:range": "xsd:nonNegativeInteger"
},
{
"@id": "csvw:source",
"@type": "rdf:Property",
"rdfs:label": {
"en": "source"
},
"rdfs:comment": {
"en": "The format to which the tabular data should be transformed prior to the transformation using the template. If the value is \"json\", the tabular data should first be transformed first to JSON based on the simple mapping defined in Generating JSON from Tabular Data on the Web. If the value is \"rdf\", it should similarly first be transformed to XML based on the simple mapping defined in Generating RDF from Tabular Data on the Web."
},
"rdfs:domain": "csvw:Template"
},
{
"@id": "csvw:table",
"@type": "rdf:Property",
"rdfs:label": {
"en": "table"
},
"rdfs:comment": {
"en": "Relates an Table group to annotated tables. (Note, this is different from csvw:resources, which relates metadata, rather than resulting annotated table descriptions."
},
"rdfs:subPropertyOf": "rdfs:member",
"rdfs:domain": "csvw:TableGroup",
"rdfs:range": "csvw:Table"
},
{
"@id": "csvw:table-direction",
"@type": "rdf:Property",
"rdfs:label": {
"en": "table direction"
},
"rdfs:comment": {
"en": "One of csvw:rtl csvw:ltr or csvw:default. Indicates whether the tables in the group should be displayed with the first column on the right, on the left, or based on the first character in the table that has a specific direction. "
},
"rdfs:domain": {
"owl:unionOf": [
"csvw:TableGroup",
"csvw:Table"
]
},
"rdfs:range": "csvw:Direction"
},
{
"@id": "csvw:targetFormat",
"@type": "rdf:Property",
"rdfs:label": {
"en": "target format"
},
"rdfs:comment": {
"en": "A URL for the format that will be created through the transformation. If one has been defined, this should be a URL for a media type, in the form https://www.iana.org/assignments/media-types/media-type such as https://www.iana.org/assignments/media-types/text/calendar. Otherwise, it can be any URL that describes the target format."
},
"rdfs:domain": "csvw:Template"
},
{
"@id": "csvw:templateFormat",
"@type": "rdf:Property",
"rdfs:label": {
"en": "template format"
},
"rdfs:comment": {
"en": "A URL for the format that is used by the template. If one has been defined, this should be a URL for a media type, in the form https://www.iana.org/assignments/media-types/media-type such as https://www.iana.org/assignments/media-types/application/javascript. Otherwise, it can be any URL that describes the template format."
},
"rdfs:domain": "csvw:Template"
},
{
"@id": "csvw:templates",
"@type": "rdf:Property",
"rdfs:label": {
"en": "templates"
},
"rdfs:comment": {
"en": "An array of template specifications that provide mechanisms to transform the tabular data into other formats. "
},
"rdfs:domain": {
"owl:unionOf": [
"csvw:TableGroup",
"csvw:Table"
]
},
"rdfs:range": "csvw:Template"
},
{
"@id": "csvw:text-direction",
"@type": "rdf:Property",
"rdfs:label": {
"en": "text direction"
},
"rdfs:comment": {
"en": "One of csvw:rtl or csvw:ltr. Indicates whether the text within cells should be displayed by default as left-to-right or right-to-left text. "
},
"rdfs:domain": {
"owl:unionOf": [
"csvw:TableGroup",
"csvw:Table",
"csvw:Schema",
"csvw:Column"
]
},
"rdfs:range": "csvw:Direction"
},
{
"@id": "csvw:title",
"@type": "rdf:Property",
"rdfs:label": {
"en": "title"
},
"rdfs:comment": {
"en": "For a Template: A natural language property that describes the format that will be generated from the transformation. This is useful if the target format is a generic format (such as application/json) and the transformation is creating a specific profile of that format.\n\nFor a Column: A natural language property that provides possible alternative names for the column."
},
"rdfs:domain": {
"owl:unionOf": [
"csvw:Template",
"csvw:Column"
]
}
},
{
"@id": "csvw:trim",
"@type": "rdf:Property",
"rdfs:label": {
"en": "trim"
},
"rdfs:comment": {
"en": "Indicates whether to trim whitespace around cells; may be true, false, start or end. The default is false."
},
"rdfs:domain": "csvw:Dialect",
"rdfs:range": "xsd:boolean"
},
{
"@id": "csvw:uriTemplate",
"@type": "rdf:Property",
"rdfs:label": {
"en": "uri template"
},
"rdfs:comment": {
"en": "A URI template property that may be used to create a unique identifier for each row when mapping data to other formats."
},
"rdfs:domain": "csvw:Schema"
}
],
"rdfs_datatypes": [
{
"@id": "csvw:json",
"@type": "rdfs:Datatype",
"rdfs:label": {
"en": "json"
},
"rdfs:comment": {
"en": "A literal containing JSON."
},
"rdfs:subClassOf": "rdfs:Literal"
}
],
"rdfs_instances": [
{
"@id": "csvw:ltr",
"@type": "Direction",
"rdfs:label": {
"en": "left to right"
},
"rdfs:comment": {
"en": "Indicates text should be processed left to right."
}
},
{
"@id": "csvw:rtl",
"@type": "Direction",
"rdfs:label": {
"en": "right to left"
},
"rdfs:comment": {
"en": "Indiects text should be processed right to left"
}
}
]
}
E. References
E.1 Normative references
- [BCP47]
- A. Phillips; M. Davis. Tags for Identifying Languages. September 2009. IETF Best Current Practice. URL: https://tools.ietf.org/html/bcp47
- [ECMASCRIPT]
- Allen Wirfs-Brock. ECMA-262 ECMAScript Language Specification, Edition 6. Draft. URL: https://people.mozilla.org/~jorendorff/es6-draft.html
- [UNICODE-BIDI]
- Mark Davis; Aharon Lanin; Andrew Glass. TR9, Unicode Bidirectional Algorithm. Report. URL: https://unicode.org/reports/tr9/
- [URI-TEMPLATE]
- Joe Gregorio; Roy T. Fielding; Marc Hadley; Mark Nottingham; David Orchard. URI Template. March 2012. RFC 6570. URL: https://www.rfc-editor.org/rfc/rfc6570.txt
- [encoding]
- Anne van Kesteren; Joshua Bell; Addison Phillips. Encoding. 16 September 2014. W3C Candidate Recommendation. URL: https://www.w3.org/TR/encoding/
- [tabular-data-model]
- Jeni Tennison; Gregg Kellogg. Model for Tabular Data and Metadata on the Web. W3C Working Draft. URL: https://www.w3.org/TR/2015/WD-tabular-data-model-20150108/
- [xmlschema-2]
- Paul V. Biron; Ashok Malhotra. XML Schema Part 2: Datatypes Second Edition. 28 October 2004. W3C Recommendation. URL: https://www.w3.org/TR/xmlschema-2/
- [xslt-21]
- Michael Kay. XSL Transformations (XSLT) Version 3.0. 2 October 2014. W3C Last Call Working Draft. URL: https://www.w3.org/TR/xslt-30/
E.2 Informative references
- [JSON-LD]
- Manu Sporny; Gregg Kellogg; Markus Lanthaler. JSON-LD 1.0. 16 January 2014. W3C Recommendation. URL: https://www.w3.org/TR/json-ld/
- [rdfa-core]
- Ben Adida; Mark Birbeck; Shane McCarron; Ivan Herman et al. RDFa Core 1.1 - Third Edition. 16 December 2014. W3C Proposed Edited Recommendation. URL: https://www.w3.org/TR/rdfa-core/