CARVIEW |
TODO in the end
This is an unofficial proposal.
Introduction
The [[[RFC7049]]] [[RFC7049]] is a data format for constrained environments. It is designed to be processable with small code, and to produce small messages. Its underlying data model is an extension of the JSON data model [[RFC8259]]. That makes it an possible format to transport JSON-LD documents, in constrained contexts such as, e.g., the Web of Things. It also allows JSON-LD contexts to be used to provide a layer of semantics on existing CBOR data.
More specifically, this note specifies how to convert between the CBOR data format and the JSON-LD internal representation (a is straightforward abstraction of the JSON format using the following terms, which are only included here to please ReSpec, but should not be displayed: array, boolean, map, entries, null, number, long, double, scalar, string ). Since CBOR emphasizes small messages, we also propose a number of techniques for reducing the size of messages.
To understand this note you must be familiar both the JSON format [[RFC8259]] and the CBOR format [[RFC7049]]. You must also understand the basics of JSON-LD 1.1 [[JSON-LD11]].
Serializing JSON-LD to CBOR
While being generally based on the JSON data model, the CBOR specification does not specify a standard conversion between both models. This is due in part to the fact that the CBOR model is an extension of the former. In this section, we provide a complete specification of such a conversion, following the guidelines from §4.2 of RFC7049, and taking into account the particularities of JSON-LD.
- Null and boolean values are serialized with major type 7, and additional type value 22 for `null`, 20 for `false` and 21 for `true`.
- Numbers without a fractional part (integer numbers) are serialized as CBOR integers (major types 0 and 1), choosing the shortest form.
- Numbers with a fractional part are serialized as CBOR floating-point values (major type 7, and additional type value from 25 to 27). Preferably, the shortest exact floating-point representation is used; for instance, 1.5 is represented in a 16-bit floating-point value (not all implementations will be capable of efficiently finding the minimum form, though).
- Strings are serialized as CBOR UTF-8 strings (major type 3). Note that, unlike JSON, CBOR does not require any escaping in strings, so all escape codes (if any) present in the JSON source must be replaced beforehand by their corresponding character.
- Arrays are serialized as CBOR arrays (major type 4), and their items are serialized by applying these rules recursively.
-
JSON objects,
are internally represented as maps in JSON-LD,
and are generally serialized as CBOR maps (major type 5).
All keys are strings, and are serialized as above;
all values are serialized by applying these rules recursively.
There are however a few exceptions described below,
to take into account their specific meaning in JSON-LD:
- If the object is a value object with a `@type` of `xsd:integer` (TODO full IRI), an if its `@value` is a valid decimal representation of an integer, then the object is serialized as a CBOR integer (major type 0 or 1) choosing the shortest form.
- If the object is a value object with a `@type` of `xsd:base64Binary` (TODO full IRI), an if its `@value` complies with the lexical space of `xsd:base64Binary` (i.e. is it a valid base 64 string [[RFC3548]]), then the JSON object is serialized as a CBOR binary string (major type 2) representing the decoded value.
Parsing CBOR to JSON-LD
This section describes how to parse CBOR data into the JSON-LD internal representation, following the guidelines from §4.1 of RFC7049. It is the inverse process of the one described in .
-
A data item with major type 0 or 1 (positive or negative integer, respectively)
is parsed as the equivalent number.
If the size of this number exceeds the size supported by the implementation,
a parser must produce instead a map with two entries:
- a `@type` entry whose value is `xsd:integer` (TODO full IRI),
- a `@value` entry whose value is a decimal representation of the integer, as a string.
-
A data item with major type 2 (binary string)
is parsed as a map with two entries:
- a `@type` entry whose value is `xsd:base64Binary` (TODO full IRI),
- a `@value` entry whose value is the base 64 encoding of the binary string, complying with the constraints on the lexical space of `xsd:base64Binary`.
- A data item with major type 3 (text string) is parsed as the equivalent string.
- A data item with major type 4 (array) is parsed as an array, whose items are parsed by recursively applying these rules to the items of the CBOR array.
- A data item with major type 5 (map) is parsed as a map, whose keys and values are parsed by recursively applying these rules to the items of the CBOR map. Any entry whose key is not a string can not be inserted in the map, and must therefore raise an error.
- the boolean value `false` if the additional type value is 20,
- the boolean value `true` if the additional type value is 21,
- the null if the additional type value is 22.
- if the additional type value is 24,
25 or 26 (floating point number),
- if the value is finite, the data item is parsed as the corresponding number;
-
otherwise the data item represents either NaN,
positive infinity or negative infinity;
it is serialized as a map with two entries:
- a `@type` entry whose value is `xsd:double` (TODO full IRI),
- a `@value` entry whose value is the string `NaN`, `INF` or `-INF`, respectively.
- the corresponding floating point number if the additional type value is 24, 25 or 26 and its value is neither `NaN`, positive infinity nor negative infinity,
CBOR-specific optimization of JSON-LD data
TODO
JSON-LD semantic tag
TODO propose numeric aliases for keywords, and possibly often used IRIs?
Designing contexts for CBOR
TODO do no alias keywords, do not use type coercion, define small prefixes, more...
The `rdf:CBOR` datatype
The intent of this datatype is to make it possible to convey CBOR data as RDF literals. It is defined as a subset of the `xsd:base64Binary` datatype.
The rdf:CBOR
datatype is defined as follows:
- The IRI denoting this datatype
- is `https://www.w3.org/1999/02/22-rdf-syntax-ns#CBOR`.
- The lexical space
- is the subset of the lexical space of `xsd:base64Binary`, for which the lexical mapping produces well-formed CBOR data [[RFC7049]].
- The value space
- is the set of finite-length sequences of zero or more binary octets, which are well-formed CBOR data [[RFC7049]].
- The lexical-to-value mapping and canonical mapping
- are the ones defined for the `xsd:base64Binary` datatype [[XSD]].