| CARVIEW |
BSON [bee · sahn], short for Binary JSON, is a binary-encoded serialization of JSON-like documents. Like JSON, BSON supports the embedding of documents and arrays within other documents and arrays. BSON also contains extensions that allow representation of data types that are not part of the JSON spec. For example, BSON has a Date type and a BinData type.
BSON can be compared to binary interchange formats, like Protocol Buffers. BSON is more "schema-less" than Protocol Buffers, which can give it an advantage in flexibility but also a slight disadvantage in space efficiency (BSON has overhead for field names within the serialized data).
BSON was designed to have the following three characteristics:
-
Lightweight
Keeping spatial overhead to a minimum is important for any data representation format, especially when used over the network.
-
Traversable
BSON is designed to be traversed easily. This is a vital property in its role as the primary data representation for MongoDB.
-
Efficient
Encoding data to BSON and decoding from BSON can be performed very quickly in most languages due to the use of C data types.
Specification
Version 1.0
BSON is a binary format in which zero or more key/value pairs are stored as a single entity. We call this entity a document.
The following grammar specifies version 1.0 of the
BSON standard. We've written the grammar using a
pseudo-BNF
syntax. Valid BSON data is represented by
the document non-terminal.
Basic Types
The following basic types are used as terminals in the rest of the grammar. Each type must be serialized in little-endian format.
| byte | 1 byte (8-bits) |
| int32 | 4 bytes (32-bit signed integer) |
| int64 | 8 bytes (64-bit signed integer) |
| double | 8 bytes (64-bit IEEE 754 floating point) |
Non-terminals
The following specifies the rest of the BSON
grammar. Note that quoted strings represent terminals,
and should be interpreted with C semantics
(e.g. "\x01" represents the byte 0000
0001). Also note that we use the *
operator as shorthand for repetition
(e.g. ("\x01"*2)
is "\x01\x01"). When used as a unary
operator, * means that the repetition can
occur 0 or more times.
| document | ::= | int32 e_list "\x00" | BSON Document |
| e_list | ::= | element e_list | Sequence of elements |
| | | "" | ||
| element | ::= | "\x01" e_name double | Floating point |
| | | "\x02" e_name string | UTF-8 string | |
| | | "\x03" e_name document | Embedded document | |
| | | "\x04" e_name document | Array | |
| | | "\x05" e_name binary | Binary data | |
| | | "\x06" e_name | Undefined — Deprecated | |
| | | "\x07" e_name (byte*12) | ObjectId | |
| | | "\x08" e_name "\x00" | Boolean "false" | |
| | | "\x08" e_name "\x01" | Boolean "true" | |
| | | "\x09" e_name int64 | UTC datetime | |
| | | "\x0A" e_name | Null value | |
| | | "\x0B" e_name cstring cstring | Regular expression | |
| | | "\x0C" e_name string (byte*12) | DBPointer — Deprecated | |
| | | "\x0D" e_name string | JavaScript code | |
| | | "\x0E" e_name string | Symbol | |
| | | "\x0F" e_name code_w_s | JavaScript code w/ scope | |
| | | "\x10" e_name int32 | 32-bit Integer | |
| | | "\x11" e_name int64 | Timestamp | |
| | | "\x12" e_name int64 | 64-bit integer | |
| | | "\xFF" e_name | Min key | |
| | | "\x7F" e_name | Max key | |
| e_name | ::= | cstring | Key name |
| string | ::= | int32 (byte*) "\x00" | String |
| cstring | ::= | (byte*) "\x00" | CString |
| binary | ::= | int32 subtype (byte*) | Binary |
| subtype | ::= | "\x00" | Binary / Generic |
| | | "\x01" | Function | |
| | | "\x02" | Binary (Old) | |
| | | "\x03" | UUID | |
| | | "\x05" | MD5 | |
| | | "\x80" | User defined | |
| code_w_s | ::= | int32 string document | Code w/ scope |
Examples
The following are some example documents (in JavaScript / Python style syntax) and their corresponding BSON representations. Try mousing over them for some useful correlation.
{"hello": "world"} | → |
"\x16\x00\x00\x00\x02hello\x00
|
{"BSON": ["awesome", 5.05, 1986]} | → |
"1\x00\x00\x00\x04BSON\x00&\x00
|
Implementations
Implementations of the BSON specification exist for many different languages / environments. Some implementations are currently embedded within MongoDB drivers, since MongoDB was the first large project to make use of BSON. Over time those libraries will be made more stand-alone, but they should be usable independently of MongoDB in their current state.
BSON Libraries
Projects Using BSON
-
MongoDB, the document-oriented database, uses BSON as both the network and on-disk representation of documents.
If you know of other BSON implementations or projects using BSON, please add them.
FAQ
What is the point of BSON when it is no smaller than JSON in many cases?
BSON is designed to be efficient in space, but in many cases is not much more efficient than JSON. In some cases BSON uses even more space than JSON. The reason for this is another of the BSON design goals: traversability. BSON adds some "extra" information to documents, like length prefixes, that make it easy and fast to traverse.
BSON is also designed to be fast to encode and decode. For example, integers are stored as 32 (or 64) bit integers, so they don't need to be parsed to and from text. This uses more space than JSON for small integers, but is much faster to parse.
Where can I get more help/infomation?
The best place to ask questions about BSON is on the BSON mailing list.
How can I contribute or make fixes to this site?
The best way to contribute to this site is to fork the project and send us a pull request.