A TOML parser for Zig.
This parser should be spec compliant.
- Parse all spec-compliant TOML documents.
- WIP: parses all valid TOML files, but also parses some invalid ones, see Spec Compliancy
- Use Zig readers and writers
- Populate structs
- Populate dynamic values
- TOML builder/write stream
- Stringify entire structs and tables
These features are yet to be implemented, and are actively being worked on, in order of severity:
- Check files for invalid control sequences and characters
- Fix parsing issues related to keys and tables being re-defined
- Check integer literals against the spec (leading zeroes are currently allowed)
Microwave has 5 sets of APIs:
- Parser API - for parsing an entire TOML file into a tree-like structure
- Populate API - for mapping a TOML file into a given struct
- Stringify API - for writing TOML files from a tree or given struct
- Write Stream API - for building TOML files safely with debug assertions
- Tokeniser/Scanner API - for iterating through a TOML file for each significant token
Microwave allows you to parse an entire TOML file from either a slice or reader into a tree-like structure that can be traversed, inspected or modified manually.
const document = try microwave.parse.fromSlice(allocator, toml_text);
defer document.deinit(); // all pointers will be freed using the created internal arena
// use document.root_table
const document = try microwave.parse.fromReader(allocator, file.reader());
defer document.deinit();
// use document.root_table
If you would like to personally own all of the pointers without creating an arena
for them, use the *Owned
variation of the functions.
These return a parse.Value.Table
directly, representing the root table of the
TOML file.
The best way to free the resulting root table is to use parse.deinitTable
.
var owned_tree = try microwave.parse.fromSliceOwned(allocator, toml_text); // or .fromReaderOwned
defer microwave.parse.deinitTable(allocator, &owned_tree);
// use owned_tree
pub const Value = union(enum) {
pub const Table = struct {
keys: std.StringArrayHashMapUnmanaged(Value),
};
pub const Array = std.ArrayListUnmanaged(Value);
pub const ArrayOfTables = std.ArrayListUnmanaged(Table);
pub const DateTime = struct {
date: ?[]const u8 = null,
time: ?[]const u8 = null,
offset: ?[]const u8 = null,
pub fn dupe(self: DateTime, allocator: std.mem.Allocator) !DateTime;
pub fn deinit(self: DateTime, allocator: std.mem.Allocator) ;
};
none: void,
table: Table,
array: Array,
array_of_tables: ArrayOfTables,
string: []const u8,
integer: i64,
float: f64,
boolean: bool,
date_time: DateTime,
pub fn dupeRecursive(self: Value, allocator: std.mem.Allocator) !Value;
pub fn deinitRecursive(self: *Value, allocator: std.mem.Allocator) void;
};
It's often helpful to map a TOML file directly onto a Zig struct, for example for
config files. Microwave lets you do this using the Populate(T)
API:
const Dog = struct {
pub const Friend = struct {
name: []const u8,
};
name: []const u8,
cross_breeds: []const []const u8,
age: i64,
friends: []Friend,
vet_info: microwave.parse.Value.Table,
}
const dog = try microwave.Populate(Dog).createFromSlice(allocator, toml_text); // or .createFromReader
defer dog.deinit();
Since TOML only supports a subset of the types that are available in Zig, your destination struct must consist of the following types:
TOML Type | Zig Type | Examples |
---|---|---|
String | []const u8 |
"Barney" |
Float | f64 |
5.0e+2 |
Integer | i64 , f64 |
16 |
Boolean | bool |
true , false |
Date/Time | parse.Value.DateTime |
2025-04-19T00:43:00.500+05:00 |
Specific Table | struct { ... } |
{ name = "Barney", age = 16 } |
Array of Tables | []struct {} |
[[pet]] |
Inline Array | []T |
["Hello", "Bonjour", "Hola"] |
Any Table | parse.Value.Table |
Any TOML table |
Any Value | parse.Value |
Any TOML value |
You can also specify an option of different types using unions. For example:
const Animal = union(enum) {
dog: struct {
name: []const u8,
breed: []const u8,
},
cat: struct {
name: []const u8,
number_of_colours: usize,
},
};
const animal = try microwave.Populate(Animal).createFromSlice(allocator, toml_text);
defer animal.deinit();
If the field is entirely optional and may not exist, use the Zig optional indiciator on the type, for example:
const Person = struct {
name: []const u8,
age: i64,
salary: f64,
job: ?[]const u8, // can be missing from the TOML file
};
const person = try microwave.Populate(Person).createFromSlice(allocator, toml_text);
defer person.deinit();
Like the parser API, you might want to own the pointers yourself rather than delegate
them to an arena. You can use the *Owned
variations of the functions.
These return the value directly.
You can free the data in the returned value however you want, but if you're using
an stack-based allocator like arena or fixed buffer allocator, then it's best to
use Populate(T).deinitRecursive
.
var dog = try microwave.Populate(Dog).createFromSliceOwned(allocator, toml_text);
defer microwave.Populate(Dog).deinitRecursive(allocator, &dog);
Instead of making Microwave create the value to populate, you can provide it with
a pointer to an existing one to populate using the into*
functions:
var dog: Dog = undefined;
try microwave.Populate(Dog).intoFromSliceOwned(allocator, &dog); // or .intoFromReaderOwned
defer microwave.Populate(Dog).deinitRecursive(allocator, &dog);
Microwave can try its best to serialise a given struct value or parse.Value.Table
into a writer:
try microwave.stringify.write(allocator, dog, file.writer());
try microwave.stringify.writeTable(allocator, root_table, file.writer());
Note
There's no need to de-init anything, the allocator is for temporary allocations.
You can build a TOML file manually, with safety assertions that the file is well-formed, using the write stream API:
var stream: microwave.write_stream.Stream(@TypeOf(file.writer()), .{
.newlines = .lf,
.unicode_full_escape_strings = false,
.format_float_options = .{
.mode = .scientific,
.precision = null,
},
.date_time_separator = .t,
}) = .{
.underlying_writer = file.writer(),
.allocator = allocator,
};
defer stream.deinit();
You can use the following functions on the write_stream.Stream
struct to build your TOML file:
pub fn beginDeepKeyPair(self: *Stream, key_parts: []const []const u8) !void;
pub fn beginKeyPair(self: *Stream, key_name: []const u8) !void;
pub fn writeString(self: *Stream, string: []const u8) !void;
pub fn writeInteger(self: *Stream, integer: i64) !void;
pub fn writeFloat(self: *Stream, float: f64) !void;
pub fn writeBoolean(self: *Stream, boolean: bool) !void;
pub fn writeDateTime(self: *Stream, date_time: parse.Value.DateTime) !void;
pub fn beginArray(self: *Stream) !void;
pub fn arrayLine(self: *Stream) !void;
pub fn endArray(self: *Stream) !void;
pub fn beginInlineTable(self: *Stream) !void;
pub fn endInlineTable(self: *Stream) !void;
pub fn writeDeepTable(self: *Stream, key_parts: []const []const u8) !void;
pub fn writeTable(self: *Stream, key_name: []const u8) !void;
pub fn writeDeepManyTable(self: *Stream, key_parts: []const []const u8) !void;
pub fn writeManyTable(self: *Stream, key_name: []const u8) !void;
As a low level API, Microwave also provides the ability to scan through a file and iterate through individual tokens.
Only basic state checks are done at this stage, and that state you have to manage yourself. It doesn't guarantee a well-formed TOML file. Most of those checks are done in the parsing stage.
If you have access to the entire slice of the TOML file, you can initialise the scanner directly:
var scanner: microwave.Scanner = .{ .buffer = slice };
while (try scanner.next()) |token| {
// token.kind, token.range.start, token.range.end
// modify state with scanner.setState(state)
}
The default scanner may return any of the following errors:
pub const Error = error{ UnexpectedEndOfBuffer, UnexpectedByte };
You can also tokenise the TOML file using a reader:
var scanner = microwave.Scanner.bufferedReaderScanner(file.reader());
// use scanner.next() in the same way
The buffered reader scanner may return any of the following errors:
pub const Error = error{ UnexpectedEndOfBuffer, UnexpectedByte, BufferTooSmall };
A TOML file can be tokenised differently depending on what kind of entities need
to be read. The scanner API doesn't manage this for you, but with your own reading
logic you can update the state of the scanner using the scanner.setState
function:
while (try scanner.next()) |token| {
if (token.kind == .table_start) {
scanner.setState(.table_key);
}
if (token.kind == .table_end) {
scanner.setState(.root);
}
}
The valid states are listed below:
State Name | Enum Value | Description |
---|---|---|
Root | .root |
Either ordinary newline-separated keys, or [table] and [[many table]] structures |
Table Key | .table_key |
The keys inside [ .. ] and [[ ... ]] |
Inline Key | .inline_key |
Delimeter-separated inline table keys |
Value | .value |
An ordinary value literal, array or inline table opening token |
Array Container | .array_container |
Same as .value , but can process array close tokens and value delimeters |
The default state is .root
.
When encountering an error, you can use scanner.cursor()
to get
the file offset that it occurred at.
If you encounter error.BufferTooSmall
while using the buffered reader scanner,
you can increase the size of the buffer for your project by instantiating Scanner.BufferedReaderScanner
directly:
var scanner = microwave.Scanner.BufferedReaderScanner(8192, @TypeOf(file.reader())) = .{
.reader = file.reader(),
};
To access the contents of a token, you can use the scanner.tokenContents
function:
while (try scanner.next()) |token| {
if (token.kind == .string) {
std.log.info("Found string! {s}", .{ scanner.tokenContents(token) });
}
}
Note
For the buffered reader scanner, previous token contents may be invalidated at any point while iterating.
Check out my other project, dishwasher for parsing XML files.
Not sure.
See the tests folder to check Microwave against the various official TOML test cases.
All failed tests are false positives, which means Microwave can read all valid TOML files, but can also read many invalid ones too.
- fail: invalid/control/bare-cr.toml
- fail: invalid/control/comment-cr.toml
- fail: invalid/control/comment-del.toml
- fail: invalid/control/comment-ff.toml
- fail: invalid/control/comment-lf.toml
- fail: invalid/control/comment-null.toml
- fail: invalid/control/comment-us.toml
- fail: invalid/control/multi-cr.toml
- fail: invalid/control/multi-del.toml
- fail: invalid/control/multi-lf.toml
- fail: invalid/control/multi-null.toml
- fail: invalid/control/multi-us.toml
- fail: invalid/control/rawmulti-cr.toml
- fail: invalid/control/rawmulti-del.toml
- fail: invalid/control/rawmulti-lf.toml
- fail: invalid/control/rawmulti-null.toml
- fail: invalid/control/rawmulti-us.toml
- fail: invalid/encoding/bad-codepoint.toml
- fail: invalid/encoding/bad-utf8-in-comment.toml
- fail: invalid/encoding/bad-utf8-in-multiline-literal.toml
- fail: invalid/encoding/bad-utf8-in-string-literal.toml
- fail: invalid/float/leading-zero.toml
- fail: invalid/float/leading-zero-neg.toml
- fail: invalid/float/leading-zero-plus.toml
- fail: invalid/inline-table/duplicate-key-3.toml
- fail: invalid/inline-table/overwrite-02.toml
- fail: invalid/inline-table/overwrite-05.toml
- fail: invalid/inline-table/overwrite-08.toml
- fail: invalid/integer/leading-zero-1.toml
- fail: invalid/integer/leading-zero-2.toml
- fail: invalid/integer/leading-zero-3.toml
- fail: invalid/integer/leading-zero-sign-1.toml
- fail: invalid/integer/leading-zero-sign-2.toml
- fail: invalid/integer/leading-zero-sign-3.toml
- fail: invalid/spec/inline-table-2-0.toml
- fail: invalid/spec/table-9-0.toml
- fail: invalid/spec/table-9-1.toml
- fail: invalid/table/append-with-dotted-keys-1.toml
- fail: invalid/table/append-with-dotted-keys-2.toml
- fail: invalid/table/duplicate.toml
- fail: invalid/table/duplicate-key-dotted-table.toml
- fail: invalid/table/duplicate-key-dotted-table2.toml
- fail: invalid/table/redefine-2.toml
- fail: invalid/table/redefine-3.toml
- fail: invalid/table/super-twice.toml
passing: 512/557
All microwave code is under the MIT license.