CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 31
3 Digging Deeper: JSON Parser
In this tutorial we will build a JSON parser using a tokenizer. SwiftParsec provides a type, GenericTokenParser
, that helps us to build tokenizers. The JSON format is defined in RFC 4627. The parser will generate a data structure built from an enum representing the different JSON values. Here is the complete code (you can skip to the explanation if you feel dazed and confused):
import SwiftParsec
public enum JSONValue {
case JString(String)
case JNumber(Double)
case JBool(Bool)
case JNull
case JArray([JSONValue])
case JObject([String: JSONValue])
case Error
public static let parser: GenericParser<String, (), JSONValue> = {
let json = LanguageDefinition<()>.json
let lexer = GenericTokenParser(languageDefinition: json)
let symbol = lexer.symbol
let stringLiteral = lexer.stringLiteral
let jstring = JSONValue.JString <^> stringLiteral
let jnumber = JSONValue.JNumber <^>
(lexer.float.attempt <|> lexer.integerAsFloat)
let trueValue = symbol("true") *> GenericParser(result: true)
let falseValue = symbol("false") *> GenericParser(result: false)
let jbool = JSONValue.JBool <^> (trueValue <|> falseValue)
let jnull = symbol("null") *> GenericParser(result: JSONValue.JNull)
var jarray: GenericParser<String, (), JSONValue>!
var jobject: GenericParser<String, (), JSONValue>!
GenericParser.recursive { (jvalue: GenericParser<String, (), JSONValue>) in
let jarrayValues = lexer.commaSeparated(jvalue)
jarray = JSONValue.JArray <^> lexer.brackets(jarrayValues)
let nameValue: GenericParser<String, (), (String, JSONValue)> =
stringLiteral >>- { name in
symbol(":") *> jvalue.map { value in (name, value) }
}
let dictionary: GenericParser<String, (), [String: JSONValue]> =
(symbol(",") *> nameValue).manyAccumulator { (assoc, var dict) in
let (name, value) = assoc
dict[name] = value
return dict
}
let jobjectDict: GenericParser<String, (), [String: JSONValue]> =
nameValue >>- { assoc in
dictionary >>- { (var dict) in
let (name, value) = assoc
dict[name] = value
return GenericParser(result: dict)
}
}
let jobjectValues = jobjectDict <|> GenericParser(result: [:])
jobject = JSONValue.JObject <^> lexer.braces(jobjectValues)
return jstring <|> jnumber <|> jbool <|> jnull <|> jarray <|> jobject
}
return lexer.whiteSpace *> (jobject <|> jarray)
}()
public init(data: String) throws {
try self = JSONValue.parser.run(sourceName: "", input: data)
}
public var string: String? {
guard case .JString(let str) = self else { return nil }
return str
}
public var double: Double? {
guard case .JNumber(let dbl) = self else { return nil }
return dbl
}
public var bool: Bool? {
guard case .JBool(let b) = self else { return nil }
return b
}
public var isNull: Bool {
if case .JNull = self { return true }
return false
}
public subscript(name: String) -> JSONValue {
guard case .JObject(let dict) = self,
let value = dict[name] else { return .Error }
return value
}
public subscript(index: Int) -> JSONValue {
guard case .JArray(let arr) = self where
index >= 0 && index < arr.count else { return .Error }
return arr[index]
}
}
Following is an explanation of the parser line by line.
let json = LanguageDefinition<()>.json
let lexer = GenericTokenParser(languageDefinition: json)
We start by selecting the JSON language definition that will be used to parameterize the tokenizer. SwiftParsec provides other language definitions and an empty definition. The empty definition is used as a the basis for all other definitions.
let symbol = lexer.symbol
let stringLiteral = lexer.stringLiteral
symbol
parses symbols and skip any trailing white space. stringLiteral
applies to strings and takes care of any escaped characters.
let jstring = JSONValue.JString <^> stringLiteral
This line of code builds a parser that will parse a string literal and return a value of JSONValue.JString
associated with the parsed string. The operator '<^>
' is a synonym for the map
function of the GenericParser
type. We could have achieved the same goal with stringLiteral.map { JSONValue.JString($0) }
.
let jnumber = JSONValue.JNumber <^>
(lexer.float.attempt <|> lexer.integerAsFloat)
The representation of numbers in JSON contains an integer component that may be prefixed with an optional minus sign, which may be followed by a fraction part and/or an exponent part. We comply to this requirement by using two parsers provided by GenericTokenParser
: float
and integerAsFloat
. float
parses the string representation of a floating point number and converts the result to a double. integerAsFloat
parses the string representation of an integer and converts the result to a double. We combine these two parsers with the '<|>
' operator and map the result by applying the JSONValue.JNumber
initializer.
let trueValue = symbol("true") *> GenericParser(result: true)
let falseValue = symbol("false") *> GenericParser(result: false)
let jbool = JSONValue.JBool <^> (trueValue <|> falseValue)
This parser applies to Bool
values and is constructed in a similar way to the previous one. Two parsers, one applying to "carview.php?tsp=true" and another one applying to "false", are combined and mapped using JSONValue.JBool
.
let jnull = symbol("null") *> GenericParser(result: JSONValue.JNull)
This is a simple one. It parses the string "null" and return a JSONValue.JNull
as result.
var jarray: GenericParser<String, (), JSONValue>!
var jobject: GenericParser<String, (), JSONValue>!
Objects and arrays are trickier because they can contain all the possible JSON values, including themselves. That is why we will have to create recursive parsers. For now we only define the variables for these two parsers and we will use them later.
GenericParser.recursive { (jvalue: GenericParser<String, (), JSONValue>) in
The GenericParser.recursive
method can be used to build recursive parsers. It takes as parameter a function that will be passed a placeholder parser. This placeholder is initialized with the result returned by the function.
let jarrayValues = lexer.commaSeparated(jvalue)
jarray = JSONValue.JArray <^> lexer.brackets(jarrayValues)
Our first recursive parser will parse JSON arrays. We first use the commaSeparated
parser to parse zero or more occurrence of JSONValues separated by commas. Any trailing white space after each comma is skipped. Then we use the brackets
parser to enclose jarrayValues
in brackets.
let nameValue: GenericParser<String, (), (String, JSONValue)> =
stringLiteral >>- { name in
symbol(":") *> jvalue.map { value in (name, value) }
}
JSON objects are a bit more complicated to parse. We start by defining a parser that will take care of name/value pairs. stringLiteral
parses the name, then we use the '>>-
' operator (pronounce 'bind') to combine it with another parser. '>>-
' is an infix operator for GenericParser.flatMap
. It takes a parser on its left side and a function returning a parser on its right side. The function receives the result of the left parser as a parameter value. Our unnamed closure returns a parser applying to a colon followed by any JSON value mapped to a tuple containing the name and the value. In a few words, the nameValue
parser will parse a name/value pair and return a tuple containing the name/value as result.
let dictionary: GenericParser<String, (), [String: JSONValue]> =
(symbol(",") *> nameValue).manyAccumulator { (assoc, var dict) in
let (name, value) = assoc
dict[name] = value
return dict
}
Here we build a parser that will parse all the name/value pairs of a JSON object except the first pair. The manyAccumulator
combinator is passed a function that process the result of the combined parser. This function has two parameters, the first is the result of the current parse and the second the accumulated result of the previous parses returned by the function itself.
let jobjectDict: GenericParser<String, (), [String: JSONValue]> =
nameValue >>- { assoc in
dictionary >>- { (var dict) in
let (name, value) = assoc
dict[name] = value
return GenericParser(result: dict)
}
}
This parser will apply to a name/value pair followed by zero or more name/value pairs.
let jobjectValues = jobjectDict <|> GenericParser(result: [:])
jobject = JSONValue.JObject <^> lexer.braces(jobjectValues)
The JSON specification states that an object structure is represented as a pair of curly brackets surrounding zero or more name/value pairs (or members). But jobjectDict
parses one or more name/value pairs. We fix this problem by combining it with a parser that always succeeds and returns an empty JSON object.
return jstring <|> jnumber <|> jbool <|> jnull <|> jarray <|> object
Finally, the closure passed to the recursive
function returns a parser that parses all the possible JSON values. This is the parser assigned to the jvalue
placeholder.
return lexer.whiteSpace *> (jobject <|> array)
Another requirement of the specification is that a JSON text is a serialized object or array. That is why we parse either an object or an array after having skipped any white space.
Given the following JSON text:
{
"Image": {
"Width": 800,
"Height": 600,
"Title": "View from 15th Floor",
"Thumbnail": {
"Url": "https://www.example.com/image/481989943",
"Height": 125,
"Width": "100"
},
"IDs": [116, 943, 234, 38793]
}
}
The parser will output:
JObject([
"Image": JSONValue.JObject([
"Title": JSONValue.JString("View from 15th Floor"),
"Height": JSONValue.JNumber(600.0),
"Thumbnail": JSONValue.JObject([
"Width": JSONValue.JString("100"),
"Height": JSONValue.JNumber(125.0),
"Url": JSONValue.JString("https://www.example.com/image/481989943")
]),
"Width": JSONValue.JNumber(800.0),
"IDs": JSONValue.JArray([
JSONValue.JNumber(116.0),
JSONValue.JNumber(943.0),
JSONValue.JNumber(234.0),
JSONValue.JNumber(38793.0)
])
])
])
And to retrieve individual values you could do something like:
if let thumbnailHeight = result["Image"]["Thumbnail"]["Height"].double {
print(thumbnailHeight)
}
Of course, you can add many functionalities to the data structure representing the JSON text. As an example: better error reporting when a path does not exist, properties to retrieve values as NSURL
, Int
, float
, etc.
SwiftParsec provides many more parsers than what we have seen. It also provides an expression parser and a library for parsing permutation phrases. You are encouraged to browse the code and the unit tests to understand how to use them.