CARVIEW |
Custom JSON encoder and decoder
The Python module json
allows you to work with the JSON data format.
In previous articles,
I've written about doing custom JSON encoding of arbitrary Python objects
and custom JSON decoding into arbitrary Python objects.
In this article I want to define a system that makes it easy to extend the JSON format, so that we can encode more Python objects into JSON and back.
My goal is to define a mechanism through which you can easily define small, atomic encoders and decoders, and to have them all operate together.
Extending the JSON format
I think it will be easier to understand what I want to achieve if I show you how I want the end product to look like.
Suppose that you want to extend the JSON format so that you can also encode and decode complex numbers and Python range
objects.
This is what you want to do.
How do you achieve that?
When you are done with the article, you will be able to define something like this:
class ComplexAndRangeEncoder(...):
def encode_complex(self, c):
return {"real": c.real, "imag": c.imag}
def encode_range(self, r):
return {"start": r.start, "stop": r.stop, "step": r.step}
class ComplexAndRangeDecoder(...):
def decode_complex(self, obj):
return complex(obj["real"], obj["imag"])
def decode_range(self, obj):
return range(obj["start"], obj["stop"], obj["step"])
Then, you will be able to use these two classes as the cls
argument to the json
methods,
enabling you to encode complex numbers and ranges to JSON, and then decoding them back.
The point, here, is that I want to make it as easy as possible to extend the JSON standard, simply by providing the encoders and the decoders for each new type you want to be able to handle.
Automatically recognising non-standard JSON
The main issue I have to struggle with is in defining the mechanism that will allow the custom JSON decoder to recognise that certain JSON objects should actually be parsed into something else. For example, in the previous article about custom JSON decoding, I showed how to convert the following JSON:
{
"real": 1.0,
"imag": 2.0
}
into the Python complex number \(1 + 2i\):
(1+2j)
However, suppose that we actually have the following Python dictionary:
dict_looks_like_complex = {
"real": 1.0,
"imag": 2.0,
}
If we convert this dictionary to JSON, we get a string:
'{"real": 1.0, "imag": 2.0}'
Now, if we use our custom decoder, we will get a complex number back instead of the original dictionary! Why? Because complex numbers and some dictionaries have the same JSON representations.
In mathematical terms, we say that the JSON encoding is not injective.
After all, I can find two objects obj1
and obj2
such that obj1 != obj2
and yet
json.dumps(obj1) == json.dumps(obj2)
.
We have two options here:
- assume these collisions aren't troublesome and decide to not worry about them; or
- try to tweak the JSON encoding so that it becomes injective (or, at least, “as injective as possible”).
I will go with the second option.
Adapting the JSON format to be injective
The strategy that I will implement will revolve around using (JSON) dictionaries to encode our new arbitrary types, together with the usage of a special key to disambiguate between the non-standard encodings and native Python dictionaries that were unlucky enough to look like something else.
Special key
Let us say that the special key will be something like "__extended_json_type__"
.
Thus, whenever we encode a non-standard object into JSON, we have to annotate the resulting dictionary with that key.
The value of the key will indicate what is the type of the original object.
For example, here is what a complex number could look like:
>>> json.dumps(1+2j)
"""{
"__extended_json_type__": "complex",
"real": 1.0,
"imag": 2.0
}"""
As another example, a range
could look like this:
>>> json.dumps(range(1, 10, 3))
"""{
"__extended_json_type__": "range",
"start": 1.0,
"stop": 10.0,
"step": 3.0
}"""
By providing the key "__extended_json_type__"
, the decoder will know this was not a native Python dictionary and will be able to reconstruct the intended objects.
Except...
Native dictionaries with the special key
Suppose that, for some annoying reason, you have the following Python dictionary:
dict_ = {
"__extended_json_type__": "complex",
"real": 1.0,
"imag": 2.0
}
If you encode this to JSON, you will end up with the exact same JSON as the one we got for the complex number (1+2j)
...
So, are we back at square one?
I don't think so. The special key will help the decoder know what type of object it should build out of the non-standard JSON. The special key also makes it less likely for collisions to happen, although it does not get rid of them entirely.
Implementing the encoding mechanism
At this point, I already have a pretty clear picture of what I have to do, and how, so let me show you the code and walk you through it.
import json
class ExtendedEncoder(json.JSONEncoder):
def default(self, obj):
name = type(obj).__name__
try:
encoder = getattr(self, f"encode_{name}")
except AttributeError:
super().default(obj)
else:
encoded = encoder(obj)
encoded["__extended_json_type__"] = name
return encoded
This isn't a lot of code, but it is not your typical for
loop,
so you may need to read the code twice to get what it is doing.
Let me give you a hand:
- The first thing to understand is that we are overriding the method
.default
fromjson.JSONEncoder
, because that is what you have to do in order to implement custom JSON encoding of Python objects. - We start by getting the name of the type of the object we want to encode. You can read about the dunder attribute
__name__
in a Pydon't I wrote. To make this explanation simpler, supposeobj = complex(1, 2)
. Then,name = "complex"
. - We look for a method whose name starts with
encode_
and that is then followed by the name of the type at hands. In our example, we look for a method calledencode_complex
. - If the method doesn't exist, the call to
getattr
will raise anAttributeError
, which we catch. At this point, the encoder has no idea how to encode the object of the given type, so we call the method.default
of the parent class, because that is what thejson
documentation says we should do. - If the method exists, we enter the
else
and we get to use the encoder to encode the object we have. - After encoding, we tag the encoding with the special key and we return it.
This is what the code is doing. If something isn't clear, feel free to ask for further clarifications!
The decoding mechanism follows a similar approach, as I will show you next.
Implementing the decoding mechanism
Here is the code for the decoding mechanism:
import json
class ExtendedDecoder(json.JSONDecoder):
def __init__(self, **kwargs):
kwargs["object_hook"] = self.object_hook
super().__init__(**kwargs)
def object_hook(self, obj):
try:
name = obj["__extended_json_type__"]
decoder = getattr(self, f"decode_{name}")
except (KeyError, AttributeError):
return obj
else:
return decoder(obj)
We subclass json.JSONDecoder
and, in the method __init__
,
we set the parameter object_hook
to the object hook that we define.
This is the object hook responsible for parsing non-standard JSON back into the original Python objects.
The object hook, itself, just undoes what the encoder does:
- We try to get the name of the original type and use it to find a method that is named accordingly.
For example, for a complex number, we would want a method
decode_complex
. - If there is no type name information or if we can't find the appropriate decode method, then we weren't supposed to be doing any fancy decoding and just return the original object.
- Otherwise, we use the decoder to decode the object we have.
Now that we have defined the encoding and decoding mechanisms,
we can extend the JSON standard with, for example, complex numbers and Python range
objects:
Extending JSON with complex numbers and range
objects
Assuming you have the definitions of ExtendedEncoder
and ExtendedDecoder
,
this is how you could extend JSON to support complex numbers and range
objects:
import json
class ExtendedEncoder(json.JSONEncoder):
...
class ExtendedDecoder(json.JSONDecoder):
...
class MyEncoder(ExtendedEncoder):
def encode_complex(self, c):
return {"real": c.real, "imag": c.imag}
def encode_range(self, r):
return {"start": r.start, "stop": r.stop, "step": r.step}
class MyDecoder(ExtendedDecoder):
def decode_complex(self, obj):
return complex(obj["real"], obj["imag"])
def decode_range(self, obj):
return range(obj["start"], obj["stop"], obj["step"])
Then, you can use the custom encoder to encode some complex numbers and range
objects:
my_data = {
"hey": complex(1, 2),
"there": range(1, 10, 3),
73: False,
}
json_data = json.dumps(my_data, cls=MyEncoder)
Obviously, you can also go back to retrieve the original data:
decoded = json.loads(json_data, cls=MyDecoder)
print(decoded)
## {'hey': (1+2j), 'there': range(1, 10, 3), '!': False}
And that's it!
Conclusion
The classes ExtendedEncoder
and ExtendedDecoder
provide a convenient way of extending the JSON standard:
- subclassing
ExtendedEncoder
lets you define JSON encodings for non-standard Python objects; and - subclassing
ExtendedDecoder
lets you define the way in which the JSON is decoded back into the original objects.
I went through all this trouble because I needed this for another project of mine, so I'll package this up and open source it! Stay tuned!
That's it for now! Stay tuned and I'll see you around!
Become a better Python 🐍 developer, drop by drop 💧
Get a daily drop of Python knowledge. A short, effective tip to start writing better Python code: more idiomatic, more effective, more efficient, with fewer bugs. Subscribe here.
Random Article
I'm Feeling Lucky!Stay in the loop
Subscribe the newsletterPopular Tags
programming | 308 python | 301 mathematics | 125 logic | 34 productivity | 29 algorithms | 25 combinatorics | 22 geometry | 18 visualisation | 17 apl | 16 interpreters | 16 pygame | 14 dunder methods | 14 slice of life | 13 recursion | 12 modules | 12 compilers | 11 machine learning | 11 bpci | 10 git | 9 arithmetic | 9 code review | 9 binary | 9 scripting | 9 opinion | 9Remaining Tags
game uv number theory invariants modular arithmetic numpy vscode regex graphs nnfwp fractals game theory chess induction optimisation generators oop pytest probability textual json fp sequences llm networking lsbasi-apl parsers simulation conferences testing primes pigeonhole principle javascript pandas repl image processing quiz rust concurrency metaprogramming typing web development matlab open source computation theory haskell automatons github floats data structures numerical analysis polars random walks rationals streamlit grammars data science group theory polynomials scroll art diophantine equations origami topology flask hypothesis set theory artificial intelligence brainfuck statistics shell matplotlibI can't dance but this triangle sure can!