CARVIEW |
struct endianness in Python
TIL the Python standard library struct module defaults to interpreting binary strings using the endianness of your machine.
Which means that this code:
def decode_matchinfo(buf):
# buf is a bytestring of unsigned integers, each 4 bytes long
return struct.unpack("I" * (len(buf) // 4), buf)
Behaves differently on big-endian v.s. little-endian systems.
I found this out thanks to this bug report against my sqlite-fts4 library.
My decode_matchinfo()
function runs against a binary data structure returned by SQLite - more details on that in Exploring search relevance algorithms with SQLite.
SQLite doesn't change the binary format depending on the endianness of the system, which means that my function here works correctly on little-endian but does the wrong thing on big-endian systems:
Update: I was entirely wrong about this. SQLite DOES change the format based on the endianness of the system. My bug fix was incorrect - see this issue comment for details.
On little-endian systems:
>>> buf = b'\x01\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00'
>>> decode_matchinfo(buf)
(1, 2, 2, 2)
But on big-endian systems:
>>> buf = b'\x01\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00'
>>> decode_matchinfo(buf)
(16777216, 33554432, 33554432, 33554432)
The fix is to add a first character to that format string specifying the endianness that should be used, see Byte Order, Size, and Alignment in the Python documentation.
>>> struct.unpack("<IIII", buf)
(1, 2, 2, 2)
>>> struct.unpack(">IIII", buf)
(16777216, 33554432, 33554432, 33554432)
So the fix for my bug was to rewrite the function to look like this:
def decode_matchinfo(buf):
# buf is a bytestring of unsigned integers, each 4 bytes long
return struct.unpack("<" + ("I" * (len(buf) // 4)), buf)
Bonus: How to tell which endianness your system has
Turns out Python can tell you if you are big-endian or little-endian like this:
>>> from sys import byteorder
>>> byteorder
'little'
Related
- python Using io.BufferedReader to peek against a non-peekable stream - 2021-02-15
- python Using tree-sitter with Python - 2023-07-13
- python TOML in Python - 2023-06-26
- webassembly Run Python code in a WebAssembly sandbox - 2023-02-02
- docker Emulating a big-endian s390x with QEMU - 2022-07-29
- python Handling CSV files with wide columns in Python - 2021-02-15
- python Running PyPy on macOS using Homebrew - 2022-09-14
- sqlite SQLite BLOB literals - 2020-07-29
- sqlite Fixing broken text encodings with sqlite-transform and ftfy - 2021-01-18
- python Find local variables in the traceback for an exception - 2021-08-09
Created 2022-07-28T08:48:00-07:00, updated 2022-07-30T11:07:51-07:00 · History · Edit