CARVIEW |
Using io.BufferedReader to peek against a non-peekable stream
When building the --sniff option for sqlite-utils insert
(which attempts to detect the correct CSV delimiter and quote character by looking at the first 2048 bytes of a CSV file) I had the need to peek ahead in an incoming stream of data.
I use Click, and Click can automatically handle both files and standard input. The problem I had is that peeking ahead in a file is easy (you can call .read()
and then .seek(0)
, or use the .peek()
method directly) but peaking ahead in standard input is not - anything you consume from that is not available to rewind to later on.
Since my code works by passing a file-like object to the csv.reader()
function I needed a way to read the first 2048 bytes but then reset the stream ready for that function to consume it.
I figured out how to do that using the io.BufferedReader
class. Here's the pattern:
import io
import sys
import csv
# Get a file-like object in binary mode
fp = open("myfile.csv", "rb")
# Or from standard input (need to use .buffer here)
fp = sys.stdin.buffer
# Wrap it in a buffered reader with a 4096 byte buffer
buffered = io.BufferedReader(fp, buffer_size=4096)
# Wrap THAT in a text io wrapper that can decode to unicode
decoded = io.TextIOWrapper(buffered, encoding="utf-8")
# Now I can read the first 2048 bytes...
first_bytes = buffered.peek(2048)
# But I can still pass the "decoded" object to csv.reader
reader = csv.reader(decoded)
for row in reader:
print(row)
My implementation is in this commit.
Related
- python Explicit file encodings using click.File - 2020-10-16
- pytest Testing a Click app with streaming input - 2022-01-09
- python Running PyPy on macOS using Homebrew - 2022-09-14
- python Handling CSV files with wide columns in Python - 2021-02-15
- python Debugging a Click application using pdb - 2020-09-03
- python struct endianness in Python - 2022-07-28
- sqlite Importing CSV data into SQLite with .import - 2021-07-13
- duckdb Using DuckDB in Python to access Parquet data - 2022-09-16
- python Protocols in Python - 2023-07-26
- pytest Writing pytest tests against tools written with argparse - 2022-01-08
Created 2021-02-15T11:17:28-08:00 · Edit