CARVIEW |
Efficiently copying a file
TLDR: Use shutil.copyfileobj(fsrc, fdst)
I'm writing a Datasette plugin that handles an uploaded file, borrowing the Starlette mechanism for handling file uploads, documented here.
Starlette uploads result in a SpooledTemporaryFile file-like object. These look very much like a file, with one frustrating limitation: they don't have a defined, stable path on disk.
I thought that this meant that you couldn't easily copy it, and ended up coming up with this recipe based on this code I spotted in the BáiZé framework:
from shutil import COPY_BUFSIZE
with open(new_filepath, "wb+") as target_file:
source_file.seek(0)
source_read = source_file.read
target_write = target_file.write
while True:
buf = source_read(COPY_BUFSIZE)
if not buf:
break
target_write(buf)
COPY_BUFSIZE
defined by Python here - it handles the difference in ideal buffer size between Windows and other operating systems:
COPY_BUFSIZE = 1024 * 1024 if _WINDOWS else 64 * 1024
But then I sat down to write this TIL, and stumbled across shutil.copyfileobj(fsrc, fdst) in the standard library which implements the exact same pattern!
def copyfileobj(fsrc, fdst, length=0):
"""copy data from file-like object fsrc to file-like object fdst"""
# Localize variable access to minimize overhead.
if not length:
length = COPY_BUFSIZE
fsrc_read = fsrc.read
fdst_write = fdst.write
while True:
buf = fsrc_read(length)
if not buf:
break
fdst_write(buf)
So you should use that instead.
Related
- python Using io.BufferedReader to peek against a non-peekable stream - 2021-02-15
- python Using psutil to investigate "Too many open files" - 2022-10-13
- python Handling CSV files with wide columns in Python - 2021-02-15
- pytest Mocking subprocess with pytest-subprocess - 2023-03-08
- python Running PyPy on macOS using Homebrew - 2022-09-14
- python Running Python code in a subprocess with a time limit - 2020-12-06
- python Packaging a Python app as a standalone binary with PyInstaller - 2021-01-04
- macos Using lsof on macOS - 2021-12-11
- datasette Running Datasette on Replit - 2021-05-02
- sqlite Using LD_PRELOAD to run any version of SQLite with Python - 2020-06-17
Created 2022-05-13T17:22:54-07:00 · Edit