fastDigest is a Rust-powered Python extension module that provides a lightning-fast implementation of the t-digest data structure and algorithm, offering a lightweight suite of online statistics for streaming and distributed data.
- Online statistics: Compute highly accurate estimates of quantiles, the CDF, and derived quantities such as the (trimmed) mean.
- Updating: Update a t-digest incrementally with streaming data or batches of large datasets.
- Merging: Merge many t-digests into one, enabling parallel compute operations such as map-reduce.
- Serialization: Use the
to_dict/from_dictmethods or thepicklemodule for serialization. - Easy API: The fastDigest API is designed to be intuitive and to keep high overlap with popular libraries.
- Blazing fast: Thanks to its Rust backbone, this module is up to hundreds of times faster than other Python implementations.
Compiled wheels are available on PyPI. Simply install via pip:
pip install fastdigestTo build and install fastDigest from source, you will need Rust and maturin.
-
Install the Rust toolchain → see https://rustup.rs
-
Install maturin via pip:
pip install maturingit cloneor download and extract this repository, open a terminal in its root directory, then build and install the package:
maturin build --release
pip install target/wheels/fastdigest-0.9.2-<platform-tag>.whlThe following examples are intended to give you a quick start. See the API reference for the full documentation.
Simply call TDigest() to create a new instance, or use TDigest.from_values to directly create a digest of any sequence of numbers:
from fastdigest import TDigest
digest = TDigest()
digest = TDigest.from_values([2.71, 3.14, 1.42])Estimate the value at the rank q using quantile(q):
digest = TDigest.from_values(range(1001))
print("99th percentile:", digest.quantile(0.99))Or the inverse - use cdf to find the rank (cumulative probability) of a given value:
print("cdf(990) =", digest.cdf(990))Compute the arithmetic mean, or the trimmed_mean between two quantiles:
data = list(range(11)) # numbers 1-10
data[-1] = 100_000 # extreme outlier
digest = TDigest.from_values(data)
print(f" Mean: {digest.mean()}")
print(f"Trimmed mean: {digest.trimmed_mean(0.1, 0.9)}")Use batch_update to merge a sequence of many values at once, or update to add one value at a time:
digest = TDigest()
digest.batch_update([0, 1, 2])
digest.update(3)Note: These methods are not the same - they are optimized for different use-cases, and there can be significant performance differences.
Use the + operator to create a new instance from two TDigests, or += to merge in-place:
digest1 = TDigest.from_values(range(20))
digest2 = TDigest.from_values(range(20, 51))
digest3 = TDigest.from_values(range(51, 101))
digest1 += digest2
merged_new = digest1 + digest3The merge_all function offers an easy way to merge an iterable of many TDigests:
from fastdigest import TDigest, merge_all
digests = [TDigest.from_values(range(i, i+10)) for i in range(0, 100, 10)]
merged = merge_all(digests)Obtain a dictionary representation by calling to_dict() and load it into a new instance with TDigest.from_dict:
from fastdigest import TDigest
import json
digest = TDigest.from_values(range(101))
td_dict = digest.to_dict()
print(json.dumps(td_dict, indent=2))
restored = TDigest.from_dict(td_dict)The fastDigest API is designed to be backward compatible with the tdigest Python library. Migrating is as simple as changing your import statement.
Dicts created by tdigest can also natively be used by fastDigest.
- Task: Construct a digest of 1,000,000 uniformly distributed random values and estimate their median (average of 10 consecutive runs).
- Test environment: Python 3.12.12, MacBook Pro (M4 Pro), macOS 15.7.2 Sequoia
| Library | Time (ms) | Relative speed |
|---|---|---|
| tdigest | 9,773 | 1x |
| pytdigest | 54 | 180x |
| fastdigest | 20 | 480x |
If you want to try it yourself, install fastDigest (and optionally tdigest and/or pytdigest) and run:
python benchmark.pyfastDigest is licensed under the MIT License. See the LICENSE file for details.
Credit goes to Ted Dunning for inventing the t-digest. Special thanks to Andy Lok and Paul Meng for creating the tdigests and tdigest Rust libraries, respectively, as well as to all PyO3 contributors.