You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Density is a free, pure rust (formerly C99) open-source compression library.
It is focused on high-speed compression, at the best ratio possible. Density's algorithms are currently at the pareto frontier of compression speed vs ratio (cf. here for an independent benchmark).
Density features a simple API to enable quick integration in any project.
Why is it so fast ?
One of the biggest assets of density is its work unit: unlike most compression algorithms, it is not a byte
but a group of 4 bytes.
When other libraries consume one byte of data to apply their algorithmic processing, density consumes 4 bytes.
That's why density's algorithms have been designed from scratch. They cater for 4-byte work units and still provide interesting compression ratios.
Speed pedigree traits
4-byte work units
heavy use of registers as opposed to memory
avoidance of or use of minimal branching when possible
use of low memory data structures to favor processor cache Lx storage
library wide inlining
A "blowup protection" is provided, dramatically increasing the processing speed of incompressible input data. The aim
is to never exceed original data size, even for incompressible inputs.
Benchmarks
Quick benchmark
Density features an integrated single-core in-memory benchmark.
Just use cargo bench to assess the performance of the library on your platform, as well as lz4's and snappy's,
using the
dickens file from the renowned silesia corpus.
It is also possible to run the benchmark with your own files, using the following command: FILE=... cargo bench (replace ... with a file path).
Popular compression test files include
the silesia corpus files
or enwik8.
Other benchmarks featuring density (non-exhaustive list) :
squash is an abstraction layer for compression algorithms, and has an
extremely exhaustive set of benchmark results, including
density's, available here.
lzbench is an in-memory benchmark of open-source LZ77/LZSS/LZMA compressors.
Build
Density can be built on rust-compatible platforms. First use rustup to install
rust.
a) get the source code:
git clone https://github.com/g1mv/density.git
cd density
Density's algorithms are general purpose, very fast algorithms. They empirically exhibit strong performance (pareto frontier speed/ratio) on
voluminous datasets and text-based datasets, and are less performant on very small datasets. Their 4-byte work unit
approach makes them "slower
learners" than Lempel-Ziv-based algorithms, albeit being much faster theoretically.
Algorithm
Speed rank
Ratio rank
Dictionary unit(s)
Prediction unit(s)
Sig. size (bytes)
chameleon
1st
3rd
1
0
8
cheetah
2nd
2nd
2
1
8
lion
3rd
1st
2
5
6
Chameleon is a dictionary lookup based compression algorithm. It is designed for absolute speed (GB/s order) both for compression
and decompression.
Cheetah, developed with inputs from Piotr Tarsa, is derived from chameleon and
uses swapped dual dictionary lookups with a single prediction unit.
Lion is derived from chameleon/cheetah. It uses different data structures, dual dictionary lookups and 5 prediction units, giving it a compression ratio advantage on moderate to highly compressible data.
Quick start
Using density in your rust code is a straightforward process.
Include the required dependency in the Cargo.toml file: