You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is an implementation of a parallel gzip. It works by splitting the input
into chunks (by default by 32MBs, but this can be configured). Each chunk is
compressed independently and the results are concatenated together. Such result
can be read and decompressed by the usual gzip implementation.
The motivation is to speed up transfers of large amounts of data across a fast
network through ssh. The ssh throughput is limited by either its compression or
encryption routines, which are single-threaded. This allows turning compression
off in ssh and using multiple cores to compress the data. As the decompression
is much faster, it is not necessary to use parallel decompression.
Limitations
There are certain limitations:
The compressed representation is slightly different than from the usual
sequential gzip. Technically, the output is multiple concatenated gzips, but
decompression tools commonly accept that. Furthermore, due to the
independent chunks, the compression ratio is likely to be a bit worse.
Unless you explicitly state otherwise, any contribution intentionally
submitted for inclusion in the work by you, as defined in the Apache-2.0
license, shall be dual licensed as above, without any additional terms
or conditions.