You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HyperLogLog - an algorithm for approximating the number of distinct elements
An improved version of HyperLogLog for the count-distinct problem, approximating the number of distinct elements in a multiset. This implementation offers enhanced performance, flexibility, and simplicity while maintaining accuracy.
Sparse representation for lower cardinalities (like HyperLogLog++)
LogLog-Beta for dynamic bias correction across all cardinalities
8-bit registers for convenience and simplified implementation
Order-independent insertions and merging for consistent results regardless of data input order
Removal of tailcut method for a more straightforward approach
Flexible precision allowing for 2^4 to 2^18 registers
This implementation is now more straightforward, efficient, and flexible, while remaining backwards compatible with previous versions. It provides a balance between precision, memory usage, speed, and ease of use.
Precision and Memory Usage
This implementation allows for creating HyperLogLog sketches with arbitrary precision between 2^4 and 2^18 registers. The memory usage scales with the number of registers:
Minimum (2^4 registers): 16 bytes
Default (2^14 registers): 16 KB
Maximum (2^18 registers): 256 KB
Users can choose the precision that best fits their use case, balancing memory usage against estimation accuracy.
Note
A big thank you to Prof. Shigang Chen and his team at the University of Florida who are actively conducting research around "Big Network Data".
Contributing
Kindly check our contributing guide on how to propose bugfixes and improvements, and submitting pull requests to the project