You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A monotone minimal perfect hash function (MMPHF) maps a set S of n input keys to the first n integers without collisions.
At the same time, it respects the natural order of the input universe.
In other words, it maps each input key to its rank.
MMPHFs have many applications in databases and space-efficient data structures.
LeMonHash (Learned Monotone Minimal Perfect Hashing) is a novel MMPHF that learns about regularities in the input data
to achieve significant space and performance improvements.
It uses the PGM-Index to calculate a learned rank estimate for each key
and then solves collisions between these estimates using the retrieval data structure BuRR.
Compared to competitors that are mostly based on tree-like data structures, LeMonHash is a lot more flat and therefore faster to query.
LeMonHash dominates most competitors in terms of construction throughput, query throughput, and space consumption -- simultaneously.
We also give a variant for variable-length strings that achieves significantly faster queries than competitors.
Usage
Requirements:
GCC 11 or later
boost
Clone the repository (as a submodule) and add the following to your CMakeLists.txt.