You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. For more detail, check the wikipedia article. Instead of using k different hash functions, this implementation seeds the CRC32 hash with k different initial values (0, 1, ..., k-1). This may or may not give you a good distribution, it all depends on the data.
Performance of the Bloom filter depends on a number of variables:
MRI/C implementation which creates an in-memory filter which can be saved and reloaded from disk.
require'bloomfilter-rb'bf=BloomFilter::Native.new(:size=>100,:hashes=>2,:seed=>1,:bucket=>3,:raise=>false)bf.insert("test")bf.include?("test")# => truebf.include?("blah")# => falsebf.delete("test")bf.include?("test")# => false# Hash with a bloom filter!bf["test2"]="bar"bf["test2"]# => truebf["test3"]# => falsebf.stats# => Number of filter bits (m): 10# => Number of filter elements (n): 2# => Number of filter hashes (k) : 2# => Predicted false positive rate = 10.87%
Redis-backed setbit/getbit bloom filter
Uses getbit/setbit on Redis strings - efficient, fast, can be shared by multiple/concurrent processes.
1.0% error rate for 1M items, 10 bits/item: 2.5 mb
1.0% error rate for 150M items, 10 bits per item: 358.52 mb
0.1% error rate for 150M items, 15 bits per item: 537.33 mb
Redis-backed counting bloom filter with TTLs
Uses regular Redis get/set counters to implement a counting filter with optional TTL expiry. Because each "bit" requires its own key in Redis, you do incur a much larger memory overhead.