CARVIEW |
igrigorik / bloomfilter
- Source
- Commits
- Network (5)
- Issues (0)
- Graphs
-
Branch:
master
click here to add a description
click here to add a homepage
Pledgie Donations
Once activated, we'll place the following badge in your repository's detail box:
Counting Bloom Filter implemented in Ruby: C version, Redis version (with TTL's) — Read more
name | age | message | |
---|---|---|---|
![]() |
.gitignore | Mon Dec 29 09:54:53 -0800 2008 | adding a test task and a test [tenderlove] |
![]() |
README.rdoc | Wed Jan 06 09:55:55 -0800 2010 | correct link in the readme [Ilya Grigorik] |
![]() |
Rakefile | Sat Jan 02 18:04:37 -0800 2010 | abstract the interface; migrate to gemcutter [Ilya Grigorik] |
![]() |
VERSION | Tue Jan 05 21:40:53 -0800 2010 | Version bump to 1.2.0 [Ilya Grigorik] |
![]() |
examples/ | Tue Jan 05 21:40:11 -0800 2010 | update readme with redis examples [Ilya Grigorik] |
![]() |
ext/ | Sat Jan 02 20:33:58 -0800 2010 | redis backend [Ilya Grigorik] |
![]() |
lib/ | Sat Jan 02 20:33:58 -0800 2010 | redis backend [Ilya Grigorik] |
![]() |
spec/ | Sat Jan 02 20:33:58 -0800 2010 | redis backend [Ilya Grigorik] |
BloomFilter
Counting Bloom Filter implemented in Ruby.
Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. For more detail: en.wikipedia.org/wiki/Bloom_filter
Performance of the Bloom filter depends on a number of variables:
- size of the bit array
- size of the counter bucket
- number of hash functions
To figure out the values for these parameters, refer to:
To learn about applications and reasons for the time based bloom filters, refer to:
Implementation
Instead of using k different hash functions, this implementation seeds the CRC32 hash with k different initial values (0, 1, …, k-1). This may or may not give you a good distribution, it all depends on the data.
Example
require 'bloomfilter' bf = BloomFilter.new(:size => 100, :hashes => 2, :seed => 1, :bucket => 3, :raise => false) bf.insert("test") bf.include?("test") => true bf.include?("test2") => false bf.delete("test") bf.include?("test") => false # Hash with a bloom filter! bf["test2"] = "bar" bf["test2"] => "bar" bf["test3"] => nil bf.stats Number of filter bits (m): 10 Number of filter elements (n): 2 Number of filter hashes (k) : 2 Predicted false positive rate = 10.87%
Redis-backed counting Bloom Filter with TTL’s
bf = BloomFilter.new(:type => :redis, :ttl => 2, :server => {:host => 'localhost'}) bf.insert('test') bf.include?('test') => true sleep(2) bf.include?('test') => false
Credits
Tatsuya Mori <valdzone@gmail.com> (Original C implementation: vald.x0.com/sb/)