Bloom Filter

From GM-RKB
Jump to navigation Jump to search

A Bloom Filter is a space-efficient probabilistic set membership test classifier that uses a Bloom filter data structure (based on a bit vector and hash functions).



References

2017

2013

  • http://en.wikipedia.org/wiki/Bloom_filter
    • A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not; i.e. a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed (though this can be addressed with a "counting" filter). The more elements that are added to the set, the larger the probability of false positives.

      Bloom proposed the technique for applications where the amount of source data would require an impracticably large hash area in memory if "conventional" error-free hashing techniques were applied. He gave the example of a hyphenation algorithm for a dictionary of 500,000 words, of which 90% could be hyphenated by following simple rules but all the remaining 50,000 words required expensive disk access to retrieve their specific patterns. With unlimited core memory, an error-free hash could be used to eliminate all the unnecessary disk access. But if core memory was insufficient, a smaller hash area could be used to eliminate most of the unnecessary access. For example, a hash area only 15% of the error-free size would still eliminate 85% of the disk accesses (Template:Harvtxt).

      More generally, fewer than 10 bits per element are required for a 1% false positive probability, independent of the size or number of elements in the set (Template:Harvtxt).


2006

2004


1970