Upload
kyria
View
44
Download
2
Embed Size (px)
DESCRIPTION
Bloom Filters. Benoit Donnet November 30th, 2006. 1. Context. Introduced in 1970 ([bloom]) Set membership problem Trade-off between space and computing complexity Lossy summary technique Historical usage Spell checking ([McIlroy]) Database ([Bratbergsengen]). Content. Bloom filters - PowerPoint PPT Presentation
Citation preview
Bloom Filters
Benoit DonnetNovember 30th, 2006
1
2
Context
• Introduced in 1970 ([bloom])
- Set membership problem- Trade-off between space and
computing complexity- Lossy summary technique
• Historical usage- Spell checking ([McIlroy])
- Database ([Bratbergsengen])
3
Content
• Bloom filters• Extensions• Networking applications• Conclusion• References
4
Bloom filters
5
Construction
6
Membership Query
7
False Positive
• A Bloom filter can suffer of false positives
- The filter returns a positive answer for some elements that do not belong to A
• Can we evaluate a priori the impact of false positives on a Bloom filter?
8
False Positive (2)
9
False Positive (3)
10
False Positive (4)
11
Extensions
12
Content
• Compressed Bloom filters ([mitzenmacher])
• Counting Bloom filters ([Fan et al.])
• Dynamic Bloom filters ([Guo et al.])
• Retouched Bloom filters ([Donnet et al.])
13
Compressed BF
• A Bloom filter can be used a message exchanged by networked monitors
• New performance metric- Bandwidth
• Transmission size can be affected by compression
• Compressed Bloom filters ([mitzenmacher])
14
Compressed BF (2)
• Positive aspects:- Quantity of bit exchanged reduced- False positive rate reduced- Amount of computation per query
reduced• Cost:
- Internal memory increased- Compression/decompression process
15
Compressed BF (3)
16
Counting BF
• The subset A is changing over time- Insertion- Deletion
• How to perform deletion?- Couting Bloom filters ([Fan et al.])
17
Counting BF (2)
18
Counting BF (3)
• Which size for the counter?- 4 bits per counter are OK for most
of the applications• What happens in case of an
overflow?
19
Dynamic BF
• Statement:- During the execution of the
application, |A| can exceed its orignal size n
• Consequence:- The false positive rate is not
maintained anymore• Solution?
- Dynamic Bloom Filters ([Guo et al.])
20
Dynamic BF (2)
• It uses a matrix of s Bloom filters- Each Bloom filter uses m bits and k
hash functions• It starts with s equals to 1.• A new Bloom filter (i.e., a new row in
the matrix) is created when needed.
21
Dynamic BF (3)• How to insert an element?
- Check for an active Bloom filter- If there is no active Bloom filter,
create a new one- Add the element to the Bloom filter
• How to query an element?• If all s Bloom filters return false, the
element does not belong the DBF.• If, at least, one Bloom filter returns
true, the element probably belongs to the DBF.
22
Dynamic BF (4)
23
Retouched BF• Statement:
- Some false positives might be more troublesome than others
- Some applications might tolerate a small level of false negatives
• Question:- Can we trade-off the false positives
against false negatives?• Solution?
- Retouched Bloom filters ([Donnet et al. 06])
24
Retouched BF (2)
25
Retouched BF (3)
• Quid if we randomly reset s bits in the vector?
- Eliminates the same proportion of false positives as the proportion of false negatives generated
- Randomized bit clearing.• The process of removing selected false
positives is called selective clearing
26
Retouched BF (4)
27
Networking Applications
28
Distributed Caching
•Proxies cooperate to exchange cache information
•Instead of sharing URLs list, proxies broadcast Bloom filters ([Fan et al.])
•A Bloom filter represents a proxy’s cache content
29
Multicast
•A router maintains, for each multicast address, a list of associated interfaces/connections
•Replace the list by a Bloom filter ([Grönvall])
•Parallelization possible
•Deletion of an address can be achieved with a counting Bloom filter
30
Measurement
• Topology discovery at the IP interface level
• Traceroute monitors exchange information about what was previously discovered
- Doubletree• This information shared can be encoded
as a Bloom filter- Communication cost reduction ([Donnet et al.
05])
31
Conclusion
•A Bloom filter
•solves the set membership problem
•can generate false positives
•Extensions to standard Bloom filter were presented
•A few networking applications were discussed
32
References
[Bloom]: Space/Time Trade-Offs in Hash Coding with Allowable Errors. In Communications of the ACM. vol. 13, n°7.
[McIlroy]: Development of a Spelling List. In Transactions on Communications. vol. 30, n° 1.
[Mitzenmacher]: Compressed Bloom Filters. In Transactions on Networking. vol. 10, n° 5.
[Fan et al.]: Summary Cache: a Scalable Wide-Area Web Cache Sharing Protocol. In Transactions on Networking. vol. 8, n°3.
[Guo et al.]: Theory and Network Applications of Dynamic Bloom Filters. In Proc. INFOCOM 2006.
33
References (2)
[Bratbergsengen]: Hashing Methods and Relational Algebra Operations. In Proc. ICVLD 1984.
[Bruck et al.]: Weighted Bloom Filters. In Proc. ISIT 2006.
[Donnet et al. 06]: Retouched Bloom Filters: Allowing Networked Applications to Trade-Off Selected False Positives Against False Negatives. In Proc. CoNEXT 2006.
[Grönvall]: Scalable Multicast Forwarding. In Proc. ACM SIGCOMM 2001. Student Workshop.
[Donnet et al. 05]: Improved Algorithms for Network Topology Discovery. In Proc. PAM 2005.