33
Bloom Filters Benoit Donnet November 30th, 2006 1

Bloom Filters

  • Upload
    kyria

  • View
    44

  • Download
    2

Embed Size (px)

DESCRIPTION

Bloom Filters. Benoit Donnet November 30th, 2006. 1. Context. Introduced in 1970 ([bloom]) Set membership problem Trade-off between space and computing complexity Lossy summary technique Historical usage Spell checking ([McIlroy]) Database ([Bratbergsengen]). Content. Bloom filters - PowerPoint PPT Presentation

Citation preview

Page 1: Bloom Filters

Bloom Filters

Benoit DonnetNovember 30th, 2006

1

Page 2: Bloom Filters

2

Context

• Introduced in 1970 ([bloom])

- Set membership problem- Trade-off between space and

computing complexity- Lossy summary technique

• Historical usage- Spell checking ([McIlroy])

- Database ([Bratbergsengen])

Page 3: Bloom Filters

3

Content

• Bloom filters• Extensions• Networking applications• Conclusion• References

Page 4: Bloom Filters

4

Bloom filters

Page 5: Bloom Filters

5

Construction

Page 6: Bloom Filters

6

Membership Query

Page 7: Bloom Filters

7

False Positive

• A Bloom filter can suffer of false positives

- The filter returns a positive answer for some elements that do not belong to A

• Can we evaluate a priori the impact of false positives on a Bloom filter?

Page 8: Bloom Filters

8

False Positive (2)

Page 9: Bloom Filters

9

False Positive (3)

Page 10: Bloom Filters

10

False Positive (4)

Page 11: Bloom Filters

11

Extensions

Page 12: Bloom Filters

12

Content

• Compressed Bloom filters ([mitzenmacher])

• Counting Bloom filters ([Fan et al.])

• Dynamic Bloom filters ([Guo et al.])

• Retouched Bloom filters ([Donnet et al.])

Page 13: Bloom Filters

13

Compressed BF

• A Bloom filter can be used a message exchanged by networked monitors

• New performance metric- Bandwidth

• Transmission size can be affected by compression

• Compressed Bloom filters ([mitzenmacher])

Page 14: Bloom Filters

14

Compressed BF (2)

• Positive aspects:- Quantity of bit exchanged reduced- False positive rate reduced- Amount of computation per query

reduced• Cost:

- Internal memory increased- Compression/decompression process

Page 15: Bloom Filters

15

Compressed BF (3)

Page 16: Bloom Filters

16

Counting BF

• The subset A is changing over time- Insertion- Deletion

• How to perform deletion?- Couting Bloom filters ([Fan et al.])

Page 17: Bloom Filters

17

Counting BF (2)

Page 18: Bloom Filters

18

Counting BF (3)

• Which size for the counter?- 4 bits per counter are OK for most

of the applications• What happens in case of an

overflow?

Page 19: Bloom Filters

19

Dynamic BF

• Statement:- During the execution of the

application, |A| can exceed its orignal size n

• Consequence:- The false positive rate is not

maintained anymore• Solution?

- Dynamic Bloom Filters ([Guo et al.])

Page 20: Bloom Filters

20

Dynamic BF (2)

• It uses a matrix of s Bloom filters- Each Bloom filter uses m bits and k

hash functions• It starts with s equals to 1.• A new Bloom filter (i.e., a new row in

the matrix) is created when needed.

Page 21: Bloom Filters

21

Dynamic BF (3)• How to insert an element?

- Check for an active Bloom filter- If there is no active Bloom filter,

create a new one- Add the element to the Bloom filter

• How to query an element?• If all s Bloom filters return false, the

element does not belong the DBF.• If, at least, one Bloom filter returns

true, the element probably belongs to the DBF.

Page 22: Bloom Filters

22

Dynamic BF (4)

Page 23: Bloom Filters

23

Retouched BF• Statement:

- Some false positives might be more troublesome than others

- Some applications might tolerate a small level of false negatives

• Question:- Can we trade-off the false positives

against false negatives?• Solution?

- Retouched Bloom filters ([Donnet et al. 06])

Page 24: Bloom Filters

24

Retouched BF (2)

Page 25: Bloom Filters

25

Retouched BF (3)

• Quid if we randomly reset s bits in the vector?

- Eliminates the same proportion of false positives as the proportion of false negatives generated

- Randomized bit clearing.• The process of removing selected false

positives is called selective clearing

Page 26: Bloom Filters

26

Retouched BF (4)

Page 27: Bloom Filters

27

Networking Applications

Page 28: Bloom Filters

28

Distributed Caching

•Proxies cooperate to exchange cache information

•Instead of sharing URLs list, proxies broadcast Bloom filters ([Fan et al.])

•A Bloom filter represents a proxy’s cache content

Page 29: Bloom Filters

29

Multicast

•A router maintains, for each multicast address, a list of associated interfaces/connections

•Replace the list by a Bloom filter ([Grönvall])

•Parallelization possible

•Deletion of an address can be achieved with a counting Bloom filter

Page 30: Bloom Filters

30

Measurement

• Topology discovery at the IP interface level

• Traceroute monitors exchange information about what was previously discovered

- Doubletree• This information shared can be encoded

as a Bloom filter- Communication cost reduction ([Donnet et al.

05])

Page 31: Bloom Filters

31

Conclusion

•A Bloom filter

•solves the set membership problem

•can generate false positives

•Extensions to standard Bloom filter were presented

•A few networking applications were discussed

Page 32: Bloom Filters

32

References

[Bloom]: Space/Time Trade-Offs in Hash Coding with Allowable Errors. In Communications of the ACM. vol. 13, n°7.

[McIlroy]: Development of a Spelling List. In Transactions on Communications. vol. 30, n° 1.

[Mitzenmacher]: Compressed Bloom Filters. In Transactions on Networking. vol. 10, n° 5.

[Fan et al.]: Summary Cache: a Scalable Wide-Area Web Cache Sharing Protocol. In Transactions on Networking. vol. 8, n°3.

[Guo et al.]: Theory and Network Applications of Dynamic Bloom Filters. In Proc. INFOCOM 2006.

Page 33: Bloom Filters

33

References (2)

[Bratbergsengen]: Hashing Methods and Relational Algebra Operations. In Proc. ICVLD 1984.

[Bruck et al.]: Weighted Bloom Filters. In Proc. ISIT 2006.

[Donnet et al. 06]: Retouched Bloom Filters: Allowing Networked Applications to Trade-Off Selected False Positives Against False Negatives. In Proc. CoNEXT 2006.

[Grönvall]: Scalable Multicast Forwarding. In Proc. ACM SIGCOMM 2001. Student Workshop.

[Donnet et al. 05]: Improved Algorithms for Network Topology Discovery. In Proc. PAM 2005.