35
Improving Bloom Filter Configuration for Lazy Transactional Memory Mark Jeffrey and J. Gregory Steffan ECE, University of Toronto November 10, 2011

Improving Bloom Filter Configuration for Lazy Transactional Memory

  • Upload
    dora

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Improving Bloom Filter Configuration for Lazy Transactional Memory. Mark Jeffrey and J. Gregory Steffan ECE, University of Toronto November 10, 2011. Parallel Programming is Hard. T 1. T 3. T 2. Rd(a). Rd(a). Rd(x). Rd(b). Wr (c). Rd(a). Wr (a). Rd(a). - PowerPoint PPT Presentation

Citation preview

Page 1: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Improving Bloom Filter Configuration for Lazy Transactional Memory

Mark Jeffrey and J. Gregory SteffanECE, University of Toronto

November 10, 2011

Page 2: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 2

Parallel Programming is Hard

T1

Rd(a)

Rd(b)

Wr(a)

T2

Rd(a)

Wr(c)

Rd(a)

T3

Rd(x)

Rd(a)

Tools offload some burden of managing data accesses:– Memory Race Replay– Atomicity Violation Survival– Transactional Memory– Speculative Optimizations

Many tools are using Bloom filters

Page 3: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 3

Bloom Filter

• Bit-vector-based data structure [1970]– offers fast set operations– in exchange for some imprecision

• Recently used to compare memory accesses• With unconventional practices: Intersection

&

We show new practices are inefficient!(in theory and empirically)

Page 4: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 4

Bloom Filters in Concurrency ToolsSystem Year ApplicationBulk 2006 Hardware TMBulkSC 2007 Memory ConsistencyHARD 2007 Race DetectionDeLorean 2008 Deterministic Race ReplaySoftSig 2008 Code Analysis/Optimization/DebugRingSTM 2008 Software TMSigRace 2009 Race DetectionColorSafe 2010 Atomicity ViolationInvalSTM 2010 Software TMAdapSig 2010 Software TMSvS 2011 Auto-protection of shared state

Our propositions will improve parallelism!

Page 5: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 5

Tracking Address-Set Conflicts

Page 6: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 6

Address-Sets

T1

Rd(a)

Rd(b)

Wr(a)

T2

Rd(a)

Wr(c)

Rd(a)

T3

Rd(x)

Rd(a)

Read Set:• memory locations read• RT1 = {a,b}

Write Set:• memory locations written• WT1 = {a}

Page 7: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 7

Burden: Address-Set Conflicts

T1

Rd(a)

Rd(b)

Wr(a)

T2

Rd(a)

Wr(c)

Rd(a)

T3

Rd(x)

Rd(a)

Conflicts– address accesses are dependent– independence -> parallelism!– address conflicts -> no parallelism

Conflict Detection requires – read and write set comparison

Page 8: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 8

Test address-sets for null-intersections

Detect conflicts at the end of a transaction

Lazy Conflict Detection

R1={a,c}W1={b}

T1 T2

Wr(b)--Rd(a)Rd(a)-

Rd(c)- -Rd(b)

?021 RW

Page 9: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 9

Bloom Filters (BF)

Page 10: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 10

Bloom Filter Background

• Bloom filter is a compact set representation– bit vector - much smaller than address space

x

h()

xS )BF(

Page 11: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 11

Bloom Filter Background

y h()?)BF(Sy

{Yes, No}

Query for an address, y

Page 12: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 12

Bloom Filter False Positives (FPs)

• Encode a large address space into a bit-vector – response to query is actually No or Maybe

• False Positives – when “maybe” is wrong

is y in ?

x y

Page 13: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 13

Partitioned Bloom Filter

Insert an address, x:– k hash functions encode k bit indices to set

x

h1() h2() hk()…

xS )BF(

Page 14: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 14

Probability of False Positives is well understood

Query for an address, y:

Partitioned Bloom Filter

y

h1() h2() hk()…

{Maybe, No}

?)BF(Sy

Page 15: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 15

UnconventionalBloom Filter Null-Intersection Tests

Page 16: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 16

Two existing approaches:1. build a Queue of Queries (QoQ)

2. combine queries into distinct Bloom filter– replace many queries with 1 intersection!

Bloom Filter Null-Intersection Tests

a2a3a4a5 a1 ?

?

Page 17: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 17

Do two sets share any elements?

Partitioned BF Intersection

?021 SS

…& …

{Disjoint, Maybe Overlap}

Page 18: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 18

Any asserted bits indicate set overlap

Unpartitioned BF Intersection

?021 SS

…& …

{Disjoint, Maybe Overlap}

Page 19: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 19

Imprecision in BF Intersection

• Bloom filter was intended for fast Querying

• Recent systems use filter for Intersection– Imprecision can produce False Set-Overlaps (FSO)– We are the first to study Bloom filter FSOs– Our goal is to

Understand and improve Bloom filter intersection

Page 20: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 20

Important Questions

When using BFs for testing null-intersection1. How do BF Intersection and QoQ compare?– theoretical study [SPAA ‘11]

2. Can we compromise? – new Bloom filter design

3. Does theory work in practice? – empirical study

Page 21: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 21

1. How do BF Intersection and QoQ compare?

Bloom Filters for Null-Intersection Tests

Page 22: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 22

Definitions

sets access addressdisjoint ,BA

bits m

Page 23: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 23

Definitions

h1() h2() hk()……

partitions k

sets access addressdisjoint ,BA

bits m

Page 24: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 24

• Unpartitioned BF Intersection

• Partitioned BF Intersection

• Queue of BF Queries

BAkmUnpartp

2111

Probability of FSO [SPAA ‘11]h1 h2 hk…

h1 h2 hk…

kBA

mk

Partp 11

BkA

mk

QoQp 1111b2b3b4b5 b1 ϵ?

Page 25: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 25

For any length m, and k > 1 hash functions,

nedUnpartitiodPartitioneQoQ ppp

Queue of Queries gives the fewest false conflictsPartitioned intersection improves on Unpartitioned

Comparing FSOs [SPAA ’11]

b2b3b4 b1 ϵ?

h1 hk… h1 hk…

Page 26: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 26

2. Can we compromise? A new Bloom filter design

Bloom Filters for Null-Intersection Tests

Page 27: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 27

Batch-of-Bloom-filters (BoB)

x hpre

x

h1 hk…

…h1 hk

xS )BoB(

…h1 hk

bSSSS 21

)BF( 1S )BF( 2S )BF( bS)BF(S

Page 28: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 28

{Disjoint, Maybe Overlap}

BoB Intersection

&…

……

?021 SS

BoB: compromise between QoQ and Intersect

Page 29: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 29

3. Does theory work in practice?Bloom Filters for Null-Intersection Tests

Page 30: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 30

Methodology

• Augment RingSTM with alternate BF configs[Spear et al. SPAA ’08]– unpartitioned Bloom filter intersection

• Stress BF configurations using STAMP bench

• 8-core Intel Xeon with SSE2 ISA– 32-bit Linux 2.6.32-5-686

Page 31: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 31

QoQ, BoB, part. intersect outperform baseline

Performance Results: LabyrinthExecution Time Aborts

21% Speedup

Better

Page 32: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 32

Querying overhead counteracts reduced aborts

Performance Results: Kmeans-low

Better

>25% slowdown

Execution Time Aborts

Page 33: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 33

Conclusion

Page 34: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Mark Jeffrey, Improving Bloom Filter Configuration for Lazy TM 34

Conclusion

Conflict detection often applies Bloom filters– for fast set operations: y ϵ S and S1∩S2

– unconventionally using BFs for null-intersection

Our recommendations (from theory & practice)1. strongly consider querying before intersection2. in hardware, consider intersecting BoBs3. build adaptive systems for application behaviors

Page 35: Improving Bloom Filter Configuration  for Lazy Transactional Memory

Improving Bloom Filter Configuration for Lazy Transactional Memory

Thank [email protected]