Upload
roland-wood
View
214
Download
0
Embed Size (px)
Citation preview
Parallelizing Data Race Detection
Benjamin WesterFacebook
David Devecsery, Peter Chen, Jason Flinn, Satish NarayanasamyUniversity of Michigan
Data Races
Benjamin Wester - ASPLOS '13 2
TIM
E
t1 t2
lock(l)x = 1unlock(l)
x = 3lock(l)x = 2unlock(l)
Race Detection
Benjamin Wester - ASPLOS '13
Normal run time
TIM
E With race detector
3
Instrumentation+
Analysis
Detection is slow!8x–200x•Managed vs. binary•Granularity•Completeness•Debug info gatheredMatches app concurrency
Race Detection
Benjamin Wester - ASPLOS '13
Normal run time
TIM
E With race detector
4
Parallel race detector
Detection is slow!8x–200x•Managed vs. binary•Granularity•Completeness•Debug info gatheredMatches app concurrency
How to use parallel hardware?
• Scale program?– Performance tied to app
• Parallel algorithm?– Lots of fine-grained dependencies
• Uniparallelism– Converts execution into parallel pipeline– Works for instrumentation
Benjamin Wester - ASPLOS '13 5
[Wallace CGO’07,Nightingale ASPLOS’08]
Uniparallelism
Benjamin Wester - ASPLOS '13
TIM
E
6
Uniparallelism
Benjamin Wester - ASPLOS '13
TIM
E
7
Epoch-parallelExecution
Epoch-sequentialExecution
Uniparallelism
• Scales well:Epochs are independent execution units
• More efficient:Synchronization
Benjamin Wester - ASPLOS '13 8
TIM
E
Add Detector…
• Scales well:Epochs are independent execution units
• More efficient:Synchronization& Lock elision
Race detection depends on state!
Benjamin Wester - ASPLOS '13 9
TIM
E Here!Here!
Architecture
Benjamin Wester - ASPLOS '13 10
Epoch-sequentialEpoch-sequential Epoch-parallelEpoch-parallel CommitCommit
Moving Work
Identify meaningful fast subset of analysis state– Low frequency, lightweight, self-contained– Run in Epoch-Sequential phase– Predicted and usable in parallel phase
Symbolic execution– Replace missing state with symbols– Defer some computation to commit phase– Commit: replace symbols with concrete values
Benjamin Wester - ASPLOS '13 11
Architecture
Benjamin Wester - ASPLOS '13 12
Epoch-sequentialEpoch-sequential Epoch-parallelEpoch-parallel CommitCommit
Instrument and predictfast subset of analysis state Perform deferred work
on concrete values
Full instrumentationSymbolic execution
Fast
Slow
State: Vector clocks – e.g. ⟨0, 1, 0⟩ – for each:– Thread– Lock– Variable x 2: last read(s), last write
Example computation:Check(read_x ≤ thread_i ); write_x = thread_i
Transitive reductionUse knowledge of happens-before relationship
Parallel FastTrack
Benjamin Wester - ASPLOS '13 13
Fast
Slow
Parallel Eraser
State:– Set of locks held by thread (lockset)– Lockset for each variable– Position in state machine for each variable
Example computation:If state_x == SHARED then ls_x = ls_x ∩ {L, M}
Lockset factorizationFactor out common behavior
Benjamin Wester - ASPLOS '13 14
Parallel Performance
EfficiencyScalability
2 worker threads as baseline8-CPU PlatformBenchmarks:
SPLASH-2 (water, lu, ocean, fft, radix) Parallel app: pbzip2
Benjamin Wester - ASPLOS '13 15
Efficiency
Benjamin Wester - ASPLOS '13 16
Exactly 2 cores
Efficiency
Benjamin Wester - ASPLOS '13 17
Median 13% faster
Median 13% faster
Median 8% faster
Median 8% faster
Exactly 2 cores
Scaling Parallel FastTrack
Benjamin Wester - ASPLOS '13 18
2 Worker Threads
Scaling Parallel FastTrack
Benjamin Wester - ASPLOS '13 19
Median 4.4x with 8 coresMedian 4.4x with 8 cores
2 Worker Threads
Scaling Parallel Eraser
Benjamin Wester - ASPLOS '13 20
Median 3.3x with 8 coresMedian 3.3x with 8 cores
2 Worker Threads
Conclusion
3-Phase Uniparallel architecture– Parallel FastTrack– Parallel Eraser
Parallel algorithms:– Are more efficient– Scale better
Benjamin Wester - ASPLOS '13 21
Questions?
Benjamin Wester - ASPLOS '13 22