An Case for an Interleaving Constrained Shared-Memory Multi-Processor Jie Yu and Satish Narayanasamy...

Preview:

Citation preview

An Case for an Interleaving Constrained Shared-Memory

Multi-Processor

Jie Yu and Satish Narayanasamy

University of Michigan

Why is Parallel Programming Hard?

• Is single-threaded programming relatively easy?– Verification is NP-hard

– BUT, properties such as a function’s pre/post-conditions, loop invariants are verifiable in polynomial time

• Parallel programming is harder– Verifying properties for even small code regions is NP-

hard

– Reason: Unbounded number of legal thread interleavings exposed to the parallel runtime

– Impractical to test/verify properties for all legal interleavings

Legal Thread Interleavings

Too much freedom given to parallel runtime?

Tested Correct

Interleavings

Incorrect interleavings found during testing

Incorrect interleavings eliminated by adding synchronization constraints

Untested interleavings - cause for concurrency bugs

Solution : Limit Freedom

Programmer tests as many legal interleavingsas practically possible

Interleaving constraints from

correct test runs are encoded in the program binary

Runtime System Avoids Untested Interleavings

i.e. avoid corner cases

Result of Constraining Interleavings

• A majority of the concurrency bugs are avoidable– Data races, atomicity violations, and

also order violations

• Performance overhead is low– Untested interleavings in well-tested

programs are likely to manifest rarely– Processor support helps reduce the cost

of enforcing interleaving constraints

Challenges

• How to encode tested interleavings in a program’s binary?– Predecessor Set (PSet)

interleaving constraints

• How to efficiently enforce interleaving constraints at runtime?• Detect violations of PSet

constraints using processor support

• Avoid violations by stalling or using rollback-and-re-execution support

Encoding Tested Interleavings

• Interleaving Constraints from Test Runs– Too specific to a test input Performance

loss for a different input– Too generic Might allow untested

interleavings

• Predecessor Set (Pset)– PSet(m)defined for each static memory

operation m– pred PSet(m), if m is immediately and

remotely memory dependent on pred in at least one tested execution

A Test RunThread

1Thread

2Thread

3

R2

W1

R1

R3

W2

R4

W3

{ W1 }

{ }

{ }

{ W1 }

{ W2 }

{ }

{ R3, R4 }

PSet(W1) = {}PSet(R1) = {}PSet(R2) = {W1}PSet(R3) = {W1}PSet(R4) = {}PSet(W2) = {R3,R4}PSet(W3) = {W2}

R2

R4

W1

Enforcing Tested Interleaving

• Processor support for detecting and avoiding PSet constraints

• Detecting PSet constraint violations– For each memory location, track its last accessor

• Cache extension – Detect PSet constraint violation

• Piggyback cache coherence reply with last accessor • Processor executes PSet membership test by executing

additional micro-ops

• Overcoming a PSet Constraint violation– Stall– Re-execute using checkpoint-and-rollback support

• E.g. SafetyNet, ReVive, etc.

Two Case Studies

• Case Study 1– An Atomicity Violation Bug in MySQL– Avoided using stall

• Case Study 2– An order violation bug in Mozilla

• neither a data race nor an atomicity violation

– Avoided using rollback and re-execution

Two Case Studies

• Case Study 1– An Atomicity Violation Bug in MySQL– Avoided using stall

• Case Study 2– An order violation bug in Mozilla

• neither a data race nor an atomicity violation

– Avoided using rollback and re-execution

An Atomicity Violation Bug in MySQL

MYSQL_LOG::new_file(){ … close(); open(…); …}

mysql_insert(…){ … if (log_status != LOG_CLOSED) { // write into a log file } …}

…log_status = LOG_CLOSED;…

…log_status = LOG_OPEN;…

Thread 1

sql/log.cc sql/sql_insert.cc

W2

W1

R1

Thread 2

Correct Interleaving #1 -- “frequent”, therefore likely to be

tested

Thread 1

Thread 2

log_status = LOG_CLOSED

log_status = LOG_OPENW2

log_status != LOG_CLOSED ?

W1

R1

{ R1 }

{ }

{ }

PSet(W1) = {R1}PSet(W2) = {}PSet(R1) = {}

Correct Interleaving #2 -- “frequent”, therefore likely to be

tested

Thread 1

Thread 2

log_status = LOG_CLOSED

log_status = LOG_OPENW2

log_status != LOG_CLOSED ?

W1

R1

{ R1 }

{ }

{ }{ W2 }

PSet(W1) = {R1}PSet(W2) = {}PSet(R1) = {W2}

log_status != LOG_CLOSED ?

Incorrect Interleaving -- rare, and therefore likely to be

untested

Thread 1

Thread 2

log_status = LOG_CLOSED

log_status = LOG_OPENW2

W1

R1

{ R1 }

{ }

{ W2 }

Constraint ViolationPSet(R1)W1

PSet(R1)W2

Two Case Studies

• Case Study 1– An Atomicity Violation Bug in MySQL– Avoided using stall

• Case Study 2– An order violation bug in Mozilla

• neither a data race nor an atomicity violation

– Avoided using rollback and re-execution

Correct Test RunTimerThread::Run() { ... Lock(lock); mProcessing = TRUE; while (mProcessing) { ... mWaiting = TRUE; Wait(cond, lock); mWaiting = FALSE; } Unlock(lock); ...}

TimerThread.cpp

TimerThread::Shutdown() { ... Lock(lock); mProcessing = FALSE; if (mWaiting) Notify(cond, lock); Unlock(lock); ... mThread->Join(); return NS_OK;}

TimerThread.cpp

mWaiting = TRUE

if (mWaiting) ?

Thread 1

Thread 2

W

R

W

R

{ }

{ W }

PSet(W) = {}PSet(R) = {W}

Avoiding Order ViolationTimerThread::Run() { ... Lock(lock); mProcessing = TRUE; while (mProcessing) { ... mWaiting = TRUE; Wait(cond, lock); mWaiting = FALSE; } Unlock(lock); ...}

TimerThread.cpp

TimerThread::Shutdown() { ... Lock(lock); mProcessing = FALSE; if (mWaiting) Notify(cond, lock); Unlock(lock); ... mThread->Join(); return NS_OK;}

TimerThread.cpp

mWaiting = TRUE

if (mWaiting) ?

W

R

Thread 1

Thread 2

W

R

{ }

{ W }

Constraint ViolationPSet(W)R

Rollback

Methodology

• Pin based analysis

• 17 documented bugs analyzed– MySQL, Apache, Mozilla, pbzip, aget, pfscan

+ Parsec, Splash for performance study

• Applications tested using regression test suites when available or random test input

PSet Constraints from Test Runs

• Concurrent workload– MySQL: run regression test

suite in parallel with OSDB– FFT, pbzip2: random test

input

Bug Avoidance Capability• 17 bugs from MySQL, Apache, Mozilla, pbzip, aget,

pfscan

• 15/17 bugs avoided by enforcing PSet contraints– Including a bug that is neither a data race nor an

atomicity violation bug

• 2/17 false negatives– a multi-variable atomicity violation – a context sensitive deadlock bug

• 6 bugs are avoided using stalling mechanism. Other require rollback mechanism.

PSet violations in Bug Free Execution

• 2 PSet constraint violations in MySQL not avoided– MySQL, bmove512 unrolls a loop 128 times

PSet Size of Instructions

Over 95% of the inst. have PSets of size zero

Less than 2% of static memory inst. have a PSet of size greater than two

Summary• Multi-threaded programming is hard

– Existing shared-memory programming model exposes too many legal interleavings to the runtime

– Most interleavings remain untested in production code

• Interleaving constrained shared-memory

multiprocessor – Avoids untested (rare) interleavings to avoid

concurrency bugs

• Predecessor Set interleaving constraints– 15/17 concurrency bugs are avoidable– Acceptable performance and space overhead

Thanks

• Q & A

Memory Space Overhead

ProgramApp. Size

# PSet Pairs

Overhead w.r.t App.

Pbzip2 39KB 201 2.16%

Aget 90KB 365 1.69%

Pfscan 17KB 295 7.34%

Apache 2435KB 4119 0.69%

MySQL 4284KB 6604 0.64%

FFT 24KB 158 2.74%

FMM 73KB 1764 10.13%

LU 24KB 244 4.31%

Radix 21KB 255 5.00%

Blackscholes

54KB 41 0.32%

Canneal 59KB 752 5.24%

Space Overhead In the worst case, 10%

code size increase

Recommended