44
DETECTION OF POTENTIAL DEADLOCKS AND DATARACES ROZA GHAMARI Bogazici University March 2009

DETECTION OF POTENTIAL DEADLOCKS AND DATARACES ROZA GHAMARI Bogazici UniversityMarch 2009

Embed Size (px)

Citation preview

DETECTION OF POTENTIAL DEADLOCKS AND DATARACES

ROZA GHAMARI

Bogazici University March 2009

/44

Outlines

Introduction Synchronization Methods Detection of Potential Deadlocks

Due to Locks Due to Semaphores Due to Conditional Variables

Detection of Dataraces Overview Formal Definitions The Lockset Algorithm Model Checking Algorithm Static Datarace Detection Using Lockset Information Prototype Implementation

References

2/42

/44

3

Introduction

Common synchronization errors in multithreaded programs: Data races Atomicity violations Deadlocks

Manifest only on rare executions (Scheduling Dependent) Potential errors detector algorithms are valuable

/44

4

Introduction (Cont.)

Deadlock: when some threads are blocked trying to acquire a lock held by another thread

thread1 { thread2 { lock(A) ; lock(B) ; lock(B) ; lock(A) ; unlock(B) ; unlock(A); unlock(A) ; unlock(B);}

5

/44

Introduction (Cont.)

Data Race: two threads concurrently access a shared variable and at least one of the accesses is a write

TicketPurchase(NumOfTickets){ if (NumOfTickets · FreeTickets) FreeTickets -= NumOfTickets else

Print “Full”; }

Thread I Thread II

TicketPurchase(2)

if (NumOfTickets · FreeTickets) TicketPurchas

e(4)if (NumOfTickets · FreeTickets)

FreeTickets -= NumOfTickets

FreeTickets -= NumOfTickets

{FreeTickets = -2}

{FreeTickets = 4}

/44

6

Introduction (Cont.)

Definitions A trace tr is a sequence of events in a given

execution A feasible permutation of a trace is a trace

that is consistent with the original order of events from each thread and with constraints imposed by synchronization events

The constraint imposed by locks no lock is held by multiple threads at the same time

The constraints imposed by other synchronization mechanisms happens-before orderings

/44

7

Synchronization Mechanisms Locks

lock acquire, and lock release operations

Semaphores up, and down operations

Condition variables wait, and notity/notifyAll operations

/44

8

Detection of Potential DeadlocksDue to Locks

Event e is called a blocking event if e is an acquire of a lock l by thread t, and l is currently held by another thread when e occurs

Run-time detection of potential deadlocks that focuses on programs using locks: GoodLock algorithm (by Havelund): Detects only

potential deadlocks involving TWO threads Generalized later for multiple threads for programs

that use block structured locking (e.g. Java) Extended and optimized later to handle non block

structured locking

/44

9

Detection of Potential Deadlocks (Cont.)

Due to Locks (Cont.)

some feasible permutation of An execution trace restricted to operations on the specified synchronization mechanism and operations on threads deadlocks

A program has potential for deadlock due to locks ignoring gate locks (PDL-IGL) if there exist distinct threads t0,…,tm-1 and locks l0,…,lm-1 in the given trace tr such that, for all i = 0,…,m-1, ti holds lock li while acquiring lock li+1 mod m

Ignores the effect of gate locks

/44

10

Detection of Potential Deadlocks (Cont.)

Due to Locks (Cont.)

Run-time lock trees represents the nested pattern in which locks are acquired and released by the thread To handle general locking keep track of which

locks are held by a thread when it acquires another lock root has one child for each lock acquired by thread Each of those nodes has a child iff thread acquired

l while holding l’

Thread 1:acquire(l4);acquire(l3);acquire(l1);release(l4);release(l1);release(l3);

Thread1

L4

L3

L3

L1L1

/44

11

Detection of Potential Deadlocks (Cont.)

Due to Locks (Cont.)

Run-time lock graph G(V, E) V: all the nodes of all the run-time lock trees E: set of directed edges

1. Tree edges: the directed (from parent to child) edges in each of the run-time lock trees

2. Inter edges: bidirectional edges between nodes that are labeled with the same lock and that are in different run-time lock trees.

/44

12

Detection of Potential Deadlocks (Cont.)

Due to Locks (Cont.)

A valid cycle does not contain consecutive inter edges and nodes from each thread appear as at most

one consecutive subsequence in the cycle

Thread1

L1

L2

L3

L4

Thread2

L2

L3

Thread3

L4

L1

Thread3

L4

L3

L1T1L2T1L2T2L3T2L3T1L4Th1L4T3L1T3L1T1 Invalid!L3Thread1L3Thread2L3Thread4L3Thread1 Invalid!

L3Thread1L4Thread1L4Thread4L3Thread4L3Thread1 Valid

/44

13

Detection of Potential Deadlocks (Cont.)

Due to Locks (Cont.)

Modified Depth-First Search algorithm Traverses only valid paths, because it extends

the current path (on the search stack) only with edges satisfying both criteria for validity

a node all of whose neighbors have been explored may be explored multiple times (along incoming interedges)

PDL-IGL holds iff the run-time lock graph G contains a valid cycle

/44

14

Detection of Potential Deadlocks (Cont.)

Due to Locks (Cont.)

The algorithm does not consider gate locksproduces false alarms whenever some

common lock acquired by at least two threads prevents deadlocks

Check Potential for Deadlocks from Lock (PDL) condition checking if intersection of every set of locks for e1

with every set of locks for e2 resulted in a non-empty set

makes the algorithm more expensive

/44

15

Detection of Potential Deadlocks (Cont.)

Due to Semaphores (Cont.)

An execution trace has potential for deadlocks due to semaphores if some feasible permutation of the trace restricted to operations on semaphores and operations on threads deadlocks.

Two nature of semaphores: Mutual exclusion: analyzed exactly as if

they were locks, with down treated as acquire, and up treated as release

Condition synchronization: analyzed with all feasible permutations allowed by the ordering constraints, tracking the values of the semaphore

/44

16

Detection of Potential Deadlocks (Cont.)

Due to Semaphores (Cont.)

Happens-Before a partial order on the events in an

execution If event e1 happens-before event e2, then e1

must occur before e2 in all feasible permutations of the trace

succ(e) is the event immediately following e on the same thread

Up operation on semaphore

Down operation on semaphore

t1

t2

o

o m

Succ(o)

/44

17

Detection of Potential Deadlocks (Cont.)

Due to Semaphores (Cont.)

Cigarette Smokers ProblemInitially, tobacco =0, paper =0, matches =0, order =1smoker 1---------while (1) {tobacco.down()paper.down()order.up()}smoker 2---------while (1) {paper.down()matches.down()order.up()}

smoker 3---------while (1) {matches.down()tobacco.down()order.up()}agent---------while (1) {order.down()up on one of tobaco, paper, matches at randomup on one of the three at random but not above}

/44

18

Detection of Potential Deadlocks (Cont.)

Due to Semaphores (Cont.)

Partial order for a permutation of Cigarette Smokers problem Happens-before ordering no deadlock

tAgent

Smoker2

Smoker1

o o

p m

t p

p

o

p m

o

Order: o = 1;Tobacco: t = 0;Paper: p = 0;Match: m = 0;

init values:Up

Down

/44

19

Detection of Potential Deadlocks (Cont.)

Due to Semaphores (Cont.)

A permutation of the problem having a potential for deadlock

tt3

t2

t1

o o

p m

t p

p

Order: o = 1;Tobacco: t = 0;Paper: p = 0;Match: m = 0;

init values:

/44

20

Detection of Potential Deadlocks (Cont.)

Due to Lost Notifies

Lost notifies blocked threads in programs that use condition variables

An execution trace tr has potential for lost notify if it contains a notify or notifyall event e such that there is a feasible permutation of tr in which e wakes up fewer threads than it does in tr.

/44

21

Detection of Potential Deadlocks (Cont.)

Due to Lost Notifies (Cont.)

for each notify or notifyall event en, for each thread t woken by en, there is a potential for lost notify if t’s corresponding wait event ew does not happen-before en.

acquire wait release acquire

acquire notify release

/44

22

Detection of Potential Deadlocks (Cont.)

Due to Lost Notifies (Cont.)

Happens-Before Ordering due to Lost Notifies

For each notify or notifyAll event en and each wait event ew that is notified by en, en happens-before succ(ew), where succ(e) is the event immediately after e on the same thread.

acquire wait release acquire

acquire notify release

/44

23

Detection of Potential Deadlocks (Cont.)

Due to Locks, Condition Variables, and Semaphores

1. Determining semaphores used for mutual exclusion

2. Considering all ordering and lock constraints

3. Checking feasible permutation of the trace for deadlock (actually a naive algorithm!)

/44

24

Detection of Potential Deadlocks (Cont.)

Due to Locks, Condition Variables, and Semaphores (Cont.)

public synchronized doWait(Object ob) {

compute();

try {

synchronized(ob) {ob.wait();}

}

catch (InterruptedException e) { }

}

public doNotify(Object ob) {

sem.down();

synchronized(ob) {ob.notify();}

}

public synchronized doCompute() {

compute();

sem.up();

}

Example

/44

25

Detection of Potential Deadlocks (Cont.)

Due to Locks, Condition Variables, and Semaphores (Cont.)

Trace without deadlocks

wait

rel(b) acq(b)

acq(b)

notify

rel(b)

t1

t2

t3

acq (a) rel (a)

s

s

acq(b)acq(a)

/44

26

Detection of Potential Deadlocks (Cont.)

Due to Locks, Condition Variables, and Semaphores (Cont.)

Happens-before ordering for the trace

wait

rel(b) acq(b)

acq(b)

notify

rel(b)

t1

t2

t3

acq (a) rel (a)

s

s

acq(b)acq(a)

/44

27

Detection of Potential Deadlocks (Cont.)

Due to Locks, Condition Variables, and Semaphores (Cont.)

Feasible permutation with deadlock

wait

t1

t2

t3

acq (a)

s

acq(b)acq(a)

/44

28

Detection of Potential Deadlocks (Cont.)

Due to Locks, Condition Variables, and Semaphores (Cont.)

Feasible permutation with lost notifies

wait

rel(b) acq(b)

acq(b)

notify

rel(b)

t1

t2

t3

acq (a) rel (a)

s

s

acq(b)acq(a)

/44

29

Outlines

Introduction Synchronization Methods Detection of Potential Deadlocks

Due to Locks Due to Semaphores Due to Conditional Variables

Detection of Dataraces Overview Formal Definitions The Lockset Algorithm Model Checking Algorithm Static Datarace Detection Using Lockset Information Prototype Implementation

References

/44

30

Datarace Detection

Static datarace detection tools Example: Racex [Engler and Ashcraft], TVLA

[Sagiv et. al.] check whether a program is datarace free Not applicable to large and complicated

programs without producing spurious dataraces

/44

31

Datarace Detection (Cont.)

Dynamic datarace detection tools: Example: Lamport’s happens-before partial

order (Djit),Lock based techniques (Lockset) More precise than static but still produce

spurious dataraces Report errors only for dataraces in the

current interleaving

/44

32

Overview

Lockset tool: is based on the assumption that well-behaved programs preserve a locking discipline Discipline: for every shared memory

location there exists a lock such that all accesses to this location are guarded by this lock

Strength: predict dataraces in rare execution paths and not just find errors in the current execution

Weakness: Violation of the locking discipline does not

guarantee the existence of a datarace cannot provide a witness for a datarace

/44 33

Model Checking: A technique for verifying finite state machines

Searches exhaustively for dataraces and reveals even those that occur in rarely executed paths Limited applicability because the large

number of thread interleavings in realistic multithreaded programs causes state space explosion

Overview (Cont.)

/44

34

Overview (Cont.)

Hybrid solution: combine model checking and lockset1. Run the Lockset algorithm produce

violations of the locking discipline together with the executed trace

2. Use a model checker construct a witness trace sharing a prefix with the actual trace executed by Lockset

/44

35

Formal Definitions

Σ0 : The set of a program’s initial states σ, σ´ : global program states ac: an action ac is in relation R ((σ, ac, σ´ )∈ R): the

multithreaded program can step from σ to σ´ by performing the action ac

A trace π is a program trace if the first state in π is in Σ0

/44

36

Formal Definitions (Cont.)

A datarace in a multithreaded program occurs if there exists a reachable global state σ and two access events, a1 and a2, performed by different threads, such that the following conditions are met: 1. a1 and a2 access the same memory

location m2. at least one of a1 or a2 is a write operation

3. at least one of a1 or a2 is not a protected access event

4. a1 and a2 are enabled at σ

/44

37

The Lockset Algorithm

checks whether a program adheres to the locking discipline by monitoring all reads and writes as the program executes

infer the protection relation from the execution history

Set C(m) of candidate locks for m a lock l is in C(m) (at a certain point in

time) if, during the computation up to this point, every thread that accessed m was holding l at the moment of access

/44

38

The Lockset Algorithm (Cont.)

Available information on every monitored access event a The program counter of each thread ma the shared memory location accessed ta, the thread that performs a τa, the type of access a (Read or Write). ψa, whether access a is protected (True or

False). locksa, the locks that ta holds when a is

being executed. The global program state (σa), which

includes the values of local and global variables as well as the content of the heap.

/44

39

The Lockset Algorithm (Cont.)

Pseudo code

Initialization

For each shared memory m

C(m) = Ω

Monitor

On access event a:C(ma) = C(ma) intersect

locksa

if C(ma) = Ø then display a warning

/44

40

Model Checking Algorithm

performs a breadth first search starting from the initial states of the model (M. Σ0) until a bug is found

Defines 2 auxiliary sets: Seen contains all the states that were

visited in the computation so far Frontier stores the states that were

discovered in the last forward step

/44

41

Model Checking Algorithm (Cont.)

Pseudo code

/44

42

Static Datarace Detection Using Lockset Information Phase 1: finding a prefix for a witness

using a backward scan on the access events gathered by Lockset, starting from the violation location

Phase 2: constructing witnesses using a model checker 2.1: Constructing a model 2.2: Using a model checker

/44

43

Prototype Implementation

A prototype tool based on IBM tools Performing in 6 stages Lockset – The IBM Watson tool Wolf – IBM Haifa’s software model

checker

/44

44

References

R. Agarwal and S. D. Stoller. Run-Time Detection of Potential Deadlocks for Programs with Locks, Semaphores, and Condition Variables. In Proceedings of the Workshop on Parallel and Distributed Systems:Testing and Debugging (PADTAD), 2006.

O. Shacham, M. Sagiv, and A. Schuster, “Scaling Model Checking of Dataraces Using Dynamic Information,” Proc. 10th ACM Symp. Principles and Practice of Parallel Programming (PPOPP), pp. 107-118, 2005.44