Upload
aysha
View
42
Download
0
Tags:
Embed Size (px)
DESCRIPTION
On-the-Fly Data-Race Detection in Multithreaded Programs. Prepared by Eli Pozniansky under Supervision of Prof. Assaf Schuster. Table of Contents. What is a Data-Race? Why Data-Races are Undesired? How Data-Races Can be Prevented? Can Data-Races be Easily Detected? - PowerPoint PPT Presentation
Citation preview
On-the-Fly Data-RaceDetection in
Multithreaded Programs
Prepared by Eli Pozniansky under Supervision of Prof. Assaf Schuster
2
Table of Contents
What is a Data-Race? Why Data-Races are Undesired? How Data-Races Can be Prevented? Can Data-Races be Easily Detected? Feasible and Apparent Data-Races Complexity of Data-Race Detection
NP and Co-NP Program Execution Model & Ordering Relations Complexity of Computing Ordering Relations Proof of NP/Co-NP Hardness
3
Table of ContentsCont.
So How Data-Races Can be Detected? Lamport’s Happens-Before Approximation
Approaches to Detection of Apparent Data-Races: Static Methods Dynamic Methods:
Post-Mortem Methods On-The-Fly Methods
4
Table of ContentsCont.
Closer Look at Dynamic Methods: DJIT+
Local Time Frames Vector Time Frames Logging Mechanism Data-Race Detection Using Vector Time Frames Which Accesses to Check? Which Time Frames to Check? Access History & Algorithm Coherency Results
5
Table of ContentsCont.
Lockset Locking Discipline The Basic Algorithm & Explanation Which Accesses to Check? Improving Locking Discipline
Initialization Read-Sharing Barriers
False Alarms Results
Combining DJIT+ and Lockset Summary References
6
What is a Data Race?
Concurrent accesses to a shared location by two or more threads, where at least one is for writing
Example (variable X is global and shared):
Thread 1 Thread 2X=1 T=YZ=2 T=X
Usually indicative of bug!
7
Why Data-Races areUndesired?
Programs with data-races: Usually demonstrate unexpected and even
non-deterministic behavior. The outcome might depend on specific
execution order (A.K.A threads’ interleaving).
Re-executing may not always produce the same results/same data-races.
Thus, hard to debug and hard to write correct programs.
8
Why Data Races areUndesired? – Example
First interleaving: Thread 1 Thread 2 1. reg1X 2. incr reg1 3. Xreg1
4. reg2X 5. incr reg2
6. Xreg2 Second interleaving: Thread 1 Thread
21. reg1X
2. incr reg13. reg2X
4. incr reg25. Xreg26. Xreg1 At the beginning: X=0. At the end: X=1 or X=2?
Depends on the scheduling order
Machine codefor ‘X++’
9
Execution Order
Each thread has a different execution speed.
The speed may change over time. For an external observer of the time axis,
instructions appear in execution order.
Any order is legal. Execution order for a single
thread is called program order.
Time
T1
T2
10
Lock(m)
Unlock(m) Lock(m)
Unlock(m)
How Data Races Can be Prevented?
Explicit synchronization between threads: Locks Critical Sections Barriers Mutexes Semaphores Monitors Events Etc.
Thread 1 Thread 2
X++
T=X
11
Synchronization –“Bad” Bank Account Example
Thread 1 Thread 2Deposit( amount ) { Withdraw( amount ) {
balance+=amount; if (balance<amount);} print( “Error” );
elsebalance–
=amount; }
‘Deposit’ and ‘Withdraw’ are not “atomic”!!!
What is the final balance after a series of concurrent deposits and withdraws?
12
Synchronization –“Good” Bank Account
ExampleThread 1 Thread 2Deposit( amount ) { Withdraw( amount ) {
Lock( m ); Lock( m );balance+=amount; if (balance<amount)Unlock( m ); print( “Error” );
} elsebalance–=amount;Unlock( m ); }
Since critical sections can never execute concurrently, this version exhibits no data-races.
Critical Sections
13
Is This Enough?
Theoretically – YES Practically – NO
What if programmer accidentally forgets to place correct synchronization?
How all such data race bugs can be detected in large program?
How to eliminate redundant synchronization?
14
Can Data Races be Easily Detected? – No!
The problem of deciding whether a given program contains potential data races (called feasible) is NP-hard [Netzer&Miller 1990] Input size = # instructions performed Even for 2 threads only Even with no loops/recursion
Lots of execution orders: (#threads)thread_length*threads
Also all possible inputs should be tested Side effects of the detection code can eliminate
all data races
alock(m)...unlock(m)
lock(m)bunlock(m)
15
Feasible Data-Races
Based on the possible behavior of the program (i.e. semantics of the program’s computation).
The actual (!) data-races that can possibly happen in some program execution.
Require full analyzing of the program’s semantics to determine if the execution could have allowed accesses to same shared variable to execute concurrently.
16
Apparent Data Races
Approximations of the feasible data races Based on only the behavior of program
explicit synchronization (and not on program semantics)
Important since data-races are usually result of improper synchronization
Easier to locate Less accurate Exist iff at least one feasible data race exists Exhaustively locating all apparent data races
is still NP-hard (and, in fact, undecidable)
17
Apparent Data-Races Cont.
Accesses a and b to same shared variable in some execution, are ordered, if there is a chain of corresponding explicit synchronization events between them.
a and b are said to have potentially executed concurrently if no explicit synchronization prevented them from doing so.
Initially: grades = oldDatabase; updated = false;
grades:=newDatabase;
updated:=true; while (updated == false);
X:=grades.gradeOf(lecturersSon);
Thread T.A.
Thread Lecturer
18
Feasible vs. Apparent
Thread 1 [Ffalse] Thread 2X++F=true
if (F==true)X– –
Apparent data-races in the execution above – 1 & 2.
Feasible data-races – 1 only!!! – No feasible execution exists, in which ‘X--’ is performed before ‘X++’ (suppose ‘F’ is false at start).
Protecting ‘F’ only will protect ‘X’ as well.
2
1
19
Feasible vs. Apparent
Thread 1 [Ffalse] Thread 2X++ Lock( m )Lock( m ) T = FF=true Unlock( m )Unlock( m ) if (T==true)
X– – No feasible or apparent data-races exist under
any execution order!!! ‘F’ is protected by a lock. ‘X++’ and ‘X– –’ are
always ordered and properly synchronized. Rather there is a sync‘ chain of Unlock(m)-Lock(m)
between ‘X++’ and ‘X– –’, or only ‘X++’ executes.
20
Complexity ofData-Race Detection
Exactly locating the feasible data-races is an NP-hard problem. The apparent races, which are simpler to
locate, must be detected for debugging.
Apparent data-races exist if and only if at least one feasible data-race exists somewhere in the execution.
The problem of exhaustively locating all apparent data-races is still NP-hard.
21
Reminder: NP and Co-NP
There is a set of NP problems for which: There is no polynomial solution. There is an exponential solution.
Problem is NP-hard if there is a polynomial reduction from any of the problems in NP to this problem.
Problem is NP-complete, if it is NP-hard and it resides in NP.
Intuitively - if the answer for the problem can be only ‘yes’/‘no’ we can either answer ‘yes’ and stop, or never stop (at least not in polynomial time).
22
Reminder: NP and Co-NP Cont.
The set of Co-NP problems is complementary to the set of NP problems.
Problem is Co-NP-hard if we can only answer ‘no’.
If problem is both in NP and Co-NP, then it’s in P (i.e. there is a polynomial solution).
The problem of checking whether a boolean formula is satisfiable is NP-complete. Answer ‘yes’ if satisfiable assignment for variables
was found. Same, but not-satisfiable – Co-NP-complete.
23
Why Data-Race Detectionis NP-Hard?
Question: How can we know that in a program P two accesses, a and b, to the same shared variable are concurrent?
Answer: We must check all execution orders of P and see. If we discover an execution order, in which
a and b are concurrent, we can report on data-race and stop.
Otherwise we should continue checking.
24
Program Execution Model
Consider a class of multi-threaded programs that synchronize by counting semaphores.
Program execution is described by collection of events and two relations over the events.
Synchronization event – instance of some synchronization operation (e.g. signal, wait).
Computation event – instance of a group of statements in same thread, none of which are synchronization operations (e.g. x=x+1).
25
Program Execution Model –Events’ Relations
Temporal ordering relation – a T→ b means that a completes before b begins (i.e. last action of a can affect first action of b).
Shared data dependence relation - a D→ b means that a accesses a shared variable that b later accesses and at least one of the accesses is a modification to variable. Indicates when one event causally affects
another.
26
Program Execution Model –Program Execution
Program execution P – a triple <E,T→,D→>, where E is a finite set of events, and T→ and D→ are the above relations that satisfy the following axioms: A1: T→ is an irreflexive partial order (a T↛ a). A2: If a T→ b T↮ c T→ d then a T→ d. A3: If a D→ b then b T↛ a.
Notes: ↛ is a shorthand for ¬(a→b). ↮ is a shorthand for ¬(a→b)⋀¬(b→a). Notice that A1 and A2 imply transitivity of T→.
27
Program Execution Model –Feasible Program Execution
Feasible program execution for P – execution of a program that: performs exactly the same events as P May exhibit different temporal ordering.
Definition: P’=<E’,T’→,D’→> is a feasible program execution for P=<E,T→,D→> (potentially occurred) if F1: E’=E (i.e. exactly the same events), and F2: P’ satisfies the axioms A1 - A3 of the model, and F3: a D→ b ⇒ a D’→ b (i.e. same data dependencies)
Note: Any execution with same shared-data dependencies as P will execute exactly the same events as P.
28
Program Execution Model –Ordering Relations
Given a program execution, P=<E,T→,D→>, and the set, F(P), of feasible program executions for P, the following relations are defined: Summarize the temporal orderings present in the
feasible program executions.
Must-Have Could-Have
Happened- Before
a MHB→ b ⇔∀<E,T→,D→>∈F(P), a T→ b
a CHB→ b ⇔∃<E,T→,D→>∈F(P), a T→ b
Concurrent-With
a MCW↔ b ⇔∀<E,T→,D→>∈F(P), a T↮ b
a CCW↔ b ⇔∃<E,T→,D→>∈F(P), a T↮ b
Ordered-With
a MOW↔ b ⇔∀<E,T→,D→>∈F(P), ¬(a T↮ b)
a COW↔ b ⇔∃<E,T→,D→>∈F(P), ¬(a T↮ b)
29
Program Execution Model –Ordering Relations -
Explanation The must-have relations describe orderings that
are guaranteed to be present in all feasible program executions in F(P).
The could-have relations describe orderings that could potentially occur in at least one of the feasible program executions in F(P).
The happened-before relations show events that execute in a specific order.
The concurrent-with relations show events that execute concurrently.
The ordered-with relations show events that execute in either order but not concurrently.
30
Complexity of Computing Ordering Relations
The problem of computing any of the must-have ordering relations (MHB, MCW, MOW) is Co-NP-hard.
The problem of computing any of the could-have relations (CHB, CCW, COW) is NP-hard.
Theorem 1: Given a program execution, P=<E,T→,D→>, that uses counting semaphores, the problem of deciding whether a MHB→ b, a MCW↔ b or a MOW↔ b (any of the must-have orderings) is Co-NP-hard.
31
Proof of Theorem 1 –Notes
The proof is a reduction from 3CNFSAT such that any boolean formula is not satisfiable iff a MHB→ b for two events, a and b, defined in the reduction.
The problem of checking whether 3CNFSAT formula is not satisfiable is Co-NP-complete.
The presented proof is only for the must-have-happened-before (MHB) relation. Proofs for the other relations are analogous.
The proof can also be extended to programs that use binary semaphores, event style synchronization and other synchronization primitives (and even single counting semaphore).
32
Proof of Theorem 1 –3CNFSAT
An instance of 3CNFSAT is given by: A set of n variables, V={X1,X2, …,Xn}. A boolean formula B consisting of conjunction
of m clauses, B=C1⋀C2⋀…⋀Cm. Each clause Cj=(L1⋁L2⋁L3) is a disjunction of
three literals. Each literal Lk is any variable from V or its
negation - Lk=Xi or Lk=⌐Xi. Example:
B=(X1⋁X2⋁⌐X3)⋀(⌐X2⋁⌐X5⋁X6)⋀(X1⋁X4⋁⌐X5)
33
Proof of Theorem 1 –Idea of the Proof
Given an instance of 3CNFSAT formula, B, we construct a program consisting of 3n+3m+2 threads which use 3n+m+1 semaphores (assumed to be initialized to 0).
The execution of this program simulates a nondeterministic evaluation of B.
Semaphores are used to represent the truth values of each variable and clause.
The execution exhibits certain orderings iff B is not satisfiable.
34
Proof of Theorem 1 –The Construction per Variable For each variable, Xi, the following
three threads are constructed:wait( Ai )signal( Xi )..signal( Xi )
wait( Ai )signal( not-Xi )..signal( not-Xi )
signal( Ai )wait( Pass2 )signal( Ai )
“. . .” indicates as many signal(Xi) (or signal(not-Xi)) operations as the number of occurrences of the literal Xi (or ⌐Xi) in the formula B.
35
Proof of Theorem 1 –The Construction per Variable
The semaphores Xi and not-Xi are used to represent the truth value of variable Xi.
Signaling the semaphore Xi (or not-Xi) represents the assignment of True (or False) to variable Xi.
The assignment is accomplished by allowing either signal(Xi) or signal(not-Xi) to proceed, but not both (due to concurrent wait(A i) operations in two leftmost threads).
36
Proof of Theorem 1 –The Construction per Clause
For each clause, Cj, the following three threads are constructed:
wait( L1 )signal( Cj )
wait( L2 )signal( Cj )
wait( L3 )signal( Cj )
L1, L2 and L3 are the semaphores corresponding to literals in clause Cj (i.e. Xi or not-Xi).
The semaphore Cj represents the truth value of clause Cj. It is signaled iff the truth assignments to variables, cause the clause Cj to evaluate to True.
37
Proof of Theorem 1 –Explanation of Construction
The first 3n threads operate in two phases: The first pass is a non-deterministic guessing
phase in which: Each variable used in the boolean formula B is
assigned a unique truth value. Only one of the Xi and not-Xi semaphores is signaled.
The second pass (begins after semaphore Pass2 is signaled) is used to ensure that the program doesn’t deadlock:
The semaphore operations that were not allowed to execute during the first pass are allowed to proceed.
38
Proof of Theorem 1 –The Final Construction
Additional two threads are created:
There are n ‘signal(Pass2)’ operations – one for each variable.
There are m ‘wait(Cj)’ operations – one for each clause.
wait( C1 )..
wait( Cm )b: skip
a: skip
signal( Pass2 )..
signal( Pass2 )
m n
39
Proof of Theorem 1 –Putting All Together
Event b is reached only after semaphore Cj, for each clause j, has been signaled.
The program contains no conditional statements or shared variables. Every execution of the program executes
the same events and exhibits the same shared-data dependencies (i.e. none).
Claim: For any execution a MHB→ b iff B is not satisfiable.
40
Proof of Theorem 1 –Proving the “if” Part
Assume that B is not satisfiable. Then there is always some clause, Cj, that is
not satisfied by the truth values guessed during the first pass. Thus, no signal(Cj) operation is performed during the first pass.
Event b can’t execute until this signal(Cj) operation is performed, which can then only be done during the second pass.
The second pass doesn’t occur until after event a executes, so event a must precede event b.
Therefore, a MHB→ b.
41
Proof of Theorem 1 –Proving the “only if” Part
Assume that a MHB→ b. This means that there is no execution in which b
either precedes a or executes concurrently with a. Assume by way of contradiction that B is
satisfiable. Then some truth assignment can be guessed
during the first pass that satisfies all of the clauses.
Event b can then execute before event a, contradicting the assumption.
Therefore, B is not satisfiable.
42
Complexity of Computing Ordering Relations – Cont.
Since a MHB→ b iff B is not satisfiable, the problem of deciding a MHB→ b is Co-NP-hard.
By similar reductions, programs can be constructed such that the non-satisfiability of B can be determined from the MCW or MOW relations. The problem of deciding these relations is therefore also Co-NP-hard.
Theorem 2: Given a program execution, P=<E,T→,D→>, that uses counting semaphores, the problem of deciding whether a CHB→ b, a CCW↔ b or a COW↔ b (any of the could-have orderings) is NP-hard.
Proof by similar reductions …
43
Complexity of Race Detection -
Conditions, Loops and Input The presented model is too simplistic. What if the “if” and “while” statements are
used? What if the user’s input is allowed?
Thread 1 Thread 2
Y = ReadFromInput( );while ( Y < 0 ) Print( Y );X++;[1]
X++;[2]
If Y≥0 there is a data-race. Otherwise it is not possible, since [1] is never reached.
44
Complexity of Race Detection -
“NP-Harder”?
The proof above does not use conditional statements, loops or input from outside.
The problem of data-race detection is much-much harder then deciding an NP-complete problem. Intuitively - there is no exponential solution,
since it’s not known whether the program will stop.
Thus, in general case, it’s undecidable.
45
So How Data-Races Can be Detected? – Approximations
Deciding whether a CHB→ b or a CCW↔ b will reveal feasible data-races.
Since it is intractable problem, the temporal ordering relation T→ should be approximated and apparent data-races located instead.
Recall that apparent data-races exist if and only if at least one feasible race exists.
Yet, it remains a hard problem to locate all apparent data-races.
46
Approximation Example – Lamport’s Happens-Before
The happens-before partial order, denoted a hb→b, is defined for access events (reads, writes, releases and acquires) that happen in a specific execution, as follows:
Shared accesses a and b are concurrent,a hb↮ b, if neither a hb→ b nor b hb→ a holds.
Program Order:a and b are events performed by the same thread, with a preceding b
Release and Acquire:a is a release of a some sync’ object S and b is a corresponding acquire
Transitivity:a hb→c and c hb→b
Thread 1
Thread 2
.a
.unlock(L)
.
.
.
.
.
.
.
.lock(L)
.b
a hb→b
47
Approaches to Detection ofApparent Data-Races – Static
There are two main approaches to detection of apparent data-races (sometimes a combination of the both is used): Static – perform a compile-time analysis of the
code.– Too conservative:
Can’t know or understand the semantics of the program. Result in excessive false alarms that hide the real data-races.
+ Test the program globally: See the whole code of the tested program Can warn about all possible errors in all possible executions.
48
Approaches to Detection ofApparent Data-Races –
Dynamic Dynamic – use tracing mechanism to detect
whether a particular execution actually exhibited data-races.+ Detect only those apparent data-races that actually
occur during a feasible execution.– Test the program locally:
Consider only one specific execution path of the program each time.
Post-Mortem Methods – after the execution terminates, analyze the trace of the run and warn about possible data-races that were found.
On-The-Fly Methods – buffer partial trace information in memory, analyze it and detect races as they occur.
49
Approaches to Detection ofApparent Data-Races
No “silver bullet” exists.
The accuracy is of great importance (especially in large programs).
There is always a tradeoff between the amount of false positives (undetected races) and false negatives (false alarms).
The space and time overheads imposed by the techniques are significant as well.
50
Closer Look atDynamic Methods
We show two dynamic methods for on-the-fly detection of apparent data-races in multi-threaded programs with locks and barriers: DJIT+ – based on Lamport’s happens-before
partial order relation and Mattern’s virtual time (vector clocks). Implemented in Millipede and MultiRace systems.
Lockset – based on locking discipline and lockset refinement. Implemented in Eraser tool and MultiRace system.
51
DJIT+
Description Detects the apparent data-races in
program execution when they actually occurs.
Based on the happens-before partial order. Can announce data-races race-by-race. After the cause of the race is verified, the
search for other races can proceed. The main disadvantage of the technique is
that it is highly dependent on the scheduling order.
52
DJIT+ Local Time Frames (LTF)
The execution of each thread is split into a sequence of time frames
A new time frame starts on each release (unlock/barrier)
For every access there is a time stamp = a vector built from LTFs of all threads at the moment of the access
Thread LTF
x = 1lock( L1 )z = 2lock( L2 )y = 3unlock( L2 )z = 4barrier( B )x = 5
1
1
1
2
3
53
DJIT+
Local Time FramesClaim 1: Let a in thread ta and b in thread tb be two accesses, where a occurs at time frame Ta, and the release in ta, corresponding to the latest acquire in tb which precedes b, occurs at time frame Tsync in ta. Then a hb→ b iff Ta < Tsync.
TFa ta tb
Ta
Trelease
Tsync
acq.a
.rel.
rel...
.
.
.acq
.
.
.
.acq
.b
54
DJIT+
Local Time Frames
Proof:- If Ta < Tsync then (a hb→ release) and since (release hb→ acquire) and (acquire hb→ b), we get (a hb→ b).- If (a hb→ b) and since a and b are in distinct threads, then by definition there exists a pair of corresponding release an acquire, so that (a hb→ release) and (acquire hb→ b). It follows that Ta < Trelease
≤ Tsync.
55
DJIT+
Vector Time Frames (VTF) A vector stt[.] for each thread t
Vector size = maxthreads (the maximum number of threads to execute)
Thread ID = thread index stt[t] is the LTF of t
Holds the number of releases actually made by t stt[u] stores the latest LTF of thread u known
to t If u is an acquirer of t’s release, then u’s vector
is updated:for k=0 to maxthreads-1
stu[k] = max( stu[k], stt[k] )
56
DJIT+
Vector Time Frames
In such way, the vector of u is notified of: The latest time frame of t. The latest time frames of other threads
according to the knowledge of t. Note that a thread can learn about a
release performed by another thread through “gossip”, when this information is transferred through a chain of corresponding release-acquire pairs.
57
Thread 1 Thread 2 Thread 3(1 1 1)
(1 1 1) (1 1 1)
write Xrelease( m1 )read Z
(2 1 1) acquire( m1
)read Yrelease( m2 )write X
(2 1 1)
(2 2 1)acquire( m2 )write X
(2 2 1)
DJIT+
Vector Time Frames
58
DJIT+ Vector Time Frames
Claim 2: Let a and b be two accesses in respective threads ta and tb, which happened during respective local time frames Ta and Tb. Let f denote the value of sttb[ta] at the time when b occurs. Then a hb→ b iff Ta < f.
TFa ta tc tb TFb
Ta a.
rel........
.
.
.
.acq
.rel....
.
.
.
.
.
.
.
.acq
.b Tb
59
DJIT+ Vector Time Frames
Proof:- If (a hb→ b) and since a and b are in distinct threads, then there exists a chain of releases and corresponding acquires such that the first release in ta and the last acquire in tb, so that (a hb→ first release) and (first release hb→ last acquire). The information on ta’s local time frame is transferred through that chain, reaches tb and stored in sttb[ta] (=f). Thus it follows that Ta < Tfirst release ≤ f.
- If Ta < f then there is a sequence of corresponding release-acquire pairs, which transfer the local time frame from ta to tb, finally resulting in tb “hearing” that ta entered a time frame which is later than Ta. This same sequence can be used to transitively apply the hb→ relation from a to b.
60
DJIT+ Logging Mechanism
We assume the existence of some logging mechanism, which is: Capable of logging all the accesses to all
shared locations as they occur. Accesses are logged ‘atomically’ (no data-
races on the accesses to the log) Agrees with the happens-before partial order:
If a hb→ b, then i is logged prior to b. Also it follows that – if a and b accesses to same shared
location v and a is logged prior to b, then b hb↛ a.
61
DJIT+ Data-Race Detection Using
VTF
Theorem 1: Let a and b be two accesses to the same shared variable in respective threads ta and tb during respective local time frames Ta and Tb. Suppose that at least one of a or b is a write. Assume that a was logged and tested for races prior to b. Then a and b form a data-race iff at the time when b is logged it holds that sttb[ta] ≤ Ta.
62
DJIT+ Data-Race Detection Using
VTF
Proof:- If sttb[ta] ≤ Ta then, by Claim 2, a hb→ b doesn’t hold. Since b is only currently being logged, it can not hold that b hb→ a. Thus a and b are concurrent and form a data race (since at least one of them is a write).- If a and b form a data race then a hb→ b doesn’t hold. Thus, by Claim 2, sttb[ta] ≤ Ta.
63
DJIT+
Data Race Detection Predicate
P(a,b) ≜ ( a.type = write ⋁ b.type = write ) ⋀ ⋀ ( a.time_frame ≥ stb.thread_id[a.thread_id] )
P gets two accesses, a and b, such that: a and b are in different threads a and b access same shared location a was logged and tested earlier b is currently logged
P returns TRUE iff a and b form a data race
Obviously, very expensive
64
DJIT+ Which Accesses to Check?
We have assumed that there is a logging mechanism, which records all accesses.
Logging all accesses in all threads and testing the predicate P for each pair of them will impose a great overhead on the system.
Actually some of the accesses can be discarded.
65
Claim 3: Consider an access a in thread ta during time frame Ta, and accesses b and c in thread tb=tc during time frame Tb=Tc. Assume that c precedes b in the program order. If a and b are concurrent, then a and c are concurrent as well.
TFa ta tb TFb
Ta
.
.
.
.a
relc.b
Tc
Tb
Ta a....
.relc.b
Tc
Tb
DJIT+ Which Accesses to Check?
66
DJIT+ Which Accesses to Check?
Proof:- Let fb and fc denote the respective values of sttb[ta] when b and c happen. Since sttb[ta] is monotonically increasing, and c precedes b, we know that fb ≥ fc. Since a hb→ b does not hold, we know by Claim 2 that Ta ≥ fb. Thus, Ta ≥ fc and again by Claim 2 we get that a hb→ c is false.- Let fa denote the value of stta[tb] when a happens. Since b hb→ a does not hold, we know by Claim 2 that Tb ≥ fa. Since Tb=Tc we get that Tc ≥ fa. Thus by Claim 2, c hb→ a is false.
67
Thread 1 Thread 2
lock( L )write Xread Xunlock( L )
read X
lock( L )write Xunlock( L )
lock( L )read Xwrite Xwrite Xunlock( L )DR
DJIT+
Which Accesses to Check? - Example
Accesses b and c previously logged in thread t1 in a same time frame
Access b precedes access c in the program order
Access a currently logged in thread t2
If a and b are synchronized, then a and c are synchronized as well
It is sufficient to log and test only the first read access and the first write access to every variable in each time frame!
No logging
b
c
No logging
a
68
Assume that in thread ta an access a is currently being logged and in thread tb we previously logged a write b in time frame Tb and another previous write c in time frame Tc, so that
Tb < Tc.
TFa ta tb TFb
Ta
.
.
.
.
.
.
.
.
.a.
b.
acq.
rel..c...
Tb
Tc
DJIT+ Which Time Frames to
Check?
69
DJIT+ Which Time Frames to
Check? Claim 4: If a forms a data-race with b then it
certainly forms a data-race with c. Proof: Easy, since Tc > Tb ≥ stta[tb].
Either pair a-b or a-c can be considered to be the apparent data-race to be reported.
Also, if there is no data-race between a and c, then there is also no data-race between a and b. Therefore, the a-b pair should not be checked.
70
DJIT+ Which Time Frames to
Check?
For current read access to a shared variable v, it is enough to check it against the last time frame in each of the other threads, which wrote to v.
For current write access to v, it is enough to check it against the last time frame in each of the other threads, which read from v, and the last time frame in each of the other threads, which wrote to v.
Djit+
Access History & Algorithm Each variable v holds for each of the threads:
The last time frames in which they read from v The last time frames in which they wrote to v
w-tfn.........w-tf2w-tf1
r-tfn.........r-tf2r-tf1
V
Time frames of recent writes
to v – one for each thread
Time frames of recent reads
from v – one for each thread
On each first read and first write to v in a time frame every thread updates the access history of v with LTF
If the access to v is a read, the thread checks all recent writes by other threads to v
If the access is a write, the thread checks all recent reads as well as all recent writes by other threads to v
To support weak memory model, thehistory should be atomic and coherent
72
DJIT+ Coherency
In fact, the presented algorithm uses only the coherency assumption on the access history.
Coherency means that: For each variable v there is an agreed-among-all-
threads global order Rv on all accesses to v. The reads always return the most recently written
value. Hence, the algorithm described above is correct
also for weakly ordered systems. E.g., the data-race-free-1 memory
model only requires that in total absence of data-races the program executes as if it was sequentially consistent.
Thread 1 Thread 2
write v1, 1write v2, 2
read v2, 2
read v1, 0
The history is coherent, butnot sequentially consistent.
73
DJIT+ Results
The DJIT algorithm was implemented in several academic systems – Millipede and MultiRace.
No false alarms No missed races in given feasible execution Very sensitive to differences in threads’
scheduling Should be applied each time the program executes
(and not only in debug mode)
Requires enormous number of runs Yet cannot prove that the tested program is race free
74
LocksetLocking Discipline
Lockset detects violations of locking discipline
The locking discipline is a programming policy that ensures total absence of data races
A common and simple locking discipline is that every shared location is consistently protected by the same lock on each access
The main drawback is a possibly excessive number of false alarms
75
LocksetWhat is the Difference?
[1] hb→ [2], yet there is a feasible data-race under different scheduling.
Thread 1 Thread 2
Z = Z + 1;[1]
Lock( m );V = V + 1;Unlock( m );
Lock( m );V = V + 1;Unlock( m );Z = Z + 1;[2]
Thread 1 Thread 2
Z = Z + 1;[1]
Lock( m );Flag = true;Unlock( m );
Lock( m );T = Flag;Unlock( m );if ( T == true ) Z = Z + 1;[2]
No any locking discipline on Y. Yet [1] and [2] are ordered under all possible schedulings.
76
LocksetThe Basic Algorithm
C(v) – the set of all locks that consistently protected v in the execution so far
locks_held(t) – the set of all locks currently acquired by thread t
The algorithm:- For each v, init C(v) to the set of all possible locks- On each access to v by thread t:
- lhvlocks_held(t)- if this is a read, then lhvlhv ∪
{readers_lock}- C(v)C(v) ∩ lhv
- if C(v)=∅, issue a warning
77
LocksetExplanation
The process is called lockset refinement. It ensures that any lock that consistently
protected v is contained in C(v). A lock m is in C(v) if in execution up to that
point, every thread that has accessed v was holding m at the moment of access.
If some lock m consistently protects v, it will remain in C(v) till the termination of the program.
The addition of fake readers_lock lock ensures that concurrent reads are not interpreted as data races.
The first write to v permanently removes readers_lock from C(v).
78
LocksetExample
Thread 1 Thread 2 lhvC(v)
{ } {L1, L2,RL}
Warning:
lockingdisciplinefor v isviolated!!
!
lock( L1 )
read v {L1,RL} {L1,RL}
unlock( L1 ){ }
lock( L2 )
write v {L2} { }unlock( L2 )
{ }
RL = readers_lockprevents from multiple reads to generate false alarms
79
Extended Lockset Which Accesses to Check?
Two accesses, a and b, to v Both in same thread Both in same time frame Access a precedes access b
Then: Locksa(v) ⊆ Locksb(v) Locksu(v) is the set of real
locks acquired by the thread during access u to v
Thread Locksu(v)
unlock…lock(L1)write x[1]
write x [2]
lock(L2)write x [3]
unlock( L2 )unlock( L1 )
{L1}{L1}={L1}
{L1,L2}⊇{L1}
Accesses [1], [2], [3] are all in same time frame
80
Extended Lockset Which Accesses to Check?
It follows that: 1) [C(v) ∩ Locksa(v)] ⊆ [C(v) ∩ Locksb(v)] 2) If C(v) ∩ Locksa(v)≠∅ then C(v) ∩ Locksb(v)≠∅ Only first access in each time frame need
be logged and checked!!! The addition of readers_lock forces us
to check both first read and first write in each time frame
Lockset needs same logging mechanism as Djit+!
81
Extended Lockset Improving Locking Discipline
The locking discipline described above is too strict. There are common programming practices that
violate the discipline, yet are free from data-races: Initialization: Shared variables are usually initialized
without holding any locks. Read-Shared Data: Some shared variables are
written during initialization only and are read-only thereafter.
Barriers: Threads can synchronize through barriers, which are not supported by the notion of locking discipline. For data-race-free programs using barriers only, the basic Lockset will report false alarms on every pair of accesses from different threads.
82
Extended LocksetInitialization
When initializing newly allocated data there is no need to lock it, since other threads can not hold a reference to it yet.
Unfortunately, there is no easy way of knowing when initialization is complete.
Therefore, a shared variable is initialized when it is first accessed by a second thread.
As long as a variable is accessed by a single thread, reads and writes don’t update C(v).
83
Extended LocksetRead-Shared Data
There is no need to protect a variable if it’s initialized once and thereafter is read-only.
To support unlocked read-sharing, the fake readers_lock was added.
Still, some additional mechanism is needed so that the initialization will not permanently remove the readers_lock from C(v).
Note: The fake lock doesn’t prevent from threads to execute the reads concurrently.
84
Extended Lockset Supporting Barriers
Barrier is a global synchronization primitive Locks are 2-way
In order to pass the barrier, all threads must reach it first and only then continue.
Observations: reaching a barrier ≅ starting new execution No races between accesses from different
sides of a barrier Idea – restart Lockset detection each time
barrier is reached by all threads.
85
Extended Lockset Supporting Barriers
Variable v is supposed to be initialized when: It is first accessed by a second thread The thread that first accessed v reaches a barrier
Initializing
Virgin Shared
Empty
Clean
Exclusive
write by first thread
read/write by same thread
read/write by new thread
barrier
barrier
barrier
barrier
barrier
read/write by any threadread/write by
some thread
read/write by any thread,C(v) not empty
read/write by any thread, C(v) is empty
read/write by new thread, C(v) is empty
read by any thread
read/write by new thread,C(v) not empty
barrier
read/write by same thread
Statetransitiondiagramemployedfor eachvariable
86
Extended LocksetStates Explanation
Virgin – The variable is new and have not been referenced by any thread.
Initializing: – The variable is initialized by only one thread. C(v) is not updated in this state.
Shared: – The data is accessed by more than one thread. C(v) is updated on each access.
Empty: – C(v) became empty. Data race warning is announced only the first time this state is reached.
Clean: – Barrier was reached by all threads. C(v) is initialized to hold the set of all possible locks.
Exclusive – Similar to the Initializing state - after reaching the barrier, the variable is accessed by only one thread. It’s supposed to be already initialized. Thus, C(v) is updated on each access, but data race is announced only if another thread accesses v, and C(v) is empty.
87
The refined algorithm will still produce a false alarm in the following simple case:
Thread 1 Thread 2 C(v)
Lock( m1 ); v = v + 1; Unlock( m1 );
Lock( m2 ); v = v + 1; Unlock( m2 );
Lock( m1 ); Lock( m2 ); v = v + 1; Unlock( m2 ); Unlock( m1 );
{m1}
{m1}
{ }
LocksetStill False Alarms
88
LocksetAdditional False Alarms
Additional possible false alarms are: Queue that implicitly protects its elements by
accessing the queue through locked head and tail fields.
Thread that passes arguments to a worker thread. Since the main thread and the worker thread never access the arguments concurrently, they do not use any locks to serialize their accesses.
Privately implemented locks,which don’t communicate with Lockset.
True data races that don’t affectthe correctness of the program(for example Benign races).
if (f == 0)lock(m);if (f == 0)
f = 1;unlock(m);
89
Lockset Results
The basic Lockset was implemented in a full scale testing tool, Eraser, which is used in industry (not “on paper only”).
The extended Lockset was implemented in MultiRace academic system.
Less sensitive to differences in threads’ scheduling Detects a superset of all apparently raced locations
in an execution of a program Possible races can be rarely missed
Our extension for barriers can be used to check programs that employ barriers only and no locks
Still lots of false alarms Still dependent on scheduling
Cannot prove the tested program is race free
S
A
F
L
Combining Djit+ and LocksetAll shared
locations in some program P
All feasibly raced locations in program P
Violations detected by Lockset in
execution E of P
D
Raced locations detected by DJIT+ in
execution E of P
All apparently raced locations in
program P
Lockset can detect suspected races in more execution orders
DJIT+ can filter out the spurious warnings reported by Lockset
Every completed data race is also a locking discipline violation
For many types of programs L tends to cover A – we detect a subset and a superset of all raced locations!!!
The number of checks performed by DJIT+ can be reduced with the help of Lockset
If C(v) is not empty yet, DJIT+ should not check v for races
The implementation overhead comes mainly from the access logging mechanism
Can be shared by both algorithms
91
Dynamic Data-Race DetectionSummary
The solutions are not universal. Not all located apparent data races are feasible. Still requires a large number of runs to check as
much executions paths as possible. Still cannot prove the program to be data race
free. Since slowdowns can be high, a satisfying testing
can take months. Different (or new) types of synchronization might
require different detection techniques. Inserting a detection code in a program can
perturb the threads’ interleaving so that races will disappear (less sensitive in Lockset).
Maybe to combine with some static analysis? Maybe better approximations can be found...?
92
The End
93
References
S. Adve and M. D. Hill. A Unified Formalization of Four Shared-Memory Models. Technical Report, University of Wisconsin, Sept. 1992.
A. Itzkovitz, A. Schuster, and O. Zeev-Ben-Mordechai. Towards Integration of Data Race Detection in DSM System. In The Journal of Parallel and Distributed Computing (JPDC), 59(2): pp. 180-203, Nov. 1999
L. Lamport. Time, Clock, and the Ordering of Events in a Distributed System. In Communications of the ACM, 21(7): pp. 558-565, Jul. 1978
F. Mattern. Virtual Time and Global States of Distributed Systems. In Parallel & Distributed Algorithms, pp. 215 226, 1989.
94
ReferencesCont.
R. H. B. Netzer and B. P. Miller. What Are Race Conditions? Some Issues and Formalizations. In ACM Letters on Programming Languages and Systems, 1(1): pp. 74-88, Mar. 1992.
R. H. B. Netzer and B. P. Miller. On the Complexity of Event Ordering for Shared-Memory Parallel Program Executions. In 1990 International Conference on Parallel Processing, 2: pp. 93 97, Aug. 1990
R. H. B. Netzer and B. P. Miller. Detecting Data Races in Parallel Program Executions. In Advances in Languages and Compilers for Parallel Processing, MIT Press 1991, pp. 109-129.
95
ReferencesCont.
E. Pozniansky. Efficient On-The-Fly Data Race Detection in Multithreaded C++ Programs. Research Thesis, May 2003.
S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T.E. Anderson. Eraser: A Dynamic Data Race Detector for Multithreaded Programs. In ACM Transactions on Computer Systems, 15(4): pp. 391-411, 1997
O. Zeev-Ben-Mordehai. Efficient Integration of On-The-Fly Data Race Detection in Distributed Shared Memory and Symmetric Multiprocessor Environments. Research Thesis, May 2001.