D ATA RACE DETECTION : PRELIMINARIES AND SURVEY Based on Nels E. Beckman paper : A Survey of Methods for Preventing Race Conditions Dolev Felman 1

1

DATA RACE DETECTION: PRELIMINARIES AND SURVEY

Based on Nels E. Beckman paper: A Survey of Methods for Preventing Race Conditions

Dolev Felman

2

WHAT IS A RACE CONDITION? By definition of Henzinger . “a race occurs when two threads can access

(read or write) a data variable simultaneously, and at least one of the two accesses is a write

No synchronization to separate accesses for example X=X+1; For this the process needs to :

Retrieve the value of x Add 1 to this value Store this value to x

3

WHY IS DATA RACES A PROBLEM

Data races are among the most common and hardest to debug types of bugs in concurrent systems, why ? It is a non-deterministic bug

Corrupt the variable data but the error won’t crash the system immediately.

programmers have a hard time reasoning about the correctness of code

4

SO HOW WE ARE GOING TO DEAL WITH IT?

To help diagnose such bugs we will see 4 techniques:

Race-Free Type Systems Dynamic and Hybrid Race Detectors Model-Checking Flow-Based Race Analysis

5

HOW ARE WE GOING TO EXAMINED COMPARE AND THESE TECHNIQUES:

ease of use annotations expressiveness scalability

soundness precision.

6

Race-Free Type Systems

Dynamic and Hybrid Race Detectors

Model-Checking

Flow-Based Race Analysis

7

RACE-FREE TYPE SYSTEMS

Language-based mechanisms for eliminating race conditions from software

we will focus on the type systems Why ?

Most of the languages today possess type systems

language’s type system is a fundamental part of the language itself

Helps eliminate wrong behaviors during compilation

8

RACE-FREE TYPE SYSTEMS PROS

offer us code that is race-free by its very construction

strong assurance – soundness lack of any performance penalty that might

be associated with instrumenting code

9

RACE-FREE TYPE SYSTEMS CONS

do the languages based upon race-free type systems allow us the same expressiveness that we are used to in more popular languages?

are there additional annotation burdens associated with these new type systems?

10


The idea at the heart of these type systems is that shared data must be protected with a lock in order to prevent race conditions.

part of the type is the name of the lock that must be used to protect it.

That why we can then check that the shared data is enclosed in code that acquires that specific lock

11


We want files and classes that make up a software system must be separately compile-able

We use effects clause- part of a function interface that acts as a summary of that function’s effects.

In this case, those effects are of the sort, “my caller must possess lock X”.

12

THIS CODE COMES FROMAN EXAMPLE STACK CLASS.

class TNode<thisOwner,TOwner> {T<TOwner> value guarded_by thisOwner;TNode<thisOwner,TOwner> next guarded_by thisOwner;T<TOwner> pop() requires (thisOwner) {return this.value; }

{class T<thisOwner> {int x guarded_by thisOwner = 0;}http://www3.cs.stonybrook.edu/~stoller/papers/PPoPP-2005-types.pdf

TOwner

effects clause

Lock type name

13

RACE-FREE TYPE SYSTEMS ALGORITHM

1. Add all locks listed in the effects clause to the current set of held locks and begin to step through method statements.

2. When a locking statement is encountered add that particular lock to the set of held locks.

3. When encountering a variable dereference, look up that variable’s type (which contains the lock that must be used to protect it) and verify that that lock exists in the set of held locks. (If not, signal an error.)

14

HOW THESE TYPE SYSTEM APPROACH MENTIONED SHARED FIELDS OR VARIABLES

A naive type system might assume that every variable could potentially be shared.

This would require unnecessary locking For that reason :

type system must be informed that a variable is thread-local.

Some add readonly type

15

FOR CONCLUSION Pros:

All of these type systems are sound in that they will not miss any potential race conditions

offer us code that is race-free Cons:

expressiveness :Limit the programs that can be written protecting shared data with locks only

Annotations :impose an additional burden on the programmer.

16



Model-Checking


17

DYNAMIC AND HYBRID RACE DETECTORS

it is more desirable to find software defects before run-time using static analysis tools

static analysis tools are forced to be conservative and produce more false positives

dynamic analysis tools can use very intimate knowledge about the runtime behavior in order to increase precision.

18

DYNAMIC AND HYBRID RACE DETECTORS

There are two major tools that are used:

Lockset

Happens-before

Modern dynamic race detection tools work by using both of these techniques at the same time

19

LOCKSET

the assumption is that race conditions occur because of shared variables that are mistakenly not protected by an appropriate lock.1. ∀ shared location v keep C(v) - set of

candidate locks, initially set to contain all locks.

2. ∀ accesses to v set C(v) = C(v)∩ set of currently held locks (lock refinement step)

3. If C(v) = ∅, show a data race warning

https://people.cs.umass.edu/~emery/classes/cmpsci691w-spring2006/scribe/lecture8-scribe.pdf

20

LOCKSET ANALYSIS

lets imagine we are watching this program execute

Lock(L1)Lock(L2)

X;++Y=5;

unlock(L1)unlock(L2)

21

LOCKSET ANALYSIS

Whenever a lock is acquired, add that to the set of “held locks.”

Held

Locks:L1L2

Lock(L1)Lock(L2)

X;++Y=5;

22

LOCKSET ANALYSIS

remove locks when they are released.

Held

Locks:L1

Lock(L1)Lock(L2)

X;++Y=5;

unlock(L2)

23

LOCKSET ANALYSIS

In the beginning a variable set to each shared variable its “candidate set” to be all locks

...int X

...

Candidate Set X:L1

L2

24

LOCKSET ANALYSIS

When that variable is accessed, take the intersection of the candidate set and the set of currently held locks

...If(X==0){

...

Held

Locks:L1

Candidate Set X:

L1L2

∩

25

LOCKSET ANALYSIS

If the intersection is empty, flag a potential race condition

Held

Locks:

Candidate Set X:

L1∩

...X;++...

Race!!

26

LOCKSET PROS AND CONS

Locks aren’t the only method for synchronization and aren’t always necessary

poor precision :gives false-positives for example

finds more races then happens-before based tools

27

HAPPENS-BEFORE 2 Out of two sequential events in the same thread the

earlier one is said to have happened before the latter one.

An unlock operation in one thread is said to have happened before a lock operation in a different thread if those operations referred to the same lock

Detection depends on scheduler-controlled interleaving of events to elicit races - hence a high false negative rate.

https://people.cs.umass.edu/~emery/classes/cmpsci691w-spring2006/scribe/lecture8-scribe.pdf

28

HAPPENS-BEFORE a pair of events (ei ; ej ) are related if

communication between processes allows in formation to be transmitted from ei to ej ;

If ei and ej are events in the same thread, and ei comes before ej , then i -> j.

If ei is the sending of message g and ej is the reception of g then i ->j.

ei = SND(g; t1) ^ ej = RCV(g; t2) =) i -> j

http://www.cs.columbia.edu/~junfeng/10fa-e6998/papers/hybrid.pdf

29

HAPPENS-BEFORE When a thread t1 writes to a shared memorylocation, we generate a fresh message g and follow the MEM(m; WRITE; t1) with a SND(g; t1). Each time a thread t2 subsequently reads or writes m, we generate an event RCV(g; t2)

,after a thread t1 releases a lock, we generate SND(g; t1), and the next thread t2 to acquire the lock first generates RCV(g; t2 ).


30

HAPPENS-BEFORE we say that a potential race has occurred if we observe

two distinct events ei and ej that access the same memory location, where at least one event is a write, and neither i happens-before j nor j happens before i .

IsPotentialHBRace(i; j) = ei = MEM(mi ; ai ; ti) && ej = MEM(mj ; aj ; tj )&& ti != tj && mi = mj && (ai = WRITE || aj = WRITE)^&& !(i - > j) ^ !(j - > i)


31

Thread A

X++

lock(L);

y++

unlock(L);

Thread B

lock(L);

y++

unlock(L);

X++

32

Thread A

X++

lock(L);

y++

unlock(L);

Thread B

lock(L);

y++

unlock(L);

X++

Raceneither i happens-before j

nor j happens before i

33

HOW DO WE IMPLEMENT THE IDEA

The happens-before relation can be computed using vector clocks.

Each thread t1 maintains a vector clock indexed by thread IDs;

We also assign a vector clock to each message(shared memory , lock), which captures the vector clock state of the sending thread at the time the message was sent.


34

Each event has a vector clock V C[i]++ at each “local computation” and “send”

event sending a message, vector clock value V is

attached to the message. At each “receive” event, C = max(C, V) C[i]++

Event X happens before event Y if for every location i VC x(i) <= VC y(i)

35

000

000

000

A

B

C001

011

021

121

221

031

DATA RACE!

A: 0 < 2B:3 > 2

HYBRID USING BOTH TECHNIQUES

Maintaining vector clocks for every shared memory location and every lock is too expensive in practice.

Detection depends on scheduler-controlled interleaving of events to elicit races - hence a high false negative rate.

Therefore we have implemented a hybrid race detector which combines lockset-based detection with a limited form of happens-before detection.


36

37

USING BOTH TECHNIQUES

A shared variable will start off using the lockset analysis, which is in general cheaper to perform and catches more potential races. If, however, a race condition is detected for that object ,

it will step to perform a happens-before analysis for subsequent memory accesses in order to verify the validity of that race condition.

We move from a more sound form to a complete form after race conditions are detected

38

FOR CONCLUSION since there is no guarantee that even a real

race condition will reoccur, this only allows the tools to better rank which race conditions are most likely to be bugs

The results themselves are neither sound nor complete.

the general problem of dynamic detectors of any kind is that you have to actually run your program for them to do any good

39



Using Model-Checking


40

USING MODEL-CHECKING TO DETECT RACE CONDITIONS

Idea: explore every possible execution path for all possible variable values in order to determine if certain undesirable behaviors might occur.

the naive way to extend model-checking would be to encode all possible thread inter-leavings into the model itself

This would lead to a combinatorial explosion as all possible control paths were considered for each of all possible thread inter-leavings.

41

USING MODEL-CHECKING TO DETECT RACE CONDITIONS

Therefore, the primary challenge of using model-checking as a tool to find a model of the system that can be explored in a reasonable amount of time.

we discuss the concept of state-less search and persistent sets as a way to reduce the overall search space.

42

STATE-LESS SEARCH

form of model-checking where there is no “memory” of the states that have already been visited

doesn't keep history of the states that where visited in the past.

these are non thread-local operations.

43

PERSISTENT SETS / PARTIAL-ORDER Persistent sets are sets of transitions leaving a

state that are independent from every sequence of transitions leaving that state which are not in the persistent set.

The independence of a transition T from a sequence of transitions S implies that the resulting system will be the same no matter whether T or S is taken first.

Both inter-leavings are identical and therefore do not need to be explored separately.

Persistent sets can be used as a justification for not exploring all possible inter-leavings of state transitions

44

Naive stateless model checking:No. of explored executions = (4+4)!/(4!)2 = 70

No. of threads = nNo. of steps executed by each thread = kNo. of executions = (nk)! / (k!)^n

https://www.cs.uoregon.edu/research/summerschool/summer06/lectures/Qadeer061.pdf

T1int x = 0;

x;++g;++

x;++g;++

int g = 0;

T2int y = 0;

y;++g;++

y;++g;++

45

An access to x by T1 is invisible to T2.

T1: x++ T2

Unnecessary to explore this transition

An access to y by T2 is invisible to T1.

T1 T2: y++Unnecessary to explore this transition

https://www.cs.uoregon.edu/research/summerschool/summer06/lectures/Qadeer061.pdf

46

T1int x = 0;

x;++g;++

x;++g;++

int g = 0;

T2int y = 0;

y;++g;++

y;++g;++

Without partial-order reduction:No. of explored executions = (4+4)!/(4!)2 = 70

With partial-order reduction:No. of explored executions = (2+2)!/(2!)2 = 6https://www.cs.uoregon.edu/research/summerschool/summer06/lectures/Qadeer061.pdf

47

There are couple of algorithm to use the model- state machine Stoller - use the lockset algorithm on the

Henzinger - search the state space for a state where amongst multiple transitions existed reads and writes by different threads to the same variable.

48

STOLLER

uses the lockset algorithm as previously discussed in section3 Since the model is essentially simulating the all possible executions of a program.False positives will occur if a strict locking discipline is not observed. On the bright side, his model includes an initialization phase where variables are as of yet unshared and therefore are not required to be protected by a lock

49

HENZINGER

The system has one abstract reachability graph (ARG) for the “main” thread which keeps track of local and global variables. The context represents every other thread in the system

Each state in the context has a counter that keeps track of the number of threads in each abstract state

local variables are not tracked on this graph

50

1. Initially the context is empty and there are no predicates.

2. Using the current context and predicates, construct an ARG from the combination of control flow automaton and the current context

3. Stop when an error state is found. An error state is one where the same variable can follow either a transition where a variable is read or another transition where that same variable is written to

51

4. the system determines if the path taken was actually feasible. If it was, an error is signaled, if not new predicates are inferred

5. If an error was signaled in the previous step, we must now guarantee that the ACFA was in fact a sound approximation. If it was, we have a genuine example of a race condition. If not, then we refine the context, yielding a more accurate approximation, and then rerun the process.

52

PROS

The technique is sound as long as it terminates.

seems to be much more precise than other techniques we have explored here.

instead of detecting violations of the locking discipline that can be used to prevent race conditions here we are actually detecting the race conditions themselves.

53

CONS

The System as described only works for basic data types

The system could not tell if two pointers referenced the same object, and therefore could not tell if a race was occurring or not

54



Using Model-Checking to Detect Race Conditions


55

FLOW-BASED RACE ANALYSIS STATIC ANALYSIS

RacerX seems to be the best prepared to be run in an actual software development environment.RacerX involves five phases:

1. Retargeting a system to system-specific locking function

2. Extracting a control flow graph from the system3. Analysis4. Ranking errors5. Inspection

http://web.stanford.edu/~engler/racerx-sosp03.pdf

56

1. Retargeting a system to system-specific locking function Users supply a table specifying the functions used to

acquire/release locks, and disable/enable interrupts. Users may optionally specify a function is single-

threaded, multi-threaded, or interrupt handler

2. Extracting a control flow graph from the system The tool extracts a CFG from the system and stores it in a

file. The CFG contains all function calls, uses of global

variables, uses of parameter pointer variables, and optionally uses of all local variables, concurrency operations.

The CFG includes the symbolic information for these objects, such as their names, types, whether an access is read or write, whether a variable is a parameter or not, whether a function or variable is static or not, the line number, etc.


57

3. Analysis The tool reads the emitted CFG and constructs a

linked whole system CFG. And traverse the whole system CFG checking for data races

The traversal is DFS, flow-sensitive, and it tracks the set of locks held at any point.

At each program statement, the race checker are passed the current lockset.

4. Ranking errors Compute ranking information for error messages Ranking sorts error messages based on two

features: the likelihood of being false positive, and the difficulty of inspection

5. Inspection Present the ranked error messages to users


58

Ease of Use Annotations

None or a constant number that give immediate precision improvements.

Expression Non-lock based idioms are 'hard-coded' by heuristics.

Scalability More than any other. Linux, FreeBSDlarge commercial system

59

Soundness Not sound in a few specific ways. Ability to detect some false negative.

Precision Fewer false positives than traditional lockset tools.

60

OTHER FLOW-BASED TOOLS

Some Rely on Alias Analysis Seem to be fundamentally hard problems to

solve Still Many False Positives May not Scale

Some Rely on Programmer Annotations to distinguish all the hard cases May impose programmer burden

61

LETS COMER ALL THE TECHNIQUES

62

ANNOTATIONS

Race-Free Type Systems Annotations are a major limiting factor.

Dynamic Tools Unnecessary

Model-Checking Unnecessary

Flow-Based Analysis Necessary in some form or another

63

EXPRESSION

Race-Free Type SystemsLimited to strict locking discipline.

Dynamic ToolsThanks to combination of lockset and happens-before, relative freedom.

Model-CheckingCan allow great expression (Depends on

technology). Flow-Based Analysis

Expression can be traded for soundness or annotations.

64

SCALABILITY

Race-Free Type Systems Scalability Limited by Annotations

Dynamic Tools work just as well on large applications as on

small. However, there is a memory and performance overhead associated with dynamic detectors.

Model-Checking Not extremely scalable. Depends highly on

number of processes. Flow-Based Analysis

Has shown the best scalability.

65

SOUNDNESS

Race-Free Type Systems Sound

Dynamic Tools Fundamentally unsound; but lockset will catch

most possible races in execution. Model-Checking

Also sound. May not terminate. Flow-Based Analysis

Different techniques trade soundness for precision.

66

PRECISION

Race-Free Type Systems Low precision. produce numerous false-positives.

Dynamic Tools Better precision.

Model-Checking Can be very high. Not complete but due to the

undecidability of determining reachability, it may never terminate.

Flow-Based Analysis High precision using an engineering approach.

67

AND

Documents

D ATA RACE DETECTION : PRELIMINARIES AND SURVEY Based on Nels E. Beckman paper : A Survey of Methods for Preventing Race Conditions Dolev Felman 1