Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Preview:

Citation preview

Scalable and Precise Dynamic Datarace Detection for Structured Parallelism

Presented by Yunming ZhangRice University

Paper by Raghavan Raman, Jisheng Zhao, Vivek Sarkar, Martin Vechev, Eran Yahav

1

Tuesday, October 22, 13

Background

• Problems

• sequential

• large space overhead

• costly read-write checks

2

Tuesday, October 22, 13

Goals

• Design a datarace detector

• works in parallel

• constant space per monitored memory location

• precise and sound

• no false positives for a given input

• if one or more races exist, at least one will be reported

3

Tuesday, October 22, 13

Structured Parallelism

• Express a wide range of parallel programs succinctly with a few parallel constructs

• Exploit structured parallelism for better performance through scheduling

• Provide guarantee of deadlock-freedom

4

Tuesday, October 22, 13

Structured Parallelism• Cilk

• spawn/sync model

• requires child/children to synchronize with the parent

• Habanero Java

• async/finish model

• requires descendants to synchronize with some ancestor

5

Tuesday, October 22, 13

Structured Parallelismint main() {

Node * node = initialize node walk(node);

}ReducerList<Node *> output_list;void walk(Node *x){ if (x) { if (has_property(x)) output_list.push_back(x);

cilk_spawn walk(x->left); walk(x->right); cilk_sync; }}

1

2 7

3 6

4 5

8 9

a

c

b

d

6

Tuesday, October 22, 13

Structured Parallelismint main() {

Node * node = initialize node finish { walk(node); }

}ReducerList<Node *> output_list;void walk(Node *x){ if (x) { if (has_property(x)) output_list.push_back(x);

async walk(x->left); async walk(x->right); cilk_sync; }}

1

2 7

3 6

4 5

8 9

a

7

Tuesday, October 22, 13

Race Detector Components

• Data structure to determine whether two tasks may execute in parallel

• Cilk uses SP bags

• This paper: Dynamic Program Structure Tree

• Data structure that keeps track of accesses to the same memory location that may result in data race

• Cilk records one read and one write

8

Tuesday, October 22, 13

Race Detector Components

• Data structure to determine whether two tasks may execute in parallel

• Cilk uses SP bags

• This paper: Dynamic Program Structure Tree

• Data structure that keeps track of accesses to the same memory location that may result in data race

• Cilk records one read and one write

9

Tuesday, October 22, 13

Dynamic Program Structure Tree (DPST)• Node types

• async

• finish

• step: a maximal sequence of statements without any async or finish operation

• Edges between parent and child tasks

• Children ordered from left to right

10

Tuesday, October 22, 13

Dynamic Program Structure Treefinish{

async{int i = 0;int j = 1;int k = i*j;...

}}

step node

async nodefinish node

11

Tuesday, October 22, 13

Dynamic Program Structure Tree

finish{//F1step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

F1F1

12

Tuesday, October 22, 13

Dynamic Program Structure Tree

finish{step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

F1F1

S1

13

Tuesday, October 22, 13

Dynamic Program Structure Tree

finish{step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

F1F1

S1 A1

14

Tuesday, October 22, 13

Dynamic Program Structure Tree

finish{step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

F1F1

S1 A1

S2

15

Tuesday, October 22, 13

Dynamic Program Structure Tree

finish{step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

F1F1

S1 A1

S2 A2

16

Tuesday, October 22, 13

Dynamic Program Structure Tree

finish{step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

F1F1

S1 A1

S2 A2

S3

17

Tuesday, October 22, 13

Dynamic Program Structure Tree

finish{step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

F1F1

S1 A1

S2 A2

S3

S4

S5

18

Tuesday, October 22, 13

Dynamic Program Structure Tree

finish{step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

F1F1

S1 A1

S2 A2

S3

S4

S5 A3

19

Tuesday, October 22, 13

Dynamic Program Structure Tree

finish{step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

F1F1

S1 A1

S2 A2

S3

S4

S5 A3

S6

20

Tuesday, October 22, 13

Dynamic May Happen in Parallel (DMHP)• Need to determine if two tasks can

execute in parallel using DPST

• Theorem 1. Consider two leaf nodes (steps) S1 and S2 in a DPST, where S1 ≠ S2 and S1 is to the left of S2. S1 and S2 can execute in parallel if and only if the ancestor of S1 that is the child of LCA(S1,S2) is an async node.

21

Tuesday, October 22, 13

Can happen in parallel

F1F1

S1 A1

S2 A2

S3

S4

S5 A3

S6

Lowest Common Ancestor LCA(S3, S4)

A1

Ancestor of S3, Child of LCA(S3, S4)

A2

22

Tuesday, October 22, 13

Can happen in parallel

F1F1

S1 A1

S2 A2

S3

S4

S5 A3

S6

23

finish{//F1step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

Tuesday, October 22, 13

Cannot happen in parallel

F1F1

S1 A1

S2 A2

S3

S4

S5 A3

S6

Lowest Common Ancestor LCA(S2,S3)

A1

Ancestor of S2, Child of LCA(S2,S3)

S2

24

Tuesday, October 22, 13

Cannot happen in parallel

F1F1

S1 A1

S2 A2

S3

S4

S5 A3

S6

25

finish{//F1step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

Tuesday, October 22, 13

Race Detector Components

• Data structure to determine whether two tasks may execute in parallel

• Cilk uses SP bags

• This paper: Dynamic Program Structure Tree

• Data structure that keeps track of accesses to the same memory location that may result in data race

• Cilk records one read and one write

26

Tuesday, October 22, 13

Access Summary

• Maintains access summary in “shadow memory”

• for each data word m, an access summary is maintained in S(m)

• S(m) contains

• w: a reference to a step that wrote m

• r1: a reference to a step that read m

• r2: a reference to another step that read m

27

Tuesday, October 22, 13

Access Summary

• Last write is straightforward

• Among the reads to a memory location, (a1, a2, ... ak) Only necessary to store two reads

• ai, aj , such that the subtree under LCA(ai, aj) includes all past reads

• any future read an that is in parallel with any prior read will be in parallel with at least one of ai or aj

28

Tuesday, October 22, 13

Race Detection Algorithm

29

Tuesday, October 22, 13

Race Detection Algorithm

If replacing one ofthe reads with my step yields a higher LCA in DPST, then do so

30

Tuesday, October 22, 13

Why two readsint[] x = new int[2];x[0] = 0;finish{//F1

async {//A1async {//A2

... = x[0]; //S1}... = x[0]; //S2

}async {//A3

... = x[0]; //S3}

}//F1

F1

A1

S1

S2

A3

S3A2

31

Tuesday, October 22, 13

Why two readsint[] x = new int[2];x[0] = 0;finish{//F1

async {//A1async {//A2

... = x[0]; //S1}... = x[0]; //S2

}async {//A3

... = x[0]; //S3}

}//F1

F1

A1

S1

S2

A3

S3A2

F1

A3

S3A2

32

Tuesday, October 22, 13

Why two reads

M.r1 = S1M.r2 = S2

F1

A1

S1

S2

A3

S3A2

F1

A3

S3A2

33

Tuesday, October 22, 13

Why two readsint[] x = new int[2];x[0] = 0;finish{//F1

async {//A1async {//A2

... = x[0]; //S1}... = x[0]; //S2

}async {//A3

... = x[0]; //S3}

}//F1

F1

A1

S2

A3

S3A2

S1

A3

A2

A3

A2 S2

34

Tuesday, October 22, 13

Why two readsIf replacing one ofthe reads with my step yields a higher LCA in DPST, then do so

M.r1 = S1M.r2 = S2S = S3

F1

A1

S2

A3

S3A2

S1

A3

A2

A3

A2

35

Tuesday, October 22, 13

Why two reads

lca(S3,S1) > DPSTlca(S1,S2) lca(S3,S2) > DPSTlca(S1,S2)

M.r1 = S1M.r2 = S2M.r2 = S3

F1

A1

S2

A3

S3A2

S1

A3

A2

A3

A2

36

Tuesday, October 22, 13

Keys to Performance

• Construct the DPST in parallel

• Relax Atomicity when updating access summary

37

Tuesday, October 22, 13

Space and Time Complexity

• Definitions:

• T1: sequential execution time

• v: total number of variables

• n: number of nodes in the DPST

• Time Complexity

• O(T1* T∞)

• Space Complexity

• O(v + n)

38

Tuesday, October 22, 13

Precise and Sound• Precise

• if Algorithm reports a data race on a memory location M during an execution of a program P with an input, then there exists at least one execution of P with the input in which data race exists

• Sound

• if Algorithms do not report a data race on a memory location M during an execution of a program P with an input, then no execution with the input will report data race

39

Tuesday, October 22, 13

Performance Evaluation

40

Tuesday, October 22, 13

Performance Evaluation

41

Tuesday, October 22, 13

Performance Evaluation

42

Tuesday, October 22, 13

Performance Evaluation

43

Tuesday, October 22, 13

Conclusions• Scalable

• parallel data race detection

• trading off parallelism for efficient DMHP mechanism and results show it is worth it

• Modest memory overhead

• store 3 access history per memory location

• Eraser maintained large lock sets and FastTrack uses vector clocks

• Precise and sound

44

Tuesday, October 22, 13

Recommended