44
Scalable and Precise Dynamic Datarace Detection for Structured Parallelism Presented by Yunming Zhang Rice University Paper by Raghavan Raman, Jisheng Zhao,Vivek Sarkar, Martin Vechev, Eran Yahav 1 Tuesday, October 22, 13

Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Embed Size (px)

Citation preview

Page 1: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Scalable and Precise Dynamic Datarace Detection for Structured Parallelism

Presented by Yunming ZhangRice University

Paper by Raghavan Raman, Jisheng Zhao, Vivek Sarkar, Martin Vechev, Eran Yahav

1

Tuesday, October 22, 13

Page 2: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Background

• Problems

• sequential

• large space overhead

• costly read-write checks

2

Tuesday, October 22, 13

Page 3: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Goals

• Design a datarace detector

• works in parallel

• constant space per monitored memory location

• precise and sound

• no false positives for a given input

• if one or more races exist, at least one will be reported

3

Tuesday, October 22, 13

Page 4: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Structured Parallelism

• Express a wide range of parallel programs succinctly with a few parallel constructs

• Exploit structured parallelism for better performance through scheduling

• Provide guarantee of deadlock-freedom

4

Tuesday, October 22, 13

Page 5: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Structured Parallelism• Cilk

• spawn/sync model

• requires child/children to synchronize with the parent

• Habanero Java

• async/finish model

• requires descendants to synchronize with some ancestor

5

Tuesday, October 22, 13

Page 6: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Structured Parallelismint main() {

Node * node = initialize node walk(node);

}ReducerList<Node *> output_list;void walk(Node *x){ if (x) { if (has_property(x)) output_list.push_back(x);

cilk_spawn walk(x->left); walk(x->right); cilk_sync; }}

1

2 7

3 6

4 5

8 9

a

c

b

d

6

Tuesday, October 22, 13

Page 7: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Structured Parallelismint main() {

Node * node = initialize node finish { walk(node); }

}ReducerList<Node *> output_list;void walk(Node *x){ if (x) { if (has_property(x)) output_list.push_back(x);

async walk(x->left); async walk(x->right); cilk_sync; }}

1

2 7

3 6

4 5

8 9

a

7

Tuesday, October 22, 13

Page 8: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Race Detector Components

• Data structure to determine whether two tasks may execute in parallel

• Cilk uses SP bags

• This paper: Dynamic Program Structure Tree

• Data structure that keeps track of accesses to the same memory location that may result in data race

• Cilk records one read and one write

8

Tuesday, October 22, 13

Page 9: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Race Detector Components

• Data structure to determine whether two tasks may execute in parallel

• Cilk uses SP bags

• This paper: Dynamic Program Structure Tree

• Data structure that keeps track of accesses to the same memory location that may result in data race

• Cilk records one read and one write

9

Tuesday, October 22, 13

Page 10: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Dynamic Program Structure Tree (DPST)• Node types

• async

• finish

• step: a maximal sequence of statements without any async or finish operation

• Edges between parent and child tasks

• Children ordered from left to right

10

Tuesday, October 22, 13

Page 11: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Dynamic Program Structure Treefinish{

async{int i = 0;int j = 1;int k = i*j;...

}}

step node

async nodefinish node

11

Tuesday, October 22, 13

Page 12: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Dynamic Program Structure Tree

finish{//F1step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

F1F1

12

Tuesday, October 22, 13

Page 13: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Dynamic Program Structure Tree

finish{step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

F1F1

S1

13

Tuesday, October 22, 13

Page 14: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Dynamic Program Structure Tree

finish{step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

F1F1

S1 A1

14

Tuesday, October 22, 13

Page 15: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Dynamic Program Structure Tree

finish{step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

F1F1

S1 A1

S2

15

Tuesday, October 22, 13

Page 16: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Dynamic Program Structure Tree

finish{step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

F1F1

S1 A1

S2 A2

16

Tuesday, October 22, 13

Page 17: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Dynamic Program Structure Tree

finish{step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

F1F1

S1 A1

S2 A2

S3

17

Tuesday, October 22, 13

Page 18: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Dynamic Program Structure Tree

finish{step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

F1F1

S1 A1

S2 A2

S3

S4

S5

18

Tuesday, October 22, 13

Page 19: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Dynamic Program Structure Tree

finish{step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

F1F1

S1 A1

S2 A2

S3

S4

S5 A3

19

Tuesday, October 22, 13

Page 20: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Dynamic Program Structure Tree

finish{step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

F1F1

S1 A1

S2 A2

S3

S4

S5 A3

S6

20

Tuesday, October 22, 13

Page 21: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Dynamic May Happen in Parallel (DMHP)• Need to determine if two tasks can

execute in parallel using DPST

• Theorem 1. Consider two leaf nodes (steps) S1 and S2 in a DPST, where S1 ≠ S2 and S1 is to the left of S2. S1 and S2 can execute in parallel if and only if the ancestor of S1 that is the child of LCA(S1,S2) is an async node.

21

Tuesday, October 22, 13

Page 22: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Can happen in parallel

F1F1

S1 A1

S2 A2

S3

S4

S5 A3

S6

Lowest Common Ancestor LCA(S3, S4)

A1

Ancestor of S3, Child of LCA(S3, S4)

A2

22

Tuesday, October 22, 13

Page 23: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Can happen in parallel

F1F1

S1 A1

S2 A2

S3

S4

S5 A3

S6

23

finish{//F1step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

Tuesday, October 22, 13

Page 24: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Cannot happen in parallel

F1F1

S1 A1

S2 A2

S3

S4

S5 A3

S6

Lowest Common Ancestor LCA(S2,S3)

A1

Ancestor of S2, Child of LCA(S2,S3)

S2

24

Tuesday, October 22, 13

Page 25: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Cannot happen in parallel

F1F1

S1 A1

S2 A2

S3

S4

S5 A3

S6

25

finish{//F1step1async{//A1

step2async{//A2

step3}step4

}step5async{//A3

step6}

}//finish F1

Tuesday, October 22, 13

Page 26: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Race Detector Components

• Data structure to determine whether two tasks may execute in parallel

• Cilk uses SP bags

• This paper: Dynamic Program Structure Tree

• Data structure that keeps track of accesses to the same memory location that may result in data race

• Cilk records one read and one write

26

Tuesday, October 22, 13

Page 27: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Access Summary

• Maintains access summary in “shadow memory”

• for each data word m, an access summary is maintained in S(m)

• S(m) contains

• w: a reference to a step that wrote m

• r1: a reference to a step that read m

• r2: a reference to another step that read m

27

Tuesday, October 22, 13

Page 28: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Access Summary

• Last write is straightforward

• Among the reads to a memory location, (a1, a2, ... ak) Only necessary to store two reads

• ai, aj , such that the subtree under LCA(ai, aj) includes all past reads

• any future read an that is in parallel with any prior read will be in parallel with at least one of ai or aj

28

Tuesday, October 22, 13

Page 29: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Race Detection Algorithm

29

Tuesday, October 22, 13

Page 30: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Race Detection Algorithm

If replacing one ofthe reads with my step yields a higher LCA in DPST, then do so

30

Tuesday, October 22, 13

Page 31: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Why two readsint[] x = new int[2];x[0] = 0;finish{//F1

async {//A1async {//A2

... = x[0]; //S1}... = x[0]; //S2

}async {//A3

... = x[0]; //S3}

}//F1

F1

A1

S1

S2

A3

S3A2

31

Tuesday, October 22, 13

Page 32: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Why two readsint[] x = new int[2];x[0] = 0;finish{//F1

async {//A1async {//A2

... = x[0]; //S1}... = x[0]; //S2

}async {//A3

... = x[0]; //S3}

}//F1

F1

A1

S1

S2

A3

S3A2

F1

A3

S3A2

32

Tuesday, October 22, 13

Page 33: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Why two reads

M.r1 = S1M.r2 = S2

F1

A1

S1

S2

A3

S3A2

F1

A3

S3A2

33

Tuesday, October 22, 13

Page 34: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Why two readsint[] x = new int[2];x[0] = 0;finish{//F1

async {//A1async {//A2

... = x[0]; //S1}... = x[0]; //S2

}async {//A3

... = x[0]; //S3}

}//F1

F1

A1

S2

A3

S3A2

S1

A3

A2

A3

A2 S2

34

Tuesday, October 22, 13

Page 35: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Why two readsIf replacing one ofthe reads with my step yields a higher LCA in DPST, then do so

M.r1 = S1M.r2 = S2S = S3

F1

A1

S2

A3

S3A2

S1

A3

A2

A3

A2

35

Tuesday, October 22, 13

Page 36: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Why two reads

lca(S3,S1) > DPSTlca(S1,S2) lca(S3,S2) > DPSTlca(S1,S2)

M.r1 = S1M.r2 = S2M.r2 = S3

F1

A1

S2

A3

S3A2

S1

A3

A2

A3

A2

36

Tuesday, October 22, 13

Page 37: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Keys to Performance

• Construct the DPST in parallel

• Relax Atomicity when updating access summary

37

Tuesday, October 22, 13

Page 38: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Space and Time Complexity

• Definitions:

• T1: sequential execution time

• v: total number of variables

• n: number of nodes in the DPST

• Time Complexity

• O(T1* T∞)

• Space Complexity

• O(v + n)

38

Tuesday, October 22, 13

Page 39: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Precise and Sound• Precise

• if Algorithm reports a data race on a memory location M during an execution of a program P with an input, then there exists at least one execution of P with the input in which data race exists

• Sound

• if Algorithms do not report a data race on a memory location M during an execution of a program P with an input, then no execution with the input will report data race

39

Tuesday, October 22, 13

Page 40: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Performance Evaluation

40

Tuesday, October 22, 13

Page 41: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Performance Evaluation

41

Tuesday, October 22, 13

Page 42: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Performance Evaluation

42

Tuesday, October 22, 13

Page 43: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Performance Evaluation

43

Tuesday, October 22, 13

Page 44: Scalable precise-dynamic-datarace-detection-for-structured-parallelism

Conclusions• Scalable

• parallel data race detection

• trading off parallelism for efficient DMHP mechanism and results show it is worth it

• Modest memory overhead

• store 3 access history per memory location

• Eraser maintained large lock sets and FastTrack uses vector clocks

• Precise and sound

44

Tuesday, October 22, 13