Upload
yunming-zhang
View
86
Download
0
Tags:
Embed Size (px)
Citation preview
Scalable and Precise Dynamic Datarace Detection for Structured Parallelism
Presented by Yunming ZhangRice University
Paper by Raghavan Raman, Jisheng Zhao, Vivek Sarkar, Martin Vechev, Eran Yahav
1
Tuesday, October 22, 13
Background
• Problems
• sequential
• large space overhead
• costly read-write checks
2
Tuesday, October 22, 13
Goals
• Design a datarace detector
• works in parallel
• constant space per monitored memory location
• precise and sound
• no false positives for a given input
• if one or more races exist, at least one will be reported
3
Tuesday, October 22, 13
Structured Parallelism
• Express a wide range of parallel programs succinctly with a few parallel constructs
• Exploit structured parallelism for better performance through scheduling
• Provide guarantee of deadlock-freedom
4
Tuesday, October 22, 13
Structured Parallelism• Cilk
• spawn/sync model
• requires child/children to synchronize with the parent
• Habanero Java
• async/finish model
• requires descendants to synchronize with some ancestor
5
Tuesday, October 22, 13
Structured Parallelismint main() {
Node * node = initialize node walk(node);
}ReducerList<Node *> output_list;void walk(Node *x){ if (x) { if (has_property(x)) output_list.push_back(x);
cilk_spawn walk(x->left); walk(x->right); cilk_sync; }}
1
2 7
3 6
4 5
8 9
a
c
b
d
6
Tuesday, October 22, 13
Structured Parallelismint main() {
Node * node = initialize node finish { walk(node); }
}ReducerList<Node *> output_list;void walk(Node *x){ if (x) { if (has_property(x)) output_list.push_back(x);
async walk(x->left); async walk(x->right); cilk_sync; }}
1
2 7
3 6
4 5
8 9
a
7
Tuesday, October 22, 13
Race Detector Components
• Data structure to determine whether two tasks may execute in parallel
• Cilk uses SP bags
• This paper: Dynamic Program Structure Tree
• Data structure that keeps track of accesses to the same memory location that may result in data race
• Cilk records one read and one write
8
Tuesday, October 22, 13
Race Detector Components
• Data structure to determine whether two tasks may execute in parallel
• Cilk uses SP bags
• This paper: Dynamic Program Structure Tree
• Data structure that keeps track of accesses to the same memory location that may result in data race
• Cilk records one read and one write
9
Tuesday, October 22, 13
Dynamic Program Structure Tree (DPST)• Node types
• async
• finish
• step: a maximal sequence of statements without any async or finish operation
• Edges between parent and child tasks
• Children ordered from left to right
10
Tuesday, October 22, 13
Dynamic Program Structure Treefinish{
async{int i = 0;int j = 1;int k = i*j;...
}}
step node
async nodefinish node
11
Tuesday, October 22, 13
Dynamic Program Structure Tree
finish{//F1step1async{//A1
step2async{//A2
step3}step4
}step5async{//A3
step6}
}//finish F1
F1F1
12
Tuesday, October 22, 13
Dynamic Program Structure Tree
finish{step1async{//A1
step2async{//A2
step3}step4
}step5async{//A3
step6}
}//finish F1
F1F1
S1
13
Tuesday, October 22, 13
Dynamic Program Structure Tree
finish{step1async{//A1
step2async{//A2
step3}step4
}step5async{//A3
step6}
}//finish F1
F1F1
S1 A1
14
Tuesday, October 22, 13
Dynamic Program Structure Tree
finish{step1async{//A1
step2async{//A2
step3}step4
}step5async{//A3
step6}
}//finish F1
F1F1
S1 A1
S2
15
Tuesday, October 22, 13
Dynamic Program Structure Tree
finish{step1async{//A1
step2async{//A2
step3}step4
}step5async{//A3
step6}
}//finish F1
F1F1
S1 A1
S2 A2
16
Tuesday, October 22, 13
Dynamic Program Structure Tree
finish{step1async{//A1
step2async{//A2
step3}step4
}step5async{//A3
step6}
}//finish F1
F1F1
S1 A1
S2 A2
S3
17
Tuesday, October 22, 13
Dynamic Program Structure Tree
finish{step1async{//A1
step2async{//A2
step3}step4
}step5async{//A3
step6}
}//finish F1
F1F1
S1 A1
S2 A2
S3
S4
S5
18
Tuesday, October 22, 13
Dynamic Program Structure Tree
finish{step1async{//A1
step2async{//A2
step3}step4
}step5async{//A3
step6}
}//finish F1
F1F1
S1 A1
S2 A2
S3
S4
S5 A3
19
Tuesday, October 22, 13
Dynamic Program Structure Tree
finish{step1async{//A1
step2async{//A2
step3}step4
}step5async{//A3
step6}
}//finish F1
F1F1
S1 A1
S2 A2
S3
S4
S5 A3
S6
20
Tuesday, October 22, 13
Dynamic May Happen in Parallel (DMHP)• Need to determine if two tasks can
execute in parallel using DPST
• Theorem 1. Consider two leaf nodes (steps) S1 and S2 in a DPST, where S1 ≠ S2 and S1 is to the left of S2. S1 and S2 can execute in parallel if and only if the ancestor of S1 that is the child of LCA(S1,S2) is an async node.
21
Tuesday, October 22, 13
Can happen in parallel
F1F1
S1 A1
S2 A2
S3
S4
S5 A3
S6
Lowest Common Ancestor LCA(S3, S4)
A1
Ancestor of S3, Child of LCA(S3, S4)
A2
22
Tuesday, October 22, 13
Can happen in parallel
F1F1
S1 A1
S2 A2
S3
S4
S5 A3
S6
23
finish{//F1step1async{//A1
step2async{//A2
step3}step4
}step5async{//A3
step6}
}//finish F1
Tuesday, October 22, 13
Cannot happen in parallel
F1F1
S1 A1
S2 A2
S3
S4
S5 A3
S6
Lowest Common Ancestor LCA(S2,S3)
A1
Ancestor of S2, Child of LCA(S2,S3)
S2
24
Tuesday, October 22, 13
Cannot happen in parallel
F1F1
S1 A1
S2 A2
S3
S4
S5 A3
S6
25
finish{//F1step1async{//A1
step2async{//A2
step3}step4
}step5async{//A3
step6}
}//finish F1
Tuesday, October 22, 13
Race Detector Components
• Data structure to determine whether two tasks may execute in parallel
• Cilk uses SP bags
• This paper: Dynamic Program Structure Tree
• Data structure that keeps track of accesses to the same memory location that may result in data race
• Cilk records one read and one write
26
Tuesday, October 22, 13
Access Summary
• Maintains access summary in “shadow memory”
• for each data word m, an access summary is maintained in S(m)
• S(m) contains
• w: a reference to a step that wrote m
• r1: a reference to a step that read m
• r2: a reference to another step that read m
27
Tuesday, October 22, 13
Access Summary
• Last write is straightforward
• Among the reads to a memory location, (a1, a2, ... ak) Only necessary to store two reads
• ai, aj , such that the subtree under LCA(ai, aj) includes all past reads
• any future read an that is in parallel with any prior read will be in parallel with at least one of ai or aj
28
Tuesday, October 22, 13
Race Detection Algorithm
29
Tuesday, October 22, 13
Race Detection Algorithm
If replacing one ofthe reads with my step yields a higher LCA in DPST, then do so
30
Tuesday, October 22, 13
Why two readsint[] x = new int[2];x[0] = 0;finish{//F1
async {//A1async {//A2
... = x[0]; //S1}... = x[0]; //S2
}async {//A3
... = x[0]; //S3}
}//F1
F1
A1
S1
S2
A3
S3A2
31
Tuesday, October 22, 13
Why two readsint[] x = new int[2];x[0] = 0;finish{//F1
async {//A1async {//A2
... = x[0]; //S1}... = x[0]; //S2
}async {//A3
... = x[0]; //S3}
}//F1
F1
A1
S1
S2
A3
S3A2
F1
A3
S3A2
32
Tuesday, October 22, 13
Why two reads
M.r1 = S1M.r2 = S2
F1
A1
S1
S2
A3
S3A2
F1
A3
S3A2
33
Tuesday, October 22, 13
Why two readsint[] x = new int[2];x[0] = 0;finish{//F1
async {//A1async {//A2
... = x[0]; //S1}... = x[0]; //S2
}async {//A3
... = x[0]; //S3}
}//F1
F1
A1
S2
A3
S3A2
S1
A3
A2
A3
A2 S2
34
Tuesday, October 22, 13
Why two readsIf replacing one ofthe reads with my step yields a higher LCA in DPST, then do so
M.r1 = S1M.r2 = S2S = S3
F1
A1
S2
A3
S3A2
S1
A3
A2
A3
A2
35
Tuesday, October 22, 13
Why two reads
lca(S3,S1) > DPSTlca(S1,S2) lca(S3,S2) > DPSTlca(S1,S2)
M.r1 = S1M.r2 = S2M.r2 = S3
F1
A1
S2
A3
S3A2
S1
A3
A2
A3
A2
36
Tuesday, October 22, 13
Keys to Performance
• Construct the DPST in parallel
• Relax Atomicity when updating access summary
37
Tuesday, October 22, 13
Space and Time Complexity
• Definitions:
• T1: sequential execution time
• v: total number of variables
• n: number of nodes in the DPST
• Time Complexity
• O(T1* T∞)
• Space Complexity
• O(v + n)
38
Tuesday, October 22, 13
Precise and Sound• Precise
• if Algorithm reports a data race on a memory location M during an execution of a program P with an input, then there exists at least one execution of P with the input in which data race exists
• Sound
• if Algorithms do not report a data race on a memory location M during an execution of a program P with an input, then no execution with the input will report data race
39
Tuesday, October 22, 13
Performance Evaluation
40
Tuesday, October 22, 13
Performance Evaluation
41
Tuesday, October 22, 13
Performance Evaluation
42
Tuesday, October 22, 13
Performance Evaluation
43
Tuesday, October 22, 13
Conclusions• Scalable
• parallel data race detection
• trading off parallelism for efficient DMHP mechanism and results show it is worth it
• Modest memory overhead
• store 3 access history per memory location
• Eraser maintained large lock sets and FastTrack uses vector clocks
• Precise and sound
44
Tuesday, October 22, 13