Upload
juro
View
55
Download
0
Tags:
Embed Size (px)
DESCRIPTION
An Adaptive Task Creation Strategy for Work-Stealing Scheduling. Lei Wang , Huimin Cui, Yuelu Duan , Fang Lu, Xiaobing Feng , Pen-Chung Yew. ICT, Chinese Academy of Sciences, China University of Minnesota, U.S.A. Forecast . Adaptive task granularity. fine-grained parallelism. tasks. - PowerPoint PPT Presentation
Citation preview
INSTITUTE OF COMPUTING
TECHNOLOGY
An Adaptive Task Creation Strategy for Work-Stealing Scheduling
Lei Wang, Huimin Cui, Yuelu Duan, Fang Lu, Xiaobing Feng, Pen-Chung Yew
ICT, Chinese Academy of Sciences, ChinaUniversity of Minnesota, U.S.A
1
INSTITUTE OF COMPUTING
TECHNOLOGY
Forecast
2
Adaptive task granularity
fine-grained parallelism
tasks
Multi-cores
An adaptive task creation strategy Work-stealing
INSTITUTE OF COMPUTING
TECHNOLOGY
Outline An adaptive task creation strategy
A new data attribute -- taskprivate
Evaluations
Conclusions
3
INSTITUTE OF COMPUTING
TECHNOLOGY
Background Cilk, Cilk++, X10, OpenMP3.0, TBB, TPL …
Parallel programming languages and libraries to support task-level parallelism
Programmer: dividing work into tasks instead of threads
Runtime system: mapping and scheduling tasks into physical threads
Key technique Work-stealing scheduling
4
INSTITUTE OF COMPUTING
TECHNOLOGY
Granularity
too fine scheduling overhead dominates
too coarse lose potential parallelism, cause starvation
5
cut-off = 3
cut-off = 1
INSTITUTE OF COMPUTING
TECHNOLOGY
An unbalanced computation tree
6P0 – red, P1 – blue, P2 – green, P3 – yellow.
INSTITUTE OF COMPUTING
TECHNOLOGY
A cut-off strategy
7P0 – red, P1 – blue, P2 – green, P3 -- yellow
Load imbalance
INSTITUTE OF COMPUTING
TECHNOLOGY
An adaptive task creation strategy -- AdaptiveTC
8
A special task
P0 – red, P1 – blue, P2 – green, P3 -- yellow
INSTITUTE OF COMPUTING
TECHNOLOGY
AdaptiveTC When executing a spawn statement
a task, a function call (a fake task), a special task the task the fake task the special task
Adaptively switching between tasks and fake tasks to get a better performance Cut-off A special task
9
Keeping idle threads busy Improving performanceGood load balancing
a task a fake taska fake task a task
INSTITUTE OF COMPUTING
TECHNOLOGY
cilk int nqueens(int depth, int n, char x [ ]){… tmpx = Cilk_alloca(n * sizeof(char)); memcpy(tmpx, x, n * sizeof(char)); sn += spawn nqueens(depth + 1, n, tmpx);…sync;return sn;}
(3)
cilk int nqueens(int depth, int n, char x [ ]){… tmpx = (char *)malloc(n * sizeof(char)); memcpy(tmpx, x, n * sizeof(char)); sn += spawn nqueens(depth + 1, n, tmpx);...sync;free(x); return sn;}
(2) cilk int nqueens(int depth, int n, char x [ ]){... tmpx =(char *)malloc(n * sizeof(char)); memcpy(tmpx, x, n * sizeof(char)); sn += spawn nqueens(depth + 1, n, tmpx); free(tmpx);...sync;return sn;}
(1)
Which Cilk programs are correct?
10
N-queen problem
INSTITUTE OF COMPUTING
TECHNOLOGY
A new data attribute -- taskprivate Workspace copying
Not easy to program Overhead is high
taskprivate Introduced for
workspace variables
11
cilk int nqueens(int depth, int n, char x [ ]) taskprivate: (x[]) (n * sizeof(char));{ int sn = 0; if(depth >= n){ sn++; return sn; } for(j = 0; j < n; j++){ if(place(depth, j, x)){ x[depth] = j; sn += spawn nqueens(depth + 1, n, x); } } sync; return sn;}
An AdaptiveTC program for nqueens
In a fake task (a function call) x[depth] = j; sn += nqueens(depth + 1, n, x);
In a task
x[depth] = j; tmpx = Cilk_alloca(n * sizeof(char)); memcpy(tmpx, x, n * sizeof(char)); sn += nqueens(depth + 1, n, tmpx);
INSTITUTE OF COMPUTING
TECHNOLOGY
Test system, test cases 8 cores
2-processor quad core Intel Xeon E5520 (2.26GHz, 8G memory)
8 test cases 6 are backtracking search programs. 2 are divide and conquer programs.
Compared systems Cilk-5.4.6, Tascell (PPoPP’09), AdaptiveTC gcc -O3
12
INSTITUTE OF COMPUTING
TECHNOLOGY
Test case 1 -- performance
1 2 3 4 5 6 7 80
1
2
3
4
5
6
7
8
CilkCilk-SYNCHEDTascellAdaptiveTC
Number of Threads
Spee
dup
(Seconds) 1 thread 8 threads
C 61 61
Cilk 198 24.57
Cilk-SYNCHED 184 22.41
Tascell 85 14.24
AdaptiveTC 66 8.27
13Nqueen-array(16)
INSTITUTE OF COMPUTING
TECHNOLOGY
Test case 1 -- analysis
Tascell Cilk Cilk-SYNCHED
AdaptiveTC0%
20%
40%
60%
80%
100%
120%working taskprivate variable
Load balanced
28.7% 69.2% 67% 7.9% The usage of cores with 8 threads
14
Tascell Cilk AdaptiveTC
83.3%99.9% 99.0%
16.7%0.1% 1.0%
busy idle
Breakdown of overhead
overhead
INSTITUTE OF COMPUTING
TECHNOLOGY
1 2 3 4 5 6 7 80
1
2
3
4
5
6
7
8
CilkCilk-SYNCHEDTascellAdaptiveTC
Number of Threads
Spee
dup
Test case 2 -- performance
(Seconds) 1 thread 8 threads
C 554 554
Cilk 669 85
Cilk-SYNCHED 661 88
Tascell 627 114
AdaptiveTC 612 77
15Nqueen-compute(16)
INSTITUTE OF COMPUTING
TECHNOLOGY
Test case 2 -- analysis
11.7% 17.2% 16.2% 9.5%
Tascell Cilk Cilk-SYNCHED
AdaptiveTC0%
20%
40%
60%
80%
100%
120%
working taskprivate variabledeque/nested function
Load balanced
The usage of cores with 8 threads
Tascell Cilk AdaptiveTC
79.2%99.9% 99.1%
20.8%0.1% 0.9%
busy idle
16
Breakdown of overhead
overhead
INSTITUTE OF COMPUTING
TECHNOLOGY
012345678
1 2 3 4 5 6 7 8
spee
dup
# of threads
Sudoku ( i nput_bal ance tree)
Ci l kCi l k-SYNCHEDTascel lAdapti veTC
Kni ght' s tour(6*6)
0123456789
10
1 2 3 4 5 6 7 8# of threads
spee
dup Ci l k
Ci l k-SYNCHEDTascel lAdapti veTC
St r i mko
012345678
1 2 3 4 5 6 7 8# of threads
Spee
dup Ci l k
Ci l k- SYNCHEDTascel lAdapt i veTC
Pentomi no(13)
012345678
1 2 3 4 5 6 7 8# of threads
Spee
dup Ci l k
Ci l k- SYNCHEDTascel lAdapt i veTC
Experimental results
17
INSTITUTE OF COMPUTING
TECHNOLOGY
Comp(60000)
012345678
1 2 3 4 5 6 7 8# of threads
Spee
dup Ci l k
Tascel lAdapti veTC
Fi b(45)
01234567
1 2 3 4 5 6 7 8# of threads
spee
dup Ci l k
Tascel lAdapt i veTC
Nqueen
_arra
y(16)
Nqueen
_com
pute(
16)
Strimko
Knight'
s Tou
r(6*6
)
Sudok
u (ba
lance_
tree)
Pentom
ino(13
)
Fib(45
)
Comp(6
0000
)
Averag
e0
0.51
1.52
2.53
3.54
Cilk Cilk_SYNCHED Tascell AdaptiveTC
Spee
dup
Experimental results (cont’d)
18
Figure: Speedup with 8 threads, baseline is Cilk’s execution time
speedup
Cilk 1Cilk-SYNED 1.07Tascell 1.5AdaptiveTC 2.24
INSTITUTE OF COMPUTING
TECHNOLOGY
Conclusions -- AdaptiveTC An adaptive task creation strategy controls
the tasks granularity. Reducing the system overhead Achieving a good load balancing
A new data attribute taskprivate is introduced for workspace variables. Improving the programmability Reducing the cost of workspace copying with an
adaptive task creation strategy19
INSTITUTE OF COMPUTING
TECHNOLOGY
Thanks!20