View
43
Download
0
Category
Preview:
Citation preview
Data Parallel Dense and Sparse Linear Algebrausing Global Arrays
Serge G. Petiton and Nahid Emad
November 1st, 2018
PARIS-SACLAY, FRANCE
Outline
• Introduction• Global arrays and data parallelism• Data parallel dense linear algebra• Data parallel sparse linear algebra• Conclusion
Novembre 1, 2018 XMP workshop 2
Introduction
Novembre 1, 2018 XMP workshop 3
The “true” data parallelism is back : all the cores didn’tshare a memory
NOC : distributed and parallel computing, even on a chip
Runtime systems and communications would not be able to optimize everything
The algorithms have to be data parallel and communication or stencils, have tobe “compiled” when possible
Outline
• Introduction• Global arrays and data parallelism• Data parallel dense linear algebra• Data parallel sparse linear algebra• Conclusion
Novembre 1, 2018 XMP workshop 4
Global array and data parallelism
• A global array is a “data parallel” variable• We assume that each element of a global array is store only on a
private memory, without other elements (hypothesis)• Reduction (prefix) operations on global arrays : associative operations• Global arrays : let ”epc” be the “elements per core ratio” = number
of elements of the global array in a physical core (“former” “virtual processor ration” and “virtual geometry”).
• Spread ”data parallel operation” along a given dimension of a given array
• Scan data parallel operations?• Spread_with_add, Reduce_with_add along a given dimension• Neighbor communication,s and stencils• Send/get one_to_one, one_to_all, one_to_many,….• We have to map those “elements” along the processors (depending of
communication and the data parallel algorithm• We have to align “global arrays” Novembre 1, 2018 XMP workshop 5
Outline
• Introduction• Global arrays and data parallelism• Data parallel Dense linear algebra
– Matrix-vector multiplication : A(Ax)– Gauss Elimination– Back substitution– Gauss-Jordan inversion– QR method
• Sparse linear algebra• Conclusion
Novembre 1, 2018 XMP workshop 6
Data parallel A(Ax)
a1,1 a1,2 a1,3 a1,4 a1,5 a1,6
a2,1 a2,3 a2,4 a2,5 a2,6a2,2
a4,2 a4,3 a4,4 a4,5 a4,6a4,1
a5,2 a5,3 a5,4 a5,5 a5,6a5,1
a6,2 a6,3 a6,4 a6,5 a6,6a6,1
a3,3 a3,4 a3,5a3,6a3,1 a3,2
x3x2 x4x1 x6x5
x3x2 x4x1 x6x5
x3x2 x4x1 x6x5
x3x2 x4x1 x6x5
x3x2 x4x1 x6x5
x3x2 x4x1 x6x5XMP workshopNovembre 1, 2018 7
Data parallel multiplication
T1,3T1,2 T1,4T1,1 T1,6T1,5
T2,3T2,2 T2,4T2,1 T2,6T2,5
T3,3T3,2 T3,4T3,1 T3�T3,5
T4,3T4,2 T4,4T4,1 T4,6T4,5
T5,3T5,2 T54T5,1 T5,6T5,5
T,36T6,2 T;46T6,1 T6,6T6,6
XMP workshop
Ti,j = ai,j xj
Novembre 1, 2018 8
Second step
T1,3T1,2 T1,4T1,1 T1,6T1,5
T2,3T2,2 T2,4T2,1 T2,6T2,5
T3,3T3,2 T3,4T3,1 T3�T3,5
T4,3T4,2 T4,4T4,1 T4,6T4,5
T5,3T5,2 T54T5,1 T5,6T5,5
T,36T6,2 T;46T6,1 T6,6T6,6
XMP workshop
Summation (reduction operation)
Ti,j = ai,j xj
Novembre 1, 2018 9
A(Ax) or A(Ax+x)+x
w1w1 w1w1 w1w1
w2w2 w2w2 w2w2
w3w3 w3w3 w3w3
w4w4 w4w4 w4w4
w5w5 w5w5 w5w5
w6w6 w6w6 w6w6
XMP workshopNovembre 1, 2018 10
A(Ax) or A(Ax+x)+x
w1w1 w1w1 w1w1
w2w2 w2w2 w2w2
w3w3 w3w3 w3w3
w4w4 w4w4 w4w4
w5w5 w5w5 w5w5
w6w6 w6w6 w6w6
Ax = wis now colummapped
XMP workshopNovembre 1, 2018 11
A(Ax) or A(Ax+x)+x
w1w1 w1w1 w1w1
w2w2 w2w2 w2w2
w3w3 w3w3 w3w3
w4w4 w4w4 w4w4
w5w5 w5w5 w5w5
w6w6 w6w6 w6w6
Ax = wis now colummapped
XMP workshopNovembre 1, 2018 12
Spread operation
w6
w1
w2
w3
w4
w5
w1 w2 w3 w4 w5 w6
XMP workshopNovembre 1, 2018 13
Novembre 1, 2018 XMP workshop
Dimension 1
14
Data parallel Gauss Elimination
Novembre 1, 2018 XMP workshop
Dimension 1
15
Novembre 1, 2018 XMP workshop
Dimension 1
16
Novembre 1, 2018 XMP workshop
Dimension 1
17
Novembre 1, 2018 XMP workshop
Dimension 1
18
Novembre 1, 2018 XMP workshop
Dimension 1
19
Novembre 1, 2018 XMP workshop
Dimension 1
20
Spread number optimization
Novembre 1, 2018 XMP workshop
Dimension 1
21
Novembre 1, 2018 XMP workshop
Dimension 1
22
Novembre 1, 2018 XMP workshop
Dimension 1
23
Novembre 1, 2018 XMP workshop 24
XMP :
Novembre 1, 2018 XMP workshop
Dimension 1
25
(non) Data parallel back substitution
Novembre 1, 2018 XMP workshop
Dimension 1
26
We already compute x8, x7 and x6, then we compute x5
Novembre 1, 2018 XMP workshop
Dimension 1
27
Novembre 1, 2018 XMP workshop
Dimension 1
28
Novembre 1, 2018 XMP workshop
Dimension 1
x5
29
Novembre 1, 2018 XMP workshop
Dimension 1
30
Data parallel Gauss Jordan method
Novembre 1, 2018 XMP workshop
Dimension 1
31
Spread number already optimized
Novembre 1, 2018 XMP workshop
Dimension 1
32
Novembre 1, 2018 XMP workshop
Dimension 1
33
Novembre 1, 2018 XMP workshop
Dimension 1
34
Triadic (*,+) data parallel operation
Novembre 1, 2018 XMP workshop 35
XMP :
Novembre 1, 2018 XMP workshop
Dimension 1
36
(non) data parallel QR method
Outline
• Introduction• Global arrays and data parallelism• Data parallel dense linear algebra• Data parallel sparse linear algebra
– Sparse matrix-vector multiplication (iterative/restarted methods)
• Conclusion
Novembre 1, 2018 XMP workshop 37
Novembre 1, 2018 XMP workshop 38
x3 x7 x8
Each compressed col jis multiplied by xj
T[1:3,1:8 ] = A[1:3,1:8] * X1:3,1:9]
We need also to store the row of each non-zero element :ELLPACK format
Novembre 1, 2018 XMP workshop 39
For the reduction/spread with addition, the best is the row compression
+
We have to store also the column of each non zero element
Novembre 1, 2018 XMP workshop 40
ELLPACK format
Column
Novembre 1, 2018 XMP workshop 41
Jc : the jcth non zero element of the compress row
Sparse General Pattern (SGP), you need to have (ai,j,i,j,ic,jc)
Acr = non zero element of A
We may use these parameters to change from a column compression to a row one
ic
ic
jc
j
j
i
Novembre 1, 2018 XMP workshop 42
(ic,j) to (i,jc)
or
(ic,j) to (jc,i)
to keep a C by N global array
Novembre 1, 2018 XMP workshop 43
Novembre 1, 2018 XMP workshop 44
Novembre 1, 2018 XMP workshop 45
jci
Novembre 1, 2018 XMP workshop 46
Novembre 1, 2018 XMP workshop 47
++ + +
+ + ++
+++
+ ++
Novembre 1, 2018 XMP workshop 48
Outline
• Introduction• Global arrays and data parallelism• Data parallel dense linear algebra• Data parallel sparse linear algebra• Conclusion
Novembre 1, 2018 XMP workshop 49
Conclusion
• Global arrays allow data parallel algorithms• Sparse matrix linear algebra asks for new formats when
cores/processors don’t share any memory• SGP format was the must efficient on past data parallel
machines with such properties• Algorithm have to be developed using XMP and
experiments on new machines
Novembre 1, 2018 XMP workshop 50
- William Ferng, Serge Petiton, Kesheng Wu, and Yousef Saad. Basic Sparse MatrixComputations on Massively Parallel Computers, Parallel Processing for ScientificComputing, David Keyes et al. Editeurs, SIAM, 1993.
- Serge Petiton et Nahid Emad. A Data Parallel Scientific Computing Introduction,The Data Parallel Programming Model, LNCS 1132, pp 45-64, Springer-Verlag,1996.
Recommended