Data Parallel Dense and Sparse Linear Algebra using Global ...Data Parallel Dense and Sparse Linear...

Preview:

Citation preview

Data Parallel Dense and Sparse Linear Algebrausing Global Arrays

Serge G. Petiton and Nahid Emad

November 1st, 2018

PARIS-SACLAY, FRANCE

Outline

• Introduction• Global arrays and data parallelism• Data parallel dense linear algebra• Data parallel sparse linear algebra• Conclusion

Novembre 1, 2018 XMP workshop 2

Introduction

Novembre 1, 2018 XMP workshop 3

The “true” data parallelism is back : all the cores didn’tshare a memory

NOC : distributed and parallel computing, even on a chip

Runtime systems and communications would not be able to optimize everything

The algorithms have to be data parallel and communication or stencils, have tobe “compiled” when possible

Outline

• Introduction• Global arrays and data parallelism• Data parallel dense linear algebra• Data parallel sparse linear algebra• Conclusion

Novembre 1, 2018 XMP workshop 4

Global array and data parallelism

• A global array is a “data parallel” variable• We assume that each element of a global array is store only on a

private memory, without other elements (hypothesis)• Reduction (prefix) operations on global arrays : associative operations• Global arrays : let ”epc” be the “elements per core ratio” = number

of elements of the global array in a physical core (“former” “virtual processor ration” and “virtual geometry”).

• Spread ”data parallel operation” along a given dimension of a given array

• Scan data parallel operations?• Spread_with_add, Reduce_with_add along a given dimension• Neighbor communication,s and stencils• Send/get one_to_one, one_to_all, one_to_many,….• We have to map those “elements” along the processors (depending of

communication and the data parallel algorithm• We have to align “global arrays” Novembre 1, 2018 XMP workshop 5

Outline

• Introduction• Global arrays and data parallelism• Data parallel Dense linear algebra

– Matrix-vector multiplication : A(Ax)– Gauss Elimination– Back substitution– Gauss-Jordan inversion– QR method

• Sparse linear algebra• Conclusion

Novembre 1, 2018 XMP workshop 6

Data parallel A(Ax)

a1,1 a1,2 a1,3 a1,4 a1,5 a1,6

a2,1 a2,3 a2,4 a2,5 a2,6a2,2

a4,2 a4,3 a4,4 a4,5 a4,6a4,1

a5,2 a5,3 a5,4 a5,5 a5,6a5,1

a6,2 a6,3 a6,4 a6,5 a6,6a6,1

a3,3 a3,4 a3,5a3,6a3,1 a3,2

x3x2 x4x1 x6x5

x3x2 x4x1 x6x5

x3x2 x4x1 x6x5

x3x2 x4x1 x6x5

x3x2 x4x1 x6x5

x3x2 x4x1 x6x5XMP workshopNovembre 1, 2018 7

Data parallel multiplication

T1,3T1,2 T1,4T1,1 T1,6T1,5

T2,3T2,2 T2,4T2,1 T2,6T2,5

T3,3T3,2 T3,4T3,1 T3�T3,5

T4,3T4,2 T4,4T4,1 T4,6T4,5

T5,3T5,2 T54T5,1 T5,6T5,5

T,36T6,2 T;46T6,1 T6,6T6,6

XMP workshop

Ti,j = ai,j xj

Novembre 1, 2018 8

Second step

T1,3T1,2 T1,4T1,1 T1,6T1,5

T2,3T2,2 T2,4T2,1 T2,6T2,5

T3,3T3,2 T3,4T3,1 T3�T3,5

T4,3T4,2 T4,4T4,1 T4,6T4,5

T5,3T5,2 T54T5,1 T5,6T5,5

T,36T6,2 T;46T6,1 T6,6T6,6

XMP workshop

Summation (reduction operation)

Ti,j = ai,j xj

Novembre 1, 2018 9

A(Ax) or A(Ax+x)+x

w1w1 w1w1 w1w1

w2w2 w2w2 w2w2

w3w3 w3w3 w3w3

w4w4 w4w4 w4w4

w5w5 w5w5 w5w5

w6w6 w6w6 w6w6

XMP workshopNovembre 1, 2018 10

A(Ax) or A(Ax+x)+x

w1w1 w1w1 w1w1

w2w2 w2w2 w2w2

w3w3 w3w3 w3w3

w4w4 w4w4 w4w4

w5w5 w5w5 w5w5

w6w6 w6w6 w6w6

Ax = wis now colummapped

XMP workshopNovembre 1, 2018 11

A(Ax) or A(Ax+x)+x

w1w1 w1w1 w1w1

w2w2 w2w2 w2w2

w3w3 w3w3 w3w3

w4w4 w4w4 w4w4

w5w5 w5w5 w5w5

w6w6 w6w6 w6w6

Ax = wis now colummapped

XMP workshopNovembre 1, 2018 12

Spread operation

w6

w1

w2

w3

w4

w5

w1 w2 w3 w4 w5 w6

XMP workshopNovembre 1, 2018 13

Novembre 1, 2018 XMP workshop

Dimension 1

14

Data parallel Gauss Elimination

Novembre 1, 2018 XMP workshop

Dimension 1

15

Novembre 1, 2018 XMP workshop

Dimension 1

16

Novembre 1, 2018 XMP workshop

Dimension 1

17

Novembre 1, 2018 XMP workshop

Dimension 1

18

Novembre 1, 2018 XMP workshop

Dimension 1

19

Novembre 1, 2018 XMP workshop

Dimension 1

20

Spread number optimization

Novembre 1, 2018 XMP workshop

Dimension 1

21

Novembre 1, 2018 XMP workshop

Dimension 1

22

Novembre 1, 2018 XMP workshop

Dimension 1

23

Novembre 1, 2018 XMP workshop 24

XMP :

Novembre 1, 2018 XMP workshop

Dimension 1

25

(non) Data parallel back substitution

Novembre 1, 2018 XMP workshop

Dimension 1

26

We already compute x8, x7 and x6, then we compute x5

Novembre 1, 2018 XMP workshop

Dimension 1

27

Novembre 1, 2018 XMP workshop

Dimension 1

28

Novembre 1, 2018 XMP workshop

Dimension 1

x5

29

Novembre 1, 2018 XMP workshop

Dimension 1

30

Data parallel Gauss Jordan method

Novembre 1, 2018 XMP workshop

Dimension 1

31

Spread number already optimized

Novembre 1, 2018 XMP workshop

Dimension 1

32

Novembre 1, 2018 XMP workshop

Dimension 1

33

Novembre 1, 2018 XMP workshop

Dimension 1

34

Triadic (*,+) data parallel operation

Novembre 1, 2018 XMP workshop 35

XMP :

Novembre 1, 2018 XMP workshop

Dimension 1

36

(non) data parallel QR method

Outline

• Introduction• Global arrays and data parallelism• Data parallel dense linear algebra• Data parallel sparse linear algebra

– Sparse matrix-vector multiplication (iterative/restarted methods)

• Conclusion

Novembre 1, 2018 XMP workshop 37

Novembre 1, 2018 XMP workshop 38

x3 x7 x8

Each compressed col jis multiplied by xj

T[1:3,1:8 ] = A[1:3,1:8] * X1:3,1:9]

We need also to store the row of each non-zero element :ELLPACK format

Novembre 1, 2018 XMP workshop 39

For the reduction/spread with addition, the best is the row compression

+

We have to store also the column of each non zero element

Novembre 1, 2018 XMP workshop 40

ELLPACK format

Column

Novembre 1, 2018 XMP workshop 41

Jc : the jcth non zero element of the compress row

Sparse General Pattern (SGP), you need to have (ai,j,i,j,ic,jc)

Acr = non zero element of A

We may use these parameters to change from a column compression to a row one

ic

ic

jc

j

j

i

Novembre 1, 2018 XMP workshop 42

(ic,j) to (i,jc)

or

(ic,j) to (jc,i)

to keep a C by N global array

Novembre 1, 2018 XMP workshop 43

Novembre 1, 2018 XMP workshop 44

Novembre 1, 2018 XMP workshop 45

jci

Novembre 1, 2018 XMP workshop 46

Novembre 1, 2018 XMP workshop 47

++ + +

+ + ++

+++

+ ++

Novembre 1, 2018 XMP workshop 48

Outline

• Introduction• Global arrays and data parallelism• Data parallel dense linear algebra• Data parallel sparse linear algebra• Conclusion

Novembre 1, 2018 XMP workshop 49

Conclusion

• Global arrays allow data parallel algorithms• Sparse matrix linear algebra asks for new formats when

cores/processors don’t share any memory• SGP format was the must efficient on past data parallel

machines with such properties• Algorithm have to be developed using XMP and

experiments on new machines

Novembre 1, 2018 XMP workshop 50

- William Ferng, Serge Petiton, Kesheng Wu, and Yousef Saad. Basic Sparse MatrixComputations on Massively Parallel Computers, Parallel Processing for ScientificComputing, David Keyes et al. Editeurs, SIAM, 1993.

- Serge Petiton et Nahid Emad. A Data Parallel Scientific Computing Introduction,The Data Parallel Programming Model, LNCS 1132, pp 45-64, Springer-Verlag,1996.

Recommended