23
MATRIX MULTIPLICATION 4 th week

MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

Embed Size (px)

Citation preview

Page 1: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

MATRIX MULTIPLICATION4th week

Page 2: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

MATRIX MULTIPLICATION4th week

References Sequential matrix multiplication Algorithms for processor arrays

– Matrix multiplication on 2-D mesh SIMD model– Matrix multiplication on hypercube SIMD model

Matrix multiplication on UMA multiprocessors Matrix multiplication on multicomputers

Page 3: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-3- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

REFERENCES

Parallel Computing: Theory and Practice– Michael J. Quinn, McGraw-Hill

Chapter 7

Page 4: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-4- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

SEQUENTIAL MATRIX MULTIPLICATION

Global a[0..l-1,0..m-1], b[0..m-1][0..n-1], {Matrices to be multiplied} c[0..l-1,0..n-1], {Product matrix} t, {Accumulates dot product}

i, j, k;Begin

for i:=0 to l-1 do for j:=0 to n-1 do t:=0;

for k:=0to m-1 do t:=t+a[i][k]*b[k][j];

endfor k;c[i][j]:=k;

endfor j; endfor i; End.

Page 5: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-5- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

ALGORITHMS FOR PROCESSOR ARRAYS

Matrix multiplication on 2-D mesh SIMD model

Matrix multiplication on Hypercube SIMD model

Page 6: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-6- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

MATRIX MULTIPLICATION ON 2-D MESH SIMD MODEL

Gentleman(1978) has shown that multiplication of to n*n matrices on the 2-D mesh SIMD model requires 0(n) routing steps

We will consider a multiplication algorithm on a 2-D mesh SIMD model with wraparound connections

Page 7: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-7- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

MATRIX MULTIPLICATION ON 2-D MESH SIMD MODEL (cont’d)

For simplicity, we suppose that– Size of the mesh is n*n– Size of each matrix (A and B) is n*n– Each processor Pi,j in the mesh (located at row

i,column j) contains ai,j and bi,j

At the end of the algorithm, Pi,j will hold the element ci,j of the product matrix

Page 8: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-8- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

MATRIX MULTIPLICATION ON 2-D MESH SIMD MODEL (cont’d)

Major phases

(a) Initial distribution of matrices A and B

(b) Staggering all A’s elements in row i to the left by i positions and all B’s elements in col j upwards by i positions

a0,0b

0,0

a0,1b

0,1

a0,2b

0,2

a0,3

b0,3

a1,0b

1,0

a1,1b

1,1

a1,2b

1,2

a1,3

b1,3

a2,0b

2,0

a2,1b

2,1

a2,2b

2,2

a2,3

b2,3

a3,0b

3,0

a3,1b

3,1

a3,2b

3,2

a3,3

b3,3

a0,0b

0,0

a0,1b

1,1

a0,2b

2,2

a0,3

b3,3

a1,1b

1,0

a1,2b

2,1

a1,3b

3,2

a2,2b

2,0

a2,3b

3,1

a3,3b

3,0

a3,2a3,1a3,0

a2,1a2,0

a1,0

b2,3b1,2b0,1

b1,3b0,2

b0,3

Page 9: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-9- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

MATRIX MULTIPLICATION ON 2-D MESH SIMD MODEL (cont’d)

(c) Distribution of 2 matrices A and B after staggering in a 2-D mesh with wrapparound connection

a0,0b

0,0

a0,1b

1,1

a0,2b

2,2

a0,3

b3,3

a1,1b

1,0

a1,2b

2,1

a1,3b

3,2

a1,0

b0,3

a2,2b

2,0

a2,3b

3,1

a2,0b

0,2

a2,1

b1,3

a3,3b

3,0

a3,0

b0,1

a3,1

b1,2

a3,2

b2,3

(b) Staggering all A’s elements in row i to the left by i positions and all B’s elements in col j upwards by i positions

a0,0b

0,0

a0,1b

1,1

a0,2b

2,2

a0,3

b3,3

a1,1b

1,0

a1,2b

2,1

a1,3b

3,2

a2,2b

2,0

a2,3b

3,1

a3,3b

3,0

a3,2a3,1a3,0

a2,1a2,0

a1,0

b2,3b1,2b0,1

b1,3b0,2

b0,3 Each processor P(i,j) has a pair of elements to multiply ai,k and bk,j

Page 10: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-10- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

MATRIX MULTIPLICATION ON 2-D MESH SIMD MODEL (cont’d)

The rest steps of the algorithm from the viewpoint of processor P(1,2)

a1,3

b3,2

b2,2

b0,2

b1,2

a1,1 a1,2 a1,0

(a) First scalar multiplication step

a1,0

b0,2

b3,2

b1,2

b2,2

a1,2 a1,3 a1,1

(b) Second scalar multiplication step after elements of A are cycled to the left and elements of B are cycled upwards

Page 11: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-11- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

MATRIX MULTIPLICATION ON 2-D MESH SIMD MODEL (cont’d)

a1,1

b1,2

b0,2

b2,2

b3,2

a1,3 a1,0 a1,2

(c) Third scalar multiplication step after second cycle step

(d) Third scalar multiplication step after second cycle step. At this point processor P(1,2) has computed the dot product c1,2

a1,2

b2,2

b1,2

b3,2

b0,2

a1,0 a1,1 a1,3

Page 12: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-12- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

MATRIX MULTIPLICATION ON 2-D MESH SIMD MODEL (cont’d)

Detailed AlgorithmGlobal n, {Dimension of matrices}

k ;Local a, b, c;Begin

for k:=1 to n-1 do forall P(i,j) where 1 ≤ i,j < n do

if i ≥ k then a:= fromleft(a);

if j ≥ k then b:=fromdown(b); end forall; endfor k; forall P(i,j) where 0 ≤ i,j < n do

c:= a*b; end forall; for k:=1 to n-1 do forall P(i,j) where 0 ≤ i,j < n do

a:= fromleft(a); b:=fromdown(b);c:= c + a*b;

end forall; endfor k;End.

Stagger 2 matrices

a[0..n-1,0..n-1] and b[0..n-1,0..n-1]

Compute dot product

Page 13: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-13- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

MATRIX MULTIPLICATION ON 2-D MESH SIMD MODEL (cont’d)

Can we implement the abovementioned algorithm on a 2-D mesh SIMD model without wrapparound connection?

Page 14: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-14- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

MATRIX MULTIPLICATION ALGORITHM FOR MULTIPROCESSSORS

Design strategy 5– If load balancing is not a problem, maximize grain

size Grain size: the amount of work performed between processor

interactions

Things to be considered– Parallelizing the most outer loop of the sequential

algorithm is a good choice since the attained grain size (0(n3/p)) is the biggest

– Resolving memory contention as much as possible

Page 15: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-15- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

MATRIX MULTIPLICATION ALGORITHM FOR UMA MULTIPROCESSSORS

Algorithm using p processorsGlobal n, {Dimension of matrices}

a[0..n-1,0..n-1], b[0..n-1,0..n-1]; {Two input matrices} c[0..n-1,0..n-1]; {Product matrix}

Local i,j,k,t;Begin forall Pm where 1 ≤ m< ≤ p do

for i:=m to n step p do for j:= 1 to n to

t:=0;for k:=1 to n do t:=t+a[i,k]*b[km,j];

endfor j; c[i][j]:=t; endfor i; end forall;End.

Page 16: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-16- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

MATRIX MULTIPLICATION ALGORITHM FOR NUMA MULTIPROCESSSORS

Things to be considered– Try to resolve memory contention as much as

possible– Increase the locality of memory references to

reduce memory access time Design strategy 6

– Reduce average memory latency time by increasing locality

The block matrix multiplication algorithm is a reasonable choice in this situation – Section 7.3, p.187, Parallel Computing: Theory and Practice

Page 17: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-17- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

MATRIX MULTIPLICATION ALGORITHM FOR MULTICOMPUTERS

We will study 2 algorithms on multicomputers– Row-Column-Oriented Algorithm (in class)– Block-Oriented Algorithm (to be read at home)

Page 18: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-18- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

ROW-COLUMN-ORIENTED ALGORITHM

The processes are organized as a ring– Step 1: Initially, each process is given 1 row of the

matrix A and 1 column of the matrix B– Step 2: Each process uses vector multiplication to

get 1 element of the product matrix C.– Step 3: After a process has used its column of

matrix B, it fetches the next column of B from its successor in the ring

– Step 4: If all rows of B have already been processed, quit. Otherwise, go to step 2

Page 19: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-19- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

ROW-COLUMN-ORIENTED ALGORITHM (cont’d)

Why do we have to organize processes as a ring and make them use B’s rows in turn?

Design strategy 7:– Eliminate contention for shared resourcesby

changing the order of data access

Page 20: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-20- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

ROW-COLUMN-ORIENTED ALGORITHM (cont’d)

A B C

A B C A B C

A B C

Example: Use 4 processes to mutliply two matrices A4*4 and B4*4

1st iteration

Page 21: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-21- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

ROW-COLUMN-ORIENTED ALGORITHM (cont’d)

A B C

A B C A B C

A B C

Example: Use 4 processes to mutliply two matrices A4*4 and B4*4

2nd iteration

Page 22: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-22- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

ROW-COLUMN-ORIENTED ALGORITHM (cont’d)

A B C

A B C A B C

A B C

Example: Use 4 processes to mutliply two matrices A4*4 and B4*4

3rd iteration

Page 23: MATRIX MULTIPLICATION 4 th week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM MATRIX MULTIPLICATION 4 th week References Sequential matrix

-23- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM

ROW-COLUMN-ORIENTED ALGORITHM (cont’d)

A B C

A B C A B C

A B C

Example: Use 4 processes to mutliply two matrices A4*4 and B4*4

4th iteration (the last)