View
302
Download
6
Category
Preview:
Citation preview
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
1/39
MATRIX MULTIPLICATION
(Part b)
By:
Shahrzad AbediProfessor: Dr. Haj Seyed Javadi
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
2/39
MATRIX Multiplication
SIMD MIMD
Multiprocessors
Multicomputers
Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn 2
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
3/39
Matrix Multiplication Algorithmsfor Multiprocessors
Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn 3
p1
p2
p3
p4
p1 p2 p3 p4
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
4/39
Matrix Multiplication Algorithmfor a UMA Multiprocessor
Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn 4
p1
p2
p3
p4
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
5/39
Matrix Multiplication Algorithmfor a UMA Multiprocessor
Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn 5
p1
p2
C A B
Example:n= 8 , P=2 n/p= 4
n/p times
We must read n/p rows of A and we must read everyelement of B, n/p times
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
6/39
Matrix Multiplication Algorithms
for Multiprocessors
Question : Which Loop should be madeparallel in the sequential Matrix multiplicationalgorithm?
Grain Size :Amount of work performed between processor
interactions
Ratio of Computation time to CommunicationTime : Computation time / Communication time
6Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
7/39
Sequential Matrix Multiplication
Algorithm
Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn 7
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
8/39
Matrix Multiplication Algorithms
for Multiprocessors
Design StrategyIf load balancing is not a
problem maximize grain size
Question : Which Loop should be made
parallel ? i or j or k ?
K has data dependency
If j
Grain-size = O(n3
/np)= O(n2
/p) If iGrain-size = O(n
3/p)
X
8Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
9/39
Matrix Multiplication Algorithm
for a UMA MultiprocessorParallelizing i loop
9Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
10/39
Matrix Multiplication Algorithmfor a UMA Multiprocessor
n/p rows each (n2)
n/p xn2= (n3/p)
Synchronizationoverhead(p)
Complexity(n3/p + p)
10Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
11/39
Matrix Multiplication in Loosely
Coupled Multiprocessors Some matrix elements may be much easier
to access than others
It is important to keep local as many memoryreferences as possible
In previous UMA algorithm : Every process must access n/prows of matrix A
and access every element of B n/p times
Only a single addition and a singlemultiplication occur for every element of Bfetched . This is not a good ratio!Implementation of this algorithm on NUMAMulti-processors yields poor speedup!
11Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
12/39
Matrix Multiplication in LooselyCoupled Multiprocessors
Another method must be found to partition
the problem
An attractive methodBlock Matrix
Multiplication
12Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
13/39
Block Matrix Multiplication
A and B are both n x n matrices, n= 2k
A and B can be thought of as conglomerates of
4 smaller matrices, each of size k x k
Given this partitioning of A and B into blocks , C is
defined as follows:
13Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
14/39
Block Matrix Multiplication
For example there are processes,
then matrix multiplication is done by dividing
A and B into p blocks of size k x k.
14Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
15/39
Block Matrix Multiplication
STEP 1: compute Ci, j= Ai,1B1,j
A B
15Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
P1: P2:
P3: P4:
P1: :P2
P3: :P4
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
16/39
Block Matrix Multiplication
STEP 2: Compute Ci,j=Ci,j+Ai,2B2,j
16Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
P1:
P3:
P2:
P4:
P1: :P2
:P4P3:
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
17/39
Block Matrix Multiplication
Each block multiplicationrequires 2k2
memory fetches, k3additions and k3
multiplications
The number of arithmetic operations per
memory access has risen from 2 , in previous
algorithm to:
17Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J.Quinn
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
18/39
Matrix Multiplication Algorithm
for NUMA Multiprocessors
Try to resolve memory contention as much as
possible
Increase the locality of memory references to
reduce memory access time
Design Strategy Reduce average memory
latency time by increasing locality
18Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
Al i h f M l i
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
19/39
Algorithms for Multicomputers:
Row-Column Oriented Algorithm
Partition Matrix A into rows and B into columns(n is apower of 2 and we are executing algorithm on an n-processor hypercube):
One imaginable parallelization:
Parallelize the outer loop (i) All parallel processes access column 0 of b, then column 1
of b, etc.
This results in a sequence of broadcast steps each having(logn) on an n-processor hypercube( refer to chapter 6,
p. 170) In the case of a multiprocessor too much contention
for the same memory bank is called hot spot
19Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
20/39
Row-Column Oriented Algorithm
Design Strategy Eliminate contention forshared resources by changing temporal orderof data accesses.
New Solution for a multicomputer: Change the order in which the algorithm
computes the elements of each row of C
Processes are organized as a ring.
After each process has used its current column ofB, it fetches the next column of B from itssuccessor on the ring
20Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
21/39
Row-Column Oriented Algorithm
1 0
5 4
3 2
7 6
We embed a ring in a hypercube
with dilation 1 using Gray Codes
Each message can be sent in
time (1)
2
6
3
4
1 0
5
7
21Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
22/39
Row-Column Oriented Algorithm
Example : Use 4 processes to multiply two matrices A4x4and B4x4
22Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
P1: P4:
P2: P3:
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
23/39
Row-Column Oriented Algorithm
Example : Use 4 processes to multiply two matrices A4x4and B4x4
23Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
P1: P4:
P2: P3:
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
24/39
Row-Column Oriented Algorithm
Example : Use 4 processes to multiply two matrices A4x4and B4x4
24Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
P1: P4:
P2: P3:
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
25/39
Row-Column Oriented Algorithm
Example : Use 4 processes to multiply two matrices A4x4and B4x4
25Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
P1: P4:
P2: P3:
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
26/39
Row-Column Oriented Algorithm
Generalizing the algorithm :
Multiplying l x m and m x n matrices on p processors where p
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
27/39
Row-Column Oriented Algorithm
total Communication time: The standard assumption : Sending and
receiving a message has Message latency plus message transmission time times the
number of values sent : Message latency
: Message transmission time
Every iteration has communication time :2(+m(n/p))
Over p iteration total communication time is :
27Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
28/39
Algorithms for Multicomputers:Block-Oriented Algorithm
We want to maximizenumber of multiplicationsperformed per iteration
Multiplying l x m matrix Aby m x n matrix B(l, m andn are integer multiples of where p is an evenpower of 2.
Processors as a two-dimensional mesh withwraparound connections
Give each processor a subsectionof A and subsection of B.
28Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
29/39
Block-Oriented Algorithm
The new Matrix multiplication algorithm is acorollary of two results shown earlier:
Block matrix multiplication performed analogously toscalar matrix multiplicationEach occurrence ofscalar multiplication is replaced by an occurrence ofmatrix multiplication
The algorithm previously used on 2-dimensional meshof processors with a staggering techniqueThe same
staggering technique is used to position the blocks ofA and B, so that every processor multiplies twosubmatrices every iteration
29Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
30/39
Block-Oriented Algorithm
Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn 30
Phase 1: Staggering the block submatrixes of matrixA is done in both directions: left and right
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
31/39
Block-Oriented Algorithm
Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn 31
Phase 1: Staggering the block submatrixes of matrixB is done in both directions: up and down
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
32/39
Block-Oriented Algorithm
Chapter 7: Matrix Multiplication , Parallel
Computing :Theory and Practice, Michael J.
Quinn
32
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
33/39
Block-Oriented Algorithm
From s point of view:
Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn 33
A1,2*B2,2A1,1*B1,2+A1,0*B0,2++C1,2 = A1,3*B3,2
(1) (2)
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
34/39
Block-Oriented Algorithm
Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn 34
(3) (4)
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
35/39
Block-Oriented Algorithm
There are iterations that every processor sends andreceives a portion of matrix A and B
Number of Computation steps
The staggering and unstaggering phase takes steps instead ofp -1steps in Getlemansalgorithm How?
There are iterations that every processor sends and
receives a portion of matrix A and B Total communication steps for transferring A block /B block
2( + ( )) =
35Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
36/39
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
37/39
The two multicomputer Algorithms
Both the block oriented algorithm and the row-column algorithm have the same number ofcomputation steps : (lmn/p)
When does the second algorithm require lesscommunication time?
Assume that we are multiplying two n x n matrices,where n is an integer multiple of p
37Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
38/39
The two multicomputer Algorithms
Thus the block oriented algorithm is uniformly
superior to the row-column algorithm when
the number of processors is an even power of2 greater than or equal to 16.
38Chapter 7: Matrix Multiplication , Parallel Computing :Theory and Practice, Michael J. Quinn
8/11/2019 Chapter 7-Matrix Multiplication from the book Parallel Computing by Michael J. Quinn
39/39
Questions?
39Ch t 7 M t i M lti li ti P ll l C ti Th d P ti Mi h l J Q i
Recommended