CS 484. Dense Matrix Algorithms There are two types of Matrices Dense (Full) Sparse We will consider...

CS 484

Dense Matrix Algorithms

There are two types of Matrices Dense (Full) Sparse

We will consider matrices that are Dense Square

Mapping Matrices

How do we partition a matrix for parallel processing?

There are two basic ways Striped partitioning Block partitioning

Striped Partitioning

P1P2P3P0P1P2P3

Block striping Cyclic striping

Block Partitioning

P0 P1 P2 P3

P4 P5 P6 P7

P0 P1 P2 P3

P4 P5 P6 P7

Block checkerboard Cyclic checkerboard

Block vs. Striped Partitioning

Scalability? Striping is limited to n processors Checkerboard is limited to n x n

processors

Complexity? Striping is easy Block could introduce more

dependencies

Matrix Multiplication

One Dimensional Decomposition

Each processor "owns" black portionTo compute the owned portion of the answer, each processor requires all of A

NttPT ws

Two Dimensional Decomposition

Requires less data per processorAlgorithm can be performed stepwise.Fox’s algorithm

Broadcast an A sub-matrix to the other processors in row.

Compute

Rotate the B sub-matrix upwards

AlgorithmSet B' = Blocal

for j = 0 to sqrt(P) -2in each row I the [(I+j) mod sqrt(P)]th task broadcasts

A' = Alocal to the other tasks in the rowaccumulate A' * B'send B' to upward neighbor

PPT ws

Cannon’s Algorithm

Broadcasting a submatrix to all who need it is costly.Suggestion: Shift both submatrices

NttPT ws

Blocks Need to Be Aligned

Each trianglerepresents a matrix block

Only same-colortriangles shouldbe multiplied

Rearrange Blocks

B03A10

Block Aij cyclesleft i positions

Block Bij cyclesup j positions

Consider Process P1,2

A10A11 A12

B32 Step 1

A11A12 A13

B02 Step 2

A12A13 A10

B12 Step 3

A13A10 A11

B22 Step 4

Complexity Analysis

Algorithm has p iterationsDuring each iteration process multiplies two (n / p ) (n / p ) matrices: (n3 / p 3/2)Computational complexity: (n3 / p)During each iteration process sends and receives two blocks of size (n / p ) (n / p )Communication complexity: (n2/ p)

Divide and Conquer

App Apq

Aqp Aqq

Bpp Bpq

Bqp Bqq

P0 = App * BppP1 = Apq * BpqP2 = App * BpqP3 = Aqp * Bqq

P4 = Aqp * BppP5 = Aqq * BqpP6 = Aqp * BpqP7 = Aqq * Bqq

P0 + P1 P2 + P3

P4 + P5 P6 + P7

Systems of Linear Equations

A linear equation in n variables has the form

A set of linear equations is called a system.A solution exists for a system iff the solution satisfies all equations in the system.Many scientific and engineering problems take this form.

a0x0 + a1x1 + … + an-1xn-1 = b

Solving Systems of Equations

Many such systems are large. Thousands of equations and unknowns

a0,0x0 + a0,1x1 + … + a0,n-1xn-1 = b0

a1,0x0 + a1,1x1 + … + a1,n-1xn-1 = b1

an-1,0x0 + an-1,1x1 + … + an-1,n-1xn-1 = bn-1

A linear system of equations can be represented in matrix form

a0,0 a0,1 … a0,n-1 x0 b0

a1,0 a1,1 … a1,n-1 x1 b1

an-1,0 an-1,1 … an-1,n-1 xn-1 bn-1

Ax = b

Solving a system of linear equations is done in two steps: Reduce the system to upper-

triangular Use back-substitution to find solution

These steps are performed on the system in matrix form. Gaussian Elimination, etc.

Reduce the system to upper-triangular form

Use back-substitution

a0,0 a0,1 … a0,n-1 x0 b0

0 a1,1 … a1,n-1 x1 b1

0 0 … an-1,n-1 xn-1 bn-1

Reducing the System

Gaussian elimination systematically eliminates variable x[k] from equations k+1 to n-1. Reduces the coefficients to zero

This is done by subtracting a appropriate multiple of the kth equation from each of the equations k+1 to n-1

Procedure GaussianElimination(A, b, y) for k = 0 to n-1

/* Division Step */for j = k + 1 to n - 1 A[k,j] = A[k,j] / A[k,k]y[k] = b[k] / A[k,k]A[k,k] = 1

/* Elimination Step */for i = k + 1 to n - 1 for j = k + 1 to n - 1

A[i,j] = A[i,j] - A[i,k] * A[k,j] b[i] = b[i] - A[i,k] * y[k] A[i,k] = 0endfor

endforend

Parallelizing Gaussian Elim.

Use domain decomposition Rowwise striping

Division step requires no communicationElimination step requires a one-to-all broadcast for each equation.No agglomerationInitially map one to to each processor

Communication Analysis

Consider the algorithm step by stepDivision step requires no communicationElimination step requires one-to-all bcast only bcast to other active processors only bcast active elements

Final computation requires no communication.

Communication Analysis

One-to-all broadcast log2q communications q = n - k - 1 active processors

Message size q active processors q elements required

T = (ts + twq)log2q

Computation Analysis

Division step q divisions

Elimination step q multiplications and subtractions

Assuming equal time --> 3q operations

Computation Analysis

In each step, the active processor set is reduced by one resulting in:

2/)1(3

nnCompTime

knCompTimen

Can we do better?

Previous version is synchronous and parallelism is reduced at each step.Pipeline the algorithmRun the resulting algorithm on a linear array of processors.Communication is nearest-neighborResults in O(n) steps of O(n) operations

Pipelined Gaussian Elim.

Basic assumption: A processor does not need to wait until all processors have received a value to proceed.Algorithm If processor p has data for other processors,

send the data to processor p+1 If processor p can do some computation

using the data it has, do it. Otherwise, wait to receive data from

processor p-1

Conclusion

Using a striped partitioning method, it is natural to pipeline the Gaussian elimination algorithm to achieve best performance.Pipelined algorithms work best on a linear array of processors. Or something that can be linearly mapped

Would it be better to block partition? How would it affect the algorithm?

Row Ordering

When dealing with a sparse matrix, sometimes operations can cause a zero space in the matrix to become non-zero

Nested Disection Ordering

Complete these slides using notes in the black binder.

CS 484. Dense Matrix Algorithms There are two types of Matrices Dense (Full) Sparse We will consider...

Documents

Aalborg Universitet · 2008. 11. 10. · Directory Tree ... 1 of 127 11/10/08 12:44../DS-484./DS-484/Risikoanalyse./DS-484/Handlingsplaner./DS-484/IT-Politik./DS

Dense Matrices for Bioﬂuids Applications · The method of regularized Stokeslets can be derived from bound-ary integral equations derived from the Lorentz reciprocal identity. When

Homework, Page 484

StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

484-Baha'i gardens

Autotuning Numerical Dense Linear Algebra for Batched ...I. INTRODUCTION For large dense matrices, software libraries for numerical linear algebra methods are known to achieve high

Hybrid Dense /Sparse Matrices in Compressed Sensing Reconstruction

Linear algebra over dense matrices over GF(2) and small extensions

[Shinobi] Bleach 484

MATRICES - · PDF filecrackiitjee.in MATRICES

BAS-300G-484 INSTRUCTION MANUAL BAS-300G-484 …download.brother.com/pub/com/ism/pdf/300g484_in.pdf · BAS-300G-484 BAS-300G-484 SF ... moving parts such as the needle and thread

Matrices (Arreglos y Matrices)

Daniel Kressner Chair of Numerical Algorithms and HPC ...the format IMost frequently used in applications featuring dense matrices: integral operators with nonlocal kernel. IHSS matrices/H2

arXiv:1407.2548v1 [q-bio.QM] 9 Jul 2014 · Interpretation and approximation tools for big, dense Markov chain transition matrices in ecology and evolution Katja Reichela,, Valentin

FMM Code Libraries for ComputationalElectromagnetics · • Fast Multipole Methods –O(N) or O(N log N) techniques for applying dense “method of moment”matrices • Discretizationschemes

Hierarchical Matrices · disadvantages are similar to those of sparse matrices: A B has the enlarged band width !A +!B and A 1 is dense. The conclusion is as follows: As long as the

Finding Dense Structures in Graphs and Matricesbhaskara/files/thesis.pdf · and matrices. In particular, in graphs we study problems related to nding dense induced subgraphs. Many

Tutorial 7 484

CS 584. Dense Matrix Algorithms There are two types of Matrices Dense (Full) Sparse We will consider matrices that are Dense Square

Test 484 Decopertare