By: David McQuilling; Jesus Caban Deng Li Numerical Linear Algebra

By: David McQuilling;Jesus Caban

Deng Li

Numerical Linear Algebra

Solving Linear Equations

21

8

94

22

v

uxA b

2194

822

vu

vu

Gaussian Elimination

2194

822

Start with the matrix representation of Ax = b by adding the solution vector b to the matrix A

Now perform Gaussian elimination by row reductions to transform the matrix Ab into an upper triangular matrix that contains the simplified equations.

Find the first multiplier by dividing the first entry in the second row by the first entry in the first row. This will allow you to transform the first entry in the second row to a zero by a row reduction, thus reducing the system of equations.

Gaussian Elimination (cont.)

2194

550

822

2194

822

)822(2550

Multiply the first row by the multiplier (4 / 2 = 2) and then subtract that from the second row to obtain the new reduced equation.

A = LU

5

8

50

22

12

01

v

uxL cU

The matrix resulting from the Gaussian elimination steps is actually U concatenated with the new right hand side c.

L is obtained from the multipliers used in the Gaussian elimination steps.

Solving Ux=c by back substitution

5

8

50

22

v

u

822 vu

81*22 u

3u62 u

1

3

v

ux

U x c

55 v1v

Solution to Ax=b

The solution vector to Ux=c is actually the solution to the original problem Ax=b.

21

8

94

22

v

u

21

8

1

3

94

22

A x b

Gaussian algorithm to find LU

After this algorithm completes, the diagonal and upper triangle part of A contains U while the lower triangle contains L. This algorithm runs in O(n3) time.

Gram-Schmidt Factorization

Yields the factorization A=QR Starts with n linearly independent vectors in A Constructs n orthonormal vectors

A vector x is orthonormal if xTx = 1 and ||x|| = 1 Any vector is made orthonormal by making it a unit vector in the same

direction, i.e. the length is 1. Dividing a vector x by its length (||x||) creates a unit vector in the direction of x

2

1

2

a 3212|||| 222 a

3/2

3/1

3/2

||||/ aa

Composition of Q and R

Q is the matrix with n orthonormal column vectors, which were obtained from the n linearly independent vectors of A.

Orthonormal columns implies that QTQ = I

R is obtained by multiplying QT on both sides of A = QR

This equation becomes QTA = QTQR = I R = R

R becomes an upper triangular matrix because later q’s are orthogonal to earlier a’s, i.e. their dot product is zero. This creates zeroes in the entries below the main diagonal.

Obtaining Q from A

Start with the first column vector from A and use that as your first vector q1 for Q (have to make it a unit vector before adding it to Q)

To obtain the second vector q2, subtract from the second vector in A, a2, its projection along the previous qi vectors. The projection of one vector onto another is defined as (xTy / xTx ) x

where the vector y is being projected onto the vector x.

Once all qi have been found this way, divide them by their lengths so that they are unit vectors. Now those are the orthonormal vectors which make up the columns of Q

Obtaining R from QTA

Once Q has been found, simply multiply A by the transpose of Q to find R

Overall, the Gram-Schimdt algorithm is O(mn2) for an mxn matrix A.

Eigenvalues and Eigenvectors

Eigenvectors are unique vectors such that when multiplied by a matrix A, they do not change their direction only their magnitude, i.e. Ax=λx

Eigenvalues are the scalar multiples of the eigenvectors, i.e. the λ’s

Solving for eigenvalues of A

Solving for the eigenvalues involves solving det(A–λI)=0 where det(A) is the determinant of the matrix A

For a 2x2 matrix the determinant is a quadratic equation, for larger matrices the equation increases in complexity so lets solve a 2x2

Some interesting properties of eigenvalues are that the product of them is equal to the det(A) and the sum of them is equal to the trace of A which is the sum of the entries on the main diagonal of A

Solving det(A-λI)=0

21

23A

21

23IA

451*2)2)(3()det( 2 IA

0452 0)1)(4(

41 12

Solving for the eigenvectors

Once you have the eigenvalues, λ’s, you can solve for the eigenvectors by solving (A-λI)x=0 for each λi

In our example we have two λ’s, so we need to solve for two eigenvectors.

Now solve (A-λI)x = 0 for λ=4

0)4( xIA

021

21

2

1

x

x

02 21 xx02 21 xx

2

12 21 xxx

Solve (A-λI)x=0 for λ=1

0)( xIA

011

22

2

1

x

x

022 21 xx021 xx

1

121 xxx

Positive definite matrices

Positive definite: all eigenvalues are positive All pivots are positive xTAx > 0 is positive except

at x=0

Applications Ellipse: find the axes of the

a tilted ellipse

122 cybxyax

Cholesky Factorization•Cholesky factorization factors an N*N , symmetric, positive-definite matrix into the product of a lower triangular matrix and its transpose.

•A = LLT

Least Square•It often happens that Ax = b has no solution

•Too many equations (more equations than unknowns)

•When the length of e is as small as possible, x is a least square solution

•Example: Find the closest straight line to three points (0,6), (1,0), and (2,0)•Line: b = C + Dt•We are asking for two numbers C and D that satisfy three equations•No straight line goes through those points•Choose the one that minimize the error

Axbe

Homework

Make a example for A=LU, show me L and U.

Matrix Multiplication


Matrix multiplication is commonly used in graph theory numerical algorithms computer graphics signal processing

Multiplication of large matrices requires a lot of computation time as its complexity is O(n3), where n is the dimension of the matrix

Most parallel matrix multiplication algorithms use matrix decomposition based on the number of processors available


210184158

908070

121110

987

654

321

8765

4321

Example (a11):

7010*47*34*21*1

10

7

4

1

432111

a

Sequential Matrix Multiplication

The product C of the two matrices A and B is defined by

kj

n

kikij bac

1

0

aij, bij, and cij is the element in ith row and jth column of the matrix A, B, and C respectively.

Strassen’s Algorithm

The Strassen’s algorithm is a sequential algorithm reduced the complexity of matrix multiplication algorithm to O(n2.81).

In this algorithm, n n matrix is divided into 4 n/2 n/2 sub-matrices and multiplication is done recursively by multiplying n/2 n/2 matrices.

Parallel Matrix Partitioning

Parallel computers can be used to process large matrices.

In order to process a matrix in parallel, it is necessary to partition the matrix so that the different partitions can be mapped to different processors.

Partitions: Block row-wise/column-wise striping Cyclic row-wise/colum-wise striping Block checkerboard striping Cyclic checkerboard

Striped Partitioning

Matrix is divided into groups of complete rows or columns, and each processor is assigned one such group.

Striped Partitioning Block-Striped: each processor is assigned

contiguous rows or columns Cyclic-Striped: is when rows or columns are

sequentially assigned to processors in a wraparound manner.

Block Stripping(Application examples: (a) is in the case of car front crash, (b) is in the case of car side crash simulation.)

From: http://www.iitk.ac.in/cc/param/tpday21.html

Checkerboard Partition

The matrix is divided into smaller square or rectangular blocks or sub-matrices that are distributed among processors

Checkerboard Partitioning Block-Checkerboard: partitioning splits both the

rows and the columns of the matrix, so no processor is assigned any complete row or column .

Cyclic-Checkerboard: when it wraparound.

Checkerboard Partitioning

From: http://www.iitk.ac.in/cc/param/tpday21.html

Parallel Matrix-Vector Product Algorithm:

For each processor I Broadcast X(i) Compute A(i)*x

A(i) refers to the n by n/p block row that processor i owns Algorithm uses the formula

Y(i) = Y(i) + A(i)*X = Y(i) + j A(i)*X(j)

Parallel Matrix Multiplication

Algorithms: systolic algorithm Cannon's algorithm Fox and Otto's algorithm PUMMA (Parallel Universal Matrix Multiplication) SUMMA (Scalable Universal Matrix Multiplication) DIMMA (Distribution Independent Matrix

Multiplication)

Systolic Communication

Communication design where the data exchange and communication occurs between the nearest-neighbors.

From: http://www-unix.mcs.anl.gov

Matrix Decomposition

To implement the matrix multiplication, the A and B matrices are decomposed into several sub-matrices.

Four methods of matrix decomposition and distribution: one-dimensional decomposition two-dimensional

square decomposition general decomposition scattered decomposition.

One-dimensional Decomposition

The matrix is horizontally decomposed. The ith processor holds ith Asub and Bsub and communicates them to two neighbor processors, i.e., to the (i-1)th and (i+1)th processors.

The first and last processor communicate with each other as in a ring topology.

p0 p1 p2 p3 p5 p4 p6 p7

Two-Dimensional Decomposition

Square decomposition: The two-dimensional square decomposition Matrix is decomposed into square processor template.

General decomposition: General Two-Dimensional Decomposition allow having 2*3, 4*3, etc. configurations.

Two-Dimensional Decomposition

Scattered decomposition: the matrix is divided into several sets of blocks. Each set of blocks contains as many elements as the number of processors, and every element in a set of blocks is scattered according to the two-dimensional processor templates.

Applications

Computer Graphics In computer graphics all the transformation are

represented as matrices and a set of transformations as a matrix multiplication.

Transformations: Rotation Translation Scaling Shearing

X Rotation

© Pixar

2D/3D Mesh

•Laplacian matrix L(G) of the graph G

•If you take the eigenvalues and eigenvectors of L(G), you can get interesting properties as vibration of the graph.

From: http://www.cc.gatech.edu/

Computer VisionIn computer vision we can calibrate a camera to determine the relationship between what appears on the image (or retinal) plane and where it is located in the 3D world by using and representing the parameters as matrices and vectors.

Software for Numerical Linear Algebra

LAPACK

LAPACK: Routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems.

Link:http://www.netlib.org/lapack/

LINPACK

LINPACK is a collection of Fortran subroutines that analyze and solve linear equations and linear least-squares problems. The package solves linear systems whose matrices are general, banded, symmetric indefinite, symmetric positive definite, triangular, and tri-diagonal square.

Link: http://www.netlib.org/linpack/

ATLAS ATLAS (Automatically Tuned Linear Algebra

Software) provides portable performance and highly optimized Linear Algebra kernels for arbitrary cache-based architectures.

Link:https://sourceforge.net/projects/math-atlas/

MathWorks

MATLAB: provide engineers, scientists, mathematicians, and educators with an environment for technical computing applications

NAG

NAG Numerical Library: solve complex problems in areas such as research, engineering, life and earth sciences, financial analysis and data mining .

NMath Matrix

NMath Matrix is an advanced matrix manipulation library. Full-featured structured sparse matrix classes, including triangular, symmetric, Hermitian, banded, tridiagonal, symmetric banded, and

Hermitian banded.

Nastran

program participates in NASA-sponsored project to develop a unified approach to computerized structural analysis. The program became known as NASTRAN (NASA Structural Analysis Program).

NX Nastran, http://www.ugs.com/index.shtml MSC Nastran http://www.mscsoftware.com/

Homework

Collect more information about LAPACK, LINPACK, ATLAS, MathWorks,NAG, NMath Matrix and Nastran from their website then choose one which you feel most interesting.

Documents

By: David McQuilling; Jesus Caban Deng Li Numerical Linear Algebra