Introduction to Simulation - Lecture 21...Introduction to Simulation - Lecture 21 Thanks to Deepak...

Preview:

Citation preview

Introduction to Simulation - Lecture 21

Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy

Boundary Value Problems - Solving 3-D Finite-Difference problems

Jacob White

Outline

• Reminder about FEM and F-D– 1-D Example

• Finite Difference Matrices in 1, 2 and 3D– Gaussian Elimination Costs

• Krylov Method– Communication Lower bound

– Preconditioners based on improving communication

1-D ExampleNormalized 1-D Equation

Normalized Poisson Equation

2

2

( ) ( ) ( )sT x u xh f x

x x xκ∂ ∂ ∂

= − ⇒ − =∂ ∂ ∂

Heat Flow

( ) ( )xxu x f x− =

SMA-HPC ©2002 MIT

SMA-HPC ©2002 MIT

SMA-HPC ©2002 MIT

FD Matrix properties Finite Differences

1-D Poisson

0x 1x 2x nx 1nx +

12

ˆ ˆ ˆ2( )j j jj

u u uf x

x+ − +

− =∆

xxu f− =

1 1

2

ˆ ( )2 1 0 01 2 1

1 0 01

0 0 1 2ˆ ( )n n

u f x

x

u f x

− − − = ∆ − −

M ML

M MO M

M MO O O

M MM O O O

M ML

A

SMA-HPC ©2002 MIT

Residual EquationUsing Basis Functions

2

2

u fx∂

− =∂

Partial Differential Equation form

(0) 0 (1) 0u u= =

Basis Function Representation

Plug Basis Function Representation into the Equation

( ) ( ) ( )2

21

ni

ii

d xR x f x

dxϕ

ω=

= +∑

( ) ( ) ( ){1 Basis Functions

n

h i ii

u x u x xω ϕ=

≈ =∑

SMA-HPC ©2002 MIT

Basis WeightsUsing Basis functions

Galerkin Scheme

Generates n equations in n unknowns

( ) ( ) ( )21

210

0n

il i

i

d xx f x dx

dxϕ

ϕ ω=

+ =

∑∫

Force the residual to be “orthogonal” to the basis functions

{ }1,...,l n∈

( ) ( )1

0

0l x R x dtϕ =∫

SMA-HPC ©2002 MIT

Basis WeightsUsing Basis Functions Galerkin with integration by

parts

Only first derivatives of basis functions

( ) ( )( ) ( )1

1 1

0 0

0

n

i iil

i

xdd xdx x f x dx

dx dx

ωϕϕϕ= − =

∑∫ ∫

{ }1,...,l n∈

Structural Analysis of Automobiles

• Equations– Force-displacement relationships for

mechanical elements (plates, beams, shells) and sum of forces = 0.

– Partial Differential Equations of Continuum Mechanics

Drag Force Analysis of Aircraft

• Equations– Navier-Stokes Partial Differential Equations.

Engine Thermal Analysis

• Equations– The Poisson Partial Differential Equation.

SMA-HPC ©2002 MIT

FD Matrix properties Discretized Poisson

2-D Discretized Problem

1x 2x mx

1mx + 2mx

m

m

1 12 2

2 2( )j j j j m j j m

j

u ux

u u u u

x

u uf x

x y

yy

+ − + −− + − ++ =

∆ ∆1442443 1442443

SMA-HPC ©2002 MIT

FD Matrix properties Matrix Nonzeros, 5x5 example

2-D Discretized Problem

SMA-HPC ©2002 MIT

FD Matrix properties

3-D Discretization

mm

jx

j mx −2j m

x−

1jx +

1jx −

j mx +

2j mx

+

2 21 12 2 2

ˆ ˆ ˆ2ˆ ˆ ˆ ˆ ˆ ˆ2 2( )

( ) ( ) ( )jj j j j m j j m j m j m

j

xx yy zz

u u uu u u u u uf x

x y zu u u

+ − + − + −− +− + − +

+ + =∆ ∆ ∆

1442443 1442443 144424443

m

Discretized Poisson

SMA-HPC ©2002 MIT

FD Matrix properties Matrix nonzeros, m = 4 example

3-D Discretization

SMA-HPC ©2002 MIT

FD Matrix properties

Summary

2 2 2

1 1 11 2, 2 4, 3 6,D D D− → − → − →∆ ∆ ∆

Matrix is symmetric positive definite

Assuming uniform discretization, diagonal is

| | | |ii ijj i

A A≠

Numerical Properties

Matrix is Irreducibly Diagonally Dominant

Each row is strictly diagonally dominant, or path connected to a strictly diagonally dominant row

SMA-HPC ©2002 MIT

FD Matrix properties

Summary

2 2 3 31 , 2 , 3D m m D m m D m m− → × − → × − → ×

Nonzeros per row 1 3, 2 5, 3 7D D D− − −

2

1 0 | | 1

2 0 | |

3 0 | |

ij

ij

ij

D A i j

D A i j m

D A i j m

− = − >

− = − >

− = − >

Structural Properties

Matrices in 3-D are LARGE

100x100x100 grid in 3-D = 1 million x 1 million matrixMatrices are very sparse

Matrices are banded

SMA-HPC ©2002 MIT

11 12 13 14

21 22 23 24

31 32 33 34

41 42 43 44

A A A AA A A AA A A AA A A A

44A43A43A 44A44A

34A33A32A

Basics of GEPicture

Triangularizing

22A 23A 24A

42A33A 34A0

0

000 0

SMA-HPC ©2002 MIT

Pivot

GE BasicsAlgorithm

Triangularizing

Multiplier

Form n-1 reciprocals (pivots)

Form 21

1

( )2

n

i

nn i−

=

− =∑ multipliers

For i = 1 to n-1 { “For each Row”For j = i+1 to n { “For each Row below pivot”

For k = i+1 to n { “For each element beyond Pivot”

}}

}

jijk jk ik

ii

AA A A

A⇐ −

Perform1

2 3

1

2( )3

n

in i n

=

− ≈∑Multiply-adds

SMA-HPC ©2002 MIT

Complexity of GE

3 3

3 6

3

6

12

19 8

1 D ( ) ( ) 100 pt grid 2 D ( ) ( ) 100 10

(10 ) ops(0 grid

3 D ( ) ( ) 100 100 110 ) ops

(10 )00 g sr opid

O n O mO n O mO n

OO

O m O

− = ←

− = ← ×

− = ← × ×

For 2- D and 3-D problems Need a Faster Solver !

SMA-HPC ©2002 MIT

For i = 1 to n-1 {For j = i+1 to n {

For k = i+1 to n {

}}

}

Banded GEAlgorithm

Triangularizing

NONZEROS

0

0b

jijk jk ik

ii

AA A A

A⇐ −

b i+b-1 {i+b-1 {

Perform( )

12 2

1(min( 1, ))

n

ib n i O b n

=

− − ≈∑Multiply-adds

SMA-HPC ©2002 MIT

Complexity of Banded GE

2

2 4

2 7

8

14

1 D ( ) ( ) 100 pt grid 2 D ( ) ( ) 100 100 grid 3 D

(100) ops(10 ) ops

(10 ) ops( ) ( ) 100 100 100 grid

O b n OO

O mO b n

OO m

O b n O m

− = ←

− = ← ×

− = ← × ×

For 3 - D problems Need a Faster Solver !

SMA-HPC ©2002 MIT

The World According to Krylov

Determine the Krylov Subspace 0r Pb PAx= −

Select Solution from the Krylov Subspace( ){ }1 0 0 0 0, , ,..., kk k kx x y y span r PAr PA r+ = + ∈kGCR picks a residual minimiz ng yi .

Start with , Form Ax b PAx Pb= =Preconditioning

( ){ }0 0 0Krylov Subspace , ,..., kspan r PAr PA r≡

SMA-HPC ©2002 MIT

Krylov MethodsDiagonal Preconditioners

Preconditioning

Let A ndD A= +

( ) ( )1 1 1Apply GCR to ndD A x I D A x D b− − −= + =

• The Inverse of a diagonal is cheap to compute• Usually improves convergence

SMA-HPC ©2002 MIT

Krylov MethodsConvergence Analysis

Optimality of GCR poly

GCR Optimality Property

( )k+1 polynomial such that 0 =1℘%

ThereforeAny polynomial which satisfies the

constraints can be used to get an upper bound on

1

0

kr

r

+

1 0k+1 k+1( ) r where is any orderk thr kPA+ ≤ ℘ ℘% %

SMA-HPC ©2002 MIT

( )Keep as small as possible:k iλ℘Easier if eigenvalues are clustered!

Residual Poly Picture for Heat Conducting Bar MatrixNo loss to air (n=10)

SMA-HPC ©2002 MIT

10

0

M

The World According to Krylov

Krylov Vectors for diagonal Preconditioned A

1

1 0.5 0 00.5 1 00 0.50 0 0.5 1

D A−

− − − −

O

O O

1444442444443

1-D DiscretizedPDE0b r= 1 0 0 0 0 0 0

1 -.5 0 0 0 0 0

1.25 -1 -.5 0 0 0 0

1D A−

1D A−

10.5

0

M=

(1 digit)exactx 0.9 -0.7 0.6 -0.5 0.4 -0.3 0.1

10.5

0

M

1.251

0.50

− −

SMA-HPC ©2002 MIT

The World According to Krylov

0b r= 1 0 0 0 0 0 0

1 -.5 0 0 0 0 0

1.25 -1 -.5 0 0 0 0

1D A−

1D A−

(1 digit)exactx 0.9 -0.7 0.6 -0.5 0.4 -0.3 0.1

Krylov Vectors for Diagonal Preconditioned A

Communication Lower Bound for m gridpoints( )1 0 is nonzero in entry after itersthk

m mD A r k− =

Communication Lower Bound

( )1 0

1Need at least iterations for + 0

kk j

jmj m

x x rm α+

=

= ≠

SMA-HPC ©2002 MIT

The World According to Krylov

Krylov Vectors for Diagonal Preconditioned A

Two Dimensional Case

For an mxm Grid

m

m

0

10

If

0

r

=

M

( )Takes iters2m O m≈ =

( )12or 0f k

mx + ≠

m2

1

SMA-HPC ©2002 MIT

The World According to Krylov

Convergence for GCR

Eigenanalysis

1Recall Eigenvalues of 1 cos1

kD Amπ− = − +

10

0

M

1

1 0.5 0 00.5 1 00 0.50 0 0.5 1

D A−

− − − −

O

O O

1444442444443

10.5

0

M=

10.5

0

M

1.251

0.50

− −

SMA-HPC ©2002 MIT

The World According to Krylov

Convergence for GCR

Eigenanalysis

# Iters for GCR to achieve convergence

log2

1log1

( )m

k O mγ

κκ

→ ∞= ≅

+

1 max

min

1For , 1 cos 1 cos

1 1mD Am m

λ π πλ

κ−−

≡ = − − + +

0

121

kkr convergencetolerancer

κκ

γ −≤ ≤ +

GCR achieves Communication lower bound O(m)!

SMA-HPC ©2002 MIT

The World According to Krylov

Work for Banded Gaussian Elimination, Relaxation and GCR

( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )

2

4 3 3

7 6 4

Dimension Banded GE Sparse GE GCR

1

2

3

O m O m O m

O m O m O m

O m O m O m

GCR faster than banded GE in 2 and 3 dimensions Could be faster, 3-D matrix only m3 nonzeros.Must defeat the communication lower bound!

SMA-HPC ©2002 MIT

The World According to Krylov

How to get Faster Converging GCR

Preconditioner must accelerate communication

Preconditioning is the Only Hope

Multiplying by PA must move values by more than one grid point.

GCR already achieves Communication Lowerbound for a diagonally preconditioned A

SMA-HPC ©2002 MIT

1-D Discretized PDE0u (old)

2u(new)1u

(new)1ˆnu −

(new)ˆnu1ˆnu +

Gauss Seidel(new)1u

(new)2u (old)

3u

Each Iteration of Gauss-Seidel Relaxationmoves data across the grid

Preconditioning Approaches

Gauss-Seidel Preconditioning

Physical View

SMA-HPC ©2002 MIT

1-D DiscretizedPDE0b r= 1 0 0 0 0 0 0

x x x x x x x

x x x x x x x

( ) 1D L A−+

( ) 1D L A−+

(1 digit)exactx 0.9 -0.7 0.6 -0.5 0.4 -0.3 0.1

0b r= 0 0 0 0 0 0 1

0 0 0 0 0 x x

0 0 0 0 x x x

( ) 1D L A−+

( ) 1D L A−+

X=nonzero

Gauss-Seidel communicates quickly in only one direction

Preconditioning Approaches

Gauss-Seidel Preconditioning

Krylov Vectors

SMA-HPC ©2002 MIT

0u (old)2u

(new)1u

(new)1ˆnu −

(new)ˆnu1ˆnu +

(new)1u

(new)2u (old)

3u

(new)2ˆnu −

(newer)1ˆnu − ( )ˆ new

nu

0u (newer)2u

(newer)1u

Preconditioning Approaches

Gauss-Seidel Preconditioning

Symmetric Gauss-Seidel

This symmetric Gauss-Seidel PreconditionerCommunicates both directions

SMA-HPC ©2002 MIT

Preconditioning Approaches

Gauss-Seidel Preconditioning

Symmetric Gauss-Seidel

( ) ( ) 1/ 2Forward Sweep half step : k kD L x Ux b++ + =

Derivation of the SGS Iteration Equations

( ) ( ) 1/ 2Backward Sweep half step : k kD U Lx x b+++ =

( ) ( ) 11 1 k kD L UUx xL D− −+ +⇒ +=( ) ( ) ( )1 1 1D U D U Lb D L b−− −+ − ++ +

( ) ( )1 11 k k kx x D U D D L Ax− −+ = − + +⇒

( ) ( )1 1D U D D L b− −+ + +

SMA-HPC ©2002 MIT

Preconditioning Approaches

Block Diagonal Preconditioners

Line SchemesGrid

Matrix

m2

1

Tridiagonal Matrices factor quickly

SMA-HPC ©2002 MIT

Preconditioning Approaches

Block Diagonal Preconditioners

Line Schemes

m2

1

The Preconditioner is now two Tridiagonal solves, with variable reordering in between.

Grid

Do lines first in x, then in y.

Solution

ProblemLines preconditioners

communicate rapidly in only one direction

SMA-HPC ©2002 MIT

Preconditioning Approaches

Block Diagonal Preconditioners

Domain Decomposition

Fewer blocks means faster convergence, but

more costly iterates

The trade-off

ApproachBreak the domain into

small blocks each with the same # of grid points

m2

1

SMA-HPC ©2002 MIT

Preconditioning Approaches

Block Diagonal Preconditioners

Domain Decompositionm points

mpointsm

l

1 2

2l

l

( )2

2B factoring grids, , sparse GElock cost: m l l O m ll

×

Communication bound gives iterations.GCR iters: mOl

( )3Suggests insensitivity t Algorithm is : .o l O m

Do you have to refactor every GCR iteration?

Block index

SMA-HPC ©2002 MIT

Preconditioning Approaches

Siedelerized Block Diagonal Preconditioners

Line SchemesGrid

Matrix

m2

1

SMA-HPC ©2002 MIT

GridMatrix

m2

1

Bigger systems to solve, but can have faster convergence on tougher problems (not just Poisson).

Preconditioning Approaches

Overlapping Domain Preconditioners

Line based Schemes

i

Reminder about Gaussian Elimination Computational Steps

Fill-in for Sparse MatricesGreatly increases factorization cost

Fill-in in a 2-D gridIncomplete Factorization Idea

Preconditioning Approaches

Incomplete Factorization Schemes

Outline

SMA-HPC ©2002 MIT

Sparse MatricesFill-In

Example

X X XX X 0X 0 X

X X XX X 0X 0 X

XX

X X

X= Non zero

Matrix Non zero structure Matrix after one GE step

X X

Fill-ins

SMA-HPC ©2002 MIT

Sparse MatricesFill-In

Second Example

X X X XX X 0 00 X X 0

X 0 00

Fill-ins Propagate

XX

X

X

X

X X

X

X X

Fill-ins from Step 1 result in Fill-ins in step 2

SMA-HPC ©2002 MIT

Sparse Matrices Fill-In

Pattern of a Filled-in Matrix

Very Sparse

Very Sparse

Dense

SMA-HPC ©2002 MIT

Sparse Matrices Fill-InUnfactored Random Matrix

SMA-HPC ©2002 MIT

Sparse Matrices Fill-InFactored Random Matrix

SMA-HPC ©2002 MIT

Factoring 2-D Finite-Difference matrices

Generated Fill-in Makes Factorization Expensive

SMA-HPC ©2002 MIT

FD Matrix properties Matrix nonzeros, m = 4 example

3-D Discretization

iPreconditioning Approaches

Incomplete Factorization Schemes

Key idea

THROW AWAY FILL-INS!Throw away all fill-ins

Throw away fill-ins produced by other fill-insThrow away only fill-ins with small values

Throw away fill-ins produced by fill-ins of other fill-ins, etc.

Summary

• 3-D BVP Examples– Aerodynamics, Continuum Mechanics, Heat-Flow

• Finite Difference Matrices in 1, 2 and 3D– Gaussian Elimination Costs

• Krylov Method– Communication Lower bound

– Preconditioners based on improving communication

Recommended