56
Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous Solver for Banded Linear Systems URI Applied Math Seminar November 12, 2015 Michael Jandron & Anthony Ruffa Naval Undersea Warfare Center, Newport, RI Raymond Roberts, NUWC, Newport, RI Michael Warnock, NUWC, Newport, RI Eric Blake, NUWC, Newport, RI James Baglama, University of Rhode Island, Kingston, RI 1

A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1

A New Asynchronous Solverfor Banded Linear SystemsURI Applied Math SeminarNovember 12, 2015

Michael Jandron & Anthony RuffaNaval Undersea Warfare Center, Newport, RI

Raymond Roberts, NUWC, Newport, RIMichael Warnock, NUWC, Newport, RIEric Blake, NUWC, Newport, RIJames Baglama, University of Rhode Island, Kingston, RI

1

Page 2: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 22

Looking for new techniques to complement these tried-and-true methods

• Large sparse problems take a while to solve (days, months, years)– Direct methods still are useful– In FEA, substructuring, Shur Complement, multi-frontal methods common and rely

on a Gaussian Elimination backbone which is difficult to parallelize– Always looking for ways to increase levels of parallelization and decrease

communication bound

Motivation

Image source: simulia.com

Page 3: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 33

Part I: Modified Forward Substitution• Tridiagonal solver

– Limitations and what it’s good for• Pentadiagonal solver• General banded solver

– Theoretical speedup predictions– Development– Numerical implementation– Numerical benchmark against MKL PARDISO

• A method to do forward and backward substitution– Numerical benchmark against MKL DGTSV & DPTPSV

• Summary

Part II: Modified Block LU• Mechanics of approach• Examples: Pentadiagonal to FEA• Where we’re headed

Outline

Page 4: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 4

Part I

Modified Forward Substitution

4

Page 5: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 5

Method for Tridiagonal Systems

5

Augment an unknown to the system [1-3]

Given the following linear system

[2] Jandron, M., Ruffa, A., Baglama, J., “An Asynchronous Direct Solver for Banded Linear Systems,” Numerical Algorithms (2015, Submitted)[3] Ruffa, A., Jandron, M., Toni, B., “Parallelized Solution of Banded Linear Systems,” STEAM-H Springer Series Contribution (2015, Submitted)

[1] Ruffa, A., “A Solution Approach for Lower Hessenberg Linear Systems,” ISRN Applied Mathematics (2011)

Page 6: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 6

Method for Tridiagonal Systems

6

Split into two tasks

1

2

Principle of superposition applies

Last equation gives:

Final vectorized superposition

[2] Jandron, M., Ruffa, A., Baglama, J., “An Asynchronous Direct Solver for Banded Linear Systems,” Numerical Algorithms (2015, Submitted)[3] Ruffa, A., Jandron, M., Toni, B., “Parallelized Solution of Banded Linear Systems,” STEAM-H Springer Series Contribution (2015, Submitted)

[1] Ruffa, A., “A Solution Approach for Lower Hessenberg Linear Systems,” ISRN Applied Mathematics (2011)

Page 7: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 7

System Details for Tridiagonal Systems

7

Undetermined system – solution to within constantChoose arbitrarily

and solve for remaining unknowns

1

Page 8: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 8

System Details for Tridiagonal Systems

8

Undetermined system – solution to within constantChoose arbitrarily

and solve for remaining unknowns

2

Page 9: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 9

Limitations of Modified Forward Sub

9

0 20 40 60 80 100-3

-2

-1

0

1

2

3 x 10-14

Unknown (k)

Erro

r (b-

Ax)

BackslashMFS

0 20 40 60 80 100-30

-25

-20

-15

-10

-5

0

Unknown (k)

Solu

tion

(x)

BackslashMFS

0 20 40 60 80 100-1

-0.5

0

0.5

Unknown (k)

Erro

r (b-

Ax)

BackslashMFS

0 20 40 60 80 100-0.5

-0.4

-0.3

-0.2

-0.1

0

Unknown (k)

Solu

tion

(x)

BackslashMFS

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

0 1 2

-1

-0.5

0

0.5

1

Alternate methods?

Page 10: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 10

Solution Options

10

Option 1:A modified forward substitution scheme

Option 2:Using the pseudoinverseGeneral, but can be slower and memory intensive

Fast, but can be unreliable in some cases without a form of pivoting or precision control

Page 11: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 11

Method for Pentadiagonal Systems

11

Add a two variables

Given the following linear system

Page 12: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 12

Method for Pentadiagonal Systems

12How does it work for general banded systems?

Split into three tasks

1

2

3

Page 13: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 13

Method for Pentadiagonal Systems

13How does it work for general banded systems?

Principle of superposition:

Last two equations gives a constraint linear system:

Page 14: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 14

Extension to Banded Systems

14

Page 15: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 15

Extension to Banded SystemsIndependent linear systems

Page 16: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 16

Extension to Banded Systems

Extra Variables

Page 17: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 17

Extension to Banded Systems

17

All related through superposition

Page 18: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 18

Extension to Banded Systems

18

Constraint Matrix

Final solution through superposition

Page 19: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 19

Numerical Implementation

19Even the constraint matrix can be split up if desired

Request solutionBroadcast to

each available core

Begin asynchronousforward substitution

as it arrives

Send extra variablesback as they are formed

Once all extra variables come back,tackle constraint matrix using any dense solver

Master thread

Level 1 superpositionto get final solution

Page 20: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 20

Banded Systems Expected Speedups

SpeedupNumber of superdiagonalsNumber of subdiagonalsNumber of unknowns

Banded GaussianElimination

Forward / backwardSubstitution

Dense ConstraintMatrix Solve

Superposition

-core BMFS

Seq. BMFS

Same cost

Speedup

Seq. LU

Pentadiagonal should be ~ 8x faster than sequential LUTridiagonal should be ~ 2x faster than sequential LU

Heptadiagonal should be ~ 18x faster than sequential LU

Page 21: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 21

0 2000 4000 6000 8000 100000

100

200

300

400

q

X

1-core8-coreq-core

Banded Systems Expected Speedups

Anticipated speedup over sequential LUusing a various number of cores

1 core is 0.5X8-core is 4X

n = 1,000,000

0 2 4 6 8 10x 105

0

2000

4000

6000

8000

10000

12000

q

X

1-core8-coreq-core

n = 1,000,000,000

For the same number of coresLU (e.g. multi-frontal) must scale to these levels in order to match speed

Page 22: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 22

Banded Systems Expected Speedups

We know optimal locations for max speedup over sequential LU

For the same number of coresLU (e.g. multi-frontal) must scale to these levels in order to match speed

SpeedupNumber of superdiagonalsNumber of subdiagonalsNumber of unknowns

Page 23: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 23

Numerical Benchmarks

23

Tests dependencewithout exponential growth

For simplicity let’s just consider symmetric cases

ImplementationFORTRAN 90OPENMP with 8-coresPARDISO 5.0.0 [1-3] Solver using 8-cores

[1] M. Luisier, O. Schenk et.al.,Fast Methods for Computing Selected Elements of the Green's Function in Massively Parallel Nanoelectronic Device Simulations, Euro-Par 2013, LNCS 8097, F. Wolf, B. Mohr, and D. an Ney (Eds.), Springer-Verlag Berlin Heidelberg, pp. 533–544, 2013,[2] O. Schenk, M. Bollhoefer, and R. Roemer, On large-scale diagonalization techniques for the Anderson model of localization. Featured SIGEST paper in the SIAM Review selected "on the basis of its exceptional interest to the entire SIAM community". SIAM Review 50 (2008), pp. 91-112.[3] O. Schenk, A. Waechter, and M. Hagemann, Matching-based Preprocessing Algorithms to the Solution of Saddle-Point Problems in Large-Scale NonconvexInterior-Point Optimization. Journal of Computational Optimization and Applications, pp. 321-341, Volume 36, Numbers 2-3 / April, 2007.

Page 24: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 24

Numerical Results with 8-cores

24

Wall time was less than PARDISO in certain cases without even scaling

SpeedupNumber of superdiagonalsNumber of unknowns

FORTRAN OpenMPSpeedup resultsPARDISO 8 cores

BMFS 8 cores

PARDISOBMFS

BMFS – 8-coreBMFS – 8-core qxq solveBMFS – q-core scaledPARDISO – 8-core

Page 25: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 25

Numerical Results with 8-cores

25

Increased error likely due to round off and errors inherent in constraint matrix solve

SpeedupNumber of superdiagonalsNumber of unknowns

FORTRAN OpenMPSpeedup resultsPARDISO 8 cores

BMFS 8 cores

BMFS – 8-coreBMFS – 8-core qxq solveBMFS – q-core scaledPARDISO – 8-core

n = 100,000 n = 500,000

n = 1,000,000

n = 5,000,000

Page 26: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 26

Speedup over PARDISO Solver

26

SpeedupNumber of superdiagonalsNumber of unknowns

From actual wall times8-core BMFS vs. 8-core PARDISO

By scaling BMFS to q-cores (not qxq solve part) vs. 8-core PARDISO

Page 27: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 27

A Method to Split the Tridiagonal System

27

Split at equationConsider in top half and in lower half

Split into four independent tasks

1

3

2

4

A modified forward substitution process for (1) and (2)A modified backward substitution process for (3) and (4)

Page 28: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 28

A Method to Split the Tridiagonal System

28

Still built on superposition principle

-th equation forms constraint

Goal to determine weights of these two solutions

From

From

Final superposition

or modified fwd/back sub.

or modified fwd/back sub.

Page 29: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 29

Tridiagonal Numerical Experiments

29

• TMFS uses 2 parallel threads• TMFBS uses 4 parallel threads

• 1.6X and 3.2X faster than MKL– DPTSV (sym, pos-def solver – LDLT)– DGTSV (Gauss elim with PP)

• Slightly more error in Euclidean norm using fwd/back sub.

• Parallelization efficiency near optimal (value of unity)

Can incorporate this into banded algorithm for increased speed

Page 30: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 30

Summary

30

• Developed an direct solver that can skip the Gaussian elimination process while solving banded linear systems [1,2]

• Built on a superposition principle• Fastest for banded systems without exponential growth

– Observed speedup over 20x faster than 8 thread PARDISO when using 8 threads– 1.6X faster than sequential MKL DPTSV when using 2 threads

• Can handle exponential growth by incorporating nullspace and pseudoinverse calculations but this becomes slower

• Splitting the system saw a near ideal 2x speed increase for large– 3.2X faster than sequential MKL DPTSV when using 4 threads

• Future work involves:– Distributed memory/MPI/GPU computing– Further partitioning?– Extension to arbitrary bandwidth?

[1] Jandron, M., Ruffa, A., Baglama, J., “An Asynchronous Direct Solver for Banded Linear Systems,” Numerical Algorithms (Submitted)[2] Ruffa, A., Jandron, M., Toni, B., “Parallelized Solution of Banded Linear Systems,” STEAM-H Springer Series Contribution (Submitted)

This leads into Part II

Page 31: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 31

Part II

Modified Block LU Approach

31

Page 32: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 32

A Method to Split the Tridiagonal System

32

The forward and backward substitution approach works for tridiagonal systems when each solution involves only one RHS

term. (Solutions can be performed in parallel without any communication between processors.) However, when there is

more than one superdiagonal, there are complications…

Row corresponding to the single RHS

term

Backward substitution

Forward substitution

Page 33: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 33

A Method to Split the Pentadiagonal System

33

The forward and backward substitution approach applied to a pentadiagonal system leads to three remaining rows and three

unconstrained RHS terms. There are several ways to remove the additional two RHS terms, but they are all complicated…

Three remaining rows & RHS terms

Backward substitution

Forward substitution

Page 34: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 34

Example: The Beam Vibration Problem

34

Single RHS term Oscillatory response

Three remaining RHS terms Evanescent response

Page 35: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 35

Modular/Block Solution

35

Introduce two RHS terms so that the solution is computed only betweenthose equations. This suppresses the exponential terms: the introduced RHSterms can be made close enough to allow the sinusoidal terms to dominate.However, a method is needed to remove the fictitious RHS terms…

Compute the solution here only

Introduce ficticous RHS term to this row

Introduce fictitious RHS term to this row

Solution is zero in these regions

Page 36: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 36

Modular/Block Solution

36

• The beam vibration problem (solved via finite differences) leadsto a Toeplitz system having row structure [1, -4, 6+γ, -4, 1].

• Consider a system having a 3001 nodes and a single nonzeroRHS term in equation 2001. Fictitious nonzero RHS terms areintroduced into equations 1751 & 2151, leading to

• Equation 1751: x1749 - 4x1750 + (6+γ) x1751 - 4x1752 + x1753 = b1751

• Equation 2151: x2149 - 4x2150 + (6+γ) x2151 - 4x2152 + x2153 = b2151

• Introducing the nonzero b1751 term allows us to set x1753 = b1751and then set xi = 0 for 1 ≤ i ≤ 1752. In the same way, we can setx2149 = b2151 and then set xi = 0 for 2150 ≤ i ≤ 3001.

Page 37: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 37

Modular/Block Solution

37

This figure shows the modular solution for the beam problem, with RHS terms introducedin equations 1751 & 2151. The oscillatory solution component outside nodes 1753 & 2149is exactly zero; with only an evanescent solution component. We filed an inventiondisclosure to apply this approach towards vibration suppression…

Page 38: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 38

Modular Solution Using Backslash (1)

38

A key insight: implement the modular solution with existing solvers, e.g., MATLAB“backslash.” This makes the approach general so that it can be used for any banded orblock banded system.

Compute the solution here only

Introduce fictitious RHS terms corresponding to

these terms

Introduce fictitious RHS terms corresponding to

these terms

Page 39: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 39

Example: Beam Vibration Problem

39

n=300; p=101; q=200

Page 40: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 40

Confined Solution

40

Page 41: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 41

RHS Vector

41

Page 42: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 42

Modular Solution Using Backslash (2)

42

We can solve the upper and lower systems in parallel. Each is a solution to the system.There are four overlapping RHS terms. We compute four solutions and then developweights to superimpose them to get the specified RHS vector.

“Lower” solution

Introduce RHS terms to lower solution corresponding to

these terms“Upper” solution

Introduce RHS terms to upper solution corresponding to

these terms

Page 43: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 43

Solving Two Smaller Systems in Parallel

43

“Lower” solution

Additional terms associated with

the lower solution

“Upper” solution

Additional terms associated with the

upper solution

Page 44: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 44

Confined Solution + RHS Vector

44

Upper solution RHS vector for upper solution

Lower solution RHS vector for lower solution

Page 45: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 4545

Efficient implementation performs LU decomposition on and Then computes solution per each right hand side in parallelMatrix multiplications are also performed in parallel

Formal Idea of Modified Block LU

Page 46: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 4646

Constraint matrix easy to solve using Schur Complement/Static Condensation

Weighted superposition

Formal Idea of Modified Block LU

Page 47: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 4747

Pentadiagonal Case

Key:only need

solutions

Number of subdiagonalsNumber of superdiagonals

0 5 10 15 200

500

1000

1500

2000

2500

Unknown (k)

Solu

tion

(x)

BackslashModified Block LU

0 5 10 15 20

-2

-1

0

1

2

3 x 10-11

Unknown (k)

Erro

r (b-

Ax)

BackslashModified Block LU

Page 48: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 48

250 parallel solutionsOptimal with 250 cores

Constraint matrix 250 x 250Orig system 13,476 x 13,476 Reordered to half bandwidth q=124

Notional FEA Model2261 Linear 3D Shell Elements2280 Nodes

Demonstration with Notional FEA Model

48

0 5000 10000 15000-2

-1

0

1

2

3 x 10-14

Unknown (k)

Erro

r (b-

Ax)

BackslashModified Block LU

Page 49: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 4949

Pentadiagonal Case with 3 x 3 Grid

0 10 20 30-1

-0.5

0

0.5

1

1.5

2 x 10-10

Unknown (k)

Erro

r (b-

Ax)

BackslashModified Block LU

0 10 20 300

5000

10000

15000

Unknown (k)

Solu

tion

(x)

BackslashModified Block LU

Page 50: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 5050

3D Structured Mesh Case with 3 x 3 Grid

0 500 1000 1500 2000 2500-4

-2

0

2

4 x 10-8

Unknown (k)

Solu

tion

(x)

BackslashModified Block LU

0 500 1000 1500 2000 2500-1

-0.5

0

0.5

1 x 10-12

Unknown (k)

Erro

r (b-

Ax)

BackslashModified Block LU

Page 51: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 5151

3D Tetrahedral Mesh Case with 3 x 3 Grid

0 500 1000 1500 2000 2500

-0.5

0

0.5

x 10-14

Unknown (k)

Erro

r (b-

Ax)

BackslashModified Block LU

0 500 1000 1500 2000 2500-15

-10

-5

0

5 x 10-11

Unknown (k)

Solu

tion

(x)

BackslashModified Block LU

Page 52: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 52

Conclusions

52

• Developing a general Block LU solver built on a superposition principal• When implemented in parallel the cost should be less than existing

methods

• Future Work:– Through cost analysis– GPU/MPI Implementation– Extension to arbitrary partitioning– Comparison against banded solver SPIKE as well as sparse solvers PARDISO, MUMPS

• End goal is to develop a competitive direct solver for banded systems with an eye on FEA applications

Page 53: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 53

BACKUP

53

Page 54: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 5454

Algorithm for Tridiagonal Modified Forward Substitution (TMFS)

Page 55: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 5555

Algorithm for Tridiagonal Modified Forward and Backward Substitution (TMFBS)

Page 56: A New Asynchronous Solver for Banded Linear Systems · 2017. 12. 24. · Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 1 A New Asynchronous

Michael Jandron & Anthony Ruffa – Naval Undersea Warfare Center // Approved for Public Release 56

Can be implemented recursively

56

Requires parallel tasks where is the half bandwidth of the systemand is the number of levels

Gray regions denote unreferenced regions at that particular level

First Level Second Level Third Level