40
1 Louis Komzsik PARENG- 2011 State of the art distributed parallel computational techniques in industrial finite element analysis Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering Dr. Louis Komzsik Siemens PLM Software, USA Ajaccio, France April 12-15, 2011

State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

1 Louis Komzsik PARENG- 2011

State of the art distributed parallel computational techniques in industrial finite element analysis

Second Conference on Parallel, Distributed, Grid and CloudComputing for Engineering

Dr. Louis KomzsikSiemens PLM Software, USA

Ajaccio, FranceApril 12-15, 2011

Page 2: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

2 Louis Komzsik PARENG- 2011

Introduction to industrial analysis

Geometric domain decomposition

Distributed computational solutions

Parallel computational kernels

Application case studies

Conclusions and future work

Scope or presentation

Page 3: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

3 Louis Komzsik PARENG- 2011

Industrial complexity – constantly increasing

Engine block1,000,000 elements

Car30,000 parts

Jet Engine10,000 parts

Factory10,000 machines

3

Page 4: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

4 Louis Komzsik PARENG- 2011

Cray Computer Multi-core CPU

$15 million $150

O(1) gigaflops O(100) gigaflops

1000 sold 100 million sold

Computer hardware – constantly changing

Page 5: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

5 Louis Komzsik PARENG- 2011

Lifecycle simulations

Designerview

Analystview

Page 6: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

6 Louis Komzsik PARENG- 2011

Multidisciplinary solutions

Designerview

Analystview

Page 7: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

7 Louis Komzsik PARENG- 2011

High performance requirements

The constrained stiffness matrix of an analysis problem

� Number of rows: 35,734,709

� Nonzero terms: 1,384,305,995

� Nonzero terms in sparse factor matrix: 43,827,004,000

� Memory used during factorization: 1,080,732,000 (4 byte) words

� Actual elapsed time of sparse factorization on a single high performance processor:

335 minutes

Page 8: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

8 Louis Komzsik PARENG- 2011

Introduction to industrial analysis

Geometric domain decomposition

Distributed computational solutions

Parallel computational kernels

Application case studies

Conclusions

Scope or presentation

Page 9: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

9 Louis Komzsik PARENG- 2011

� Subdivide large geometry domains into limited number of partitions

� Computations in the geometry partitions are dependent

� Minimize the boundary size of each partition with respect to its interior

� Minimize the total boundary size as communication is needed

Single level geometric domain decomposition

Proc 1 Proc 2 Proc k

Page 10: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

10 Louis Komzsik PARENG- 2011

Single level

� Subdivide large geometry domains into limited number of partitions

� Subdivide the partitions into sub-partitions and dynamically reduce them to their collectors

� Assemble the multilevel substructures to obtain the engineering solution

� The total number of substructures may exceed the number of processors

Multi-level geometry domain decomposition

Page 11: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

11 Louis Komzsik PARENG- 2011

Finite element problem domain decomposition

Based on model or matrices

Graph Matrix FE model

Vertices Diagonal Terms Node points

Edges Off-diagonals Elements

Undirected Symmetric Linear

Page 12: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

12 Louis Komzsik PARENG- 2011

Graphs and matrices

Graph model and its Laplacian matrix

Finite element model and its stiffness matrix

1 2 4

3 5

−−

−−−

=

kkk

kkk

kkk

kkkkk

kkk

K

2300

36030

0023

3383

0032

MembraneElement 1

Membrane Element 2

1 2 4

3 5

−−

−−

−−

−−−−

−−

=

21010

12010

00211

11141

00112

L

Page 13: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

13 Louis Komzsik PARENG- 2011

Partitioning technology

Spectral bisection method

Vertex cut result

1 2 4

3 5

:222 uLu λ=

⋅=

−−

−−

−−

−−−−

−−

2/1

2/1

2/1

0

2/1

1

2/1

2/1

2/1

0

2/1

21010

12010

00211

11141

00112

1 2

3

2 4

5

Page 14: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

14 Louis Komzsik PARENG- 2011

Recursive graph partitioning

Coarsening, partitioning and refining phases

8

9 36

57

24

1

2 8

36

57

69

4 2

71

69

44 2

7

2

1

9 6

24

Partition 1

Refining

Partitioning

Coarsening

9 3

51

6

7

Partition 2

Page 15: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

15 Louis Komzsik PARENG- 2011

Introduction to industrial analysis

Geometric domain decomposition

Distributed computational solutions

Parallel computational kernels

Application case studies

Conclusions and future work

Scope or presentation

Page 16: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

16 Louis Komzsik PARENG- 2011

Distributed memory parallel architecture

� Cluster of high performance workstations

� Distributed memory work station

� Dedicated I/O devices

� High level parallelism

� Feasible number ofnodes: 16-1024

Page 17: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

17 Louis Komzsik PARENG- 2011

Geometric problem Partitioning hierarchy

Recursive matrix partitioning

1 2 4

3 6

7

5

9 36

57

24

1

Page 18: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

18 Louis Komzsik PARENG- 2011

Distributed normal modes analysis

1 1 1,3 1,3 1

2 2 2,3 2,3 2

3 3 3,7 3,7 3

4 4 4,6 4,6 4

5 5 5,6 5,6 5

6 6 6,7 6,7 6

77 7

oo oo ot ot o

oo oo ot ot o

tt tt tt tt t

oo oo ot ot o

oo oo ot ot o

tt tt tt tt t

ttt tt

K M K M

K M K M

K M K M

K M K M

K M K M

K M K M

K M

λ λ φλ λ φ

λ λ φλ λ φ

λ λ φλ λ φ

φλ

− −

− − − −

− − − −

− −

0

=

0)( =Φ− MK λPhysical problem

Partitioned form

Page 19: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

19 Louis Komzsik PARENG- 2011

Phase 1

Processor 1

Processor 3 Processor 4

Processor 2Start

Communicate

Page 20: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

20 Louis Komzsik PARENG- 2011

Phase 2

Processors 1-2

Processors 3- 4

Start

Communicate

Page 21: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

21 Louis Komzsik PARENG- 2011

Phase 3

Processors 1-2-3-4Start

0~

)~~

( =Φ− MK λ

Solve reduced order problem

Recover physical solution

=Φ→

=Φ→

7

6

5

4

3

2

1

7

6

5

4

3

2

1

7

6

5

4

3

2

1

~

~

~

~

~

~

~

~

t

t

o

o

t

o

o

t

t

o

o

t

o

o

q

q

q

q

q

q

q

ϕ

ϕ

ϕ

ϕ

ϕ

ϕ

ϕ

ϕ

ϕ

ϕ

ϕ

ϕ

ϕ

ϕ

ϕ

ϕ

ϕ

ϕ

ϕ

ϕ

ϕ

Page 22: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

22 Louis Komzsik PARENG- 2011

Introduction to industrial analysis

Geometric domain decomposition

Distributed computational solutions

Parallel computational kernels

Application case studies

Conclusions and future work

Scope or presentation

Page 23: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

23 Louis Komzsik PARENG- 2011

Shared memory parallel architecture

� Multi-core processors

� Shared cache

� Shared memory

� Low level parallelism

� Feasible number of cores: 2-16

Page 24: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

24 Louis Komzsik PARENG- 2011

Sparse factorization

Matrix connectivity Reordering

Elimination tree Factorization

Page 25: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

25 Louis Komzsik PARENG- 2011

Multifrontal factorization

Sparsity pattern

Frontal steps

Front amalgamation

Page 26: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

26 Louis Komzsik PARENG- 2011

Symbolic reordering

Consecutive columns

Same sparsity pattern

Cache fitting size

Supernodal approach

Page 27: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

27 Louis Komzsik PARENG- 2011

Matrix update

Panel selection

Downstream columns

Different sparsity pattern

BLAS 2.5 operation

Page 28: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

28 Louis Komzsik PARENG- 2011

Introduction to industrial analysis

Geometric domain decomposition

Distributed computational solutions

Parallel computational kernels

Application case studies

Conclusions and future work

Scope or presentation

Page 29: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

29 Louis Komzsik PARENG- 2011

High performance workstation cluster

111 IBM P575 nodes with 1.9 GHz4 dual-core POWER5 CPUs per node

3.5 Terabyte aggregate memory100 Terabyte total disk space

IBM High Performance Switch (HPS)8 GB/sec bidirectional bandwidth

AIX OS Version 5.3Parallel Environment (PE) V4.2

Page 30: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

30 Louis Komzsik PARENG- 2011

Trimmed car body application

Shell element model

� 1.3 M grid points� 1.2 M shell elements� 7.9 M degrees of freedom

Normal modes analysis

� Frequency 0 – 300 Hz � ~1000 normal modes� 512 partitions

Page 31: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

31 Louis Komzsik PARENG- 2011

Shortening solution time

0.0

20.0

40.0

60.0

80.0

100.0

120.0

Serial 1 2 4 8 16 32 64 128

1.04.0

7.8

29.3

49.2

77.5

96.5

104.1 105.9

Speed Up

Number of DMP processes

Page 32: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

32 Louis Komzsik PARENG- 2011

0.00

2.00

4.00

6.00

8.00

10.00

12.00

0 - 100 0 - 200 0 - 300 0 - 400 0 - 500

1.00 1.08 1.21 1.34 1.551.00

2.41

4.67

7.44

10.93

Frequency Range (Hz)

Solution Time

(Normalized)

Number of Modes

(Normalized)

0.00

2.00

4.00

6.00

8.00

10.00

12.00

0 - 100 0 - 200 0 - 300 0 - 400 0 - 500

1.00 1.08 1.21 1.34 1.551.00

2.41

4.67

7.44

10.93

Frequency Range (Hz)

Solution Time

(Normalized)

Number of Modes

(Normalized)

Increased fidelity of analysis

Page 33: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

33 Louis Komzsik PARENG- 2011

Distributed memory workstation

HP Proliant DL320G5 server

64 dual core (1.85 GHz) Xeon CPUs

50GB local SATA disks per node

4 GB memory per node

GigE interconnect with HP MPI

Suse Linux Version 10.3

Page 34: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

34 Louis Komzsik PARENG- 2011

Automotive engine application

Solid element model

� 3.6 M grid points� 2.3 M tetrahedral elements� 10.8 M degrees of freedom

Normal modes analysis

� Frequency: 0 – 10,000 Hz � ~ 250 normal modes� 256 partitions

Page 35: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

35 Louis Komzsik PARENG- 2011

Shortening solution time

1.004.00

7.11

12.47

17.15

25.78

34.58

49.27

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

45.00

50.00

Speed up

Serial 1 2 4 8 16 32 64

Number of DMP processes

Page 36: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

36 Louis Komzsik PARENG- 2011

Increased fidelity of analysis

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

0 - 10,000 0 - 20,000 0 - 30,000 0 - 40,000 0 - 50,000

1.001.25 1.28 1.32 1.34

1.00

2.95

5.61

8.79

12.57

Frequency Range (Hz)

Solution Time

(Normalized)

Number of Modes

(Normalized)

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

0 - 10,000 0 - 20,000 0 - 30,000 0 - 40,000 0 - 50,000

1.001.25 1.28 1.32 1.34

1.00

2.95

5.61

8.79

12.57

Frequency Range (Hz)

Solution Time

(Normalized)

Number of Modes

(Normalized)

Page 37: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

37 Louis Komzsik PARENG- 2011

Introduction to industrial analysis

Geometric domain decomposition

Distributed computational solutions

Parallel computational kernels

Application case studies

Conclusions and future work

Scope or presentation

Page 38: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

38 Louis Komzsik PARENG- 2011

Geometric domain decomposition technologies provide the basis for distributed solutions on modern hardware

Recursive computational solutions can support a wide range of engineering analyses with practically acceptable accuracy

The handling of the local matrix operations with multi-core processors contributes to the overall performance gain

The performance advantages of distributed computational solutionsare significant and tremendously accelerate the engineering work

Conclusions

Page 39: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

39 Louis Komzsik PARENG- 2011

Extending the distributed finite element technology to a grid computing environment

Overcoming the lack of node to node communication mechanism with a high speed network

Minimizing the need for a high bandwidth connection between the local nodes and storage devices

Synchronizing completion of similar computational complexity components on non-homogeneous grid environment

Future work

Page 40: State of the art distributed parallel computational ... · 1 PARENG- 2011 Louis Komzsik State of the art distributed parallel computational techniques in industrial finite element

40 Louis Komzsik PARENG- 2011

Thank you for your attention!

www.siemens.com

www.siemens.com/plm

www.siemens.com/plm/nxnastran

Siemens and the Siemens logo are registered trademarks of Siemens AG. NX is a registered trademark of Siemens PLM Software Inc. in the United States and in other countries.

NASTRAN is a registered trademark of the National Aeronautics and Space Administration.

SpaceShip One pictures by courtesy and permission of Quartus Engineering Inc.