Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1 Louis Komzsik PARENG- 2011
State of the art distributed parallel computational techniques in industrial finite element analysis
Second Conference on Parallel, Distributed, Grid and CloudComputing for Engineering
Dr. Louis KomzsikSiemens PLM Software, USA
Ajaccio, FranceApril 12-15, 2011
2 Louis Komzsik PARENG- 2011
Introduction to industrial analysis
Geometric domain decomposition
Distributed computational solutions
Parallel computational kernels
Application case studies
Conclusions and future work
Scope or presentation
3 Louis Komzsik PARENG- 2011
Industrial complexity – constantly increasing
Engine block1,000,000 elements
Car30,000 parts
Jet Engine10,000 parts
Factory10,000 machines
3
4 Louis Komzsik PARENG- 2011
Cray Computer Multi-core CPU
$15 million $150
O(1) gigaflops O(100) gigaflops
1000 sold 100 million sold
Computer hardware – constantly changing
5 Louis Komzsik PARENG- 2011
Lifecycle simulations
Designerview
Analystview
6 Louis Komzsik PARENG- 2011
Multidisciplinary solutions
Designerview
Analystview
7 Louis Komzsik PARENG- 2011
High performance requirements
The constrained stiffness matrix of an analysis problem
� Number of rows: 35,734,709
� Nonzero terms: 1,384,305,995
� Nonzero terms in sparse factor matrix: 43,827,004,000
� Memory used during factorization: 1,080,732,000 (4 byte) words
� Actual elapsed time of sparse factorization on a single high performance processor:
335 minutes
8 Louis Komzsik PARENG- 2011
Introduction to industrial analysis
Geometric domain decomposition
Distributed computational solutions
Parallel computational kernels
Application case studies
Conclusions
Scope or presentation
9 Louis Komzsik PARENG- 2011
� Subdivide large geometry domains into limited number of partitions
� Computations in the geometry partitions are dependent
� Minimize the boundary size of each partition with respect to its interior
� Minimize the total boundary size as communication is needed
Single level geometric domain decomposition
Proc 1 Proc 2 Proc k
10 Louis Komzsik PARENG- 2011
Single level
� Subdivide large geometry domains into limited number of partitions
� Subdivide the partitions into sub-partitions and dynamically reduce them to their collectors
� Assemble the multilevel substructures to obtain the engineering solution
� The total number of substructures may exceed the number of processors
Multi-level geometry domain decomposition
11 Louis Komzsik PARENG- 2011
Finite element problem domain decomposition
Based on model or matrices
Graph Matrix FE model
Vertices Diagonal Terms Node points
Edges Off-diagonals Elements
Undirected Symmetric Linear
12 Louis Komzsik PARENG- 2011
Graphs and matrices
Graph model and its Laplacian matrix
Finite element model and its stiffness matrix
1 2 4
3 5
−
−−
−
−−−
−
=
kkk
kkk
kkk
kkkkk
kkk
K
2300
36030
0023
3383
0032
MembraneElement 1
Membrane Element 2
1 2 4
3 5
−−
−−
−−
−−−−
−−
=
21010
12010
00211
11141
00112
L
13 Louis Komzsik PARENG- 2011
Partitioning technology
Spectral bisection method
Vertex cut result
1 2 4
3 5
:222 uLu λ=
−
−
⋅=
−
−
−−
−−
−−
−−−−
−−
2/1
2/1
2/1
0
2/1
1
2/1
2/1
2/1
0
2/1
21010
12010
00211
11141
00112
1 2
3
2 4
5
14 Louis Komzsik PARENG- 2011
Recursive graph partitioning
Coarsening, partitioning and refining phases
8
9 36
57
24
1
2 8
36
57
69
4 2
71
69
44 2
7
2
1
9 6
24
Partition 1
Refining
Partitioning
Coarsening
9 3
51
6
7
Partition 2
15 Louis Komzsik PARENG- 2011
Introduction to industrial analysis
Geometric domain decomposition
Distributed computational solutions
Parallel computational kernels
Application case studies
Conclusions and future work
Scope or presentation
16 Louis Komzsik PARENG- 2011
Distributed memory parallel architecture
� Cluster of high performance workstations
� Distributed memory work station
� Dedicated I/O devices
� High level parallelism
� Feasible number ofnodes: 16-1024
17 Louis Komzsik PARENG- 2011
Geometric problem Partitioning hierarchy
Recursive matrix partitioning
1 2 4
3 6
7
5
9 36
57
24
1
18 Louis Komzsik PARENG- 2011
Distributed normal modes analysis
1 1 1,3 1,3 1
2 2 2,3 2,3 2
3 3 3,7 3,7 3
4 4 4,6 4,6 4
5 5 5,6 5,6 5
6 6 6,7 6,7 6
77 7
oo oo ot ot o
oo oo ot ot o
tt tt tt tt t
oo oo ot ot o
oo oo ot ot o
tt tt tt tt t
ttt tt
K M K M
K M K M
K M K M
K M K M
K M K M
K M K M
K M
λ λ φλ λ φ
λ λ φλ λ φ
λ λ φλ λ φ
φλ
− −
− − − −
− − − −
− −
−
0
=
0)( =Φ− MK λPhysical problem
Partitioned form
19 Louis Komzsik PARENG- 2011
Phase 1
Processor 1
Processor 3 Processor 4
Processor 2Start
Communicate
20 Louis Komzsik PARENG- 2011
Phase 2
Processors 1-2
Processors 3- 4
Start
Communicate
21 Louis Komzsik PARENG- 2011
Phase 3
Processors 1-2-3-4Start
0~
)~~
( =Φ− MK λ
Solve reduced order problem
Recover physical solution
=Φ→
=Φ→
=Φ
7
6
5
4
3
2
1
7
6
5
4
3
2
1
7
6
5
4
3
2
1
~
~
~
~
~
~
~
~
t
t
o
o
t
o
o
t
t
o
o
t
o
o
q
q
q
q
q
q
q
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
22 Louis Komzsik PARENG- 2011
Introduction to industrial analysis
Geometric domain decomposition
Distributed computational solutions
Parallel computational kernels
Application case studies
Conclusions and future work
Scope or presentation
23 Louis Komzsik PARENG- 2011
Shared memory parallel architecture
� Multi-core processors
� Shared cache
� Shared memory
� Low level parallelism
� Feasible number of cores: 2-16
24 Louis Komzsik PARENG- 2011
Sparse factorization
Matrix connectivity Reordering
Elimination tree Factorization
25 Louis Komzsik PARENG- 2011
Multifrontal factorization
Sparsity pattern
Frontal steps
Front amalgamation
26 Louis Komzsik PARENG- 2011
Symbolic reordering
Consecutive columns
Same sparsity pattern
Cache fitting size
Supernodal approach
27 Louis Komzsik PARENG- 2011
Matrix update
Panel selection
Downstream columns
Different sparsity pattern
BLAS 2.5 operation
28 Louis Komzsik PARENG- 2011
Introduction to industrial analysis
Geometric domain decomposition
Distributed computational solutions
Parallel computational kernels
Application case studies
Conclusions and future work
Scope or presentation
29 Louis Komzsik PARENG- 2011
High performance workstation cluster
111 IBM P575 nodes with 1.9 GHz4 dual-core POWER5 CPUs per node
3.5 Terabyte aggregate memory100 Terabyte total disk space
IBM High Performance Switch (HPS)8 GB/sec bidirectional bandwidth
AIX OS Version 5.3Parallel Environment (PE) V4.2
30 Louis Komzsik PARENG- 2011
Trimmed car body application
Shell element model
� 1.3 M grid points� 1.2 M shell elements� 7.9 M degrees of freedom
Normal modes analysis
� Frequency 0 – 300 Hz � ~1000 normal modes� 512 partitions
31 Louis Komzsik PARENG- 2011
Shortening solution time
0.0
20.0
40.0
60.0
80.0
100.0
120.0
Serial 1 2 4 8 16 32 64 128
1.04.0
7.8
29.3
49.2
77.5
96.5
104.1 105.9
Speed Up
Number of DMP processes
32 Louis Komzsik PARENG- 2011
0.00
2.00
4.00
6.00
8.00
10.00
12.00
0 - 100 0 - 200 0 - 300 0 - 400 0 - 500
1.00 1.08 1.21 1.34 1.551.00
2.41
4.67
7.44
10.93
Frequency Range (Hz)
Solution Time
(Normalized)
Number of Modes
(Normalized)
0.00
2.00
4.00
6.00
8.00
10.00
12.00
0 - 100 0 - 200 0 - 300 0 - 400 0 - 500
1.00 1.08 1.21 1.34 1.551.00
2.41
4.67
7.44
10.93
Frequency Range (Hz)
Solution Time
(Normalized)
Number of Modes
(Normalized)
Increased fidelity of analysis
33 Louis Komzsik PARENG- 2011
Distributed memory workstation
HP Proliant DL320G5 server
64 dual core (1.85 GHz) Xeon CPUs
50GB local SATA disks per node
4 GB memory per node
GigE interconnect with HP MPI
Suse Linux Version 10.3
34 Louis Komzsik PARENG- 2011
Automotive engine application
Solid element model
� 3.6 M grid points� 2.3 M tetrahedral elements� 10.8 M degrees of freedom
Normal modes analysis
� Frequency: 0 – 10,000 Hz � ~ 250 normal modes� 256 partitions
35 Louis Komzsik PARENG- 2011
Shortening solution time
1.004.00
7.11
12.47
17.15
25.78
34.58
49.27
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
Speed up
Serial 1 2 4 8 16 32 64
Number of DMP processes
36 Louis Komzsik PARENG- 2011
Increased fidelity of analysis
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
0 - 10,000 0 - 20,000 0 - 30,000 0 - 40,000 0 - 50,000
1.001.25 1.28 1.32 1.34
1.00
2.95
5.61
8.79
12.57
Frequency Range (Hz)
Solution Time
(Normalized)
Number of Modes
(Normalized)
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
0 - 10,000 0 - 20,000 0 - 30,000 0 - 40,000 0 - 50,000
1.001.25 1.28 1.32 1.34
1.00
2.95
5.61
8.79
12.57
Frequency Range (Hz)
Solution Time
(Normalized)
Number of Modes
(Normalized)
37 Louis Komzsik PARENG- 2011
Introduction to industrial analysis
Geometric domain decomposition
Distributed computational solutions
Parallel computational kernels
Application case studies
Conclusions and future work
Scope or presentation
38 Louis Komzsik PARENG- 2011
Geometric domain decomposition technologies provide the basis for distributed solutions on modern hardware
Recursive computational solutions can support a wide range of engineering analyses with practically acceptable accuracy
The handling of the local matrix operations with multi-core processors contributes to the overall performance gain
The performance advantages of distributed computational solutionsare significant and tremendously accelerate the engineering work
Conclusions
39 Louis Komzsik PARENG- 2011
Extending the distributed finite element technology to a grid computing environment
Overcoming the lack of node to node communication mechanism with a high speed network
Minimizing the need for a high bandwidth connection between the local nodes and storage devices
Synchronizing completion of similar computational complexity components on non-homogeneous grid environment
Future work
40 Louis Komzsik PARENG- 2011
Thank you for your attention!
www.siemens.com
www.siemens.com/plm
www.siemens.com/plm/nxnastran
Siemens and the Siemens logo are registered trademarks of Siemens AG. NX is a registered trademark of Siemens PLM Software Inc. in the United States and in other countries.
NASTRAN is a registered trademark of the National Aeronautics and Space Administration.
SpaceShip One pictures by courtesy and permission of Quartus Engineering Inc.