Upload
hoangmien
View
221
Download
3
Embed Size (px)
Citation preview
1
SIERRA -- A Computational Framework forEngineering Mechanics Applications
http://www.esc.sandia.gov/sierra.html
James R. Stewart [email protected]. Carter Edwards [email protected]
Engineering Sciences CenterSandia National Laboratories
Albuquerque, NM
2
Outline
• SIERRA concepts and overview
• Basic framework services
• Advanced framework services
3
DOE / ASCI
• Developed for– Department of Energy (DOE)– Accelerated Strategic Computing Initiative (ASCI)
• Challenges– Coupled multiphysics: solid+fluid+thermal+chem.– Large unstructured meshes: 100,000,000s elements– Massively parallel computing: 1,000s of processors– Advanced algorithms: mesh erosion, h-adaptivity,
multilevel, dynamic load balancing
4
SIERRA Concept
Applications share a single framework whichprovides common capabilities
• Simplify utilization of ASCI supercomputers• Consolidate common capabilities• Eliminate redundant development and maintenance• Encourage architecturally similar applications
Application developers work in a commonsoftware development environment
• Uniform access to ASCI resources• Utilize a common source code repository• Coordinate development efforts• Consolidate the set of software development toolstools• Share software development processes
5
SIERRA Contributors
• Framework– Noel Belcourt– Kevin Copps– Carter Edwards– Jonathan Rath– Greg Sjaardema– Jim Stewart– Alan Williams
• Tools– Kathy Aragon– Dorothy Brethauer– Christi Forsythe– Mark Hamilton– Erik Illescas
+ many Application Developers!
6
Current stand-alone codes
VIPAR Parachute performance code, vortexmethod coupled with transient dynamics
PRONTO Transient dynamicsLagrangian solid mechanics
JAS Quasi-static solid mechanics
COYOTE Thermal mechanics with chemistry
GOMA Incompressible fluid mechanics with free surface
SALINAS Structural dynamics
SACCARA Compressible fluid mechanics
DAKOTA Design optimization
VIPAR Parachute performance code, vortexmethod coupled with transient dynamics
PRONTO Transient dynamicsLagrangian solid mechanics
JAS Quasi-static solid mechanics
COYOTE Thermal mechanics with chemistry
GOMA Incompressible fluid mechanics with free surface
SALINAS Structural dynamics
SACCARA Compressible fluid mechanics
DAKOTA Design optimization
Migration of SandiaApplication Codes
SIERRA
Current and future SIERRA-based codes
FUEGO/ SYRINXPREMO
(SACCARA)
SALINAS
KRINO
ARIA (GOMA)
CALORE(COYOTE)
ADAGIO(JAS)
PRESTO(PRONTO)
ANDANTE
VIPAR
DAKOTA
7
Outline
• SIERRA concepts and overview
• Basic framework services
• Advanced framework services
8
Basic Framework Services
User Input ParsingBulk Mesh Data I/O
Mechanics Mgmt
Field Mgmt Mesh Mgmt
Parallel Communications
SIERRAKernel
Master Element I/FLinear Solver I/F
9
Mesh Management
• Unstructured mesh– Arbitrary mesh object connections– Mix element topologies (hex, tet, quad, …)
• Fully distributed mesh data structure• Dynamic creation/deletion of mesh objects• Can define mesh subsets
– Define by part, material type, boundary, constraint, …– Define unions and intersections of subsets
10
Field Management
• Application defined fields (a.k.a. variables)– Text name– Type (int, real, vector, full tensor, symmetric tensor, …)– Aggregate types (e.g. collection of material variables)– Optionally associated with a master element
(interpolation field, integration field)
• Fields are defined on mesh objects• Associated with a mesh subset
– ( field , mesh-object ∈ subset ) → allocated value– ( field , mesh-object ∉ subset ) → NO value
11
Mesh/Field Management
data100 data200
data300
data430
data410
data420
Collection of mesh objects
data442
data441
data444data443
300 400
Finite Element Mesh
410
430 440
420
441 442
443 444
100 200
300
Contiguous workset array
data444data441data430data410data200data300data100data443data420data442
12
Mesh/Field Management
Workset arrays are:
• Delivered to the mechanicsalgorithm in Fortran array order
• Sized to minimize cache misses
data444data441data430data410data200data300data100data443data420data442
13
Finite Element Services
• SIERRA provides a master element interface• Elements are implemented and shared by
application developers• Current (partial) list of master elements (each
with fully integrated, uniform gradient, and controlsurface/volume versions)– 8-node hexahedron– 4-node tetrahedron– 4-node quadrilateral (in 2D and 3D, including shell)– 3-node triangle (in 3D, including shell)– 6-node wedge– Others, and more on the way…
14
Mechanics Management
• A mechanics module consists of algorithmsand supporting data
– Associate mesh subsets with mechanics modules
– Declare fields to support mechanics modules
– Uses zero-to-many master elements
• Mechanics modules are supplied byapplication
– Examples: solvers, BC’s, external forces, etc
– Nest mechanics modules inside other modules
15
Mechanics Module Hierarchy
Domain
Procedure (time step control)
Region A(single step of physics A)
Mechanics
Mesh and Fields
Region B(single step of physics B)
Mechanics
Mesh and Fields
Transfer
16
Parallel Communication Services
• Simple scalar reductions (e.g. global dotproduct)
• Global assembly– Update/sum field values on subdomain boundary
• Inter-processor operations– Communicate fields between processors (MPI)
• Multiple domain decompositions– Redistribute mesh between decompositions– SIERRA uses the Zoltan (SNL) partitioning library
17
Finite Element Interface (FEI)
Prometheus Trilinos
Spooles
PETSc
Others
Linear Solver InterfaceElement, Boundary, and Constraint Contributions
Solution Values
Application Mechanics
Linear Solver Interface
18
User Input Parsing
Application’sMechanics
ParameterValues
User
UserInput File
CommandSpecifications’HTML Pages
GenerateDocumentation
CommandSpecifications’XML Database
QuerySpecifications
SIERRAParser
ParseCommands
CommandRegistration
19
Scaling Tests: 3 Applications
Adagio Scaling for 2K Elements/Processor Relative to 32 Processors on ASCI-Red
0.1
1
10
1 10 100 1000 10000
Number of Processors
Sca
ling
of
No
nlin
ear
Iter
atio
n
ASCI-RedASCI-Blue-MtnASCI-Blue-Pac
Presto Scaling for 2K Elements/Processor Relative to 32 Processors on ASCI-Red
0.1
1
10
1 10 100 1000 10000
Number of Processors
Sca
ling ASCI-Red
ASCI-Blue-MtnASCI-Blue-Pac
Calore Scaling for 10K Elements/Processor Relative to 32 Processors on ASCI-Red
0.1
1
10
1 10 100 1000 10000
Number of Processors
Sca
ling ASCI-Red
ASCI-Blue-MtnASCI-Blue-Pac
• Presto: explicit dynamics• Adagio: quasi-statics• Calore: thermal conduction
and enclosure radiation
20
Outline
• SIERRA concepts and overview
• Basic framework services
• Advanced framework services
21
Basic Framework Services
User Input ParsingBulk Mesh Data I/O
Mechanics Mgmt
Field Mgmt Mesh Mgmt
Parallel Communications
SIERRAKernel
Master Element I/FLinear Solver I/F
22
Advanced Framework Services
Mechanics Mgmt
Field Mgmt Mesh Mgmt
Parallel Communications
H-AdaptivityField Transfers
Element DeathLoad Balancing
SIERRAKernel
User Input ParsingBulk Mesh Data I/O
Master Element I/FLinear Solver I/F
23
Key Concept:Multiple Domain Decompositions
Original Finite Element Mesh
• Primary Decomposition - graph based,best suited for element computations
• Secondary Decomposition – geometry based,best suited for search algorithms.
• Mechanics algorithm chooses an efficient domain decomposition
• Mechanics algorithms are independent of a specific decomposition
Four Processor Example
24
Dynamic Load Balancing:Surface Example
(b)Primary Decomposition
Topology BasedAll of surface is on P0
P0
P1
(c)Secondary Decomposition
Geometry BasedDistributed surface on P0 & P1
P0 P1
(a)Original
Undecomposed Model
Surface
Load BalancingDecomposition
Primary Surface• nodes• faces• edges
NOT balanced forsurface algorithm
Primary Surface• nodes• faces• edges
NOT balanced forsurface algorithm
Secondary Surface• nodes• faces• edges
Balanced forsurface algorithm
Secondary Surface• nodes• faces• edges
Balanced forsurface algorithm
Two copies of the same surface (nodes, edges, faces) using exactly the same amount of memory. However, they are distributed over theprocessors differently for each view.
25
Key Technology:Parallel Communication Specification
“CommSpec”• Relation: Source-Mesh → Destination-Mesh
– { (MeshObject,SrcProc) → (MeshObject,DestProc) }
• Symmetric: Relation = converse(Relation)– Shared mesh objects on inter-processor boundaries
• Nonsymmetric− Arbitrary on and off processor mesh object connectivity
− Inter-mesh transfers, load balancing, periodic boundaryconditions, contact algorithms, …
26
Determine common geometricdecomposition for rendezvous
Parallel Transfer Algorithm
• Transfer nodal field from source to destination• Mesh decompositions not geometrically aligned
⇒ must “geometrically rendezvous” the meshes
DestinationMesh
SourceMesh
Mesh Data(CommSpec B)
Field & Mesh Data(CommSpec A)
SourceRendezvous
Mesh
DestinationRendezvous
MeshOn-processor search(CommSpec C)
On-processor interpolation
Interp. Field Data(CommSpec BT)
Search Results(CommSpec AToC)
Interpolated Results(CommSpec BToCToA)
27
Coupled Calore/Fuego(Thermal/Flow) Pipe Flow Problem
Calore• Mesh both pipe and fluid• Transfer fluid temperature
to Fuego
Fuego• Different mesh for fluid• Transfer fluid velocity to
Calore
28
H-Adaptivity: Overview
• Goal: Selectively refine elements toachieve solution accuracy more efficiently
Application SIERRA Either
Solve Physics
YesProceed to
Next Timestep
Estimate SolutionError Distribution
Stopping CriterionSatisfied?
NoMark Elements
(Adaptive Strategy)Resolve
Markers (2:1)
RestrictVariables
Global Mesh UpdateProlong
Variables
29
H-AdaptivityGlobal Mesh Update
Global Mesh UpdateRestrictVariables
ProlongVariables
Unrefine Mesh Rebalance Load Refine Mesh
Rebalancing is done when the mesh is “smallest”!
30
• Elements are hierarchically split into childelements
• Parent elements are retained in the data structure• Hanging node constraints are handled by the
application code• 2:1 refinement ratio is globally enforced
R
R = Refine element 2:1 refinement ratio enforced
R
Adaptive Mesh Refinement
P0
P1
31
• Parent elements are restored by merging theirchildren
• Child elements are then deleted from the datastructure
Adaptive Mesh Unrefinement
U = Unrefine element
U UUU
32
• All children must be marked for unrefinement tooccur:
Adaptive Mesh Unrefinement
U = Unrefine element
UUU
No unrefinement!
U = Unrefine element
U U
UNo unrefinement!
U
• Unrefinement will not break the 2:1 refinement ratio:
33
Dynamic Load Rebalancing
• Option 1: Rebalance “genesis” mesh only– Parents and children are not allowed to split
Genesis Mesh
P0
P1
Mesh after 3 refinements: - P0: 3 elements - P1: 10 elements
P0
P1
34
Dynamic Load Rebalancing
• Option 2: Allow parent-child splitting amongprocessors
Genesis Mesh
P0
P1
Mesh after 3 refinements: - P0: 6 elements - P1: 7 elements
P0
P1
Tradeoff: More communication required,but the mesh is more evenly balanced
35
H-Adaptivity Example:Electron Beam Rastering
• 3D Si wafer at roomtemperature
• Beam is modeledby time-dependentheat flux
• Heating is verylocalized
• a posteriori errorindicator: H1
seminorm oftemperature
36
• Temperature in K
• Two refinementsper time step
• Dynamic loadrebalancing
Electron Beam Rastering
Solution after8 time steps
37
•Temperature is in K
•Two refinementsperformed each timestep
Electron beam rastering (cont)
Solution after32 time steps
38
Wafer Slice and Rotation
39
Electron Beam Rastering8 Proc Dynamic Load Rebalancing
Movie
40
Summary
• SIERRA provides scalable framework servicesneeded by mechanics application codes
• Application code developers focus on mechanics, notcomputer science
• Software frameworks such as SIERRA will becomeincreasingly more common [e.g, Trellis (RensselaerPolytechnic Institute), Deal.II and UG (Univ of Heidelberg),Others… ]
• “Supporting Infrastructures” mini-symposium at 5th
World Congress on Computational Mechanics,Vienna, July 2002
41
EXTRA SLIDES
42
Element Refinement Template
• Defines topology and connectivity of children• Refinement must maintain validity of mesh• 2D examples:
43
Bulk Mesh Data Input/Output
• Services– Parallel IO for mesh topology and field values– Restart
• Simple application interface, specify:– What files for input/output– Which mesh subsets and fields– When to output
• Transparent access to multiple file formats– ExodusII (SNL), DMF (ASCI Tri-Lab), …