View
218
Download
0
Embed Size (px)
Citation preview
Parallel Mesh Refinement with Optimal Load Balancing
Jean-Francois Remacle, Joseph E. Flaherty and Mark. S. Shephard
Scientific Computation Research Center
Scope of the presentation
• The Discontinuous Galerkin Method (DGM)– Discontinuous Finite Elements– Spatial discretization– Time discretization– DG for general conservation laws
• Adaptive parallel software– Adaptivity– Parallel Algorithm Oriented Mesh Datastructure
The DGM for Conservation Laws
• Find such that
• Weighted residuals + integration by parts
• Spatial discretization
The DGM for Conservation Laws
• Numerical Flux
• Choices for numerical fluxes– Lax Friedrichs – Roe linearization with entropy fix– Exact 1D Riemann solution (more expensive)
• Monotonicity not guaranteed– Higher-order limiters
Higher order equations
• Discontinuous approximations needs regularization for gradients
uu
uuut
,
Re1
F
F
u
uut
Re1
0
s
sF
Computing higher order derivatives
• How to compute when u is not even C0 ?
• Stable gradients : find such that
• Or weakly
Computing higher order derivatives
– Solution of the weak problem: w=u. Weak derivatives are equal, then fields are equal.
– If we choose a constrained space for w with no average jumps on interfaces i.e. with
– We have– With– And
Time discretization
• Explicit time stepping– Efficient in case of shock tracking e.g.
• Method of lines may be too restrictive due to– Mesh adaptation (shock tracking)– Real geometry's (small features)
• Local time stepping, use local CFL– The key is the implementation– Important issues in parallel
Parallel Issues
• Good practice in parallel– Balance the load between processors– Minimize communications/computations– Alternate communications and computations
• Local time stepping– Elementary load depends on local CFL– Not the mostly critical issue
Parallel issues
• Example, load is balanced when– Proc 0 : 2000(1dt) + 1000 (2dt) – Proc 1 : 3000(1dt) + 500 (2dt)– Total Load : 4000 dt
• If synchronization after every sub-time steps– Proc 0 waits 1000 dt at the first sync.– Proc 1 waits 1000 dt at the first sync– Maximum parallel speedup = 4/3
Parallel issues
• Solution– Synchronization only after the goal time step– Non blocking sends and receive after each sub-
time step– Inter-processor faces store the whole history– Some elements may be “retarded”
Objectives of PAOMD
• Distributed mesh– Partition boundaries treated like model boundaries– On processor : serial mesh
• Services– Round of communication– Parallel adaptivity– Dynamic load balancing
Higher order equations
• Navier-Stokes– Von Karman vortices– Re = 200
• Numerics– use of p=3– no limiting– filtering
• In parallel
64 processors of the PSC alpha cluster 1 106 to 2.0107 dof’s
128 processors of Blue Horizon108 dof’s
Large scale computations
Muzzle break problem
• Process– Input: ProE CAD file– MeshSim: Mesh gen.– Add surface mesh for
force computations– Choice of parameters
• Orders of magnitude– 1 day (single proc., no
adaptation, LTS)– will need ~ 100 procs. for
adaptive computations
Force computations
• Conservation law
• Integral of fluxes
• Numerical issues– Geometric search
)(uruut f
Fu
nf