Geometry Based Parallel Mesh Generation and Adaptation · 12 Parallel Mesh Adaptation •...

Preview:

Citation preview

Geometry Based Parallel Mesh Generation and Adaptation

Saurabh TendulkarMark Beall

Rocco Nastasia

2

Introduction

• Q. Why parallel mesh generation/adaptation?• A. Enables large scale parallel adaptive

simulations– Billions of elements, 10s/100s of thousands of processors.– If mesh is serial, scaling is not possible.– Eliminate I/O where possible – I/O is very slow.– Seamless simulations that scale well without bottlenecks.

3

Topics• Partitioned mesh• Parallel mesh generation

– Surface meshing.– Volume meshing.

• Parallel adaptation– Mesh modifications.– Predictive load balancing.– Anisotropic size fields.– Boundary layer adaptation.

• Distributed parallel geometry• File-free adaptive analysis• Multithreaded mesh generation/adaptation

4

Partitioned Mesh

• Mesh distributed among available processors.

• Each processor has part of the mesh.

• Entities on part boundaries:– Replicated on each part– Know about their copies on other

parts.

• Mesh classified on model.

5

Partitioned Mesh• Partitioned mesh allows

– Communication at part boundaries• Operations independently performed on each part.• Communication of data so mesh is in sync.

– Mesh migration• Migrate individual/groups of entities from part to part.• Localize given neighborhood around entities.

– Partitioning • Load balance.• Parallel graph partitioner – ParMetis, user defined.• In volume mesh, regions/faces are graph nodes/edges.

6

Parallel Mesh Generation

• Surface Meshing– Fully automatic.– Decompose model faces

among processes.– Automatically determined.– Load balance not guaranteed,

but scales well in practice (more faces than processors).

7

Parallel Mesh Generation

• Volume Meshing– Fully automatic.– Octree based spatial

decomposition for load balance.

– Mesh local areas (away from part boundaries).

– Hierarchical repartitioning to localize and mesh areas between part boundaries.

8

Distributed Volume Meshing• Scaling

– Hard problem to scale well as amount of work is unknown.

– Good speedup = half the number of processes.

– Focus on generating.– >300M element meshes

generated in 10 min.– Generation << I/O time.– But do not need I/O!– Reduces overall time.

Volume meshing up to 12 processors

Volume meshing up to 64 processors

9

Parallel Mesh Generation

1/8 of 180M element mesh on 64 processors

10

Parallel Mesh Adaptation

• Error estimation specifies new mesh size field– Mesh size at vertices.

• Adaptation, based on this size field involves:– Refinement (splits), coarsening (collapses).– Optimization to improve shape (swaps etc).

• Maintain fidelity to geometry (snapping)– Motion.– Modifications.– Cavity remeshing.

11

Parallel Mesh Adaptation

• Modifications in parallel (at part boundaries)– Refinement

• Split in parallel independently.• Communicate new data to keep in sync.

– Coarsening, optimization, snapping• Localize mesh, then modify.

12

Parallel Mesh Adaptation

• Predictive load balancing– New size field may lead to heavy refinement on

one part and coarsening in another.– Cannot go with current partitioning – memory as

well as work load could be unbalanced.– Use new size field to set weights on regions.– Do weighted repartitioning before modifications.– Load/memory balance and suitable partitioning

after modifications for next analysis step.

13

Anisotropic Adaptation• Anisotropic size field

– Ellipsoidal sizes at vertices.

Transonic flow over ONERA M6 wing

14

Boundary Layer Adaptation

• Boundary layer mesh– Semistructured mesh.– Models high gradients normal

to surface, e.g. no-slip walls in CFD.– First layer height (t0), number of layers (n), total height (T)

or gradation factor (g).

• Adaptation must maintain structure.• Parallel BL adaptation under development.

– Serial available.

15

Boundary Layer Adaptation

• Separate normal and in-plane adaptation.• In-plane

– Size field same as for unstructured mesh.– Mesh modifications propagate thru stack.– Keep stacks together in parallel.

• Normal– User specifies t0, n, and T or g on vertices.

– Shrink/expand BL, change number of layers.

16

Boundary Layer Adaptation

BL adaptation – pipe manifold example

17

Boundary Layer Adaptation

Pipe adaptation – close up of corner

18

Boundary Layer Adaptation

Normal BL adaptation

19

File-free Adaptive Analysis

• Direct interface between Simmetrix and Solver codes.– All required data is in memory, no I/O.– In-place solution transfer during adaptation with FieldSim.

• In progress:– RPI/Colorado's PHASTA CFD code.– NASA's FUN3D CFD code.

FUN3D Itersdone?

ErrorEstimation

Errorok?

MeshSimAdaptMain

End End

Y Y

N N

FUN3D RPI

FieldSim

20

Distributed Model Geometry• Requiring entire model on each process poses

memory and I/O (or communication) overhead.• Truly require only geometry where mesh is classified.• Partitioned model representation

– Similar to partitioned mesh.– Migrate model entities between processors.– Maintain enough data to properly hook up with adjacent entities.

• Driven by mesh migration– Geometry required by receiving process migrated first.– Then mesh is migrated so it can be classified.– Geometry not required any more on sending process (mesh was

migrated away) is deleted.

21

Distributed Model Geometry

Partitioned mesh and model – model geometry in grey

• Substantial memory savings for models with large number of model entities.

• Parts entirely in interior require no geometry at all!

22

Multithreaded Meshing

• Utilize multicore machines to get results faster.• Limit critical sections where threads need exclusive access.• Initial goal: speedup to 1.5 on 2 and 2 on 4 threads.• Hybrid distributed + MT for modern parallel clusters:

– For example, MPI (funneled calls via master) + pthreads.

23

Concluding Remarks• Parallel mesh generation and adaptation enable

large scale parallel adaptive analyses.• Parallel model representation allows scaling of

geometry as well.• Work in progress

– Better scaling for high number of processors.• Use RPI's CCNI supercomputer.

– Better scaling for multithreaded (2, 4, 8... threads).– Parallel adaptation in the boundary layer.– File-free adaptive analyses.– Hybrid MPI + thread model.

Recommended