22
ParFUM: A Parallel Framework for Unstructured Meshes Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008

ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

ParFUM: A Parallel Framework for Unstructured Meshes

Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan ChakravortyCharm++ Workshop 2008

Page 2: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

What is ParFUM?

• A framework for writing parallel finite element codes

• Takes care of difficult tasks involved in parallelizing a serial code

• Provides advanced mesh operations such as mesh adaptivity and dynamic load balancing

• Constantly evolving to support application needs (for example, cohesive elements and collision detection)

• Based on Charm++ and AMPI. Supports C, C++, and Fortran

Page 3: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

Making Parallel Finite Element Codes Easier

Create mesh

Perform finite element computations

Extract results

A simple serial finite element code:

Page 4: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

Making Parallel Finite Element Codes Easier

Create mesh

Perform finite element computations

Extract results

Partition mesh

Distribute mesh data and create “ghost” layers

with synchronizationand load balancing

A simple parallel finite element code:

Page 5: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

Making Parallel Finite Element Codes Easier

Create mesh

Perform finite element computations

Extract results

Partition mesh

Distribute mesh data and create ghost layers

with synchronization and load balancing

ParFUM can do these things automatically and let the developer concentrate on science and engineering

Page 6: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

The Structure of a ParFUM Program

Init

Driver

MPI

Charm++

MSAPartitioning and Distribution

Page 7: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

The Big Picture

7

ParFUM

Load Balancing Framework Communication Optimizations

Charm++AMPIMulti-phaseSharedArrays

System View

Partitioning

Ghost LayerGeneration

BulkAdaptivity

IncrementalAdaptivity

CollisionDetection

Contact

SolutionTransfer

User'sSolver

AdjacencyGeneration

IFEM

pTopS

User View

CharmRun-time System

Page 8: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

Integrating Multiple Programming Models

8

ParFUM

Load Balancing Framework Communication Optimizations

Charm++AMPIMulti-phaseSharedArrays

System View

Partitioning

Ghost LayerGeneration

BulkAdaptivity

IncrementalAdaptivity

CollisionDetection

Contact

SolutionTransfer

User'sSolver

AdjacencyGeneration

IFEM

pTopS

User View

CharmRun-time System

Global Shared Memory

Message Passing

Message Driven

Page 9: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

ParFUM and AMPI

• Application code is written in AMPI, an implementation of MPI on top of the Charm RTS.

• AMPI processes (virtual processors, or VPs) are not tied to a physical processor, they can migrate and there may be many of them per physical processor

• This allows easier porting of MPI codes and eases the learning curve of ParFUM

Page 10: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

Virtualization Tradeoffs

Advantages

•Allows adaptive overlap of communication and computation

•More granular load balancing

•Improved cache performance

•More flexibility

Disadvantages

•More communication

•Worse ratio of remote data to local data

•Imposes some thread overhead

•High virtualization requires many elements per node

Page 11: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

Virtualization Performance Impact

For this dynamic fracture code, virtualization provides a substantial benefit

Page 12: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

Parallel Mesh Adaptivity

• Efficient parallel adaptivity is critical for many unstructured mesh codes

• ParFUM provides two implementations of common operations:

• incremental (2D triangle meshes): each individual operation leaves the mesh consistent. Relatively slow and puts limitations on ghost layers

• bulk (2D triangles and 3D tets): many operations performed at once, ghosts and adjacencies updated at end. Lower cost, no restrictions on ghost layers. (ongoing work)

Page 13: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

Higher Level Adaptivity

• Operations like propagating edge bisection are composed from edge bisect, flip, and contraction primitives

• Which is better, bulk or incremental? Depends on amount and frequency of adaptivity

Propagating Edge Bisection

Page 14: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

Load Balance, Adaptivity, and Virtualization

Serious load imbalance: areas near fracture are much more expensive

Page 15: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

Load Balance, Adaptivity, and Virtualization

We can change the VP mapping to distribute computationally expensive parts of the mesh better

Page 16: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

Load Balance, Adaptivity, and Virtualization

Assigning VPs using a greedy load balancer further improves utilization

Page 17: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

Spacetime Meshing

• Parallelization of Spacetime Discontinuous Galerkin (SDG) algorithm [Haber]

• Adaptive in both space and time, uses incremental adaptivity

• Asynchronous code, no global barriers

Page 18: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

PTops

• Structural dynamics code for graded materials [Paulino]

• Based on Tops, a serial framework featuring an efficient topological mesh representation

Page 19: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

PTops Strong Scaling

400,000 elements on AbeNo virtualization

Page 20: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

PTops and CUDA

• ParFUM-Tops interface has CUDA support

• Our implementation runs ~10x faster on a single node using CUDA

• Limited usefulness due to lack of double precision and lack of access to clusters which combine GPUs and high quality interconnects

Page 21: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

Ongoing Work

• On-demand insertion of cohesive elements (truly extrinsic cohesives) in PTops for dynamic fracture simulations

• Efficient, scalable implementation of bulkedge flip and edge contraction

• Contact: use Charm++ collision detectionlibrary to detect when domain fragments come into contact

Page 22: ParFUM - charm.cs.uiuc.educharm.cs.uiuc.edu/workshops/charmWorkshop2008/slides/charm20… · ParFUM and AMPI •Application code is written in AMPI, an implementation of MPI on top

ParFUM: A Parallel Framework for Unstructured Meshes

Aaron Becker, Sayantan Chakravorty, Isaac Dooley, Terry WilmarthParallel Programming LabCharm++ Workshop 2008