Parallelization Techniques for LBM Free Surface …...5 Free Surface Flows with LBM • Similar to...

Preview:

Citation preview

1

Nils Thürey , Thomas Pohl , Ulrich RüdeInstitute for System-Simulation

University of Erlangen-Nürnberg

Parallelization Techniques for LBM Free Surface Flows using MPI and OpenMP

2

Overview

• Introduction

• OpenMP Parallelization

• MPI Parallelization

• Validation Experiment

• Conclusions

3

Overview

• Introduction

• OpenMP Parallelization

• MPI Parallelization

• Validation Experiment

• Conclusions

4

Introduction

Simulations with Free Surface Flows

• Applications, e.g.:– Computer graphics: special effects– Engineering: metal foam simulations

• Lattice Boltzmann method:– D3Q19 lattice– BGK collision with Smagorinsky

turbulence model– Grid compression to reduce memory

requirements

5

Free Surface Flows with LBM

• Similar to Volume-of-Fluid:– Track fill fraction for each cell– Compute mass transfer– Interface with a closed layer of cells

• Extension for adaptive Grids:– Coarse grids for large fluid volumes – Adapt to movement of surface

Details in, e.g.: Lattice Boltzmann Model for Free Surface Flow for Modeling Foaming; C. Körner, M. Thies, T. Hofmann, N. Thürey and U. Rüde; J. Stat. Phys. 121, 2005

6

Free Surface Flow Example

7

Example with Moving Objects

8

Overview

• Introduction

• OpenMP Parallelization

• MPI Parallelization

• Validation Experiment

• Conclusions

9

OpenMP Parallelization• OpenMP for shared memory architectures

• Partition along y-axis, synchronize layers

10

OpenMP and Grid-Compression• Problem: dependency of updates

• Use boundary layer: offset to (0,0,+2) instead of (+1,+1,+1)

• Slightly increased memory requirements

11

OpenMP Performance

• Measurements with 2/4-way Opterons:

12

Overview

• Introduction

• OpenMP Parallelization

• MPI Parallelization

• Validation Experiment

• Conclusions

13

MPI Parallelization

• MPI for distributed memory architectures• Partition along x-axis• Transfer boundary layer over the network• No coarsening of boundary layer allowed

14

MPI Performance• Measurements on 4-way Opterons with

Infiniband interconnect:

15

MPI Performance with adaptive Grids:• Same Opteron-cluster, with and without adaptive coarsening:

16

Example Simulation

• Resolution: 880*880*336; 260M cells, 6.5M active on average

17

Overview

• Introduction

• OpenMP Parallelization

• MPI Parallelization

• Validation Experiment

• Conclusions

18

Numerical Experiment: Single Rising Bubble

• Implementation only using MPI

• Validation for Metal Forms or e.g. Bubble Reactors

• Comparison to 2D Level-Set Volume of Fluid method

• Modified Parameter of Animation: Surface Tension

19

Numerical Experiment: Single Rising Bubble

20

Parallel MPI Performance

21

Overview

• Introduction

• OpenMP Parallelization

• MPI Parallelization

• Validation Experiment

• Conclusions

22

Conclusions

• High performance by combining OpenMP and MPI

• Grid compression requires modifications

• Adaptive coarsening is problematic

23

24

• Unused slides:

25

Cell LBM Simulations

• Goal: …

• Available cell systems: – Blades– Playstation 3

26

Cell Architecture

27

Cell Performance Measurements

• Implementation issues:• as much as possible work on SPUs• SIMD vectorization, also of bounce-back• memory layout must support alignment

restrictions for DMAs and SIMD

MLSUPS: 1 node full blade PS31 SPU/CPU 40 64 41

2 SPUs/CPU 78 111 783 SPUs/CPU 97 143 854 SPUs/CPU 98 164 85

28

PerformanceFree surface LBM-Code

Standard LBM-Code

Performance lousy on a single node! Conditionals: 2,9 SLBM 51 free surface LBM

Pentium 4: almost no degradation ~ 10%SR 8000: enormous degradation (pseudo-vector, predictable jumps)

29

Numerical Experiment: Single Rising BubbleUNUSED

Recommended