Parallelization of 2D Lid-Driven Cavity Flow Asif Salahuddin Ahmad Sharif Jens Kehne 04 December 20081Parallelization of 2D Lid-Driven Cavity Flow

Parallelization of 2D Lid-Driven Cavity Flow

Asif SalahuddinAhmad Sharif

Jens Kehne

04 December 2008 1Parallelization of 2D Lid-Driven Cavity Flow

OBJECTIVES


3

Our objectives

• Numerical simulation of fluid dynamics, using the Lattice-Boltzmann method

• Parallelize the code using MPI– Study speedup and scalability

• Allow to run large problem sizes in reasonable time– Allow to run them at all, for that matter (memory

requirements)

04 December 2008 Parallelization of 2D Lid-Driven Cavity Flow

CONCEPT


The Lattice-Boltzmann method

• The Lattice-Boltzmann equation:

• Velocity directions:

),(),(1

),()1,( )0( txftxftxftexf iiiii


Top-down vs. bottom-up

Partial differential equations

(Navier-Stokes)

Differenceequations (Conserved Quantities?)

Discretization

Partial differential equations

(Navier-Stokes)

Discrete model (LGCA or LBM)

Multi-scale analysis

04 December 2008 Parallelization of 2D Lid-Driven Cavity Flow 6

Fluid nodes

• The entire problem is represented as a grid of fluid nodes– Fluid nodes hold velocities towards all neighbors

• New grid state computed for discrete time steps


Wall bounceback

• The fluid domain is surrounded by walls

• On each timestep, the direction of links hitting a wall is reversed

• Walls may be moving– Changes the momentum of the fluid close to it


IMPLEMENTATION


Domain decomposition

• Each processor processes part of the grid

: Ghost nodes– Represent border nodes

of the neighbors

: Border nodes– Updated by neighbors

: Inner nodes– We can update these alone


Automatic decomposition

• Factorize and merge– Factorize x and y dimension and #procs– Divide x and y by prime factors of #procs

• Goal: Try to keep the processor’s grids as square as possible– Best relation between inner and border nodes– Minimizes communication


Automatic decomposition - demo

#CPUs: 6 = 2 * 3X-axis: 30 = 2 * 3 * 5Y-axis: 20 = 2 * 2 * 5


30

20 # CPUs: 6

1010

Optimizations

• Overlapping wall bounceback and communication– About 5% speedup

• Overlapping inner node computation with communication– Massive slowdown!– Probably due to cache effects

• Making use of regular communication pattern– Slower (we have no idea why!)


EXPERIMENTAL RESULTS


Experimental setup

• Lonestar Linux cluster @ University of Texas– Part of the Teragrid project

• 1300 compute nodes• 2 Intel Xeon 2.66 GHz dual-core CPUs per node– 42.6 GFLOPS/node

• 8GB RAM/node• Linux kernel 2.6, 64 bit• Infiniband interconnect, fat tree topology


Actual speedup


Relation to expected speedup


QUESTIONS


Documents

Parallelization of 2D Lid-Driven Cavity Flow Asif Salahuddin Ahmad Sharif Jens Kehne 04 December 20081Parallelization of 2D Lid-Driven Cavity Flow