47
CUDA Fluid simulation Lattice Boltzmann Models Cellular Automata

Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

CUDA

Fluid simulation

Lattice Boltzmann Models

Cellular Automata

Page 2: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Please excuse my layout of slides for the remaining part of the talk!

Page 3: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Fluid Simulation

� Navier Stokes equations for incompressible fluids

� Well known technique from computer graphics

� Stable fluids, Jos Stam, Siggraph ’99

� Can take arbitrary large time steps

� Finite difference approximations resultsin grid computations similar to LBM

Page 4: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

SDK fluids

DEMO

Page 5: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

SDK fluids

Not 100% identical to the following method…

Page 6: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Navier-Stokes (again)

∂u∂t

= −(u ⋅∇)u − 1ρ

∇p − ν∇2u + f

∇ ⋅ u = 0

Page 7: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Projection operator

� Define a projection operator PPPP that projects a

vector field w onto divergence-freecomponent u.

� PPPP w = PPPP u + PPPP (∇∇∇∇p)� Since by definition PPPP w =PPPP u = u then PPPP (∇∇∇∇p)=0

� And then

Page 8: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

� Break it down [Stam 2000]:

� Add forces

� Advect

� Diffuse

� Solve for pressure

� Subtract pressure gradient

Algorithm

Page 9: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Algorithm

From: Stam J. Stable Fluids. Siggraph 1999.

Page 10: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Algorithm

� Break it down [Stam 2000]:

� Add forces:

Source: Mark Harris / GPGPU tutorial@Siggraph 04

∂u∂ t

= − (u ⋅ ∇ )u − 1ρ

∇ p − ν ∇ 2 u + f

External Force

Page 11: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Add Forces

w1= u(x,t) + f (x,t)∆t

Source: Mark Harris / GPGPU tutorial@Siggraph 04

Explicit Euler integration

Page 12: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

� Break it down [Stam 2000]:

� Add forces:

� Advect:

Algorithm

Source: Mark Harris / GPGPU tutorial@Siggraph 04

∂u∂ t

= − (u ⋅ ∇ )u − 1ρ

∇ p − ν ∇ 2 u + f

Advection

Page 13: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Advection

� Advection:

� quantities in a fluid are carried along by its velocity

� Want velocity at position x at new time t + ∆t

� Follow velocity field back in time from x : (x - w1∆t)

� Like tracing particles!

� Simple in a fragment program

u(x, t+∆t)

u(x’ , t)

Path of fluid

Trace back in time

w2(x) = w

1(x − w

1∆t)

Source: Mark Harris / GPGPU tutorial@Siggraph 04

Page 14: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Algorithm

� Break it down [Stam 2000]:

� Add forces:

� Advect:

� Diffuse:

Source: Mark Harris / GPGPU tutorial@Siggraph 04

∂ u∂ t

= − ( u ⋅ ∇ ) u − 1ρ

∇ p − ν ∇ 2 u + f

Diffusion (viscosity)

Page 15: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

� Implicit

� Stable for large timesteps

Numerical integration

),(),()(

),(),(),(

),(),(),(

2

)(

22

22

22

22222

3

txwhtxwhI

htxwhtxwhtxw

htxwh

txwhtxw

dt

dw

=+∇−

⇒+∇+=+

⇒+∇=−+=

43421xw

νν

ν

Page 16: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Viscous Diffusion

� Solution by Jacobi iteration

I − ν∆t∇2( )w3

= w2

Source: Mark Harris / GPGPU tutorial@Siggraph 04

Page 17: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

� Break it down [Stam 2000]:

� Add forces:

� Advect:

� Diffuse:

� Solve for pressure:

Algorithm

∇2 p = ∇ ⋅ w

3

Source: Mark Harris / GPGPU tutorial@Siggraph 04

Page 18: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Poisson-Pressure Solution

� Poisson Equation

� Jacobi, Gauss-Seidel, Multigrid, etc.

� Jacobi easy on GPU, the rest are trickier

∇2 p = ∇ ⋅ w

3

Source: Mark Harris / GPGPU tutorial@Siggraph 04

Page 19: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

� Break it down [Stam 2000]:

� Add forces:

� Advect:

� Diffuse:

� Solve for pressure:

� Subtract pressure gradient:

Algorithm

u(x,t + ∆t) = w3

− ∇p

Source: Mark Harris / GPGPU tutorial@Siggraph 04

Page 20: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Subtract Pressure Gradient

� Last computation of the time step

� u is now a divergence-free velocity field

u(x,t + ∆t) = w3

− ∇p

Source: Mark Harris / GPGPU tutorial@Siggraph 04

Page 21: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Implementation

Cg code.

But we “convert” to Cuda on the fly…

Page 22: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Pseudocode of timestep

Page 23: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Textures

Page 24: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Advection

w2(x) = w

1(x − w

1∆t)

Page 25: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Diffusion

I − ν∆t∇2( )w3

= w2

Page 26: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Divergence

Page 27: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Gradient subtraction

Page 28: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Partial differential equations

� Finite difference discretizations lead to “lattice formulations”

� Like the Jacobi iteration program

� Similar in implementation to cellular automata

Page 29: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Lattice Boltzmann Models

Speculations…

Page 30: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Implementing Lattice Boltzmann Computationon Graphics Hardware

Wei Li, Xiaoming Wei, and Arie Kaufman.

The Visual Computer 19(7-8) 2003.

Hardware-near paper

Extrapolate…

Page 31: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

movie

Page 32: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

� The LBM consists of a regular grid and a set of packet distribution values.

� Each packet distribution fqi corresponds to a velocity direction vector eqi shooting from a node to its neighbor.

LBM

Page 33: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

� Every step is easy to program� Initially read fi from global memory but

store in shared memory

� Iterate and write out densities, velocities…

� Write out only for visualization or external boundary update

LBM in Cuda

Page 34: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Two papers from the same authors!

Page 35: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Speedup

Page 36: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Cellular automata

Page 37: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Informal definition

Cellular automaton

� A regular grid of cells, each in one of a finite number of states.

� The grid can be in any finite number of dimensions.

� Time is also discrete

� The state of a cell at time t is a function of the states of a finite number of cells (called its neighborhood) at time t − 1.

Adapted from wikipedia

Page 38: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Rich behaviour from simple functions

� Example, exclusive or

Courtesy Prof. Bastien Chopard

Page 39: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

CA in CUDA: format

� If the number of possible states per cell corresponds to 232

� integer

� No bool type on GPUs (at present)

� Chars available

� Represent multiple cell per primitive for best performance

Page 40: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Exclusive OR

Implementation considerations

Page 41: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

CA in CUDA: global memory

� Read from and write to global memory after each iteration

� Simple and easy

� Inefficient with respect to memory bandwidth

� Kernel

� Read neighbours’ states

� Compute four-way exclusive or

� Write result at node

Page 42: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

CA in CUDA: textures

� Read states from a texture and write to global memory after each iteration

� Cache hits reduces memory bandwidth

� BUT

� Currently no write-to-texture support

� Write to global memory and copy to CUDA array

� For simple kernels the cop outweighs the cache gain

Page 43: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

CA in CUDA: shared memory

� Read states from global memory once into shared memory and write to global memory after each iteration� Reduces bandwidth

� Kernel� Read current cell state into shared memory

� Block border cells read also border cell state� Watch out for bank conflicts!

� Synchronize threads

� Compute four-way exclusive or from shared mem.

� Write result at node (from register)

Page 44: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

In my experience…

� For 256x256 grid

� Shared memory version ~5 times faster as global memory version in a specific 2D registration problem

Page 45: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Cuda profiler

Page 46: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

Iterate in kernel

� If you are only interested in the result after ‘n’ iterations

� Iterate in kernel

� to remove CPU overhead

� Only border cells need to read/write global memory in each iteration

� Communication between blocks

� Rest of shared mem. is already known

Page 47: Fluid simulation Lattice Boltzmann Models Cellular Automatacui.unige.ch/~chopard/GPGPU/4-LBM-CA.pdf · Block border cells read also border cell state Watch out for bank conflicts!

That was it

Thank you for coming