Project Dissertation

Real-Time Physically-BasedFluid Simulation on the GPU

Valentin Hinov

Declaration of Originality and Permission to Copy

Author: Valentin Hinov

Title: Real-Time Physically-Based Fluid Simulation on the GPU

Degree: BSc (Hons) Computer Games Technology

Year: 2014

(i) I certify that the above mentioned project is my original work.

(ii) I agree that this dissertation may be reproduced, stored or transmitted, in anyform and by any means without the written consent of the undersigned.

Signature: .................................................................

Date: .................................................................

i

Abstract

Physically-based fluid simulation has long been reserved for the realm of offline ren-dering. Increasing improvements in the parallel computational power of graphicscards are bringing the opportunity to simulate this phenomena in real-time. Thisprojects aims to prove that, with certain simplifications and optimisations, fluidsimulation can be used in demanding applications such as games.

A framework is created for this project to present methods for calculating andrendering fire and smoke using the parallel processing power of the graphics cardthrough the DirectX 11 Compute Shader APIs. The suggested approach takes intoconsideration the importance of maintaining performance in a real-time application.Various LOD(Level Of Detail) and performance optimisation methods used in gamesare adopted and modified for this purpose.

The most important variable for smooth gameplay is the frames-per-second (FPS)that an application maintains. By keeping a constant measure of it, the frameworkprovides a means to monitor the stability and effectiveness of the implementation.

The results of this project show that proper adoption of LOD techniques, suchas frame skipping can greatly reduce processing overhead. On the other hand, theuse of instancing techniques can allow for multiple fluids to be rendered at the costof simulating just one. This, together with smart usage of texture managementhelp keep the memory and processing footprint low. Conclusively, these combinedprovide an optimized solution for using physically-based fire and smoke in a real-time setting, which maintains both accuracy and visual quality. Measurements showthat simulating 3 differently sized fluid domains - 64x128x64, 40x80x40, 30x60x30 -maintains an average frame rate of over 800 on a high tier graphics card, while stillmanaging a comfortable 50 on a low tier one.

Keywords: fluid simulation, performance, DirectX 11, Compute Shader

ii

Preface

I would like to take this opportunity to extend my gratitude to the support and helpI have received from my supervisor, Dr David MacTaggart, and my module tutor,Dr Henry Fortuna. I would also like to thank Alex Dunn, who provided me withvaluable advice and constructive criticism.

I am also immensely grateful for the help and patience provided by TsvetelinaDacheva and the support of my parents during the long production hours on thisproject.

- Valentin Hinov

iii

Contents

Abstract ii

Preface iii

List of Figures vi

List of Tables viii

1 Introduction 11.1 Project Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Background 52.1 Mathematics of Fluid Flow . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Modelling the Simulation Space . . . . . . . . . . . . . . . . . 62.2 State of Fluids in Games . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 LOD and Performance Overview . . . . . . . . . . . . . . . . . . . . . 7

3 Literature Review 83.1 Early work on real-time solvers . . . . . . . . . . . . . . . . . . . . . 83.2 The GPU advantage . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.3 3D Fluid Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.3.1 Particle Systems . . . . . . . . . . . . . . . . . . . . . . . . . 113.3.2 Volume Ray Casting . . . . . . . . . . . . . . . . . . . . . . . 12

4 Methodology 134.1 Introduction and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.1.1 Framework Architecture . . . . . . . . . . . . . . . . . . . . . 134.1.2 Methodology Structure . . . . . . . . . . . . . . . . . . . . . . 14

4.2 Fluid Domain Representation . . . . . . . . . . . . . . . . . . . . . . 144.3 Setting up a Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 15

iv

4.3.1 Choosing a Grid Size . . . . . . . . . . . . . . . . . . . . . . . 154.3.2 Setup Optimisations . . . . . . . . . . . . . . . . . . . . . . . 16

4.4 Running a Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.4.1 Simulation Steps . . . . . . . . . . . . . . . . . . . . . . . . . 194.4.2 Runtime Modifications . . . . . . . . . . . . . . . . . . . . . . 244.4.3 Choosing an Update Rate . . . . . . . . . . . . . . . . . . . . 264.4.4 Frame Skipping . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.5 Rendering Fluids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.5.1 Render Parameters . . . . . . . . . . . . . . . . . . . . . . . . 284.5.2 Fluid Instancing . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5 Results and Discussion 325.1 Testing Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.1.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.1.2 Scene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.2 Visual Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.2.1 Modifying Parameters . . . . . . . . . . . . . . . . . . . . . . 34

5.3 Memory & Performance Results . . . . . . . . . . . . . . . . . . . . . 355.3.1 Memory Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.3.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6 Conclusion and Future Work 386.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Appendix A Test GPUs Specifications 40

Appendix B CD Contents 41

References 41

Bibliography 45

v

List of Figures

1.1 Computational performance of Navier-Stokes equations on new NVidiaGPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

3.1 Advection step moving smoke density along a velocity field. As shownin Stam’s solver from 2003 (Stam, 2003) . . . . . . . . . . . . . . . . 9

3.2 Smoke being pushed and moving around by a gargoyle in "Hellgate:London" . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.1 Different grids, same render scale. Domain size from left to right:16x32x16, 32x64x32, 64x128x64 . . . . . . . . . . . . . . . . . . . . . 16

4.2 Left: Using MacCormack for density and reaction only; Right: UsingMacCormack for all fields . . . . . . . . . . . . . . . . . . . . . . . . 20

4.3 Different vorticity confinement strengths. Strength factors from leftto right: 0, 0.5, 1.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.4 Users have the freedom to edit fluid control settings at runtime toobserve their effects. Reaction values are not used for smoke simulation 25

4.5 Render settings modify the look of a fluid without changing its phys-ical properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.6 Different sample rates of a 64x128x64 fluid from afar. Left: 32 sam-ples; Right: 128 samples . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.7 Different sample rates of a 64x128x64 fluid from up close. Left toright: 32, 64, 128 samples . . . . . . . . . . . . . . . . . . . . . . . . 29

5.1 Looking at the entire final scene from a distance with all fluids in view 335.2 Smoke and fire simulation in the application . . . . . . . . . . . . . . 345.3 Right: Fast decaying fire, producing a lot of smoke; Mid: Strong

fire, burning with nearly no smoke; Right: Average strength fire,producing blue smoke . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.4 Benchmark results on notebook computer using a NVidia GT 640MLE GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

vi

5.5 Benchmark results on gaming PC using an AMD Radeon R9 290XGPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

A.1 Technical Specifications of both graphics cards used for testing. Thebandwidth and clock speeds are the key factors for performance . . . 40

vii

List of Tables

4.1 Texture Formats and their uses . . . . . . . . . . . . . . . . . . . . . 17

5.1 Hardware used for testing . . . . . . . . . . . . . . . . . . . . . . . . 325.2 Video memory used for simulations of different resolution . . . . . . . 35

viii

Chapter 1

Introduction

Fluid simulation has been a hot topic in computer graphics, especially in the lastdecade where the dramatic increase in computational power has affected not only theCPU (Central Processing Unit) but the much more parallel-focused GPU (GraphicsProcessing Unit) (Gupta, 2011).

For virtual environments in games, a correct portrayal of natural phenomena, suchas smoke and fire, aids greatly in immersing the player in the world and makingit appear believable. However, realistic rendering and simulation of these fluidsrequires a considerable amount of resources - both from a processing and from amemory standpoint. In fact, when an extreme degree of accuracy is needed - forexample, how a ships design will handle at sea - high-performance computing (HPC)centres are used and the calculations often take several months to complete.

Computer games, in their inherent nature, are all about interacting in real-timewith a virtual world. As CPUs and graphics cards have got more powerful, theexpectations of how fast real-time is and how good the worlds look has increased.Depending on the game, the accepted frame rate varies between 30 and 60. Dropsbelow 30 become instantly obvious, as the world seems to experience "hiccups" andthe action slows down. In fast-paced, twitch-based experiences, such as first-personshooters or real-time strategies, maintaining a frame rate of 60 is often a requirementfor smooth gameplay.

The challenge of integrating realistic fluid simulation in a virtual world, while ad-hering to these requirements, is the main motivation behind this project. Lets startby defining what a fluid is. A fluid is any substance that flows - meaning it cantake the shape of its container. This includes liquids, such as water, and gases, such

1

as air. Smoke can also be described as a fluid, although it is more accurate to saythat it is composed of tiny particulates suspended in a gas (Gourlay, 2012). Fire isthe chemical process of combustion, leading to the release of heat and light. As itdecays, smoke forms as a by-product.

In graphics a fluid can be modelled as a grid system of cells, each containing theproperties of the fluid at that location. The most important of these are velocity -the speed and direction of the flow at a cell location, and density - the amount ofmaterial that position contains. Every update step, the equation of motion is ap-plied on each cell and the quantity of the properties it contains changes. It followsthat depending on the grid size and quality of the simulation, this traversal andupdate can quite be an expensive operation.

This is where the GPU advantage comes in - splitting up a task into a lot of smallerparallel-running jobs is exactly what the hardware excels at. In fact, early 2D GPUfluid dynamics experiments saw a performance increase of up to six times comparedto a CPU implementation (Harris, 2004). As GPGPU (General-purpose computingon graphics processing units) has advanced with technologies such as CUDA andDirectCompute (MSDN, 2010), speeds of up to ten times faster are becoming areality (NVidia, 2013).

Figure 1.1: Computational performance of Navier-Stokes equations on new NVidiaGPUs

2

Simply moving all calculations to the graphics card does not solve the problem fully.What needs to be considered is that in a proper real-time interactive application,the GPU will be engaged with many other activities - such as rendering polygons,doing lighting calculations and others - meaning that fluid simulation and renderingcannot be the only task occupying resources.

1.1 Project Aim

The objective of this project is to investigate the simulation of physically-basedfluids by taking advantage of modern GPU hardware with the aim of answering thefollowing question:

How can the parallel processing advantage of modern graphics cards beused for simulating physically-based fluids, and how can this approach

be adapted for real-time use?

The main consideration during this investigation will be what simplifications canbe made when simulating - both when setting up and during runtime - in order toobtain a result which is both graphically impressive and computationally efficient.The project hopes to achieve the following:

• Derive an effective way of utilising the GPU for solving the equations of fluidmotion in 3D.

• Discover what level of detail methods and performance optimisations can beapplied in order to use fewer system resources.

• Draw conclusions and recommendations for further research into this area.

Over the course of this undertaking, an experimental framework will be developedto showcase the research discoveries. It will also be used to gather quantitative datain order to make the appropriate conclusions as to the effectiveness of the providedsolutions.

3

1.2 Dissertation Structure

Chapter 2 gives additional background on the main topics of discussion: the math-ematical formulas describing fluid motion; current state of fluid representation ingames and it ends with a review of how level of detail is used in games to increaseperformance and how existing techniques can be adapted to fluid simulation.

Chapter 3 presents past research in the area of fluid simulation - starting fromwork on early real-time solvers and moving on to research into using the GPU. Italso discusses past work on ways of rendering fluids as well as research into integrat-ing level of detail in fluid simulation.

Chapter 4 describes this projects implementation of a physically-based incompress-ible Navier-Stokes. It also discusses what optimisations are used for setting up andrunning a simulation. It ends with how fluid rendering is handled.

Chapter 5 analyses the results and data collected from the experimental frame-work and explores their implications.

Chapter 6 concludes this dissertation and draws recommendation for future workthat could be undertaken.

4

Chapter 2

Background

2.1 Mathematics of Fluid Flow

For understanding of the mechanics behind fluid dynamics, knowledge of differentialvector operations is expected. Gradient ∇f , divergence ∇·~v, curl ∇×~v, directionalderivative ~v·∇f and the Laplacian ∇2 are all used in the Navier-Stokes Equationswhich are described below.

∂~u

∂t=−(~u·∇)~u− 1

ρ∇p+ν∇2u+ ~F (2.1)

∇·~u= 0 (2.2)

Where ~u is the velocity of the fluid; p is the pressure; ρ is the density; ν controlsthe viscosity of the fluid and ~F encapsulates all external forces acting on it.

Equation (1) is known as the momentum equation of fluid flow. It is derived fromNewton’s 2nd law of motion which means it describes the acceleration of the fluiddue to forces acting on it. From left to right these being advection, pressure, diffu-sion and external forces. When dealing with complex media it is common to makesimplifying assumptions so as to more easily model the problem. Thus, when deal-ing with a fluid it is assumed that it is an incompressible and homogeneous one.Equation (2), the continuity equation enforces the incompressibility assumption byensuring that the fluid always has zero divergence, meaning that the volume of thefluid will remain constant in time.

5

The Navier-Stokes equations are commonly used because they precisely describe theevolution of a velocity field over time given its current state and other forces Stam(2003). The key task of a fluid solver is to compute a numerical approximation of ~u.This velocity field later controls the visual phenomena of the fluid - smoke densityor fire reaction values for example.

2.1.1 Modelling the Simulation Space

Fluids are typically modelled in one of two ways - as a field or as a particle system.These are referred to as the Eulerian and Lagrangian viewpoints, respectively. Thefirst considers the fluid as a region of points - each containing properties like velocityand density. These values change with time, but the points containing them stayfixed in space. The Lagrangian viewpoint takes the more conventional approach ofmodelling the continuum as a set of particles. Each particle, in addition to carryingwith it the properties of the fluid, has a position component. The easiest way tovisualise this is to think of the particles as molecules of fluid that move in time. Themost common way of representing Eulerian fluids is an arrangement of voxels andLagrangian ones as classic particle systems. A more in-depth description of theseviewpoints can be found in (Bridson, 2008).

2.2 State of Fluids in Games

Fluid simulation in games covers a wide range of phenomena - the most importantof which are water, smoke and fire. As this project mainly deals with the latter two,they will be the focus of discussion. For a more detailed look at the state of waterin games, please refer to (Barrett, 2012).

3D games have to do an impressive amount of work to provide an immersive expe-rience. During each update call physics, pathfinding, lighting, rendering and othercalculations have to be computed. It is no surprise that developers look to simplifyeffects whenever they can. Particle systems have for the longest time been used tomodel smoke. Particles are just 2D textured sprites, which always face the virtualcamera. Fire is often rendered the same way with the addition of static animations.

With improvements in lighting and particle control, the look of the effects doesvisibly improve but at their core the simulation is not based on physical propertiesbut is determined by design tools.

6

An additional negative side to this representation is that it makes proper interactionwith the fluids very difficult to achieve. While, as is discussed later, in a suitablydefined Navier-Stokes solver, boundary conditions are part of the simulation processand can be used as a means of accurate interaction with the system.

2.3 LOD and Performance Overview

The need to render many and different graphical effects on a system with limited re-sources has been a widely explored challenge. Various techniques are often employedto save processing time and system memory while still maintaining good graphicalquality. It is worthwhile to study some of these methods with the view of how theymight be adopted for fluid simulation.

A common performance boost when rendering 3D polygon mesh objects involvesreducing the amount of polygons they are made out of (Valve, 2012). There area variety of different ways to accomplish this - either with pre-made low-poly ver-sions of the mesh or via a procedural method at runtime, often using GPU shaders.A system is then set up to intelligently swap or blend between different versionsbased on various parameters, such as the ones mentioned above. The end result isless graphics bandwidth used and fewer computations made. If done properly, theplayer never notices it.

Level of detail has other uses than just object rendering. It also has a place inthe complex calculations such as rigid body dynamics. For simulating collisions be-tween bodies, for example, if the player is not looking at the objects in question, asimplified, less realistic calculation can take place. Whatever the player is looking atneeds to behave consistently, but objects outside this direct area are less importantand approximations can be used.

A common occurrence in games is the need to render many of the same objectmultiple times. Creating copies for the resources required, like vertex buffers andtextures, can quickly add up. Instead the same graphical data is used to rendermultiple copies of the object where required (Carucci, 2005). Since transferring tri-angle data from the CPU to the GPU and submitting state changes is a relativelyslow operation, instancing is a method that frees up valuable CPU processing time.Batching as many draw calls together as possible is an often advised method ofoptimising game renderers.

7

Chapter 3

Literature Review

Investigations into the physics behind fluid simulation dates back to the 18th and19th centuries when the mathematicians Euler, followed by Navier and Stokes devel-oped the basics of analytical solutions to fluid flows. With the start of the computerprocessing era came the possibility to calculate solutions to these equations numer-ically. Far from the idea of real-time applications, however, early research into thetopic focused on engineering applications, striving for accuracy and not factoring intime taken (Hess and Smith, 1967).

3.1 Early work on real-time solvers

In the late 90s and early 2000s the prospect of real-time simulations started to beactively discussed in research fields. Up to this point the majority of the inves-tigation had been into offline graphical solvers. Also, the majority of numericalsolvers by that point used explicit techniques which suffer from instability unless asmall simulation time step is provided. It was Jos Stam, who in 1999 SIGGRAPHconference, proposed an implicit Navier-Stokes solver that was stable under highertime steps and was fast enough so results could be viewed instantly (Stam, 1999).The importance of this paper stems from the fact that the method it put forwardwas designed to be used in real-time. Not only that, but this approach allows forboundary conditions to be dynamic and, as such, opens the door to interactivitywith the fluid. For game applications this is key. The resultant technique is verysuccessful in simulating gaseous-type fluids and will influence this study.

8

Figure 3.1: Advection step moving smoke density along a velocity field. As shownin Stam’s solver from 2003 (Stam, 2003)

Stam’s initial proposal has downsides, though. Namely, it suffers from "numericaldissipation" (also known as numerical diffusion/smoothing). This, as described by(Bai and Turk, 2005) is due to the averaging operations performed when interpo-lating values in the differential equation numerical solvers. Due to the lower orderaccuracy of the advection routine, Stam’s method experiences this. This not onlytends to smooth out interesting features, like vortices in the fluid, but also makesthe fluid appear too viscous.

With this problem in mind, Fedkiw et al. (Fedkiw et al., 2001) presented a sem-inal paper in the 2001 SIGGRAPH proceedings. In "Visual Simulation of Smoke"the incompressible Euler equations are used as the fluid solver on a staggered gridarrangement. They are combined with a new method called "vorticity confinement"(Steinhoff and Underhill, 1994) which injects the energy lost due to numerical dissi-pation, effectively balancing out the simulation. The result is that, even on a fairlycoarse grid, the aforementioned interesting features, such as swirling vortices in thesmoke field, are preserved and the overall lifespan of the smoke is improved. LikeStam’s proposed method, this one is stable for large timesteps and allows for dy-namic boundaries. The downside of this procedure is that it introduces an extracomputational step in the algorithm. The step itself is not overly expensive andgreatly enhances the simulation’s look, so it will be considered for this research.

9

3.2 The GPU advantage

At the SIGGRAPH 2003 conference the power of the GPU was the topic of dis-cussion. Krüger and Westermann demonstrated that the parallelism of graphicsprocessors can be used as a matrix solver and to handle finite difference equationsfor PDE approximations (Krüger and Westermann, 2003). On an ATI9800 card aninteractive visualized 2D Navier-Stokes solution ran at 9 FPS (frames per second)on a 1024x1024 grid. In contrast, the CPU solvers provided by (Fedkiw et al., 2001)need more than a second per frame on a similar sized domain. The advantage ofthe GPU became obvious and fluid dynamics research reflected that.

In 2004, as part of the GPU Gems book (Fernando et al., 2004), Harris wrote achapter entitled "Fast Fluid Dynamics on the GPU" (Harris, 2004). He describeda method, based on Stam’s "Stable Fluids" technique that offloaded all equationof motion calculations to the graphics card and produced a very fast interactive2D Navier-Stokes solver. He successfully demonstrated how the grid data can betranslated into textures and how pixel shaders, which run simultaneously on eachpixel every render call, can be used to calculate the simulation. To solve the Poisson-pressure equation he used a Jacobi iteration scheme, which, compared to the Krügerand Westermann conjugate gradient and multigrid solvers, converges slower, but issimple to implement and makes easy use of parallel calculations. Harris details howhis approach can be easily extended to allow for arbitrary boundaries (indeed, it isjust an addition of a texture that contains them to each shader). This GPU Gemschapter provides a straightforward introduction into using shaders as a fluid solverand will influence the early investigation of this study. Harris also describes a meansto extend the domain into 3D by "layering" 2D textures, but as current technologyallows for easy use of 3D Textures, it will not be considered.

As graphics hardware saw a staggering growth, in 2007 Harris’ work was built uponin GPU Gems 3 (Nguyen et al., 2007). In the chapter "Real-Time Simulation andRendering of 3D Fluids" (Crane et al., 2007) the authors extend GPU simulation ofreal-time fluid dynamics into the 3D domain. Their example program successfullysimulated either fire, smoke or water in a 70x70x100 grid. Additionally, using thepowerful Direct3D 10 support for 3D textures and the brand new geometry shaderfunctionality, their method allowed for any 3D object to voxelised and used as adynamic boundary for the simulation. Results of this can be seen in the game"Hellgate: London" (Studios, 2007), which utilises this procedure.

10

Figure 3.2: Smoke being pushed and moving around by a gargoyle in "Hellgate:London"

In the 2004 GPU Gems chapter, Harris uses a semi-Lagrangian backward advectionstep that is based upon the one used by Stam (Stam, 1999) and as such suffers fromnumerical smoothing. Crane et al. address this issue by utilising a MacCormackscheme, which is a higher-order accuracy advection solver, in addition to vorticityconfinement. While this introduces two intermediate semi-Lagrangian steps in theadvection process and is not an unconditional method, it allows for better visualfidelity of the final result without increasing the grid resolution. This saves memorybandwidth at the expense of more computation, but in the chapter’s words "math ischeap compared to bandwidth". As this project will be looking into 3D smoke andfire simulation by utilizing graphics hardware, this work will be used for reference.

3.3 3D Fluid Rendering

Graphics cards are optimised for rendering polygons and especially triangles. Thismust be taken into account when it comes to displaying volumetric data, such as asmoke, as there is no native way of rendering volume.

3.3.1 Particle Systems

In almost all early real-time fluid solvers (Stam, 2003) and many modern ones(Gourlay, 2012) and (McGuire, 2006), the approach is to use particles. This hasthe initial advantage of using an already established system that is common ingames and other graphical applications. Lagrangian or semi-Lagrangian schemesare used to represent the domain. In the example from Gourlay, there are two typesof particles. The first are called vortex particles (or vortons). They are used torepresent the flow field and are free to move anywhere. The second type are justregular particles, used for visualisation of the effect. They change their colour andopacity state, depending on the vortons around them.

11

Using particle systems is advantageous when using a Lagrangian scheme and comeswith the advantage that the simulation space can be global, instead of a constrainedgrid size. The disadvantage when using particles to visualise the fluid is the CPUand GPU memory and processing overhead of storing and updating all of them. Thefiner the detail level required, the more particles need to be used. Another downsideis that an unconstrained, dynamic simulation space is difficult to implement whenusing the GPU.

3.3.2 Volume Ray Casting

The other main rendering technique which has gained more traction in GPU fluidsolvers is called ray-marching (or volume ray casting). This is the approach used by(Crane et al., 2007) and (Zhou et al., 2007). This method works by considering thefluid as a box, made up of many voxels, which contain the fluid properties. Whenrendering, rays are traced from the point of view to the volume. The rays are then"marched" through the domain with a predefined sample rate, accumulating colourbased on what the volume contains - smoke density, for example. This ends eitherwhen enough density to get a fully saturated colour has been collected, or the rayexits the volume. Usually, to get a decent visual result, a step size equal to half avoxel is used when marching. Results of using volume ray casting can be seen inthe figure on page 11.

There are certain problems present in ray-marching. As discussed by (Crane et al.,2007), banding is a visible artifact that appears if the sample step is too big or gridresolution is too small. It is mostly prevalent when looking at the fluid up close.There are certain ways to compensate for this - by using a smaller sample step, ortaking an extra sample at each step. These both come at an additional computa-tional cost, though.

Ray-marching fits very well with the Eulerian and semi-Lagrangian schemes as itconsiders the simulation domain as a fixed grid with changing properties. It isalso a technique that is inherently parallel and can naturally be implemented usingGPU pixel shaders. As this work will focus entirely on leveraging the graphics card,volume ray casting will be the rendering method considered.

12

Chapter 4

Methodology

4.1 Introduction and Goals

Upon completing the research into this topic and revising the major project aims,implementation goals are set. To revise - the main objective of this project is toprovide an effective way of performing fluid simulation on the GPU and adapt it tobe used at runtime. With this in mind, the framework created is tasked to fulfil thefollowing:

• Support simulation of both fire and smoke.

• Compute and render at least 2 different fluids at the same time.

• Maintain a frame rate of at least 30 FPS on low-to-mid tier graphics cardsand 60 FPS on high-end ones.

• Showcase improved application performance by using one or more LOD tech-niques.

4.1.1 Framework Architecture

The framework developed for this project targets the Windows 7+ operating sys-tems. It uses Direct3D for rendering and is implemented in C++. For graphicscard operations it makes use of the powerful DirectCompute API along with HLSL(High-Level Shading Language) for writing compute and pixel shaders.

13

4.1.2 Methodology Structure

The methodology starts by talking about how the fluid domain is represented. Then,it covers the set up of a simulation and optimisations that it makes use of. Whatfollows after is a description of the process of running a fluid simulation and thetechniques that are used to control it at runtime. The methodology concludes withhow fluids are rendered. Please note that fluid simulation and fluid object will beused interchangeably.

4.2 Fluid Domain Representation

In order to solve the fluid equations of motions numerically, the domain must bediscretized into computational points that the solver works with. As discussed inthe background chapter, an Eulerian representation models a domain as a grid ofpoints that contain the properties of the fluid. This is the approach this frameworkutilises. The main reason being that 3D Eulerian grids are can be logically mappedto voxels in 3D textures, which contain the data the GPU requires. The following ispart of a function which creates a 3D texture that can hold a 4 16-bit floating pointnumbers:

D3D11_TEXTURE3D_DESC textureDesc;

textureDesc.Width = SizeX;

textureDesc.Height = SizeY;

textureDesc.Depth = SizeZ;

textureDesc.Format = DXGI_FORMAT_R16G16B16A16_FLOAT;

Where SizeX, SizeY and SizeZ vary depending on the size of the required domainbounds. The format can also be changed. Each fluid uses a number of equally sizedtextures to represent its state and properties. These are then stored on and used bythe graphics card. The equations of motion are calculated by running computationalkernels (implemented by shader programs) over the textures.

14

4.3 Setting up a Simulation

The first necessary requirement when creating a new fluid object is to determineits grid dimensions. Textures of this size are then created for the various fluidproperties. Each texture is then used to create a ShaderParams structure, outlinedbelow:

struct ShaderParams {

CComPtr<ID3D11ShaderResourceView> mSRV;

CComPtr<ID3D11UnorderedAccessView> mUAV;

};

Generally, a ShaderResourceView (SRV) is used as an input to a shader program, asit can only be read from, and a UnorderedAccessView (UAV) is used as an output,as it can be written to. A more detailed description of Direct3D resource interfacescan be found on (MSDN, 2014). For all fluid properties, except divergence, vorticityand obstacles, 2 textures and ShaderParams structures are created. This is due tothe need to keep track of the fluid state at the previous time step in order to evaluatethe new one.

Choosing what type the fluid will be is also decided during this step. The frameworkallows for two kinds - fire and smoke. While simulating both is nearly identical, firesimulation requires 2 extra textures to keep track of the fire reaction values (thisdetermines the intensity of the fire at each cell and is used when rendering it).

At the end of the setup static boundary conditions are initialised. As fluids aremodelled as being in a box domain, boundary conditions are modelled as a single-cell wide obstacle along each wall of the box and are stored in an 8-bit texture.

4.3.1 Choosing a Grid Size

Grid size has the biggest effect on how fast a simulation step is processed. A32x64x32 domain, for example, will be evaluated more than 3 times faster thana 64x128x64 one. This is both due to the fact that there are less cells to processand because smaller texture sizes need less memory bandwidth.

The render size of the fluid is independent of its simulation domain. This meansthat both a high and a low-resolution grid can be rendered with the same size. Up-

15

scaling the render size of a coarse grid can lead to visible artifacts, as there will beless cells to sample for a good appearance. Similarly, rendering a fine grid in smallerscale is potentially wasteful, as the increased detail is harder to spot.

Figure 4.1: Different grids, same render scale. Domain size from left to right:16x32x16, 32x64x32, 64x128x64

When setting up a simulation, it is best to first determine the render scale requiredand use that to choose the appropriate grid resolution in order to achieve the detailneeded. It is worth noting that dynamically resizing textures at runtime is notfeasible, so grid sizes stay fixed throughout execution.

4.3.2 Setup Optimisations

Memory is a key factor during setup, since depending on the fluid, a simulation canuse up to 15 textures (due to double buffering) to keep all the required data for itsstate. These all have to be stored in memory and bound-unbound from the graphicspipeline every frame. Using many fluid objects has the risk of bottlenecking theGPU and starving it of memory.

Texture Formats

Direct3D offers an expansive range of different texture formats that can be usedon the GPU. Based on the needed number of components and their size, choosingthe correct format helps reduce video memory used and keeps texture bandwidthlow, increasing performance. This is the first place for potential optimisations.When constructing fluid object textures, the format chosen is the smallest one thatcan contain the data. The DXGI_FORMAT_R16G16B16A16_FLOAT format isused for textures that hold fluid velocity and vorticity, since it is the smallest one

16

that provides 3 components per cell. Density and pressure, on the other hand,only need 1 component per grid cell, which means they can be created with theDXGI_FORMAT_R16_FLOAT format. This leads to using 4 times less memory.It is worth to mention that using 16 bits per float is in itself an optimisation overusing 32 bit floats, but as shown in other research (Crane et al., 2007), the visualdegradation due to precision is hardly discernible. Below is a table of all formatsused and the properties they’re used for.

Table 4.1: Texture Formats and their uses

Direct3D Texture Format Uses

Texture Format Fluid Property

DXGI_FORMAT_R16G16B16A16_FLOATVelocity

Vorticity

DXGI_FORMAT_R16_FLOATDensity

Temperature

Reaction

Divergence

PressureDXGI_FORMAT_R8_SINT Obstacles

Texture Sharing

There are several resources that need to be created for each fluid object that areunique to it. These are the velocity, density, temperature, vorticity, obstacle andreaction (for fire simulation) textures. They are unique, as they must be maintainedthrough program execution. On the other hand, the textures for velocity divergenceand fluid pressure are used temporarily by each solver.

To take advantage of this, when a fluid simulation is constructed it first checksif an instance of these common resources has been created for its grid size. If so - ituses them, if not - it constructs and makes them available for further sharing. Withthis advantage in mind, if using many fluid objects - it is advantageous to buildthem of the same size.

17

4.4 Running a Simulation

Once the 3D scene has been initialised, the main application loop begins. In it, eachfluid simulation is updated based on the numerical equation of motion solver. Forthis implementation, fluids are modelled as both incompressible and inviscid. Thus,the equations of motion become:

∂~u

∂t=−(~u·∇)~u− 1

ρ∇p+ ~F (4.1)

∇·~u= 0 (4.2)

The calculation process involves solving each part of these equations in order, usingthe result from each as the input to the other. The function that solves this everyupdate is outlined below:

void Fluid3DSolver::Process(ID3D11DeviceContext *context) {

// Set the obstacle texture - it is constant throughout the

execution step

context->CSSetShaderResources(4, 1,

&(mFluidResources.obstacleSP.mSRV.p));

// Set all the constant buffers to the context

SetShaderBuffers(context);

//Advect temperature, density and reaction against velocity

AdvectProperties();

// Advect velocity against itself

AdvectVelocity();

// Determine how the temperature of the fluid changes the velocity

ComputeBuoyancy();

// Add a constant amount of density and temperature back into the

system

RefreshConstantImpulse();

// If there are any extra forces - add them here

ApplyExtraForces();

// Inject vorticity back into the system

ComputeVorticityConfinement();

// Subtract the pressure gradient from the velocity field. This

computes divergence free velocity.

ComputeProjection();

}

18

Firstly, the function binds the obstacle texture to the graphics pipeline. This isbecause nearly all compute shader programs query this texture and there is noneed to constantly rebind it. Afterwards, the SetShaderBuffers function copies therequired fluid computational parameters - which control aspects of calculation - fromstandard C++ structs into GPU constant buffers.

4.4.1 Simulation Steps

The rest of the functions solve the equations of motion. They all have underlyingsimilarities: binding required SRVs as inputs and UAVs as outputs to their respectiveshader programs. Each is calculated using a numerical method that estimates itsvalue. Below are all the steps outlined in order of execution.

Advection

Advection is what happens when the velocity field of the fluid transports other quan-tities, including itself, along the flow. This is described by the term (~u·∇)~u. Thereare two methods that the framework uses to calculate advection.

The first, simpler one, is the trace-back implicit routine (Stam, 1999). It uses asemi-Lagrangian scheme to calculate the new quantity of a fluid property at a posi-tion by tracing back the trajectory to its former cell and copying the quantity. Theadvantage of this advection technique is that it is unconditionally stable for anytime steps and velocities.

p (~x,t+ ∆t) = ξ× p(~x−~u(~x,t)∆t, t)−µ (4.3)

Here p (~x,t+∆t) is the quantity at the new time step. ξ is a user-defined dissipationterm. It is in the range ξ ∈ [0,1] and it artificially controls how fast the quantitybeing advected dissipates. 1 is no dissipation and lower values lead to the quantitydisappearing faster. µ is the decay constant. It is used only for fire simulation andcontrols how fast the fire reaction dies out. When it is used the end result is clampedto not go below 0.

The second advection routine used is the one proposed in (Crane et al., 2007) -the MacCormack scheme. It works by first performing two semi-Lagrangian steps,one by tracing forward and one by tracing back. Using those values, it performs ahigher-order accuracy calculation, which leads to less numerical diffusion than the

19

previous routine.

φ̂n+1 = A(φn)

φ̂n = AR(φ̂n+1)

φn+1 = ξ× (φ̂n+1 + 12(φn− φ̂n))−µ

(4.4)

Here, φn indicates the advected property, φ̂n+1 and φ̂n calculate the two intermedi-ate properties. φn+1 gives the final property at the new time step. A performs theadvection routine 4.3 on the passed quantity and AR indicates that it is performedin reverse (meaning with a negative time step value). Again ξ is the dissipation fac-tor and µ is the decay constant. When doing MacCormack advection, no artificialdissipation or decay is performed on the first two steps. Since this advection routineis not unconditionally stable, the final result is clamped within the minimum andmaximum values of the surrounding grid cells.

While the MacCormack scheme gives improved detail, it forces the creation of twoadditional textures to hold the intermediate results and the computational costof calculating them. The cost of the first can be offset by using texture sharing,mentioned previously, between simulations for these temporary values. The compu-tational cost is dealt with by using MacCormack advection only for the density andreaction properties and the standard one for the temperature and velocity fields.

Figure 4.2: Left: Using MacCormack for density and reaction only; Right: UsingMacCormack for all fields

20

As it can be seen, due to the chaotic nature of both fire and smoke, the extradetail gained by using the more expensive advection routine on all fields is hardlydiscernible. In fact, the only difference can be seen in the beginning of a simulation,as the MacCormack one advects slightly faster.

Buoyancy

Buoyancy is what causes hot air to rise and cool air to fall. In the simulationit is used to modify the velocity field at each grid cell based on the temperatureand density values at that cell, the density weight and buoyancy and the ambienttemperature of the environment. It is one of the external forces ~F in equation 4.1.

~fbuyoancy = ((T −Tamb)ϕ− (ρ×κ))~vup ∆t (4.5)

Where T and ρ represent temperature and density at the current grid locationrespectively. Tamb is the ambient temperature of the fluid - if not used, can be leftat 0. The buoyancy factor of the density field is ϕ - it controls how buoyant thesmoke is, meaning how quickly it rises with the hot air. κ is the smoke weight - ahigher value will exert a stronger force on the velocity field and will make it die outfaster. The result is multiplied by the global normal up vector and the current timestep numerical integration value. The resultant force is then applied to the velocityvalue at that cell location.

Constant Impulse and External Forces

All fluid objects have been designed to have new quantities added every simula-tion step. For smoke, a constant amount of temperature and density is injectedfrom the bottom of the domain. The addition of the first helps the system maintainvelocity and the second keeps a steady stream of smoke that is the final visible result.

With fire simulation the addition of temperature remains the same. It also in-jects extra reaction into the system along with the temperature. This is analogousto adding fuel to a fire. Afterwards, an extinguishment test is performed on the grid.This samples reaction values and determines if they are below an extinguishmentthreshold - if so, smoke is formed based on a reaction constant.

The ApplyExternalForces method is mainly reserved for future use. This is whereforces such as wind can be added to introduce more chaotic behaviour into the sys-tem. User interaction with fluids can also be accomplished using this function. Any

21

quantity used by the simulation can be added in this step.

Vorticity Confinement

Even when using a higher-order advection routine, the solver still suffers from nu-merical dissipation. Vorticity confinement (Fedkiw et al., 2001) tries to offset thisby calculating the local vorticity

~ω =∇×~u (4.6)

and injecting it back into the velocity field. Calculating this is the first step of theprocess. Afterwards a normalized vorticity location vector is retrieved using:

~η =∇|~ω|

~N = ~η

|~η|(4.7)

In both equations, vector operations are estimated using finite difference methods.The final confinement force is then calculated by:

~fconf = ε( ~N ×~ω)∆t (4.8)

In this equation ε > 0 is called the strength factor and controls the amount of smallscale detail that is introduced back into the velocity field. In this project implemen-tation it is clamped to the range ε ∈ [0,1]. This force is then added to the existingflow.

Vorticity confinement requires the addition of 1 extra texture per fluid and 2 rela-tively cheap shader program operations per update step. The technique proves vitalto the proper appearance of both smoke and fire and more than makes up for itscost in visual quality.

22

Figure 4.3: Different vorticity confinement strengths. Strength factors from left toright: 0, 0.5, 1.0

As can be seen - a suitable strength factor lies between 0.5 and 1.0. The simulationsin the project application use values in that range with fire ones tending to be higher.

Projection

Up to this point a velocity field ~w has been calculated but it does not adhere to thecontinuity equation 2.2 as it is divergent. Therefore, the final step in each simulationupdate is to calculate a divergence-free flow field. (Harris, 2004) explains that theHelmholtz-Hodge Decomposition Theorem can be used to correct the velocity bysubtracting the gradient of the pressure field:

~u= ~w−∇p (4.9)

To compute the pressure field the following Poisson-pressure equation can be used:

∇2p=∇· ~w (4.10)

These two equations are logically broken down into 3 operations. The first calculatesthe divergence of the velocity field ∇· ~w and stores it in a texture. Again, vectoroperations are estimated using finite differences.

23

The second step solves the Poisson-pressure equation using a common method calleda Jacobi iteration solver. It is a technique that converges relatively slowly to a solu-tion but has the advantage of being cheap to run using GPU kernels (Harris, 2004).This project uses an average of 10 to 15 Jacobi iterations for both fire and smoke.A higher number will provide better looking, more accurate results but the compu-tational cost rises quite steeply. As proven by (Crane et al., 2007) higher iterationcounts do not lead to overly better quality render results.

The final step is a straightforward subtraction of the resultant pressure gradientfrom the divergent flow field. The result is stored in ~u which becomes the newvelocity field.

Boundary Interaction

As mentioned in section 4.3, all fluids have a single voxel wide obstacle texture onthe box edges that acts as the boundary for the system. Cells in this texture eitherhave the value of 1 if there is an obstacle at the location, or 0 if there is none. Allcomputational steps have access to this texture and use it differently.

Its most important function is to enforce the free-slip boundary condition, whichstates that a fluid cannot flow into or out of a solid, but can freely flow along itssurface. This is mainly done in the projection step, where if an obstacle is detected,the velocity component of that cell is taken as 0. When performing Jacobi iterationsand sampling adjacent cells, if an obstacle is present, the pressure component of thatcell is not used - this is the approach utilised by (Crane et al., 2007).

Obstacles are similarly used in the computation of vorticity confinement and ad-vection - forcing the velocity vector to be 0 if inside a boundary.

4.4.2 Runtime Modifications

Since so many variables control the appearance and structure of a fluid object, it isdeemed feasible to have as many of them available to be edited at runtime as pos-sible. These are all kept in a C++ struct called FluidSettings and, along with thedomain size, are used when constructing a fluid. These parameters are then trans-ferred to the GPU in various buffers during the SetShaderBuffers function from 4.4.

24

At runtime, nearly all of the control parameters can be edited from a user interfacewindow. This window appears when the user clicks on a fluid object with the mouse.

Figure 4.4: Users have the freedom to edit fluid control settings at runtime to observetheir effects. Reaction values are not used for smoke simulation

When a parameter is edited, a method is called on the respective FluidCalculatorfor that object.

void Fluid3DCalculator::SetFluidSettings(const FluidSettings

&fluidSettings) {

// Update buffers if needed

int dirtyFlags = GetUpdateDirtyFlags(fluidSettings);

this->fluidSettings = fluidSettings;

if (dirtyFlags & BufferDirtyFlags::General) {

UpdateGeneralBuffer();

} else if ...

}

It first checks to see what settings have been changed and sets the necessary updatedirty flags. Using the dirty flag pattern allows for only the constant buffers that

25

have changed to be updated, instead of all of them. Updating a buffer involvescopying its contents from GPU to system memory, changing them and then copyingthem back into the GPU so it should not be overused as advised by (McDonald,2012). Dirty flags assist with this.

In a real game environment a player would not have such access to fluid settings,but this is immensely useful as a level of detail or game design tool as it allows forfine-tuning of just how the simulation plays out.

4.4.3 Choosing an Update Rate

For doing updates on objects each frame, games tend to use the difference betweenthe time at the new frame subtracted from the time at the old frame. This is referredto as the delta time. Since this can vary with frame rate, sensitive calculations suchas game physics tend to use a "fixed" integration time step value that is independentfrom delta time.

This approach is used here - the value is, by default, 1/30, meaning 30 fluid up-dates per second. Note that this value controls how often the "process" methodof a fluid is called, not the ∆t value for the calculation formulas - that is definedseparately for each fluid. The advantage of calling "process" at a fixed rate is that itkeeps fluid movement consistent. If it was updated with a variable rate, each fluidwould slow down or speed up, leading to a distorted look.

30 updates a second was chosen since it is fast enough for each fluid to developwith reasonable speed, while still keeping up decent performance. It can be changedat runtime, although if the update rate increases above the processing capability ofthe hardware, the application slows down as it cannot keep up with the requirednumber of updates. Rates of around 30 to 50 a second are common choices, althoughhigher ones are certainly achievable on better hardware.

26

4.4.4 Frame Skipping

Even with the many simplifications, memory cutbacks and processing optimisationsused, updating a reasonably-sized fluid object every frame is a demanding operation.Here is where an LOD technique called frame skipping comes in use. Its premiseis quite simple - instead of updating a fluid simulation every frame, do it everyfew frames. It is inspired by the approximation techniques used by game physicssimulations and has previously been adopted for fluids (Tangvald, 2007). Below isthe implementation as used in the project.

void Update() {

bool canUpdate = framesSinceLastProcess > framesToSkip;

if (canUpdate) {

fluidCalculator->Process();

framesSinceLastProcess = 0;

}

else {

++framesSinceLastProcess;

}

}

Although a very simple LOD method, frame skipping frees up substantial comput-ing power, especially when using many fluid objects. Its downside is that its effectsare quickly spotted. Even skipping one frame per process step means that the sim-ulation will update twice as slow. Therefore, this technique is only used on fluidswhich are not in the current view frustum. Even then, it starts being used only afterthe simulation has had a few seconds to develop first. Afterwards, no difference inbehaviour can be noticed when looking away and then back at a fluid object, sincethe behaviour is inherently chaotic.

Choosing the amount of frames to skip can be changed at runtime. It has to benoted that the performance gained by skipping additional frames is not linear - itpeaks at 5 and most is gained around skipping 2 or 3.

27

4.5 Rendering Fluids

Rendering the final result is done via the Ray-marching technique previously dis-cussed (Zhou et al., 2007). It was chosen due to it being straightforward to im-plement in a standard pixel shader program and for its ability to give good visualresults.

A fluid in 3D space is represented by an object called a VolumeRenderer whichat its core is a simple cube - it has a position, rotation and scale components - allthe required properties for rendering in 3D space. When an instance of a volumerenderer is constructed it needs to know what type of fluid it will render - smoke orfire. If the type is smoke - it can be given a reference to a 3D texture (in the formof a SRV) of smoke density values that it then uses for drawing. If rendering fire,it can also be passed a reference to a 3D texture of fire reaction values in additionto density. This creation data is important, as there are different pixel shaders usedwhen rendering each type.

4.5.1 Render Parameters

There are certain parameters that affect the render result of a fluid simulation whichcan be modified at runtime.

Figure 4.5: Render settings modify the look of a fluid without changing its physicalproperties

Number of Samples

The number of samples is the sample rate described in section 3.3.2. It has a directeffect on the quality of the produced result. A higher rate will sample more densityand reaction values, thus producing a more accurate average colour. It also meansthat more time will be spent in the pixel shader, which directly affects performance.

In practise, sample rate matters only when the fluid takes up a significant amount

28

of screen space. This is due to the fact that a pixel shader is only run on the visiblepixels on the screen that an object occupies.

Figure 4.6: Different sample rates of a 64x128x64 fluid from afar. Left: 32 samples;Right: 128 samples

As it can be seen, from afar the difference in quality is hardly discernible, althoughthe step size difference is substantial. The performance of both is nearly identical- since there are less pixels occupying the screen space, the extra time spent in theshader program is insignificant.

Figure 4.7: Different sample rates of a 64x128x64 fluid from up close. Left to right:32, 64, 128 samples

This is the same flame as the one in the previous figure. When viewing from a

29

closer distance, the quality of using a higher number of samples can be seen moreclearly (this is more defined when seeing the fluid moving). This is due to the lowerstep value leading to a smaller range of colours used to represent the fluid. A vis-ible improvement is seen when increasing the sample rate from 32 and 64, but avery slight one when going from 64 to 128. This is because more samples cannotmake up for the grid size of a fluid. Even for a relatively big domain, like the one inthe figures, there will be little visual gain when using more than 100 samples per ray.

There are significant performance implications when using a higher sample ratewith the fluid in full view, since the fluid takes up a large part of the screen. De-pending on the view distance, rendering with 32 samples could be nearly twice asfast as rendering with 128. It is therefore best to decide upon a sample rate thatwould give a good visual result, yet still compute fast.

Colour and Absorption

Changing the Smoke Color property alters the colour appearance of smoke for bothtypes of simulations.

Smoke and fire Absorption control how much to saturate the resultant colour whensampling density and reaction values respectively. A higher value will mimic thicksmoke or flames, while a lower one will produce a weaker looking flame or lightersmoke.

4.5.2 Fluid Instancing

A key goal throughout the development of this project is to separate the concept offluid motion calculation from fluid rendering. A Fluid3DCalculator object does notknow about a VolumeRenderer and vice versa. The former is responsible for settingup and running the equations of motion on a set of 3D grids while the latter willrender suitable 3D textures passed to it.

Given this separation, it is straightforward to implement a form of instancing forfluids. This means that one fluid instance can be drawn multiple times by differentvolume renderers. Since the cost of rendering is trivial compared to the cost ofsimulating, this allows for a scene to seemingly contain many fire and smoke effects,while only computing a small amount.

30

Volume renderers using data from the same fluid instance will display identicalresults. To make them visibly dissimilar, each can be set different render parame-ters. A combination of colour and absorption can be used to achieve non-identicallooking fluids. The final scene as seen on page 33 is made out of 2 unique smokesimulations - one of which has 3 instances, and 1 unique fire simulation that has 2instances.

Instancing and Frame Skipping

Frame skipping is used when a fluid simulation is not in view. Instancing meansthat the same fluid simulation can be in more than one place. To deal with this,before activating frame skipping, all volume renderers that use a particular fluidobject are tested for visibility. If even one is in view - frame skipping will not occur.

31

Chapter 5

Results and Discussion

The previous chapter covered the implementation details of calculating the fluidequations of motion and rendering the result. It also discussed various optimisationmethods used to make the process as performant as possible. This chapter willexamine the results of the implementation to determine its effectiveness. This willinvolve scrutinizing both the visual results of the simulation and its performance.

5.1 Testing Setup

5.1.1 Hardware

The application was tested and benchmarked on two different systems. The first isa mid-tier laptop and the second is a high-end "gaming" PC.

Table 5.1: Hardware used for testing

Laptop PC

CPU Intel Core i7-3632QM @ 2.20GHz i5 3570K @ 4.5GHz

RAM 8 GB, DDR3 12 GB, DDR3

GPU NVIDIA GeForce GT 640M LE, 2 GB DDR3 ATI R9 280X, 3GB GDDR5

OS Microsoft Windows 7 64-bit Microsoft Windows 8.1 64-bit

The important difference between the two setups being the graphics card. TheNVIDIA, being a mobile low-power series, has around 2.5 times less clock cyclesand 11 times less memory bandwidth compared to the ATI one. Detailed specifica-tions on both GPUs can be found in appendix A.

32

Quantitative results are obtained during application runtime. There is an in-gameframe counter to report on FPS. It displays current, minimum, maximum and av-erage frames per second achieved and is used as a benchmark for performance.

5.1.2 Scene

The test scene has been set up to fulfil the application requirements. There are 3different fluids computed at the same time - 2 fire and 1 smoke effects. There are 6volume renderers visualising the results of those simulations.

Figure 5.1: Looking at the entire final scene from a distance with all fluids in view

The user is free to control the camera, click on fluids and change or observe theirparameters. There is also a "scene fly-through" mode, which performs a loopingpredefined movement through the scene. This mode features both up-close anddistance views of the various fluids in the scene.

33

5.2 Visual Results

Real-world phenomena, such as smoke and fire, come with an inherent random-ness and subtle features that computer graphics do not have the power to preciselymimic. With certain simplifications and smart uses of technology, though, the re-sults obtained in this project successfully attempt to bridge that gap.

Figure 5.2: Smoke and fire simulation in the application

5.2.1 Modifying Parameters

Since the application allows the freedom to modify both fluid and render settings -it is very easy to produce different looking simulations.

Figure 5.3: Right: Fast decaying fire, producing a lot of smoke; Mid: Strong fire,burning with nearly no smoke; Right: Average strength fire, producing blue smoke

34

5.3 Memory & Performance Results

The main goal of this project is to prove that the parallel power of graphics cards hasreached a threshold that would allow for real-time physically-based fluid simulation.For this reason memory and frame times are both a topic of common discussionthroughout this project.

5.3.1 Memory Use

In section 4.3.2 the various optimisations that are performed during a fluid objectset up were discussed. By querying the GPU, it can be seen how much video mem-ory fluids of different types and domain sizes use. Below is table with several ofthese results with increasing grid resolution. These do not include video memoryfor rendering.

Table 5.2: Video memory used for simulations of different resolution

Grid Size Smoke Memory Fire Memory Shared Memory

16x16x16 0.1 MB 0.11 MB 0.06 MB

32x32x32 1.8 MB 1.9 MB 0.8 MB

64x64x64 13.8 MB 14.8 MB 5.5 MB

128x128x128 110 MB 118 MB 44 MB

Smoke Memory is the video memory required per unique smoke effect and FireMemory is the memory required per unique fire effect. Shared Memory is how muchof that total can be shared with other simulations.

As it can be seen, the memory required to store all of the textures that containthe fluid properties rises exponentially with grid resolution. By utilising texturesharing, some of this memory cost is offset when using more than one fluid of thesame size. Even so, using sizes bigger than 1283 is infeasible both due to the memorycost required but also because the processing time quickly rises. A good option isto only use a higher resolution in one or two dimensions, while using a smaller onin another.

Alternatively, grids in the range of 303 to 503 are ideal for modelling average sizeduniform domains. Their memory cost comes around 1 to 1.5 times that of high

35

quality PNG images, which are often used as textures in games. Both test GPUshave in excess of 2 GB of memory to spare, so this is a small cost to pay.

Finally, instancing allows for having many fire and smoke effects without payingthe memory cost for creating each one. Its benefits are measured in the amountof instances that use a single fluid object. Considering also that the cost of a vol-ume renderer is insignificant compared to that of a simulation means that, whereappropriate, instancing should be preferred to creating a new fluid effect.

5.3.2 Performance

To recap, the final scene features 1 smoke simulation of grid size 64x128x64, an-other smoke one of grid size 30x60x30 and 1 fire of size 40x80x40. There are atotal of 6 volume renderers displaying the results of these simulations. Each simula-tion does 10 Jacobi solver iterations and uses a sample rate of 64 when ray-marching.

This scene was benchmarked on both test machines several times with increasingsimulation update rates. Benchmarking involves running the scene in "fly-through"mode for a period of 5 minutes and noting down the minimum, maximum andaverage frame rates achieved.

Figure 5.4: Benchmark results on notebook computer using a NVidia GT 640M LEGPU

The substantial difference between the maximum and the minimum and averageFPS is noticed immediately. This is due do the use of frame skipping when someor all simulations are not in view, freeing up GPU resources. The minimum framerate occurs when all fluid objects are in view and one or more are viewed up close,

36

which increases render time. The majority of time in the fly-through mode is spentwith all or 2 out of 3 simulations in view from a distance. This is what the averageFPS captures.

The benchmark results show that going above 30 updates/sec is not feasible onthis setup since frames quickly start dropping. As mentioned previously, if the up-date rate forces the use of more clock cycles and texture bandwidth than available,the program slows down.

Figure 5.5: Benchmark results on gaming PC using an AMD Radeon R9 290X GPU

This graph displays the significance that increased memory bandwidth and clockcycles have on performance. The AMD R290x only begins to get a decreased framerate when doing over 150 updates/sec. Up until then, it consistently keeps an av-erage of above 800 FPS. Only around the 200 updates/sec mark do the simulationsstart reaching the system limits.

In reality, though, there is no reason to use an update rate of more than 30-40when that power can be spent on computing and rendering more fluid objects, in-stead. These results show the potential that the new generation of GPUs have forhandling such computationally intensive tasks.

37

Chapter 6

Conclusion and Future Work

This project had the goal of investigating fluid simulation with the aim of answeringthe following question:

How can the parallel processing advantage of modern graphics cards beused for simulating physically-based fluids, and how can this approach

be adapted for real-time use?

With particular goals being:

• Derive an effective way of utilising the GPU for solving the equations of fluidmotion in 3D.

• Discover what level of detail methods and performance optimisations can beapplied in order to use fewer system resources.

This research has demonstrated that the equations of fluid motion can be calculatedin real-time with reasonable frame rates on the GPU. The project implementationprovided offers an optimised and memory efficient solution for numerically solvingand rendering fire and smoke with satisfactory results.

The performance tests in Chapter 5 clearly show that the newest generation ofgraphics cards are more than capable of updating and rendering many simulationsat once. The tests also showed that low-to-mid tier cards can handle their own whendealing with a few reasonably sized fluid domains at an average update rate.

38

6.1 Future Work

This project covers how to efficiently implement a fluid solver and render the results.For a topic as broad as fluid simulation there is certainly more research that couldbe done.

One area that can certainly be further investigated is implementing interactionswith a fluid. The external forces part of the motion equations can be used to pro-vide a form of user control of the system. (Crane et al., 2007) implement a form ofobject voxelisation using a geometry shader to allow arbitrary 3D models to be usedas obstacles in the simulation. This technique could be extended and improved totake into account different objects going into and out of a fluid domains, disturbingit based on their velocity and shape.

When there are only a few sources of constant input into a fluid domain, largeparts of the 3D grid are left empty but still take up computational time. A betterway to handle updating a fluid would be to split up each grid into chunks and de-termine if a chunk contains fluid properties. Then, only the ones that do will beupdated. This technique has the potential to allow for much faster processing ofbigger fluid domains.

To further increase visual quality, rendering smoke could take into account lightsources and each fluid should be able to cast dynamic shadows. Additionally, a fireitself could be made a light source. This would be achieved by first creating a num-ber of lights per fire simulation and then advecting their positions via the velocityfield and controlling their brightness via the reaction field.

39

Appendix A

Test GPUs Specifications

Figure A.1: Technical Specifications of both graphics cards used for testing. Thebandwidth and clock speeds are the key factors for performance

40

Appendix B

CD Contents

The attached CD contains the following directory structure:

\Application Contains the final application executable.

\Dissertation Contains an electronic copy of this dissertation document.

\Instructions Contains instructions for the operation of the application.

\Media Contains images and video of the final application.

\Project Contains the full source code and assets for the application.

\Proposal Contains an electronic copy of the original project proposal.

41

References

Bai, Y. and Turk, G. 2005. Reducing numerical dissipation in fluid simulation.Georgia Institute of Technology Available from: http://tinyurl.com/pcy4exs.3.1

Barrett, J. 2012. Real-time animation and rendering of ocean waves. [Online]. 2.2

Bridson, R. 2008. Fluid Simulation for Computer Graphics. CRC Press. 2.1.1

Carucci, F. 2005. Inside Geometry Instancing. Addison-Wesley Professional.Available from: http://http.developer.nvidia.com/GPUGems2/gpugems2_

chapter03.html. 2.3

Crane, K., Llamas, I., and Tariq, S. 2007. Real-Time Simulation and Render-ing of 3D Fluids. Addison-Wesley Professional. Available from: http://http.

developer.nvidia.com/GPUGems3/gpugems3_ch30.html. 3.2, 3.3.2, 4.3.2, 4.4.1,4.4.1, 4.4.1, 6.1

Fedkiw, R., Stam, J., and Jensen, H. W. 2001. Visual simulation of smoke. In:SIGGRAPH 2001 Conference. 3.1, 3.2, 4.4.1

Fernando, R. et al. 2004. GPU Gems: Programming Techniques, Tips and Tricks forReal-Time Graphics. Addison Wesley. Available from: http://http.developer.

nvidia.com/GPUGems. 3.2

Gourlay, M. 2012. Fluid simulation for video games. Intel Devel-oper Zone Available from: http://software.intel.com/en-us/articles/

fluid-simulation-for-video-games-part-3. 1, 3.3.1

Gupta, S. 2011. Gpu supercomputers show exponential growth in top 500list. [Online]. Available from: http://blogs.nvidia.com/blog/2011/11/14/

gpu-supercomputers-show-exponential-growth-in-top500-list/. 1

42

http://tinyurl.com/pcy4exs

http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter03.html

http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter03.html

http://http.developer.nvidia.com/GPUGems3/gpugems3_ch30.html

http://http.developer.nvidia.com/GPUGems3/gpugems3_ch30.html

http://http.developer.nvidia.com/GPUGems

http://http.developer.nvidia.com/GPUGems

http://software.intel.com/en-us/articles/fluid-simulation-for-video-games-part-3

http://software.intel.com/en-us/articles/fluid-simulation-for-video-games-part-3

http://blogs.nvidia.com/blog/2011/11/14/gpu-supercomputers-show-exponential-growth-in-top500-list/

http://blogs.nvidia.com/blog/2011/11/14/gpu-supercomputers-show-exponential-growth-in-top500-list/

Harris, M. 2004. Fast Fluid Dynamics Simulation on the GPU. Addison Wes-ley. chap. 38. Available from: http://http.developer.nvidia.com/GPUGems/

gpugems_ch38.html. 1, 3.2, 4.4.1, 4.4.1

Hess, J. and Smith, A. 1967. Calculation of potential flow around arbitrary bodies.In: Progress in Aerospace Sciences. 3

Krüger, J. and Westermann, R. 2003. Linear algebra operators for gpu implementa-tion of numerical algorithms. In: SIGGRAPH 2003 Conference. Available from:http://tinyurl.com/ozb5xpy. 3.2

McDonald, J. 2012. Don’t throw it all away: Efficient buffer man-agement. In: Game Developer Conference. Available from: https:

//developer.nvidia.com/sites/default/files/akamai/gamedev/files/

gdc12/Efficient_Buffer_Management_McDonald.pdf. 4.4.2

McGuire, M. 2006. A real-time, controllable simulator for plausible smoke.Brown University Available from: http://graphics.cs.williams.edu/papers/

SmokeSimBrown06/smoke-simulation-brown06.pdf. 3.3.1

MSDN. 2010. Compute shader overview. [Online]. Available from: http://tinyurl.

com/plpw97t. 1

MSDN. 2014. Resource interfaces. [Online]. Available from: http://tinyurl.com/

mwledo4. 4.3

Nguyen, H. et al. 2007. GPU Gems 3. Addison-Wesley Professional. Available from:https://developer.nvidia.com/content/gpu-gems-3. 3.2

NVidia. 2013. NVidia Computational Fluid Dynamics Page. [Online]. Available from:http://www.nvidia.com/object/computational_fluid_dynamics.html. 1

Stam, J. 1999. Stable fluids. In: SIGGRAPH 1999 Conference. Avail-able from: http://www.dgp.toronto.edu/people/stam/reality/Research/

pdf/ns.pdf. 3.1, 3.2, 4.4.1

Stam, J. 2003. Real-time fluid dynamics for games. In: Game Developer Con-ference. Available from: http://www.dgp.toronto.edu/people/stam/reality/

Research/pdf/GDC03.pdf. (document), 2.1, 3.1, 3.3.1

Steinhoff, J. and Underhill, D. 1994. Modification of the euler equations for “vorticityconfinement”: Application to the computation of interacting vortex rings. Physicsof Fluids . 3.1

43

http://http.developer.nvidia.com/GPUGems/gpugems_ch38.html

http://http.developer.nvidia.com/GPUGems/gpugems_ch38.html

http://tinyurl.com/ozb5xpy

https://developer.nvidia.com/sites/default/files/akamai/gamedev/files/gdc12/Efficient_Buffer_Management_McDonald.pdf



http://graphics.cs.williams.edu/papers/SmokeSimBrown06/smoke-simulation-brown06.pdf

http://graphics.cs.williams.edu/papers/SmokeSimBrown06/smoke-simulation-brown06.pdf

http://tinyurl.com/plpw97t

http://tinyurl.com/plpw97t

http://tinyurl.com/mwledo4

http://tinyurl.com/mwledo4

https://developer.nvidia.com/content/gpu-gems-3

http://www.nvidia.com/object/computational_fluid_dynamics.html

http://www.dgp.toronto.edu/people/stam/reality/Research/pdf/ns.pdf

http://www.dgp.toronto.edu/people/stam/reality/Research/pdf/ns.pdf

http://www.dgp.toronto.edu/people/stam/reality/Research/pdf/GDC03.pdf

http://www.dgp.toronto.edu/people/stam/reality/Research/pdf/GDC03.pdf

Hellgate: London. 2007. DVD-ROM. 3.2

Tangvald, L. 2007. Implementing lod for physically-based real-time fire rendering.[Online]. 4.4.4

Valve, S. 2012. Level of detail. Valve Developer Portal Available from: https:

//developer.valvesoftware.com/wiki/Level_of_detail. 2.3

Zhou, K. et al. 2007. Real-time smoke rendering using compensated ray march-ing. Microsoft Research Available from: http://research.microsoft.com/

pubs/70503/tr-2007-142.pdf. 3.3.2, 4.5

44

https://developer.valvesoftware.com/wiki/Level_of_detail

https://developer.valvesoftware.com/wiki/Level_of_detail

http://research.microsoft.com/pubs/70503/tr-2007-142.pdf

http://research.microsoft.com/pubs/70503/tr-2007-142.pdf

Bibliography

Acheson, D. 1990. Elementary Fluid Dynamics. Clarendon Press.

Rideout, P. 2011. 3d eulerian grid Available from: http://prideout.net/blog/

?p=66.

Selle, A. et al. 2007. An unconditionally stable maccormack method Available from:http://tinyurl.com/nm4novl.

45

http://prideout.net/blog/?p=66

http://prideout.net/blog/?p=66

http://tinyurl.com/nm4novl

Documents

Project Dissertation