Volumetric Lighting for Many Lights in Lords of the Fallen

Volumetric Lighting for Many Lights in Lords of the Fallen

Benjamin GlatzelEngine/Graphics Programmer

Deck13 Interactive GmbH

Who are we?• One of Germany’s leading game studios

• Currently working on “Lords of the Fallen” in cooperation with CI Games

• We’re using our own proprietary multi-platform technology called “Fledge”

• We’ve shipped numerous titles primarily on PC but also on Xbox 360, iOS and PS3 (maybe you know Jack Keane, Ankh, Venetica, Blood Knights or Tiger and Chicken)

Lords of the Fallen

• Lords of the Fallen is a challenging Action-RPG for PC, Xbox One and PlayStation 4

• Will be released fall 2014

• For an in-depth view into the rendering guts of Fledge, visit Philips talk tomorrow

Who am I?

• Engine/Graphics Programmer since 2 years

• Mainly responsible for the GNM/PS4 version of “Fledge”

• Apart from that I'm behind everything related to physics, our software rasterisation based culling system, our IK system, …

Introduction

Light Scattering

Lightwaves

Participating media

Motivation

Motivation

• Simple light shafts as a screen space post-processing effect [1] sure are shiny, but…

Light shafts as a post-processing effect

Light shafts as a post-processing effect

Motivation

• Billboards can be neat, but…

“Billboard volumetrics”

“Billboard volumetrics”

Motivation

• We wanted something more dynamic and flexible that could be tightly integrated into our lighting system

• It should work with a lot of small to medium sized light sources

• Our artists tend to place a whole lot of lights

• Thus a negligible performance penalty on all supported platforms was critical

State of the Art

Deep Down

Killzone 4Crysis 3

State of the Art• Many recent implementations seem to be based on the

work of Toth et. al. [2]:

• Ray marching in light view space while evaluating the shadow map

• Often combined with a special sampling approach to reduce the workload per fragment

• Many other approaches/optimisations popped up over the recent years: Epipolar sampling [3], sampling planes shaded in light space [4], …

Our Approach

Our Approach

• Loosely based on “Real-time Volumetric Lighting in Participating Media” (Toth et. al. [2])

• Straightforward ray marching

• Usage of “Interleaved Sampling” to reduce the overall sample count needed per fragment

• Utilises low-resolution rendering to reduce the fragment workload even further

Our Approach

• Works with multiple lights and light types

• Custom bilateral blurring and depth-aware up-sampling to work around the obvious artefacts

• Various tweaks and optimisations per light type

• Completely implemented using good old pixel and vertex shaders - no compute

Basic Algorithm

Radiative Transport Equation [2]

~x(s) = ~x0 + ~!s

L(~x(s), ~!)

⌧

a

P (~!0, ~!)

Ray equation, where ω is the direction of the ray

Change of radiance along the ray

Probability of collision

Scattering probability after collision

Phase function

dL(~x(s), ~!)

ds

= �⌧L(~x(s), ~!) + ⌧a

Z

⌦0L(~x(s), ~!)P (~!0

, ~!)d!0

L(~x(s), ~!) = e

�⌧sL(~x0, ~!) +

Z s

0Li(~x(l), ~!)e

�⌧(s�l)dl

L(~x(s), ~!) ⇡ L(~x0, ~!)e�⌧s +

NX

n=0

Li(~x(ln), ~!)e�⌧(s�ln)�l

Ignore multiple scattering

Li(~x, ~!) = ⌧a

�

4⇡d2v(~x)e�⌧d

P (~!l, ~!) In-scattering term

s Total ray marching distance

d Distance to the light source

l Traveled distance on the ray�l Step size

v(~x) Visibility function

� Source power of the light

Direction from the ray position to the light source

~!l

Basic Algorithm

• Let’s start with a simple fullscreen pass for a directional light

• Start the ray marching on the position of the current fragment in light space

• Evaluate and accumulate the in-scattering term for each of the n samples and march in equidistant steps towards the position of the viewer

#define NUM_SAMPLES 128!#define NUM_SAMPLES_RCP 0.0078125!!FRAGMENT_OUT ps_main(VERTEX_OUTPUT f_in)!{! // Fallback if we can't find a tighter limit! float raymarchDistanceLimit = 999999.0 ;!! [...]!! // Reduce noisyness by truncating the starting position! float raymarchDistance = trunc ( clamp ( length ( cameraPositionLightVS . xyz - positionLightVS . xyz ) , ! 0.0, raymarchDistanceLimit ) ) ;!! // Calculate the size of each step! float stepSize = raymarchDistance * NUM_SAMPLES_RCP ;! float3 rayPositionLightVS = positionLightVS . xyz ;!! // The total light contribution accumulated along the ray! float3 VLI = 0.0 ;!! // ... start the actual ray marching! [loop] for ( float l = raymarchDistance; l > stepSize ; l -= stepSize ) ! {! executeRaymarching(...) ;! }! ! f_out . color . rgb = light_color_diffuse . rgb * VLI ;! return f_out ;!}

#define TAU 0.0001!#define PHI 10000000.0!!#define PI_RCP 0.31830988618379067153776752674503!!void executeRaymarching(...)!{! rayPositionLightVS . xyz += stepSize * invViewDirLightVS . xyz ;!! [...]!! // Fetch whether the current position on the ray is visible form the light's perspective - or not! float3 shadowTerm = getShadowTerm ( shadowMapSampler, shadowMapSamplerState, rayPositionLightSS . xyz ) . xxx ;!! // Distance to the current position on the ray in light view-space! float d = length ( rayPositionLightVS . xyz ) ; ;! float dRcp = rcp ( d ) ;!! // Calculate the final light contribution for the sample on the ray...! float3 intens = TAU * ( shadowTerm * (phi * 0.25 * PI_RCP) * dRcp * dRcp ) * exp( -d * TAU ) * exp ( -l * TAU ) * stepSize ;!! // ... and add it to the total contribution of the ray! VLI += intens ;!}

From One to Many

From One to Many• Render the back faces of the

light volume for each volumetric light (depth test/write disabled)

• Start the ray marching on the fragment of the light geometry instead of the scene geometry

• If the light volume intersects the scene geometry, the starting position gets clamped to the closest fragment position relatively to the viewer

From One to Many• Calculate the in-scattering term as depicted before

• In addition to that evaluate the attenuation function for each given light type and “modulate” it with the in-scattering term

• March the ray in light view and in world space in parallel - less costly than transforming between spaces for each step

• Accumulate the volumetric lighting contribution for each visible light to an accumulation buffer using additive blending

From One to Many

• Constrain the taken samples to the area inside the light volume to increase the precision

• For box and point lights we simply clamp the total ray marching distance to the attenuation ranges of the lights

• In the case of spotlights we actually calculate the intersection points between the current ray and the light volume and calculate the range in-between

Much slow

Wow

So sample

How to Make it Fast

How to Make it Fast

• Everything I told you so far needs far too many samples to achieve visually pleasing results

• 128+ samples per fragment for each light rendered to a full resolution target does not sound like the ideal solution

How to Make it Fast

• We ended up rendering all volumetrics to a half or quarter resolution target

• We use an additional depth aware up-sampling pass to hide this fact - often referred to as ”Nearest Depth Up-Sampling“ [5]

Without depth-aware up-sampling

With depth-aware up-sampling

How to Make it Fast

• Only using half-resolution rendering will not suffice to make it fast enough for multiple light sources on the screen

• We can “abuse” the fact that the in-scattered light value at a given fragment position is either equal or at least close to one or more of the surrounding values

How to Make it Fast

• We spread the evaluation of the in-scattering term from a single pixel to multiple pixels

• We ended up using 8x8 pixel tiles, where each pixel of a tile evaluates 16 samples

• This makes a total of 8x8x16 = 1024 potential samples

• Each pixel of one tile evaluates a different region of the ray

vs.

How to Make it Fast• Assign an unique index i ∊ [0..64) to each pixel of the tile

- the indices repeat for each tile

• Reduce the total ray marching distance by one step

• Offset the ray marching starting position for each pixel of the tile according to i

•

• Randomising the indices trades the obvious repetitive sampling pattern for some less noticeable noise

ray� = istepSize

64

#define INTERLEAVED_GRID_SIZE 8!#define INTERLEAVED_GRID_SIZE_SQR 64!#define INTERLEAVED_GRID_SIZE_SQR_RCP 0.015625!![...]!! // Calculate the offsets on the ray according to the interleaved sampling pattern! float2 interleavedPos = fmod ( f_in . position . xy, INTERLEAVED_GRID_SIZE ) ; !!#if defined (USE_RANDOM_RAY_SAMPLES)! float index = ( interleavedPos . y * INTERLEAVED_GRID_SIZE + interleavedPos . x ) ;! // light_volumetric_random_ray_samples contains the values 0..63 in a randomized order! // The indices are packed to float4s => { (0,1,2,3), (4,5,6,7), ... }! float rayStartOffset = light_volumetric_random_ray_samples [ index * 0.25 ] [ fmod ( index, 4.0 ) ] * ( stepSize * INTERLEAVED_GRID_SIZE_SQR_RCP ) ;!#else! float rayStartOffset = ( interleavedPos . y * INTERLEAVED_GRID_SIZE + interleavedPos . x ) * ( stepSize * INTERLEAVED_GRID_SIZE_SQR_RCP ) ;!#endif // USE_RANDOM_RAY_SAMPLES! ! float3 rayPositionLightVS = rayStartOffset * invViewDirLightVS . xyz + positionLightVS . xyz ;!![...]

Accumulation buffer before the gather pass

How to Make it Fast

• To achieve the final results we use an additional blur pass before the up-sampling pass

• We use a simple bilateral blur filter to avoid bleeding over the edges of any geometry inside or behind the volumetrics

Accumulation buffer after the gather pass

Non-bilateral blur

Bilateral blur

Non-bilateral blur

Bilateral blur

Render light geometry for each volumetric and execute ray marching

R11G11B10 1/2 Resolution

Apply horizontal and vertical bilateral Gaussian Blur

Accumulation Pass

Gather Pass

Apply depth-aware up-sampling Upscale Pass

Composite Pass

Add final up-scaled buffer to the scene

R11G11B10 Native Resolution

Final Scene

Extending the System

2D projector texture (gobo/cookie)

3D noise texture

IES profilesTop down perspective

Isostropic scattering

Anisotropic scattering (Henyey-Greenstein phase function)

p(⇥) =

1� g2

(1 + g2 + 2g cos⇥)

1.5

Anisotropic scattering (Schlick phase function)

p(⇥) =

1� k2

(1 + k cos⇥)

2

k ⇡ 1.55g � 0.55g3

Without temporal re-projection

With temporal re-projection

Performance

Pass PC (GTX 700 Series GPU) PS4/GNM

Accumulation* 0.362 ms 0.161 ms

Gather 0.223 ms 0.375 ms

Upscale 0.127 ms 0.321 ms

= 0.712 ms = 0.857 ms

*measured using a half resolution render target

Results

No volumetrics

Volumetrics active

No volumetrics

Volumetrics active

“Faked” multiple scattering

Thanks for listening! :) Questions?

Contact

• Benjamin Glatzel <[email protected]>

• @begla

• http://www.deck13.com

mailto:[email protected]

http://www.deck13.com

References• [1] Volumetric Light Scattering as a Post-Process - http://

http.developer.nvidia.com/GPUGems3/gpugems3_ch13.html

• [2] Real-time Volumetric Lighting in Participating Media - http://sirkan.iit.bme.hu/~szirmay/lightshaft.pdf

• [3] Epipolar Sampling for Shadows and Crepuscular Rays in Participating Media with Single Scattering - http://www.sfb716.uni-stuttgart.de/uploads/tx_vispublications/espmss10.pdf

• [4] Light Shafts - Rendering Shadows in Participating Media - http://developer.amd.com/wordpress/media/2012/10/Mitchell_LightShafts.pdf

• [5] Fast Rendering of Opacity Mapped Particles using DirectX 11 Tessellation and Mixed Resolutions - https://developer.nvidia.com/sites/default/files/akamai/gamedev/files/sdk/11/OpacityMappingSDKWhitePaper.pdf

http://http.developer.nvidia.com/GPUGems3/gpugems3_ch13.html

http://sirkan.iit.bme.hu/~szirmay/lightshaft.pdf

http://www.sfb716.uni-stuttgart.de/uploads/tx_vispublications/espmss10.pdf

http://developer.amd.com/wordpress/media/2012/10/Mitchell_LightShafts.pdf

https://developer.nvidia.com/sites/default/files/akamai/gamedev/files/sdk/11/OpacityMappingSDKWhitePaper.pdf

Bonus Slides

½-Resolution accumulation buffer

¼-Resolution accumulation buffer

static const float gauss_filter_weights[] = {! 0.14446445, 0.13543542, 0.11153505, 0.08055309, 0.05087564, 0.02798160, 0.01332457, 0.00545096!} ;!!#define NUM_SAMPLES_HALF 7!#define BLUR_DEPTH_FALLOFF 1000.0!!float4 gatherGauss ( in float2 blurDirection , in float2 uv )!{! [...]!! [unroll]! for ( REAL r = -NUM_SAMPLES_HALF; r <= NUM_SAMPLES_HALF; ++r )! {! uvOffset = r * blurDirection * rendertarget_size . zw ;! kernelSample = SAMPLE ( inputSampler, uv + uvOffset ) . rgba ;! kernelDepth = getLinearDepth ( depthSampler, depthSamplerState, uv + uvOffset ) ;!! // Simple depth-aware filtering! depthDiff = abs ( kernelDepth - centerDepth ) ;! r2 = BLUR_DEPTH_FALLOFF * depthDiff ;! g = exp ( -r2*r2 ) ;! weight = g * gauss_filter_weights [ abs ( r ) ] ;!! accumResult += weight * kernelSample . rgb ;! ! accumWeights += weight ;! }!! return float4 ( accumResult . rgb / accumWeights , 1.0 ) ;!}!!float4 ps_gather_horz ( VERTEX_OUTPUT f_in ) : SV_Target!{! return gatherGauss ( float2 ( 1.0, 0.0 ), f_in . uv0 ) ;!}!![...]

float4 ps_upsample ( VERTEX_OUTPUT f_in ) : SV_Target!{! [...]!! // Better choose something relative to the far clip distance here! const float upsampleDepthThreshold = 0.0001 ;!! float minDepthDiff = 1.0 ;! uint nearestDepthIndex = 0 ;!! float currentDepthDiff = abs ( sampleDownsampledDepth[0] - fullResDepth ) ;! bool rejectSample = currentDepthDiff < upsampleDepthThreshold ;!! [branch]! if ( currentDepthDiff < minDepthDiff )! {! minDepthDiff = currentDepthDiff ;! nearestDepthIndex = 0 ;! }!! currentDepthDiff = abs ( sampleDownsampledDepth[1] - fullResDepth ) ;! rejectSample = rejectSample && currentDepthDiff < upsampleDepthThreshold ; !! [branch]! if ( currentDepthDiff < minDepthDiff )! {! minDepthDiff = currentDepthDiff ;! nearestDepthIndex = 1 ;! }!! // Repeat this for the remaining 2 samples! [...]!! // Avoid blocky artefacts using edge detection! if (rejectSample)! return float4 ( SAMPLE ( inputSampler, f_in . uv0 ) . rgb, 1.0 ) ;!! return float4 ( sampleR[nearestDepthIndex], sampleG[nearestDepthIndex], sampleB[nearestDepthIndex], 1.0 ) ;!}

Technology

Volumetric Lighting for Many Lights in Lords of the Fallen