Battle-Tested Deferred Rendering on PS3, Xbox 360 and PC

Preview:

DESCRIPTION

Battle-Tested Deferred Rendering on PS3, Xbox 360 and PC. Tibor Klajn scek, Technical Director, ZootFly. Overview. The G-Buffer Rendering pipeline Lighting details Anti-aliasing HDR Platform-specific issues. G-Buffer. We use a full deferred shading approach - PowerPoint PPT Presentation

Citation preview

Battle-Tested Deferred Battle-Tested Deferred Rendering on PS3, Xbox Rendering on PS3, Xbox 360 and PC360 and PC

Tibor KlajnTibor Klajnscek,scek,

Technical Director,Technical Director,

ZootFlyZootFly

Overview The G-Buffer Rendering pipeline Lighting details Anti-aliasing HDR Platform-specific issues

G-Buffer We use a full deferred shading

approach A single, heavily #ifdef-ed material

shader writes the G-Buffer 3 RTs on consoles + native depth 32-bit (8888) RTs, 16 bytes/pixel Using DX9 on PC so 4 RTs since we

have to write depth as well

G-Buffer shader Supports all standard stuff (skinning,

parallax, reflection...) Detail texture (UV offset and normal bend) Overlay texture (own UV set) Rim light Self illumination from texture Vertex shader wind Per-polygon billboarding

G-Buffer layout

Accumulation buffer needed for forward lighting (e.g. lightmaps, self-illumination, rim light, fog)

DOF amount calculated here to avoid extra depth reads in post process stage

  R G B A

RT 0 Accumulation RGB DOF Amount

RT 1 Color RGB Spec Exponent

RT 2 Normal Phi Normal Theta Translucency Spec Amount

RT 3 Linear depth (encoded into 8-bit channels, PC only)

G-Buffer visualization Color

Normal

G-Buffer visualization

Depth (exaggerated)

G-Buffer visualization

Specular amount

G-Buffer visualization

Specular exponent

G-Buffer visualization

G-Buffer normals Hemispheric normals looked bad

- Projection lets you see neg. normals - Shading can swim because of this

Straight RGB 888 world-space was good, but needed an extra channel

Stored in spherical coordinates - Two 8-bit channels – just 16 bits - Looks better than other two - Conversion cost can be quite high - Lookup texture can be a win here

G-Buffer position On PC store linear Z as RGB encoded

float On consoles use the main z-buffer

- undo projection in light shader World space position

- Interpolated camera to far plane vector * linear Z + camera pos

Google “reconstruct position from depth”

The PipelineThe Pipeline

1. Opaque & Alpha Test

Lays down initial G-Buffer and Z Fill accumulation buffer with ambient, IBL

and self-illumination Z-prepass was not a win for us We render sorted by material first and still

get good early Z Just in case you forgot: make sure to render

alpha test last OUT: Z, accum, color, normal

2. Decals

Alpha test, alpha blend, multiply, additive

Can write all RTs except depth Change normals & color before lighting Can't change specular

- output alpha used for blending - but specular is in the alpha channel

OUT: accum, color, normal

Color without decals

Color with decals

3. Background

Vanilla sky box (optional) Any geometry labeled as background by

the artists Simple shader, no lighting Up to artists to make it look good Far 10% of Z range reserved for this

pass OUT: Z, accum

4. Lighting

Explained in detail in a moment Most of the work happens here We support all standard light types

plus a few custom additions Ambient, Point, Spot, Volume,

Directional, Ortho OUT: accum

5. Transparencies

Alpha geometry & particles sorted Forward shader Lighting only via 3rd order SH

- Compute lighting for center of obj. SH coefficients efficient to calculate in jobs Artists hand tweak cases where it doesn’t look

right - Split mesh into more chunks - Tweak mesh/vertex colors

Lighting details

Lighting - before

Lighting – after

Lighting – overdraw 33 lights in view, all pretty large

Three color light

Artists specify three diffuse colors Front color (N•L) Mid color 1-abs(N•L) Back color (-N•L) Wrap around(-ish)

- Back = black - Mid = 0.5 * front - Almost correct

Less lights needed FASTER!

Three color light

Sub-surface scattering / Translucency Just front color bleeding through to the

back We’re not actually doing proper

scattering... But looks really cool on leaves and

other thin surfaces Also helps noses, earlobes etc. You also get shadows from behind!

Without SSS

With SSS

*Note the shadows

SSS Mask in the G-Buffer

Projected texture

Every light can project a texture It’s just multiplied at the end Cube texture for point lights Had issues with MIP LOD calculation on Z

discontinuities Only solution was to manually override LOD

(tex2Dlod) Select LOD based on screen-space size, but be

aggressive Tweak selection until it looks OK

Lighting shader code

float lightdot = dot( Normal , ToLight );

// Fake sub-surface scattering

float3 SSSColor = FrontColor * SSSAmount;

SSSColor *= 0.3 + shadow*0.7;

float3 Result;

Result = saturate( lightdot) * lerp( BackColor , FrontColor , shadow );

Result += saturate(1-abs(lightdot)) * MidColor;

Result += saturate(-lightdot) * (BackColor + SSSColor);

Result *= PixelMaterialColor;

Result += SpecularBlinn( Normal , HalfVec ) * SpecColor * Shadow;

Result *= ProjectedMaskTexture;

Excerpted just the relevant bits...

Light filters/groups

We have no filtering Could use IDs, but didn’t

- shader would run on tons of pixels that would get rejected in the end

- needs extra channel in g-buffer Artists use custom water tight

meshes grouped under the light in Maya to contain lights

Multiplicative lights

All our lights can be set to use multiply as blend mode

Useful for adding in dark spots without many lights

Also helps if you need to add a dark spot in a hurry before shipping

Multiplicative lights Before

Multiplicative lights After

Ambient light

Box shape with a nice fade It’s basically a SH light probe

- Group a bunch of point, spot and directional lights under it in Maya

- Plus a standard ambient term - They all get baked into 3rd order SH

- Just lookup with the pixel normal

Directional light

Cascaded shadow map Cascades rendered as boxes Final non-shadow pass is a fullscreen

quad - quad at far plane to stencil mask out

sky/background Projector texture is tiled and animated

cheap, fake cloud shadows!

Early stencil rejection

Without it we’d run at about 4 fps so I can’t stress the importance of it enough!

Very simple to set up, but easy to break too

Very fast rendering Cuts down light rendering time

tremendously

Early stencil rejection

All lights are rendered as geometry - Sphere for point, cone for spot etc. - 50-100 polys

Use same geometry for stencil mask unless artist supplies a mesh

We use a standard Z-Fail approach Yes, we should be using Z pass to get early Z in

the masking pass But this pass was always fast so we chose to fix

other stuff first

Early stencil rejection

Mask pass (no pixel shader): TwoSidedStencilMode = true StencilFunc = Always StencilZFail = Invert CCW_StencilFunc = Always CCW_StencilZFail = Invert StencilWriteMask = 1 SCull/HiZ = Equal to 1

Light shader pass: StencilFunc = Equal StencilRef = 1

This works well with SCull (PS3)

Early stencil example #1

Simple case, light geometry

e

Early stencil example #1

Simple case, light geometry

e

Early stencil example #2

Custom geometry

e

Early stencil example #2

Custom geometry

e

Directional light stencil

Every cascade must only light pixels untouched by previous cascades

Cascade overlap unpredictable when FOV & settings change

Came up with a way to always keep stencil test EQUAL to 1

Plays nice with SCull Every cascade rendered into stencil twice,

but still plenty fast

Directional light stencil

Write Mask: 00000001Z Fail: Invert

Green Stencil == 1Red Stencil > 1

Mask cascade #1 and do lighting

Directional light stencil

Clear cascade #1

Write Mask: 00000010Z Fail: Invert

Green Stencil == 1Red Stencil > 1

Directional light stencil

Mask cascade #2 and do lighting

Write Mask: 00000001Z Fail: Invert

Green Stencil == 1Red Stencil > 1

Directional light stencil

Clear cascade #2

Write Mask: 00000100Z Fail: Invert

Green Stencil == 1Red Stencil > 1

Directional light stencil

Mask cascade #2 and do lighting

Write Mask: 00000001Z Fail: Invert

Green Stencil == 1Red Stencil > 1

Directional light stencil

Clear cascade #2

Write Mask: 00001000Z Fail: Invert

Green Stencil == 1Red Stencil > 1

Directional light stencil

Mask far plane and do final pass

Write Mask: 00000001Z Fail: Invert

Green Stencil == 1Red Stencil > 1

Antialiasing overview

Render G-buffer into 2xMSAA RT Perform lighting for each sample Need render target access at sample not

pixel level Effectively supersample lighting Can be expensive There’s a ton of hacky methods that

might work for you

Antialiasing hack #1

Distribute shadow sampling between MSAA samples

Suggested by Guerilla guys in their excellent presentation

We use it, works great Just do it

Antialiasing hack #2

Render lighting at pixel resolution, but with 2xMSAA (per-sample stencil tests)

Light both samples in the shader and output averaged result

Saves output bandwidth compared with super sampled rendering

Stencil testing causes artifacts

Antialiasing hack #2

Edges on stencil discontinuities still darken on resolve - Averaged in the shader already - But then one sample rejected on stencil fail - Can be fixed by sampling stencil in the shader

as well, but it may not be trivial (i.e. Xbox 360 with float depth)

Not a whole lot of benefit from this alone, but allows for more optimizations in the shader

Antialiasing hack #3

Use one position for both samples Manually loop just part of shader Shadows, light falloff, projected textures all

break on edges Was too visible for us (lots of lights, shadows

and projected textures everywhere) Caused borders around characters Might work for you

Antialiasing hack #4

Pre-resolve color buffer to avoid two lookups

Wrong lighting on edges since the color bleeds between background and foreground

Can work if your scenes are uniform enough - gray & brown are popular lately - can work for outdoors

Antialiasing on PC

Works properly with DX10.1 We only support DX9 so tough luck Super-sampling was too slow

- PC resolution unpredictable - But HW is becoming really fast

Using centroid sampling do hacky approximate AA (google it) - Couldn’t afford extra geometry pass

Edge-detection AA + filtering - slow and looks crappy

Antialiasing on PC

Some success with jittered 2X AA Apply sub-pixel offset to projection Alternate between frames Always show 50-50 blend Visible feedback if framerate is low Can use temporal re-projection to fix it somewhat

or enable only if framerate is high enough (60+) Left it out in the end, it was untested so we

played it safe

Antialiasing on PS3

Confessions first... We render at 1120x576 2xMSAA

- AA was added late in the project - Lights & textures were already in - Couldn’t afford 35+ MB buffers - Fillrate was also an issue in certain cases

Preferred the image quality over 1280x720 and no MSAA.

We have lots of thin steel bars

Antialiasing on PS3

Alias same memory as: - 1120x576 2xMSAA - 2240x576 non-AA

Do ping-pong post between left and right half of 2240x576 RT

Our render targets: - 2x 1280x720 Front/back buffer - 3x 1120x576x2 RTs - 1x 1120x576x2 Depth

Total memory: 26.7 MB

Antialiasing on PS3

1. Activate 1120 2xMSAA MRT2. Render G-buffer3. Switch to 2240 no-AA RT

1. - PS3 has a nice MSAA layout for this2. - Reload ZCull!

4. Render lighting as usual, lighting each sample as a pixel

Antialiasing on PS3

5. Switch back to 1120 2xMSAA RT5. - Don’t forget to reload ZCull!

6. Render transparencies with MSAA7. Quincunx resolve at the end

1. - Resolve into same memory!2. - To left part of 2240 no-AA texture3. - Didn’t cause any artifacts for us

Antialiasing on Xbox 360

Confessions again... We render at 1120x576 2xMSAA Same reasons as PS3, but fillrate was less of

an issue Lighting can render without tiling Without this we'd have to cache all shadow

maps Not really an option with a bunch of shadow

casting lights

Antialiasing on Xbox 360

Our render targets: - 2x 1120x576x2 Accum./FB RT - 1x 1120x576x2 Color RT - 1x 1120x576x2 Normal RT - 1x 1120x576x2 Depth RT

720p frame buffers in same memory as the accum. buffers - Alternate between frames - Can’t do this on PS3 due to tiled memory limitations

Total memory: 24.6 MB

Antialiasing on Xbox 360

1. Activate 1120 2x MSAA MRT2. Render G-buffer3. Resolve both samples of all RTs4. Activate 2240 no-AA RT5. Restore depth & accumulation to EDRAM as

2xWidth with custom shader (emulate PS3’s layout)

6. Render lighting as usual, lighting each sample as a pixel

Antialiasing on Xbox 360

7. Resolve to 2240 no-AA texture8. Using a custom shader average

samples into a 1120 no-AA EDRAM surface

9. Render alpha, particles and the rest without MSAA

10. Resolve into the left part of 2240 no-AA texture

Antialiasing future work

We should really only be doing lighting twice for edge pixels

Huge potential speed boost Didn’t research further at the time

since it was fast enough Must find a way to make it play well

with SCull/Hi-Stencil without breaking our stencil masking

HDR

All our buffers are 8:8:8:8 We use Valve style HDR with

histogram analysis HDR multiplier is passed into all

shaders that write to the accumulation buffer

Output color is multiplied before output

HDR

Not really correct, but hey, it looks convincing

In current project, exposure is limited to 0.5 – 2.0 range since HDR was added mid-project

Tried with larger exposures ranges and still looked cool

Light blending fails if exposure is really low and light contribution is below 1/255

Exposure = 0.5

HDR

Exposure = 1.0

HDR

Exposure = 2.5

HDR

Exposure = 5

HDR

Post-processing

This is one of the best things with deferred rendering

For each pixel you have access to: - Color - Normal - Position / Depth - Final lit result

You can pretty much do any post process your want with this

Post-processing

But it's very easy to absolutely devastate performance on both consoles so be careful

Cram as much as you can into a single shader to avoid re-reading data

Check the end of the slides for our post processing method

Platform specific issues

There are times where you just want to...

Platform specific issues

...burn you PC!

Platform specific issues

... smash your Xbox 360!

Platform specific issues

... make a grill out of your PS3!

Platform specific issues

We had those moments ourselves Unfortunately dev kits cost too

much.... So we had no choice but to solve

the issues... So here’s what we learned

PS3 Performance Killers

If you don’t setup MRT properly, your performance will be SLOW

Memory tiler makes reuse hard some times (pitch must match)

ZCull needs reloading to work SCull hates any changes Make sure you read all Sony docs on the

subject, it’s already been covered a lot

PS3 SCull horrors

SCull is very, very touchy Changing SCull compare value kills it for the

frame (at least for us) Best to just bind it once and leave it alone

forever All lights just use EQUAL to 1 as stencil pass

criterion Must clear stencil after every light

WARNING - This also applies to GeForce 6 & 7 series PC parts

PS3 improvements

We’re still doing all rendering on RSX If you’re cross platform you’ll likely wind up

with spare SPU time Moving post processing to SPUs is an easy

way too free up the RSX You can even do parts or all of the shading

with SPUs, but that’s a bit more involved. Remember – SPUs are FAST!

Xbox 360 EDRAM

VERY fast and generally awesome But can be quite inflexible at times Once you start running low you’re

pretty much out of luck But it’s mostly forgiven since it’s really

fast Plan you EDRAM use otherwise you’ll

be in a world of pain...

Xbox 360 EDRAM

When rendering shadow maps the accumulation buffer is evicted from EDRAM - Restored for each shadow casting light, but

fillrate was better than PS3 so we could afford this

Higher resolutions don't scale linearly - Start requiring 3 tiles at g-buffer pass (much

slower) - 2 tiles for lighting (not good)

Xbox 360 gamma

Started paying attention too late Had to undo 360 gamma correction to get a

proper image All our textures and lighting were done so

there was no other way Artifacts not really noticable by the end

user Might just keep it like this since the image

is consistent across all platforms

Final thoughts

Deferred rendering is cool and practical

Enables really large light counts MSAA is not an issue Some of the best looking games use

some variant of deferred rendering It’s my opinion that it makes cross

platform development easier

QUESTIONS?

E-mail: tibor@zootfly.com Feel free to send spam, I already

get lots

Slides available soon on www.zootfly.com

Stuff that didn’t make it into the talk, but is still cool

Our post process

Hi-Pass & SSAO

at ½ res

Downsample to ¼ res

Horizontal blur Horizontal blur

Vertical blur Vertical blur

Z/Pos Buffer

Downsampleto ¼ res

Accumulationbuffer

COMBINE

100% 50% 25%Resolution:

Z-Downsampling

Use any applicable MSAA hacks when downsampling Z

Quarter res Z is needed for low res particle rendering anyway

Huge bandwidth savings when sampling from lower res texture

SSAO

Calculated at 50% resolution, but blurred at 25%?!

It’s much more stable this way - Higher frequency input

We also tried blurring at 50% res, but there was no visual difference except the framerate drop

Depth-of-field

We always apply DOF to geometry very close (<1m) to the camera

Hides low res textures this way and just looks cool

Very simple, just four parameters: - Near plane distance - Near plane fade - Far plane distance - Far plane fade

Combine

Final step, munges it all together Color correction before output

- Apply levels filter - Apply curves filter

Controllable saturation of base and bloom images

On consoles upscale to 1280x720

Combine code// DOF

Out = lerp( Accum, BlurredAccum, doffac );

// Ambient occlusion

Out *= AmbientOcclusionFactor;

// Bloom with adjustable saturation & intensity

Out = ApplySat(Out,BaseSat);

Bloom = ApplySat(Bloom,BloomSat) * BloomIntensity;

Out *= (1 - saturate(Bloom));

Out += blurred;

// Levels filter

Output = sat((Output + LevelsInAdd) * LevelsInMul);

Output = pow( Output , LevelsGamma );

// Curves

Out.r = tex1D( CurvesSampler , Out.r ).r;

Out.g = tex1D( CurvesSampler , Out.g ).g;

Out.b = tex1D( CurvesSampler , Out.b ).b;

Our material system

Material textures  R G B A Format

Color Color RGB Alpha (OPT.) DXT1 / 5

Normal Self illum. N.Y N.Z N.X DXT5

Mask Reflection Spec Amount Spec Exp. Height (OPT.) DXT1 / 5

Env Cube Color RGB additive NULL DXT1

Overlay Color RGB Photoshop overlay blend mode NULL DXT1

Detail U Offset V Offset Normal Bend X Normal Bend Y 8888

All textures are optional (#ifdefs) Detail:

- tiled, uses multiplied primary UV set - offset UV for texture lookups - then bend the normal

DXT Red & Blue suck, but good enough for what they contain

Material settings Bunch of checkboxes and sliders Less is faster, more is better No custom artist shaders Allows programmer to optimize

Recommended