Upload
floored
View
3.796
Download
1
Tags:
Embed Size (px)
Citation preview
Floored
3D Visualization amp Virtual Reality for Real Estate
Introduction- Hi my name is Nick Brancaccio
- I work on graphics at Floored
- Floored does real time 3D architectural visualization on the web
Introduction- Demo
- Check out flooredcom for more
Challenges
Architectural
- Clean light-filled aesthetic
- Canrsquot hide tech art deficiencies with grungy textures
Challenges
Challenges
Interior Spaces
- Many secondary light sources rather than single key light
- Direct light fairly high frequency (directionally and spatially)
- Sunlight does not dominate many of our scenes
- Especially in NYC
Challenges
Real world material representation
- Important for communicating quality mood feel
- Comparable real-life counterparts
- Customers are comparing to high-quality offline rendering
Challenges
Challenges
webGL
- Limited OpenGL ES API
- Variable browser support
Approach
- Physically Based Shading
- Deferred Rendering
- Temporal Amortization
Approach
- Physically Based Shading
- Deferred Rendering
- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]
Physically Based Shading
Physically Based Shading
- Scalable Quality
- Architectural visualization industry has embraced PBS in offline
rendering for quite some time
- Maxwell VRay Arnold etc
- High Standards
- Vocabulary of PBS connects real time and offline disciplines
- Offline can more readily consume real time assets
- Real time can more readily consume offline assets
Physically Based Shading
- Authoring cost is high but so is reusability
- Floored has a variety of art assets spaces furniture lighting
materials
- PBS supports reusability across projects
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Material Parameterization
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Introduction- Hi my name is Nick Brancaccio
- I work on graphics at Floored
- Floored does real time 3D architectural visualization on the web
Introduction- Demo
- Check out flooredcom for more
Challenges
Architectural
- Clean light-filled aesthetic
- Canrsquot hide tech art deficiencies with grungy textures
Challenges
Challenges
Interior Spaces
- Many secondary light sources rather than single key light
- Direct light fairly high frequency (directionally and spatially)
- Sunlight does not dominate many of our scenes
- Especially in NYC
Challenges
Real world material representation
- Important for communicating quality mood feel
- Comparable real-life counterparts
- Customers are comparing to high-quality offline rendering
Challenges
Challenges
webGL
- Limited OpenGL ES API
- Variable browser support
Approach
- Physically Based Shading
- Deferred Rendering
- Temporal Amortization
Approach
- Physically Based Shading
- Deferred Rendering
- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]
Physically Based Shading
Physically Based Shading
- Scalable Quality
- Architectural visualization industry has embraced PBS in offline
rendering for quite some time
- Maxwell VRay Arnold etc
- High Standards
- Vocabulary of PBS connects real time and offline disciplines
- Offline can more readily consume real time assets
- Real time can more readily consume offline assets
Physically Based Shading
- Authoring cost is high but so is reusability
- Floored has a variety of art assets spaces furniture lighting
materials
- PBS supports reusability across projects
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Material Parameterization
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Introduction- Demo
- Check out flooredcom for more
Challenges
Architectural
- Clean light-filled aesthetic
- Canrsquot hide tech art deficiencies with grungy textures
Challenges
Challenges
Interior Spaces
- Many secondary light sources rather than single key light
- Direct light fairly high frequency (directionally and spatially)
- Sunlight does not dominate many of our scenes
- Especially in NYC
Challenges
Real world material representation
- Important for communicating quality mood feel
- Comparable real-life counterparts
- Customers are comparing to high-quality offline rendering
Challenges
Challenges
webGL
- Limited OpenGL ES API
- Variable browser support
Approach
- Physically Based Shading
- Deferred Rendering
- Temporal Amortization
Approach
- Physically Based Shading
- Deferred Rendering
- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]
Physically Based Shading
Physically Based Shading
- Scalable Quality
- Architectural visualization industry has embraced PBS in offline
rendering for quite some time
- Maxwell VRay Arnold etc
- High Standards
- Vocabulary of PBS connects real time and offline disciplines
- Offline can more readily consume real time assets
- Real time can more readily consume offline assets
Physically Based Shading
- Authoring cost is high but so is reusability
- Floored has a variety of art assets spaces furniture lighting
materials
- PBS supports reusability across projects
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Material Parameterization
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Challenges
Architectural
- Clean light-filled aesthetic
- Canrsquot hide tech art deficiencies with grungy textures
Challenges
Challenges
Interior Spaces
- Many secondary light sources rather than single key light
- Direct light fairly high frequency (directionally and spatially)
- Sunlight does not dominate many of our scenes
- Especially in NYC
Challenges
Real world material representation
- Important for communicating quality mood feel
- Comparable real-life counterparts
- Customers are comparing to high-quality offline rendering
Challenges
Challenges
webGL
- Limited OpenGL ES API
- Variable browser support
Approach
- Physically Based Shading
- Deferred Rendering
- Temporal Amortization
Approach
- Physically Based Shading
- Deferred Rendering
- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]
Physically Based Shading
Physically Based Shading
- Scalable Quality
- Architectural visualization industry has embraced PBS in offline
rendering for quite some time
- Maxwell VRay Arnold etc
- High Standards
- Vocabulary of PBS connects real time and offline disciplines
- Offline can more readily consume real time assets
- Real time can more readily consume offline assets
Physically Based Shading
- Authoring cost is high but so is reusability
- Floored has a variety of art assets spaces furniture lighting
materials
- PBS supports reusability across projects
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Material Parameterization
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Challenges
Challenges
Interior Spaces
- Many secondary light sources rather than single key light
- Direct light fairly high frequency (directionally and spatially)
- Sunlight does not dominate many of our scenes
- Especially in NYC
Challenges
Real world material representation
- Important for communicating quality mood feel
- Comparable real-life counterparts
- Customers are comparing to high-quality offline rendering
Challenges
Challenges
webGL
- Limited OpenGL ES API
- Variable browser support
Approach
- Physically Based Shading
- Deferred Rendering
- Temporal Amortization
Approach
- Physically Based Shading
- Deferred Rendering
- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]
Physically Based Shading
Physically Based Shading
- Scalable Quality
- Architectural visualization industry has embraced PBS in offline
rendering for quite some time
- Maxwell VRay Arnold etc
- High Standards
- Vocabulary of PBS connects real time and offline disciplines
- Offline can more readily consume real time assets
- Real time can more readily consume offline assets
Physically Based Shading
- Authoring cost is high but so is reusability
- Floored has a variety of art assets spaces furniture lighting
materials
- PBS supports reusability across projects
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Material Parameterization
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Challenges
Interior Spaces
- Many secondary light sources rather than single key light
- Direct light fairly high frequency (directionally and spatially)
- Sunlight does not dominate many of our scenes
- Especially in NYC
Challenges
Real world material representation
- Important for communicating quality mood feel
- Comparable real-life counterparts
- Customers are comparing to high-quality offline rendering
Challenges
Challenges
webGL
- Limited OpenGL ES API
- Variable browser support
Approach
- Physically Based Shading
- Deferred Rendering
- Temporal Amortization
Approach
- Physically Based Shading
- Deferred Rendering
- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]
Physically Based Shading
Physically Based Shading
- Scalable Quality
- Architectural visualization industry has embraced PBS in offline
rendering for quite some time
- Maxwell VRay Arnold etc
- High Standards
- Vocabulary of PBS connects real time and offline disciplines
- Offline can more readily consume real time assets
- Real time can more readily consume offline assets
Physically Based Shading
- Authoring cost is high but so is reusability
- Floored has a variety of art assets spaces furniture lighting
materials
- PBS supports reusability across projects
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Material Parameterization
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Challenges
Real world material representation
- Important for communicating quality mood feel
- Comparable real-life counterparts
- Customers are comparing to high-quality offline rendering
Challenges
Challenges
webGL
- Limited OpenGL ES API
- Variable browser support
Approach
- Physically Based Shading
- Deferred Rendering
- Temporal Amortization
Approach
- Physically Based Shading
- Deferred Rendering
- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]
Physically Based Shading
Physically Based Shading
- Scalable Quality
- Architectural visualization industry has embraced PBS in offline
rendering for quite some time
- Maxwell VRay Arnold etc
- High Standards
- Vocabulary of PBS connects real time and offline disciplines
- Offline can more readily consume real time assets
- Real time can more readily consume offline assets
Physically Based Shading
- Authoring cost is high but so is reusability
- Floored has a variety of art assets spaces furniture lighting
materials
- PBS supports reusability across projects
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Material Parameterization
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Challenges
Challenges
webGL
- Limited OpenGL ES API
- Variable browser support
Approach
- Physically Based Shading
- Deferred Rendering
- Temporal Amortization
Approach
- Physically Based Shading
- Deferred Rendering
- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]
Physically Based Shading
Physically Based Shading
- Scalable Quality
- Architectural visualization industry has embraced PBS in offline
rendering for quite some time
- Maxwell VRay Arnold etc
- High Standards
- Vocabulary of PBS connects real time and offline disciplines
- Offline can more readily consume real time assets
- Real time can more readily consume offline assets
Physically Based Shading
- Authoring cost is high but so is reusability
- Floored has a variety of art assets spaces furniture lighting
materials
- PBS supports reusability across projects
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Material Parameterization
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Challenges
webGL
- Limited OpenGL ES API
- Variable browser support
Approach
- Physically Based Shading
- Deferred Rendering
- Temporal Amortization
Approach
- Physically Based Shading
- Deferred Rendering
- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]
Physically Based Shading
Physically Based Shading
- Scalable Quality
- Architectural visualization industry has embraced PBS in offline
rendering for quite some time
- Maxwell VRay Arnold etc
- High Standards
- Vocabulary of PBS connects real time and offline disciplines
- Offline can more readily consume real time assets
- Real time can more readily consume offline assets
Physically Based Shading
- Authoring cost is high but so is reusability
- Floored has a variety of art assets spaces furniture lighting
materials
- PBS supports reusability across projects
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Material Parameterization
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Approach
- Physically Based Shading
- Deferred Rendering
- Temporal Amortization
Approach
- Physically Based Shading
- Deferred Rendering
- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]
Physically Based Shading
Physically Based Shading
- Scalable Quality
- Architectural visualization industry has embraced PBS in offline
rendering for quite some time
- Maxwell VRay Arnold etc
- High Standards
- Vocabulary of PBS connects real time and offline disciplines
- Offline can more readily consume real time assets
- Real time can more readily consume offline assets
Physically Based Shading
- Authoring cost is high but so is reusability
- Floored has a variety of art assets spaces furniture lighting
materials
- PBS supports reusability across projects
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Material Parameterization
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Approach
- Physically Based Shading
- Deferred Rendering
- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]
Physically Based Shading
Physically Based Shading
- Scalable Quality
- Architectural visualization industry has embraced PBS in offline
rendering for quite some time
- Maxwell VRay Arnold etc
- High Standards
- Vocabulary of PBS connects real time and offline disciplines
- Offline can more readily consume real time assets
- Real time can more readily consume offline assets
Physically Based Shading
- Authoring cost is high but so is reusability
- Floored has a variety of art assets spaces furniture lighting
materials
- PBS supports reusability across projects
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Material Parameterization
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Physically Based Shading
Physically Based Shading
- Scalable Quality
- Architectural visualization industry has embraced PBS in offline
rendering for quite some time
- Maxwell VRay Arnold etc
- High Standards
- Vocabulary of PBS connects real time and offline disciplines
- Offline can more readily consume real time assets
- Real time can more readily consume offline assets
Physically Based Shading
- Authoring cost is high but so is reusability
- Floored has a variety of art assets spaces furniture lighting
materials
- PBS supports reusability across projects
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Material Parameterization
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Physically Based Shading
- Scalable Quality
- Architectural visualization industry has embraced PBS in offline
rendering for quite some time
- Maxwell VRay Arnold etc
- High Standards
- Vocabulary of PBS connects real time and offline disciplines
- Offline can more readily consume real time assets
- Real time can more readily consume offline assets
Physically Based Shading
- Authoring cost is high but so is reusability
- Floored has a variety of art assets spaces furniture lighting
materials
- PBS supports reusability across projects
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Material Parameterization
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Physically Based Shading
- Authoring cost is high but so is reusability
- Floored has a variety of art assets spaces furniture lighting
materials
- PBS supports reusability across projects
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Material Parameterization
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Material Parameterization
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Physically Based Shading
Physically Based Shading
Physically Based Shading
Physically Based Shading
Material Parameterization
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Physically Based Shading
Physically Based Shading
Physically Based Shading
Material Parameterization
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Physically Based Shading
Physically Based Shading
Material Parameterization
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Physically Based Shading
Material Parameterization
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Material Parameterization
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Standard Material Parameterization
Full Artist Control
- Albedo
- Specular Color
- Alpha
- Emission
- Gloss
- Normal
Physically Coupled
- Metallic
- Color
- Alpha
- Emission
- Gloss
- Normal
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Microfacet BRDF
- Microfacet Specular
- D Normal Distribution Function GGX [Walter 07]
- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]
- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]
- Microfacet Diffuse
- Qualitative Oren Nayar [Oren 94]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Standard Material Parameterization
- Give color parameter conditional meaning [Burley 12] [Karis 13]
if (metallic)
albedo = color
specularColor = vec3(004)
else
albedo = vec3(00)
specularColor = color
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Standard Material Parameterization
- Can throw out a whole vec3 parameter
- Less knobs help enforce physically plausible materials
- Significantly lighter g-buffer storage
- Less textures better download times
- What control did we lose
- Video of non-metallic materials sweeping through physically plausible range of
specular colors
- 002 to 005 [Hoffman 10][Lagarde 11]
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Standard Material Parameterization
- Our standard material does not support
- Translucency (Skin Foliage Snow)
- Anisotropic Gloss (Brushed Metal Hair Fabrics)
- Layered Materials (Clear coat)
- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Deferred Rendering
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Forward Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- For each light
- outgoing radiance += incoming radiance brdf projected area
- Remap outgoing radiance to perceptual display domain
- Tonemap
- Gamma Color Space Conversion
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Forward Pipeline Cons- Challenging to effectively cull lights
- Typically pay cost of worst case
- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)
- outgoing radiance += incoming radiance brdf projected area
- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Deferred Pipeline Overview- For each model
- For each primitive
- For each vertex
- Transform vertex by modelViewProjectionMatrix
- For each pixel
- Write geometric and material data to g-buffer
- For each light
- For each pixel inside light volume
- Read geometric and material data from texture
- outgoing radiance = incoming radiance brdf projected area
- Blend Add outgoing radiance to render target
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Deferred Pipeline Cons- Heavy on read bandwidth
- Read G-Buffer for each light source
- Heavy on write bandwidth
- Blend add outgoing radiance for each light source
- Material parameterization limited by G-Buffer storage
- Challenging to support non-standard materials
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
G-Buffer
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
G-Buffer- Parameters What data do we need to execute shading
- Rasterization How do we access these parameters
- Storage How do we store these parameters
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
G-Buffer Parameters
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Lit Scene
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
G-Buffer Color
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
G-Buffer Metallic
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
G-Buffer Gloss
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
G-Buffer Depth
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
G-Buffer Normal
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
G-Buffer Velocity
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
G-Buffer Rasterization
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Screen Space Velocity
- Compute per pixel screen space velocity for temporal reprojection
- In vertex shader
- In fragment shader
varying vec3 vPositionScreenSpace
varying vec3 vPositionScreenSpaceOld
vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)
vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)
gl_Position = vPositionScreenSpace
vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew
- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Read Material Data
- Rely on dynamic branching for swatch vs texture sampling
vec3 color = (material_uTextureAssignedColor gt 00)
texture2D(material_uColorMap colorUV)rgb
colorSwatch
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Encode
gBufferComponents buffer
buffermetallic = metallic
buffercolor = color
buffergloss = gloss
buffernormal = normalCameraSpace
bufferdepth = depthViewSpace
buffervelocity = velocity
- our data is ready Now we just need to write it out
- and after skipping some tangential details
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
G-Buffer Storage
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Challenges Storage
- In vanilla webGL largest pixel storage we can write to is a single RGBA
unsigned byte texture This isnrsquot going to cut it
- What extensions can we pull in
- Poll webglstatscom for support
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Challenges Storage
- Multiple render targets not well supported
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Challenges Storage
- Reading from render buffer depth getting better
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Challenges Storage
- Texture float support quite good
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Challenges Storage
- Texture half float support getting better
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Challenges Encode Decode
- Texture float looks like our best option
- Can we store all our G-Buffer data into a single floating point texture
- Pack the data
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Integer Packing
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Integer Packing
- Use floating point arithmetic to store multiple bytes in large numbers
- 32-bit float can represent every integer to 2^24 precisely
- Step size increases at integers gt 2^24
- 0 to 16777215
- 16-bit half float can represent every integer to 2^11 precisely
- Step size increases at integers gt 2^11
- 0 to 2048
- Example pack 3 8-bit integer values into 32-bit float
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Integer Packing
- No bitwise operators
- Can shift left with multiplies right with divisions
- AND OR operator simulation though multiples mods and adds
- Impractical for general single bit manipulation
- Must be high speed especially decode
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)
return floor(raw 2550)
float uint8_8_8_to_uint24(const in vec3 raw)
const float SHIFT_LEFT_16 = 2560 2560
const float SHIFT_LEFT_8 = 2560
return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)
vec3 color888
color888r = normalizedFloat_to_uint8(colorr)
color888g = normalizedFloat_to_uint8(colorg)
color888b = normalizedFloat_to_uint8(colorb)
float colorPacked = uint8_8_8_to_uint24(color888)
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)
const float SHIFT_RIGHT_16 = 10 (2560 2560)
const float SHIFT_RIGHT_8 = 10 2560
const float SHIFT_LEFT_8 = 2560
vec3 res
resx = floor(raw SHIFT_RIGHT_16)
float temp = floor(raw SHIFT_RIGHT_8)
resy = -resx SHIFT_LEFT_8 + temp
resz = -temp SHIFT_LEFT_8 + raw
return res
vec3 color888 = uint24_to_uint8_8_8(colorPacked)
vec3 color
colorr = uint8_to_normalizedFloat(color888r)
colorg = uint8_to_normalizedFloat(color888g)
colorb = uint8_to_normalizedFloat(color888b)
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Unit Testing
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Unit Testing
- Important to unit test packing functions
- Easy to miss collisions
- Easy to miss precision issues
- Watch out for glsl functions such as mod() that expand to multiple
arithmetic instructions
- Desirable to test on the gpu
- WebGL has no support for readPixels on floating point textures
- Requires packing
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Unit Testing
- 2^24 not a very large number
- Can exhaustively test entire domain with a 4096 x 4096 render target
- Assign pixel unique integer ID
- pack ID
- unpack ID
- Compare unpacked ID to pixel ID
- Write success fail color
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Packing Unit Test Single Passvoid main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
Encode Decode and Compare
vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))
if (expectedDecoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Unit Testing
- Single pass verifies our packing functions are mathematically correct
- Pass 1 Pack data upack data compare to expected value
- In practice we will write read from textures in between pack unpack
phases
- Better to run a more exhaustive two pass test
- Pass 1 Pack data render to texture
- Pass 2 Read texture unpack data compare to expected value
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))
- Pass 1 Pack data render to texture
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Packing Unit Test Two Pass
void main()
Covers the range of all uint24 with a 4k x 4k canvas
Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target
vec2 pixelCoord = floor(vUV pass_uViewportResolution)
float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx
vec3 encoded = texture2D(encodedSampler vUV)xyz
float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))
if (decoded == expected)
Packing Successful
gl_FragColor = vec4(00 10 00 10)
else
Packing Failed
gl_FragColor = vec4(10 00 00 10)
- Pass 2 Read texture unpack data compare to expected value
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
G-Buffer PackingCompression
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Compression
- What surface properties can we compress to make packing easier
- Surface Properties
- Normal
- Emission
- Color
- Gloss
- Metallic
- Depth
- Velocity
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Normal Compression
- Normal data encoded in octahedral space [Cigolle 14]
- Transform normal to 2D Basis
- Reasonably uniform discretization across the sphere
- Uses full 0 to 1 domain
- Cheap encode decode
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Emission
- Donrsquot pack emission Forward render
- Avoid another vec3 in the G-Buffer
- Emission only needs access when adding to light accumulation buffer
Not accessed many times a frame like other material parameters
- Emissive surfaces are geometrically lightweight in common cases
- Light fixtures elevator switches clocks computer monitors
- Emissive surfaces are uncommon in general
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Color Compression
- Transform to perceptual basis YUV YCrCb YCoCg
- Human perceptual system sensitive to luminance shifts
- Human perceptual system fairly insensitive to chroma shifts
- Color swatches textures can be pre-transformed
- Already a practice for higher quality dxt compression [Waveren 07]
- Store chroma components at a lower frequency
- Write 2 components of the signal alternating between chroma bases
- Color data encoded in checkerboarded YCoCg space [Mavridis 12]
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
G-Buffer PackingFormat
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Sign Bits of R G and B are available for use as flags
- ie Material Type
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG NormalX 12 Bits NormalY 12 Bits
- RGB Float 96bpp
- Throw out velocity discretize normals a bit more
- In practice not reliable bandwidth saving RGB Float is deprecated in
webGL Could be RGBA Float texture under the hood
B Depth 31 Bits Metallic 1 Bit
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
G-Buffer Format
R ColorY 7 Bits ColorC 5 Bits (sign bit)
G NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
A Depth 15 Bits Metallic 1 Bit
- RGBA Half-float 64 bpp
- Half-float target more challenging
- Probably not practical Depth precision is the real killer here
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
G-Buffer Format
R ColorY 7 Bits ColorC 4 Bits Metallic 1
BitG NormalX 9 Bits (sign bit) Gloss 3 Bits
B NormalY 9 Bits (sign bit) Gloss 3 Bits
- RGB Half-float 48 bpp
- Rely on WEBGL_depth_texture support to read depth from renderbuffer
- Future work to evaluate Probably too discretized
- Maybe useful on mobile where mediump 16-bit float preferable
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
G-Buffer Format
R ColorY 8 Bits ColorC 8 Bits Gloss 8
BitsG VelocityX 10 Bits NormalX 14 Bits
B VelocityY 10 Bits NormalY 14 Bits
A Depth 31 Bits Metallic 1 Bit
- RGBA Float 128bpp
- Letrsquos take a look at packing code for this format
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)
vec4 res
Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range
vec3 colorYcocg = rgbToYcocg(componentscolor)
vec2 colorYc
colorYcx = colorYcocgx
colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)
const float CHROMA_BIAS = 05 2560 2550
colorYcy += CHROMA_BIAS
resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)
vec2 normalOctohedronQuantized
normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)
normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)
takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity
-512 and 511 both represent infinity
vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05
velocityQuantized = floor(clamp(velocityQuantized -5120 5110))
velocityQuantized += 5120
resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))
resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Packing Depth and Metallic
Pack depth and metallic together
If not metallic negate depth Extract bool as sign()
resw = componentsdepth componentsmetallic
return res
- Phew wersquore done
- Depth is the cheapest to encode decode
- Can write fast depth decode function for ray marching screen space
sampling shaders such as AO
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Packing Challenges
- Must balance packing efficiency with cost of encoding decoding
- Packed pixels cannot be correctly hardware filtered
- Deferred decals cannot be alpha blended
- No MSAA
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Direct Light
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Accumulation Buffer
- Accumulate opaque surface direct lighting to an RGB Float Render Target
- Half Float where supported
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Light Uniforms- ClipFar float
- Color vec3
- Decay Exponent float
- Gobo sampler2D
- HotspotLengthScreenSpace float
- Luminous Intensity float
- Position vec3
- TextureAssignedGobo float
- ViewProjectionMatrix mat4
- ViewMatrix mat4
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Rasterize Proxy
- Point Light = Sphere Proxy
- Spot Light = Cone Pyramid Proxy
- Directional Light = Billboard
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Decode G-Buffer RGB Lighting
gBufferComponents decodeGBuffer(
const in sampler2D gBufferSampler
const in vec2 uv
const in vec2 gBufferResolution
const in vec2 inverseGBufferResolution)
gBufferComponents res
vec4 encodedGBuffer = texture2D(gBufferSampler uv)
resdepth = abs(encodedGBufferw)
Early out if sampling infinity
if (resdepth lt= 00)
rescolor = vec3(00)
return res
- Decode Depth
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Decode G-Buffer RGB Lighting
resmetallic = sign(encodedGBufferw)
- Decode Metallic
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Decode G-Buffer RGB Lighting
vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))
vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))
vec2 normalOctohedron
normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)
normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)
resnormal = octohedronDecode(normalOctohedron)
- Decode Normal
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Decode G-Buffer RGB Lighting
resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)
resvelocity -= 5120
if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)
When velocity is out of representable range throw it outside of screenspace for culling in future passes
sqrt(2) + 1e-3
resvelocity = vec2(141521356)
else
resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS
- Decode Velocity
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Decode G-Buffer RGB Lighting
vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))
resgloss = colorGlossDataz
- Decode Gloss
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Decode G-Buffer RGB Lighting
const float CHROMA_BIAS = 05 2560 2550
vec3 colorYcocg
colorYcocgx = colorGlossDatax
colorYcocgy = colorGlossDatay - CHROMA_BIAS
- Decode Color YC
- Now we need to reconstruct the missing chroma sample in order to light
our G-Buffer in RGB space
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Decode G-Buffer RGB Lighting
vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy
vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy
vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy
vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy
gBufferSampleYc0y -= CHROMA_BIAS
gBufferSampleYc1y -= CHROMA_BIAS
gBufferSampleYc2y -= CHROMA_BIAS
gBufferSampleYc3y -= CHROMA_BIAS
- Decode G-Buffer Cross Neighborhood Color YC
vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))
vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))
vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))
vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))
- Sample G-Buffer Cross Neighborhood
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Decode G-Buffer RGB Lighting
float gBufferSampleDepth0 = abs(gBufferSample0w)
float gBufferSampleDepth1 = abs(gBufferSample1w)
float gBufferSampleDepth2 = abs(gBufferSample2w)
float gBufferSampleDepth3 = abs(gBufferSample3w)
- Decode G-Buffer Cross Neighborhood Depth
Account for samples at infinity by setting their luminance and chroma to 0
gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)
gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)
gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)
gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)
- Guard Against Chroma Samples at Infinity
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Decode G-Buffer RGB Lighting
colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2
gBufferSampleYc3)
- Reconstruct missing chroma sample based on luminance similarity
float offsetDirection = getCheckerboard(uv gBufferResolution)
colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy
- Swizzle chroma samples based on subsampled checkerboard layout
- Color stored in non-linear space to distribute precision perceptually
Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting
rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))
return res
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Decode G-Buffer RGB Lighting
- Quite a bit of work went into reconstructing that missing chroma
component
- Can we defer reconstruction later down the pipe
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Light Pre-pass
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Light Pre-pass
- Many resources
- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]
- Accumulate lighting unmodulated by albedo or specular color
- Modulate by albedo and specular color in resolve pass
- Pulls fresnel out of the integral with nDotV approximation
- Bad for microfacet model We want nDotH
- Could light pre-pass all non-metallic pixels due to constant 004
- Keep fresnel inside the integral for nDotH evaluation
- Requires running through all lights twice
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
YC Lighting
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Reconstruct missing chroma component in a post process
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Artifacts
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Results- All results are rendered
- Direct Light Only
- No Anti-Aliasing
- No Temporal Techniques
- G-Buffer Color Component YCoCg Checkerboard Interlaced
- Unique settings will accompany each result
- Percentages represent render target dimensions not pixel count
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
RGB Lighting Rendered at 100
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
YC Lighting Rendered at 100
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
RGB Lighting Rendered at 25
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
YC Lighting Rendered at 25
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Letrsquos take a closer look
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
EnhanceRGB Lighting 100
YC Lighting 100 YC Lighting 25
RGB Lighting 25
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings
- Challenging to find artifacts when viewed at 100
- Easy to find artifacts in detail shots
- Artifacts occur at strong chroma boundaries
- Depends on art direction
- Temporal techniques can significantly mitigate artifacts
- Can alternate checkerboard pattern each frame
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Implementation
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
YC Lighting- Light our G-Buffer in chroma subsampled YC space
- Modify incoming radiance evaluation to run in YCoCg Space
- Access light color in YCoCg Space
- Already have Y from Luminance Intensity Uniform
- Color becomes vec2 chroma
- Modify BRDF evaluation to run in YCoCg Space
- Schlickrsquos Approximation of Fresnel
- Luminance calculation the same
- Chroma calculation inverted approaches zero at perpendicular
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]
vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)
float power = pow(10 - vDotH 50)
return (10 - reflectionCoefficient) power + reflectionCoefficient
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
YC Lighting- YC Schlickrsquos Approximation of Fresnel
vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = pow(10 - vDotH 50)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an
ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD
and ADD from the skipped 3rd component
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too
vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)
float power = exp2((-555473 vDotH - 698316) vDotH)
return vec2(
(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx
reflectionCoefficientYCy -power + reflectionCoefficientYCy
)
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
YC Lighting- Write YC to RG components of render target
- Frees up B component
- Could write outgoing radiance unmodulated by albedo for more accurate light meter data
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
YC Lighting- Write YC to RG components of render target
- Could write to an RGBA target and light 2 pixels at once YCYC
- Write bandwidth savings
- Where typical scenes are bottlenecked
- Only applicable for billboard rasterization
- Canrsquot conservatively depth stencil test light proxies
- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches
- Future work
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
YC Lighting- Reconstruct missing chroma component in a post process
- Bilateral Filter
- Luminance Similarity
- Geometric Similarity
- Depth
- Normal
- Plane
- Wrap into a pre-existing billboard pass Plenty of candidates
- OIT Transparency Composite
- Anti-Aliasing
Tonemapping
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
YC Lighting- Simple luminance based chroma reconstruction function for radiance data
vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)
vec4 luminance = vec4(a1x a2x a3x a4x)
vec4 chroma = vec4(a1y a2y a3y a4y)
vec4 lumaDelta = abs(luminance - vec4(centerx))
const float SENSITIVITY = 250
vec4 weight = exp2(-SENSITIVITY lumaDelta)
Guard the case where sample is black
weight = step(1e-5 luminance)
float totalWeight = weightx + weighty + weightz + weightw
Guard the case where all weights are 0
return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Thanks for listening
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know
- Contact Josh Paul
- Our very own talent scout joshflooredcom
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars
Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Questionsnickflooredcom
pastasfuture
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Resources[WebGLStats] WebGL Stats
httpwebglstatscom 2014
[Moumlller 08] Real-Time Rendering
Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008
[Hoffman 10] Physically-Based Shading Models in Film and Game Production
httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010
[Lagarde 11] Feeding a Physically-Based Shading Model
httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011
[Burley 12] Physically-Based Shading at Disney
httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012
[Karis 13] Real Shading in Unreal Engine 4
httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Resources[Pranckevičius 09] Encoding Floats to RGBA - The final
httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009
[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors
httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014
[Mavridis 12] The Compact YCoCg Frame Buffer
httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012
[Waveren 07] Real-Time YCoCg-DXT Compression
httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van
Waveren Ignacio Castantildeo 2007
[Geldreich 04] Deferred Lighting and Shading
httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004
[Hoffman 09] Deferred Lighting Approaches
httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Resources[Shishkovtsov 05] Deferred Shading in STALKER
httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005
[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001
httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game
Developers Conference 2009
[Mittring 09] A Bit More Deferred - CryEngine 3
httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009
[Sousa 13] The Rendering Technologies of Crysis 3
httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013
[Pranckevičius 13] Physically Based Shading in Unity
httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013
[Olsson 11] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Resources[Billeter 12] Clustered Deferred and Forward Shading
httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012
[Yang 09] Amortized Supersampling
httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence
Hugues Hoppe 2009
[Herzog 10] Spatio-Temporal Upsampling on the GPU
httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P
Seidel 2010
[Wronski 14] Temporal Supersampling and Antialiasing
httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014
[Karis 14] High Quality Temporal Supersampling
httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014
[Walter 07] Microfacet Models for Refraction Through Rough Surfaces
httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994
Resources[Heitz 14] Understanding the Shadow Masking Function
httpjcgtorgpublished00030203paperpdf Eric Heitz 2014
[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering
httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994
[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel
httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012
[Oren 94] Generalization of Lambertrsquos Reflectance Model
httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994