Upload
maili
View
31
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Technology Behind AMD’s “Leo Demo” Jay McKee MTS Engineer, AMD. Why Forward Rendering ?. Complex materials Multiple light types Supports hardware anti-aliasing Efficient memory usage Supports transparency BUT , previously could not support a large number of lights. - PowerPoint PPT Presentation
Citation preview
Technology Behind AMD’s “Leo Demo”
Jay McKeeMTS Engineer, AMD
Why Forward Rendering?
● Complex materials● Multiple light types● Supports hardware anti-aliasing● Efficient memory usage● Supports transparency● BUT, previously could not support a
large number of lights
Forward+ Rendering
● Modified forward renderer. Add computer shader for light culling. Modify main light loop.
● Lighting and shading done in the same place, all information is preserved.
Forward+ Rendering (continued)● No limits on parameters for lights and
materials● Omni● Spot● Cinematic (arbitrary falloffs, barndoor)● BRDF per material instance
● Simple design, concentrate on rendering, not engine maintenance.
Important DX11 features
●Compute Shaders●UAV support.
Compute Shaders
●In Leo demo we use two compute shaders:● One for culling lights.● Another for spawning Virtual Point Lights (VPLs)
for indirect lighting.
● Culling 3,072 lights takes 1.7 ms on high end GPU.
UAVs
● Array(s) of scene light information.● Array of u32 light indices for storing
start/end lights per-tile.● Array of material instance data
Algorithm summary● Depth Pre-Pass● Light Culling
● Screen divided into tiles. Launch compute shader per tile.● Light info such as position, radius, direction, length
passed to light culling compute shader.● Light culling shader projects lights bounds to screen-
space tiles. Uses scene depth from z pre-pass for z testing against light volumes.
● Outputs to UAV describing per tile light list start/end along with a large UAV of u32 array of light indices.
● Output UAVs are passed to main light shaders for looping through lights per-pixel.
Algorithm summary continued● Render scene materials
● Base light accumulation function● Use screen x, y location to determine tileID● From tileID, get light start and end indices● From start index to end index, loop● Entry is index into light array.● Accumulate light hitting pixel● Returns total direct and indirect light hitting
pixel.
Algorithm summary continued
● Material shader● Decides what to do with total incoming light● Passed into material’s BRDF for example● Uses light accumulation building blocks
● Env. lighting, base light accumulation, BRDF, etc. are put together for final pixel color.
Light Culling Shader Details (1/3)
// 1. prepare
float4 frustum[4];
float minZ, maxZ;
{
ConstructFrustum( frustum );
minZ = thread_REDUCE(MIN, depth );
maxZ = thread_REDUCE(MAX, depth );
ldsMinZ = SIMD_REDUCE(MIN, minZ );
ldsMaxZ = SIMD_REDUCE(MAX, maxZ );
minZ = ldsMinZ;
maxZ = ldsMaxZ;
}
Light Culling Shader Details (2/3)__local u32 ldsNLights = 0;
__local u32 ldsLightBuffer[MAX];
// 2. overlap check, accumulate in LDS
for(int i=threadIdx; i<nLights; i+=WG_SIZE)
{
Light light = fetchAndTransform( lightBuffer[ i ] );
if( overlaps( light, frustum ) && overlaps ( light, minZ, maxZ ) )
{
AtomicAppend( ldsLightBuffer, i );
}
}
Light Culling Shader Details (3/3)// 3. export to global
__local u32 ldsOffset;
if( threadIdx == 0 )
{
ldsOffset = AtomAdd( ldsNLights );
globalLightStart[tileIdx] = ldsOffset;
globalLightEnd[tileIdx] = ldsOffset + ldsNLights;
}
for(int i=threadIdx; i< ldsNLights; i+=WG_SIZE)
{
int dstIdx = ldsOffset + i;
globalLightIndexBuffer[dstIdx] = ldsLightBuffer[i];
}
// BaseLighting.inc // THIS INC FILE IS ALL THE COMMON LIGHTING CODE
StructuredBuffer<float4> LightParams : register(u0);StructuredBuffer<uint> LowerBoundLights : register(u1);StructuredBuffer<uint> UpperBoundLights : register(u2);StructuredBuffer<int2> LightIndexBuffer : register(u3);
uint GetTileIndex(float2 screenPos){ float tileRes = (float)m_tileRes; uint numCellsX = (m_width + m_tileRes - 1)/m_tileRes; uint tileIdx = floor(screenPos.x/tileRes)+floor(screenPos.y/tileRes)*numCellsX;
return tileIdx;}
}
Light Accumulation Pseudo-code
Light Accumulation (2):StartHLSL BaseLightLoopBegin // THIS IS A MACRO, INCLUDED IN MATERIAL SHADERS
uint tileIdx = GetTileIndex( pixelScreenPos ); uint startIdx = LowerBoundLights[tileIdx]; uint endIdx = UppweBoundLights[tileIdx];
[loop] for ( uint lightListIdx = startIdx; lightListIdx < endIdx; lightListIdx++ ) {
int lightIdx = LightIndexBuffer[lightListIdx];
// Set common light parametersfloat ndotl = max(0, dot(normal, lightVec));
float3 directLight = 0;float3 indirectLight = 0;
Light Accumulation (3):
if( lightIdx >= numDirectLightsThisFrame ) { CalculateIndirectLight(lightIdx , indirectLight); } else { if( IsConeLight( lightIdx ) ) { // <<== Can add more light types here CalculateDirectSpotlight(lightIdx , directLight); } else { CalculateDirectSpherelight(lightIdx , directLight); } }
float3 incomingLight = (directLight + indirectLight)*ndotl; float shadowTerm = CalcShadow();
EndHLSL
StartHLSL BaseLightLoopEnd }EndHLSL
Material Shader Template:#include "BaseLighting.inc"
float4 PS ( PSInput i ) : SV_TARGET{ float3 totalDiffuse = 0; float3 totalSpec = GetEnvLighting();;
$include BaseLightLoopBegin
// unique material code goes here!! Light accumulation on the pixel for a given light// we have total incoming light and direct/indirect light components as well as material params and shadow term// use these building blocks to integrate lighting terms
totalDiffuse += GetDiffuse(incomingLight); totalSpec += CalcPhong(incomingLight);
$include BaseLightLoopEnd
float3 finalColor = totalDiffuse + totalSpec; return float4( finalColor, 1 );}
Debug Mode Demo
Benchmark
3k dynamic lights
Compute-based Deferred v.s. Forward+
Forward+(L)
Forward+(H)
Deferred(L)
Deferred(H)
0 2 4 6 8 10 12 14 16 18 20
Prepass Light processing
Final shading
Time (ms)
Takahiro Harada, Jay McKee, Jason C.Yang, Forward+: Bringing Deferred Lighting to the Next Level, Eurographics Short Paper (2012)
Depth Pre-Pass Critical
● Pixel overdraw cripples this technique so depth pre-pass is required.
● Depth pre-pass is good opportunity to use MRT to generate other full-screen data needed for post-fx and other render fx (optional).
Other important points
● XBOX 360 has good bandwidth so given limitations on forward rendering, deferred makes a lot of sense.
● However, ALU computation growing at faster rate than bandwidth. more and more feasible to just do the calculations than to read/write so much data.
● Dynamic branching penalties not nearly as bad as before. As an optimization, compute shader can sort by light-type for example to minimize penalties.
● All that "light management" CPU side code to decide which lights hit each object for setting constant registers can be ditched!
Summary
● Modified forward renderer that handles scenes with 1000s of lights.
● Hardware anti-aliasing (MSAA) “automatic”● Bandwidth friendly.● Makes the most of the GPU's ALU power (which is
growing faster than bandwidth)
Thanks!Contact: [email protected]@[email protected]
Leo Demo website:http://developer.amd.com/samples/demos/pages/AMDRadeonHD7900SeriesGraphicsReal-TimeDemos.aspx
Eurographics 2012: 'Forward+: Bringing Deferred Lighting to the Next Level'