Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Modern Graphics Engine DesignModern Graphics Engine DesignSim Dietrich
NVIDIA [email protected]
OverviewModern Engine Features
Modern Engine Challenges
Scene ManagementCulling & Batching
Geometry ManagementCollision Structures
Shader Systems
Example Engine Design
Modern Graphics Engine Features
High polygon count for added visual complexityNot just to make things ‘smoother’
Some form of bump mapping for more surface detailFrom single-light dot3To general diffuse / specular / aniso per-pixel lighting
Some form of shadowsFrom simple blobby discs under charactersTo full shadow map or shadow volume for each light
More Features
Particle system for splashes, sparks, etc
Decal System for blood, scorch marks, etc.
Performance & Visual ScalabilityGame should look good on the newer cards
1280x1024 x 4X AA + 4X AnisoGame should look ‘ok’ on the older cards
800x600 x 2X AAAnd run well on both at the appropriate resolution and Anti-Aliasing settings
Challenges
To get achieve visually rich scenes, there must be several visually interesting objects
To get acceptable frame rates, the number of draw calls in a frame should be low
< 500 per frame for good frame ratesThis is a CPU limitation
The API & Driver must do a little work every time you make a render call to draw somethingMany calls doing a little CPU work add up to a lot of CPU work
Challenges
ComplexityModern engines are able to lose some complexity compared to engines a few years ago
Software TransformSoftware Rasterization
But, there are plenty of new things to worry aboutHigh poly-count worlds in low memoryRealistic characters & animationShader ManagementHigh Framerates
Scene Mangement
There are about 5 different game engine sections that need access to the geometry in the scene
CullingRenderingCollisionDecalsAI
Scene ManagementCulling – View Frustum Culling
Also from light’s point of view for some shadow approaches
RenderingMay need to render from multiple points of view for radar, shadows, etc.
CollisionMay be simplified version of the rendered geometry
DecalsIf these are done by re-rendering scene triangles, need per-triangle collision
AIThe computer needs some spatial awarenessFor path-finding, tactical understanding, etc.
Scene Management - Culling
Goal : To quickly identify groups of triangles that can be culled out efficiently
Typically inside a bounding volumeBSP Leaf, Sphere, Bounding Box
There is a tradeoff between culling efficiency and CPU efficiency :
The ultimate culling efficiency would cull each triangle individuallyThe ultimate CPU efficiency would draw the entire world in one draw call
The trick is to group most of your scene in large, easy-to-cull chunks
Scene Management - Culling
In this scene, a world section is broken into a grid with ~300 triangle cells
Highlighted area represents one such 3D Cell
Probably too few tris for CPU batch efficiency
Scene Management - Culling
Make bounding boxes too small, and clipping creates many extra triangles & vertices
Make bounding boxes too large, and you end up sending down too much off-screen geometry
Can also create per-material AABoxes
Instanced GeometryStore a Axis-Aligned Bounding box, AA Cylinder or Sphere for each Instance for cullingDon’t cull individual bone groups except for very expensive and close-to-camera characters
Scene Management - Culling
Particle SystemsStore a bounding volume for each group of particlesCull entire group as a unitAlso try to draw as a unit for efficiencyIf particles don’t affect gameplay, can also avoid calculating off-screen systems
Scene Management - Decals
There are several popular approaches for creating decals for bullet holes, scorch marks, blood drops, etc.One approach renders little pieces of geometry to represent the bullet hole, etc.
Upside : Low fillrate for small decals
Downside : Needs to be clipped so that it doesn’t hang over a corner
Downside : May Z fight with geometry, need bias
Scene Management - Decals
Another method uses texture mapping to apply the decal by re-rendering scene polygons with the texture applied
Upside : No need to clip to corners
Upside : No depth fighting if you use the exact same geometry as used for rendering
Downside : Large polygons cost fillrate for many decals
Scene Management - Decals
Either approach requires finding the exact triangles the decal touches
Either for clipping the geometry decals to the scene geometryOr re-rendering them with the decal texture
Therefore, the engine must support being able to quickly find a group of nearby polygons on which to apply the decal
This has implications for the collision system…
Scene Management - Decals
Highlighted area indicates triangles possibly covered by decal shadow
Amount of extra fillrate burned is more for less-tessellated geometry
So, more vertices can save fillrate on decals
Scene Management - Collision
Efficient collision with world & mesh data is a challenge in a modern engine
Many more polygons for required visual richness
Standard BSP approaches won’t cut itOnly for very simple walls & floors will a leaf-based BSP sufficeThe more polygons in the scene, the greater the penalty for splits
Scene Management - CollisionOne of the main problems with a standard BSP or KD-Tree ( axis aligned BSP ) is the depth of the tree
Consider every time you follow a pointer, you can assume a CPU cache miss
The deeper your tree, the more cache misses you will takeCache misses can be more expensive than intersection testsTherefore, shallower tree types will perform better on high-polygon-count scenes
Two ways to make a tree shallow :High Branching Factor – QuadTree, Octree
More children per nodeStore multiple items in one Leaf
Scene Management - Collision
This KD-Tree or BSP has 2 levels, the leftmost root and the rightmost children
The Quadtree only also needs 2 levels for this scene
Scene Management - Collision
Of course, if the scene is arranged differently, the KD tree or BSP tree can cope better by adjusting where the split planes go.Standard Quadtrees and Octrees don’t do this, so require more levels. Variations with rectilinear cells, as on the right, can cope better
Scene Management - Collision
An alternative is the Axis-Aligned Bounding Box Tree
Good example in Game Programming Gems 2Also Short Tutorial on FlipCode
This Tree contains a hierarchy of AA Bounding Boxes which contain all of the geometry
The AABox Tree is not meant to represent empty space like a grid, but instead to just tightly contain the triangles
Scene Management - Collision
The basic idea is to somehow divide the # of triangles in a node in half at each step, but without clipping them to the Split Plane
Root NodeDashed line is Split Plane
Scene Management - Collision
The triangle centroid is compared to the ‘Split Plane’ This way each triangle only lives in one nodeNo clipping to increase polycounts
Important for collision more than for rendering
Left Child in BlueRight Child in RedDots are Triangle Centroids
Scene Management - Collision
Each node in tree contains a Axis-Aligned Bounding box, and two childrenEach child may be a Node or a LeafLeafs contain the triangle data or triangle idsCan create tree down to individual triangle level
Requires compression of nodes & bounding boxes to avoid too much memory – see GPG2
Alternatively create down to a small # of triangles per leaf, like 8 – 20
All triangles in leaf will be nearby in memoryArgues against storing tri ids, and just vertex indices
Scene Management - Collision
The ‘Split Plane’ must be intelligently chosen to create a nicely balanced treeOne approach is to create the AABB tree top-down
Create a parent node and find the AABoxcontaining all trianglesSplit the node somehow into two childrenEach child gets some of the trianglesEach child’s AABox may overlap its siblingRecurse into each child until
The # of triangles is small enoughOr the volume of the AABox is small enough
How To A Node Split into 2 Children?
A good approach is to pick the largest axis of the AAbox containing all triangles in the parent nodeThen sort the triangles by their centroid with respect to the AABox’s largest axis
A tall AABox would have its triangles sorted by centroid.y
Now you can go through exactly half the triangles in the sorted list and give them to the left child, and the assign the rest to the right child
This gives a median distribution, which guarantees a O(log n) search time
Scene Management - Collision
Modern engines will increasingly use non-splitting, looser trees with larger numbers of triangles per leaf
Looser trees, like AABB Trees, and loose Octreesdon’t split, so they don’t increase collision polysneedlessly
A dozen or so triangles per leaf reduce cache misses and amortize the memory cost of the bounding box and node information
Scene Management - AI
The AI also needs some view of the sceneBut it should be probably be separate from the rendering & collision views of the worldShould probably a simpler, more symbolic view of the world than a collision structure
The AI will use raycasting and other spatial queries, so this should be fast
Line Of SightEnemies In Range
Scene Management - Rendering
Question : What is the most expensive render state change?
Scene Management - Rendering
Question : What is the most expensive render state change?
Answer : The one that caused you to make more draw calls.
Scene Management - Rendering
Question : What is the most expensive render state change?
Answer : The one that caused you to make more draw calls.
In general, sort by the item with the most useful coherency.
Scene Management - Rendering
You can use the GPU to reduce the number of draw calls needed for your scene.
Use per-vertex data to encode shading parametersReduces need to set vertex shader constantsReduces need to switch vertex shaders
Example : Indexed Palette Skinning
Idea : Apply per-vertex index to other things like lighting, occlusion, etc.
Scene Management - Rendering
Use textures to encode shading parametersReduces need to set pixel shader constantsReduces need to switch pixel shaders
Example : Put gloss into the alpha of your normal map, instead of setting it via SetPixelShaderConstant()
Idea : Encode 4 light occlusion terms into a lightmap, and draw all 4 shadowed lights in one pass
Lighting and Shadows
Your choices for Lighting and Shadowing will largely dictate the look and speed of your gameHow dynamic is your lighting?
Totally Static Precompute per-vertex or lightmaps
Partially Dynamic Lights can change color & intensity, but can’t moveBuild per-light occlusion term into vertex or texture
Totally DynamicPerform plenty of CPU raycasting for shadowingUse GPU-assisted shadows like shadow maps or shadow volumes
Lighting Tradeoffs
Low
Low
LowPixel Cost
Too many raycasts for CPU
More lights cost during updates
Any # of lights and shadows free
Comment
LowProhibitiveDynamic Lightmapsw/ Shadows
LowHigh, at least when light changes
Dynamic Lightmaps
LowLow, if using texture pages
Static Lightmaps
Vertex CostCPU CostTechnique
Lighting Tradeoffs
High
Low
Medium
Pixel Cost
Limited to 3 lights per surface
Lights Can only change color
Limits # of lights to 4 or so
Comment
LowHigh, for batch count and silhouettes
Stencil Shadows on CPU
MediumLowPer-Vertex Occlusion
LowLow, if using texture pages
Occlusion Maps
Vertex Cost
CPU CostTechnique
Lighting Tradeoffs
Low
Medium
High
Pixel Cost
Only infinite lights & no animation
Aliasing Artifacts
Limited to 3 lights per surface
Comment
MediumLowPRT via Spherical Harmonics
MediumLowDepth Shadow Maps
Very HighMedium for batch size
Stencil Shadows on GPU
Vertex Cost
CPU CostTechnique
Shader Management
There are two main ways to handle shaders, depending on your type of game
Open-Ended - Artist-driven from within the level editor
Highly FlexibleUse HLSL / .FX files to manage complexitySomewhat Complex to support many shader types
Use Annotations to identify shader parametersCan create explosion of shaders if not carefulSwitching shaders often is not good for reducing draw calls
Shader Management
Unified Shader Model – Driven from the Engine Capabilites and/or Game Needs
Fewer, more specific, optimized shaders Practical to do C++ coding to set up shaders
Can still use .fx files, but not needed as muchShaders are from a more limited set of choicesGood for higher framerates by limiting maximum # of draw calls due to shader changesMust build shader parameters into geometry & textures to get the speed benefit
Questions So Far?
Test Engine Overview
Test Engine - OverviewTop level of the scene is a 3d Grid of 16x16x16 meter cells
Triangles are clipped to the gridEach Cell has a Vertex and an Index BufferAABBTree of collision triangles matching the tessellation of the rendering trianglesAlso a vector of material records
Contains Index Buffer range for trianglesContains AABox for triangles with material
List of Moving EntitiesContains AABoxContains reference to mesh data for rendering only
Test Engine - Overview
Advantage to breaking the world into large cellsEfficient for cullingCan share same VB and IB without going over 65K vertex or triangle limits
Can use 16-bit indices for IBCan use 16-bit indices for AABB tree
Can compress AABB boxes in tree to 16 or 8 byte per axis and still have good precisionCan more quickly reject moving entities in other cellsCan restrict lighting to only 7 lights per tile
Many Features, Few Draw Calls
An entire world cell is drawn in one draw callUp to 7 LightsDiffuse & Specular Bump MappingSoft shadowsGloss-Mapped, Color Shifted SpecularMasked EmissiveWater or Fog Depth stored in Dest Alpha
Fog, Mist and Water are a partial alpha passBlend in fog layer colors based on dest alpha
Shadows & Deep Water
Shadows for world geometry are pre-calculated for up to 7 lights
Per-Light occlusion terms are stored in diffuse.rgba and specular.rgb per vertex
The vertex shader calculates the 7 L vectorsScales the L vectors down by the occlusion & attenuation termsPerforms per-vertex N.LAdds up scaled L vectors
Test Engine - Lighting
Averaged L Bump Mapping
The Pixel Shader uses this averaged L vector to perform bump mapping – ‘Averaged L Bump Mapping’
Bump mapping is nice, but not worth it to do for many lights
This way, if lights change intensity or turn on and off, the bumps respond to the most intense lights
The bump mapping still corresponds to the scene’s lighting, but no need to do up to 7 rendering passes for 7 lights
Averaged L Vectors for “Averaged L Bump Mapping”
Metallic Specular