Upload
jasinb
View
1.033
Download
0
Tags:
Embed Size (px)
DESCRIPTION
One of the key challenges of modern 3D game rendering engines powering the next-generation of console games is to minimize resources spent on assets that do not actually contribute to the user experience. More specifically, determining which surfaces are hidden behind (occluded by) other surfaces can be a very hard problem to solve in real-time, but will typically yield significant performance gains. Real-time occlusion culling typically requires either a vast amount of manual labor or a computationally intensive pre-processing step. In this talk, I will show how the occluder generation step can actually be considered embarrassingly parallel, and distributed across multiple nodes accordingly. I will also discuss how this model can be further improved.
Citation preview
Embarrassingly Parallel Computation for Visibility
Jasin BushnaiefUmbra Software
Who are we?
• The only occlusion culling middleware company in the world
• Founded in 2006• Based in Helsinki• 12 people• Customers: Bungie (Halo), Guerrilla (Killzone),
Remedy (Alan Wake), Bioware (Mass Effect), CD Projekt (Witcher), ArenaNet (Guild Wars) and many more
We’re going to talk about
• The past– Brief introduction to occlusion culling– Traditional methods of visibility computation
• The present– Umbra’s visibility computation algorithm– How it can be distributed
• The future– Challenges of modern games and engines
SO, WHAT’S OCCLUSION CULLING ANYWAY?
The Past:
Graphics in games
• Game development process:– Artists create content– Engine runtime renders it
• Rendering– Content consists of objects– Which consist of triangles– Which get rendered by the GPU
• Our business: rendering optimization
Occlusion culling explained
• ”Culling is the process of removing breeding animals from a group based on specific criteria.” (Wikipedia)
• Hidden surface removal: ”Which surfaces do not contribute to the final rendered image on the screen?”
• Some popular HSR methods:– Frustum culling– Backface culling– Occlusion culling
Occlusion culling explained
• Occlusion culling: ”Which surfaces are blocked (occluded) by other surfaces?”
• Depth buffering is one way to do OC– Very accurate (i.e. pixel level)– Ubiquitous on hardware, easy problem to solve– Occurs very late in the pipeline
Occlusion culling explained
• Higher-level methods complement depth-buffering nicely
• These cull entire objects, groups of objects or entire sections of the scene– Not easy!
• The earlier, the better
Occlusion culling
Only the objects visible to the camera are rendered
”Traditional” way to do OC
• Preprocess:– Divide scene into cells– Compute visibility between cells• Results in a visibility matrix (PVS)
• Runtime:– Locate the camera– Do a lookup into the PVS matrix
Simple example
Split scene into cells
A
D
B
C
Compute visibility (sampling)
A B C DA 1 1 1 0BCD
A
DC
B
Compute visibility
A B C DA 1 1 1 0B 1 1 0 1CD
A
DC
B
Compute visibility
A B C DA 1 1 1 0B 1 1 0 1C 1 0 1 1D
A
DC
B
Compute visibility
A B C DA 1 1 1 0B 1 1 0 1C 1 0 1 1D 0 1 1 1
A
DC
B
Runtime PVS culling
A B C DA 1 1 1 0B 1 1 0 1C 1 0 1 1D 0 1 1 1
C
A
D
B
Problem?
• Solving visibility between cells is very difficult– E.g. Solving analytically is actually O(n4)
• Global operation by nature• Doesn’t play well with dynamic scenes– Worst case: a change in one cell requires
recomputation of the entire matrix
UMBRA DOES IT BETTERThe Present
Welcome to the 2010s
• Modern game worlds are huge• So it’d be cool if you didn’t need the entire
scene in memory, ever• It’d be even cooler if the heavy lifting could be
distributed. Or sent to the Cloud™• Buildings collapse. Things change.
The Umbra approach
• Don’t actually compute visibility for the entire scene
• Instead, process geometry to create a datastructure to solve visibility in the runtime
• Portal culling in the runtime
Data generation
• Data = portal graph• Generate local graphs individually reasonably-
sized geometry chunks (tiles), in parallel• Combine the results into a global portal graph
that can be quickly traversed• Solve visibility quickly in the runtime using this
graph
Will this work?
• Portal generation– Is very hard, but possible to do automatically– Only local geometry needed→Pretty much an embarrassingly parallel problem
• Runtime– Not as simple as a PVS lookup, but still quite fast
Simple example revisited
Split geometry into tiles
Tile 3Tile 2Tile 1Tile 0
Dispatch tiles to worker nodes
Tile 3Tile 2Tile 1Tile 0
Generate portals
Combine portal graph
Runtime query: traverse portals
Runtime
What did we do here?
• Essentially a map-reduce– Split scene into distributable tiles– Generate local portal graph for each tile– Combine results, link global portal graph
Scene
Map
Global portal graph
Visible objects
Tile 0
Tile 1
Tile n
Portals 0
Portals 1
Portals n
Redu
ce
Que
ry
... ...
THE NEXT GENERATIONThe Future
Turns out...
• Even the initial ”map” is too much for large game worlds
• A global graph of a vast world is too expensive in the runtime
• You need to support multiple versions of some chunks for dynamic content– Quite a combinatorial problem
→ Next-gen games require an even better solution!
Runtime
So we did something like this
Graph A Visible objects
Tile 0
Tile 1
Tile 2
Tile 3
Portals 0
Portals 1
Portals 2
Portals 3
Tile n Portals n
Graph B
Que
ry
Visible objects
Que
ry
Com
bine
Com
bine
... ... ...
Runtime
Got rid of ”map”
Graph A Visible objects
Tile 0
Tile 1
Tile 2
Tile 3
Portals 0
Portals 1
Portals 2
Portals 3
Tile n Portals n
Graph B
Que
ry
Visible objects
Que
ry
Com
bine
Com
bine
... ... ...
Runtime
Split up ”reduce”, moved to runtime
Graph A Visible objects
Tile 0
Tile 1
Tile 2
Tile 3
Portals 0
Portals 1
Portals 2
Portals 3
Tile n Portals n
Graph B
Que
ry
Visible objects
Que
ry
Com
bine
Com
bine
... ... ...
Questions?