Realtime Rendering
Karljohan Lundin Palmerius
About Me● Karljohan Lundin Palmerius● Ph.D. in Scientific Visualization● Research on multimodal visualization
Course Contents● Realtime Rendering
– goals, conditions, concepts, algorithms● Parallel Processing
– architectures and programming● Graphics Hardware
– mobile graphics, shading languages● Self study course
Course ContentsRealtime Rendering KJ
RtR cont. KJ
Parallel Architectures MC
Parallel Programming MC
GPU Shaders JJ
Mobile Graphics AH
GPU Shaders cont. JJ
Summary or Buffert KJ
Course Contents● Laborations – deadline 071019
– Parallel processing graphics– GLSL shader programming– Hand in report, code and executable– Individual or in pair
● Essay or Project – deadline 071019– Project/essay suggestion/topic by 070921– Individual or in pair
Lecture Contents● What is Realtime rendering● Achieve speed and flexibility
– concepts, theories and algorithms– shadows, culling– basic cheating, advanced cheating
Realtime Rendering● Computer graphics with a timeconstraint
– “Realtime” is relative– Computer games, interactive visualization– Limited resources, time limits and other demands
● Concepts:– do as much as you can!– don't do more than you can!
Realtime Rendering
Data
VisualizationProcesses
Interaction
Issue 1● Do as much as you can!
Do as much as you can!● Acceleration
– software acceleration● code optimization● parallelization
– hardware acceleration● dedicated hardware● hardware optimized graphics approach
Software Acceleration● Code optimization
– algorithm complexity– inner loop optimization– caching / cache optimization
Code Optimization● Algorithm complexity
– Simple operation many times● Keep down the number of times – ordo notation● c.f. radiosity vs gouraud shading
– Algorithm optimization
O N2O N
Code Optimization● Inner loop optimization
– Simple operation many times● Keep the cost low
– Lowlevel optimization● LUT or replace cost intensive functions
– pow, arctan, sincos – Taylor Polynomials● CPU specific routines● Stall minimization● Manual assembly coding
Code Optimization● caching / cache optimization
– Fit as much as possibleas close as possible to CPU
● memory coherency / block data● L1 ~ 16 kB L2 ~ 1 MB
RAM ~ 1 GB HD ~ 1 TB– advanced HPC
CPU
L1 Cache
L2 Cache
RAM
L3 Cache
Hard Drive
Parallelization● Independent operation in parallel
– trivial● unsychronized read● screen space multiplex
– nontrivial● synchronized write● object space multiplex
● Workload balancing
Hardware Acceleration● Hardware optimized algorithms
– simple operations in parallel pipelines– polygon rasterization approach
● Specialized hardware– GPU, PPU– Hardware interface
● Driver● API: OpenGL, Direct3D, PhysX
Graphics HardwareVertex Positions Vertex Connections
Vertex Operations
Rasterization
Textures
Primitives
Fragment Operations
Buffer
Primitives
Blending
Image Data
Hardware Acceleration● General Purpose Hardware
– Multicore x86● 2 x Quad core = 8 virtual CPUs● TILE64 = 64 virtual CPUs
– FieldProgrammable Gate Array(FPGA)
– Cell Broadband Engine● 1 st Power Processing Element● 8 st Synergistic Processing Elements
Hardware Acceleration● General Purpose Hardware
– Hardware Interface● Compiler / compiler extensions● Optimization tools
Hardware Acceleration● Caveats
– Software distribution● unknown hardware support● resulting in disabled functions● driver may simulate features
– Hard to realize● design algorithms● implement in program
Fast Algorithms● Fast lighting and shadows
Shadowing Algorithms● Global illumination (radiosity)
– really, really slow● Raytracing
– very slow● Hardware accelerated
– Shadow Volume, Shadow MappingShadow Texture, Etc...
Global Illumination
Scene
Connectivity Radiosity Light Maps
O N2 O N2 logN
Lighting
Rendering
ON
Global Illumination● Hierarchical radiosity
– group patches far away– group patches in early iterations
● (Quasi) MonteCarlo radiosity● Transillumination planes
– intermediate transspatial radiosity map– can be hardware accelerated
Raytracing● for each buffer row
for each row pixelfor each object
check for intersection
.=O W⋅H⋅N
Raytracing● Speedup
– imagespace (trivial) parallelization– collision search speedup– frametoframe coherency
● Close to interactive with current hardware
Projected Planar Shadows● Only planar surfaces
● Project object and colour black
x⋅n=d
p'=xL−dn⋅xL
n⋅p−xLp−xL
Shadow Volumes● For each light source
– polygon shadow volume● Multipass render
1) render all in shadow2) render shadow polygons
– to mask buffer3) render lighting masked
.=O N!
Shadow Volumesgenerate shadow polygons
render scene with ambient and emission lighting
for each front face shadow polygon
if Zbuffer test passesthen increment stencil buffer value
for each shadow polygon back face
if Zbuffer test passesthen decrement stencil buffer value
render lit scene with stencil buffer mask
Shadow Volumes● Shadow volume estimation
– computationally expensive– dependent on scene complexity
Shadow Mapping● Completely imagespace algorithm!
1) Render scene from light source > depth map2) Render scene from viewpoint (use map)– check each fragment against depth buffer from light
.=ON
Shadow Mappingfor each light
render scene to light's depth buffer
for each fragment
transform fragment into each light's buffer space
if fragment depth < light buffer depththen render fragment lit by lightelse render fragment shadowed
Shadow Mapping● Precision limit artifacts
– depth buffer precision is limited=> self shadowing
– solution● substract bias from Zvalue● glPolygonOffset
Shadow Mapping● Resolution mismatch artefacts
– depth fragment size in image space
– solutions● percentage closer filtering● perspective shadow maps
d=d s
r sr i
coscos
r s
r i
Shadow Mapping
Shadow Texture● Hack!
– texture shadow onto object
glTexEnvf(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_MODULATE);
Issue 2● Don't do more than you can!
Don't do more than you can!● Framerate maintenance
– Remove unnecessary work– Flexibility
● adjust rendering to fit conditions● graphics hardware, memory, CPU
– Degrading● use graceful degradation● adaptive or predetermined degradation
Prepare Your Data● Pre processing
– prepare and organize● hierarchical scenes● data structures
– when?● compiletime● initializationtime● runtime
Design
Compilation
Initialization
Runtime
Culling● Remove objects that will not affect the scene
– behind the viewer (near/far plane)– outside view frustum– behind another object
Culling● Bounding volumes
– cull groups with bounding spaces outside the view– bounding
● sphere● axisaligned bounding box (AABB)● oriented bounding box (OBB)
– should be easy to work with– should have small void space
Bounding Volumes
Bounding Volume Culling
Scenegraph● Hierarchical rendering● Groups● Local transformation
– wheel relative car– car relative earth– earth relative sun
Earth transform
Car transform
Wheel transforms
Wheel
Hierarchical Bounding Volumes
Scene Partitioning● Rooms are very nice
– limited contents– occluding walls– static relation to other rooms
● Intelligent door placement– automatic closing– cellsandportals
Scene Partitioning
Scene Partitioning● Scene splitting benefits
– intelligent culling● visibility lists● realtime visibility tests
– scene preloading● objects and textures
– detail adjustments● physics simulation● artificial intelligence
Automatic Partitioning● Binary Space Partitioning (BSP)
– recursive space division– results in hierarchy with convex leaves– results in “BSP tree”
● Breaking the Walls (BW)– two pass partitioning– supports outdoor and dynamic scenes
BSP Algorithm
BW Algorithm● First pass
– traverse edges (walls)● e.g. counter clockwise
– local greedy algorithm● choose shortest path● adjacent wall or portal
BW Algorithm● Second pass
– merge cells with long portals– split over estimated cells
● cells that can be split● split at concave corners
Partitioning Algorithms
Scene Partitioning● Alternative partitioning
– merge BSP cells by visibility lists– regular grid blocks– quadtree
Cheating
Cheating● Don't use more effort than necessary
– psychophysical aspects (perception)● speed, contrast, (eccentricity)● fog, (depthoffield)
– projected size
Cheating
Cheating● Object impact
– speed, contrast, projected size– use bounding volume projected size– analyze prerendered image of object
● Simplifications– polygonal resolution– texturing– shaders
Object Resolution● Levelofdetail (LoD)
– use as low resolution as possible– select resolution at rendertime
Textures● Replace geometrical details with texture
– clothes, faces and ground
Textures● Imagebased rendering
– Use photographs of things● Lighting● Material and colours● Textures (microstructure)
Fragment Shaders● Hardware accelerated details
– each fragment individually estimated– realtime updated texturing– bumpiness, details, etc.– relief texturing
Sprites and Impostors● Sprite
– often billboard – polygon facing the camera– image/photograph/video instead of object– commonly plants, grass, people
● Impostors– replace geometry with image
● prerendered / dynamically generated
Adaptive Degradation● Adjust degradation at runtime
– Predetermine degradation● time buffer● priority queue
– depending on distance, size, velocity● buy quality for each object
– Dynamic degradation● adjust quality if framerate is high or low● hysteresis
LevelofDetails
LevelofDetails● Adaptive cheating
– select model resolution at runtime– usually distance specific
● Issues– generating detail levels– error estimation / level selection– pop / popup artifacts– resolution mismatch
Pop Artifacts● Too large of a change
– object morphing – change in shape– creeping texture– popping attracts eye
● Remidy– gradual change is invisible– change at sub visible size (resolution)
Resolution Mismatch● Part of different resolution don't match
– Texture resolution change
– Polygon patches
MIPmap● Texture resolution issues
– Aliasing at distance – moiré patterns– Unnecessarily high resolution– Too low resolution
● Solution– Multum In Parvo (MIP map)– Multiple texture resolutions– Hardware support in OpenGL
MIPmap
MIPmap
Clipmap● Clipped MIPmap (SGI)
– Full texture at low resolution– Only partial texture at high resolution
LoD Polygonal Objects● Multiple resolution versions
– Generate multiple resolutions– Select right resolution
Decimation
Decimation
Decimation
Switch
Decimation
Decimation● Vertex clustering
– Cluster vertices using regular grid● Replace cluster with single vertex● Update faces accordingly
– Poor result● low quality● no triangle count specification● no error measure
75 ~28%
Decimation● Remove vertices to simplify model
– Vertex removal, edge contraction
– Remove vertex/edge until● exceeding error threshold● reaching triangle count
108 ~20%
Quadric Error Metric● “Quadric Decimation”
– Edge / nonedge contraction– Associate vertex with error quadric Q
– Use heap with vertices and costs
v =vTQv
Q1,2=Q1Q2
cost=v 'T Q1Q2 v '
v '=Q1,2−1 0,0,0,1
T
Quadric Error Metric● Initial Quadric Error
– approximate plane distance square error
v = ∑p∈planes v
pT v 2
= ∑p∈planes v
vT p pT v
= ∑p∈planes v
vT p pT v
=vT
∑p∈planes v
p pT
v
⇒Q= ∑p∈planes v
p pT
Adaptive Decimation● What if different parts are at different distance?
– Parts of same object requires different resolution– MIPmapping for textures– Adaptive decimation for polygonal data
● e.g. ground meshes
Adaptive Decimation● Alternatives
– Patches– Adaptive decimation
● ROAM● SOAR
● Criteria– distance, frustum culling,
decimation error, lineofsight
ROAM● Realtime Optimally Adapting Meshes
– Triangle bintree– Timedependent priority– Priority queues
● Incremental changes● Split● Merge
ROAM● Forced split
ROAM
ROAM● Error Metrics
– Nested wedges – hierarchical error bound– Screen distortions – project wedge on screen– Check lineofsight
● intersection with wedge – modify triangle priority– View frustum, backface detail reduction,
specular highlights, silhouette edges,atmospheric obscurance, object positioning
SOAR● Stateless, Onepass Adaptive Refinement
– Longest edge intersection, 4k mesh, ...– Multiresolution mesh forms DAG of vertices
● children belong to higher resolution level● nested error terms ensure active parent of active vertex
SOAR● Pros
– Simple to implement– No forced split– High memory coherency
● Cons– No frametoframe coherency
The end