Karljohan Lundin Palmerius - weber.itn.liu.se

Preview:

Citation preview

Real­time Rendering

Karljohan Lundin Palmerius

About Me● Karljohan Lundin Palmerius● Ph.D. in Scientific Visualization● Research on multi­modal visualization

Course Contents● Real­time Rendering

– goals, conditions, concepts, algorithms● Parallel Processing

– architectures and programming● Graphics Hardware

– mobile graphics, shading languages● Self study course

Course ContentsReal­time Rendering KJ

RtR cont. KJ

Parallel Architectures MC

Parallel Programming MC

GPU Shaders JJ

Mobile Graphics AH

GPU Shaders cont. JJ

Summary or Buffert KJ

Course Contents● Laborations – deadline 07­10­19

– Parallel processing graphics– GLSL shader programming– Hand in report, code and executable– Individual or in pair

● Essay or Project – deadline 07­10­19– Project/essay suggestion/topic by 07­09­21– Individual or in pair

Lecture Contents● What is Real­time rendering● Achieve speed and flexibility

– concepts, theories and algorithms– shadows, culling– basic cheating, advanced cheating

Real­time Rendering● Computer graphics with a time­constraint

– “Real­time” is relative– Computer games, interactive visualization– Limited resources, time limits and other demands

● Concepts:– do as much as you can!– don't do more than you can!

Real­time Rendering

Data

VisualizationProcesses

Interaction

Issue 1● Do as much as you can!

Do as much as you can!● Acceleration

– software acceleration● code optimization● parallelization

– hardware acceleration● dedicated hardware● hardware optimized graphics approach

Software Acceleration● Code optimization

– algorithm complexity– inner loop optimization– caching / cache optimization

Code Optimization● Algorithm complexity

– Simple operation many times● Keep down the number of times – ordo notation● c.f. radiosity vs gouraud shading

– Algorithm optimization

O N2O N

Code Optimization● Inner loop optimization

– Simple operation many times● Keep the cost low

– Low­level optimization● LUT or replace cost intensive functions

– pow, arctan, sincos – Taylor Polynomials● CPU specific routines● Stall minimization● Manual assembly coding

Code Optimization● caching / cache optimization

– Fit as much as possibleas close as possible to CPU

● memory coherency / block data● L1 ~ 16 kB L2 ~ 1 MB

RAM ~ 1 GB HD ~ 1 TB– advanced HPC

CPU

L1 Cache

L2 Cache

RAM

L3 Cache

Hard Drive

Parallelization● Independent operation in parallel

– trivial● unsychronized read● screen space multiplex

– non­trivial● synchronized write● object space multiplex

● Work­load balancing

Hardware Acceleration● Hardware optimized algorithms

– simple operations in parallel pipelines– polygon rasterization approach

● Specialized hardware– GPU, PPU– Hardware interface

● Driver● API: OpenGL, Direct3D, PhysX

Graphics HardwareVertex Positions Vertex Connections

Vertex Operations

Rasterization

Textures

Primitives

Fragment Operations

Buffer

Primitives

Blending

Image Data

Hardware Acceleration● General Purpose Hardware

– Multi­core x86● 2 x Quad core = 8 virtual CPUs● TILE64 = 64 virtual CPUs

– Field­Programmable Gate Array(FPGA)

– Cell Broadband Engine● 1 st Power Processing Element● 8 st Synergistic Processing Elements

Hardware Acceleration● General Purpose Hardware

– Hardware Interface● Compiler / compiler extensions● Optimization tools

Hardware Acceleration● Caveats

– Software distribution● unknown hardware support● resulting in disabled functions● driver may simulate features

– Hard to realize● design algorithms● implement in program

Fast Algorithms● Fast lighting and shadows

Shadowing Algorithms● Global illumination (radiosity)

– really, really slow● Raytracing

– very slow● Hardware accelerated

– Shadow Volume, Shadow MappingShadow Texture, Etc...

Global Illumination

Scene

Connectivity Radiosity Light Maps

O N2 O N2 logN

Lighting

Rendering

ON

Global Illumination● Hierarchical radiosity

– group patches far away– group patches in early iterations

● (Quasi) Monte­Carlo radiosity● Trans­illumination planes

– intermediate transspatial radiosity map– can be hardware accelerated

Raytracing● for each buffer row

for each row pixelfor each object

check for intersection

.=O W⋅H⋅N

Raytracing● Speed­up

– image­space (trivial) parallelization– collision search speed­up– frame­to­frame coherency

● Close to interactive with current hardware

Projected Planar Shadows● Only planar surfaces

● Project object and colour black

x⋅n=d

p'=xL−dn⋅xL

n⋅p−xLp−xL

Shadow Volumes● For each light source

–  polygon shadow volume● Multi­pass render

1) render all in shadow2) render shadow polygons

– to mask buffer3) render lighting masked

.=O N!

Shadow Volumesgenerate shadow polygons

render scene with ambient and emission lighting

for each front face shadow polygon

if Z­buffer test passesthen increment stencil buffer value

for each shadow polygon back face

if Z­buffer test passesthen decrement stencil buffer value

render lit scene with stencil buffer mask

Shadow Volumes● Shadow volume estimation

– computationally expensive– dependent on scene complexity

Shadow Mapping● Completely image­space algorithm!

1) Render scene from light source ­> depth map2) Render scene from viewpoint (use map)– check each fragment against depth buffer from light

.=ON

Shadow Mappingfor each light

render scene to light's depth buffer

for each fragment

transform fragment into each light's buffer space

if fragment depth < light buffer depththen render fragment lit by lightelse render fragment shadowed

Shadow Mapping● Precision limit artifacts

– depth buffer precision is limited=> self shadowing

– solution● substract bias from Z­value● glPolygonOffset

Shadow Mapping● Resolution mismatch artefacts

– depth fragment size in image space

– solutions● percentage closer filtering● perspective shadow maps

d=d s

r sr i

coscos

r s

r i

Shadow Mapping

Shadow Texture● Hack!

– texture shadow onto object

glTexEnvf(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_MODULATE);

Issue 2● Don't do more than you can!

Don't do more than you can!● Frame­rate maintenance

– Remove unnecessary work– Flexibility

● adjust rendering to fit conditions● graphics hardware, memory, CPU

– Degrading● use graceful degradation● adaptive or pre­determined degradation

Prepare Your Data● Pre processing

– prepare and organize● hierarchical scenes● data structures

– when?● compile­time● initialization­time● run­time

Design

Compilation

Initialization

Run­time

Culling● Remove objects that will not affect the scene

– behind the viewer (near/far plane)– outside view frustum– behind another object

Culling● Bounding volumes

– cull groups with bounding spaces outside the view– bounding

● sphere● axis­aligned bounding box (AABB)● oriented bounding box (OBB)

– should be easy to work with– should have small void space

Bounding Volumes

Bounding Volume Culling

Scene­graph● Hierarchical rendering● Groups● Local transformation

– wheel relative car– car relative earth– earth relative sun

Earth transform

Car transform

Wheel transforms

Wheel

Hierarchical Bounding Volumes

Scene Partitioning● Rooms are very nice

– limited contents– occluding walls– static relation to other rooms

● Intelligent door placement– automatic closing– cells­and­portals

Scene Partitioning

Scene Partitioning● Scene splitting benefits

– intelligent culling● visibility lists● real­time visibility tests

– scene pre­loading● objects and textures

– detail adjustments● physics simulation● artificial intelligence

Automatic Partitioning● Binary Space Partitioning (BSP)

– recursive space division– results in hierarchy with convex leaves– results in “BSP tree”

● Breaking the Walls (BW)– two pass partitioning– supports outdoor and dynamic scenes

BSP Algorithm

BW Algorithm● First pass

– traverse edges (walls)● e.g. counter clockwise

– local greedy algorithm● choose shortest path● adjacent wall or portal

BW Algorithm● Second pass

– merge cells with long portals– split over estimated cells

● cells that can be split● split at concave corners

Partitioning Algorithms

Scene Partitioning● Alternative partitioning

– merge BSP cells by visibility lists– regular grid blocks– quad­tree

Cheating

Cheating● Don't use more effort than necessary

– psychophysical aspects (perception)● speed, contrast, (eccentricity)● fog, (depth­of­field)

– projected size

Cheating

Cheating● Object impact

– speed, contrast, projected size– use bounding volume projected size– analyze pre­rendered image of object

● Simplifications– polygonal resolution– texturing– shaders

Object Resolution● Level­of­detail (LoD)

– use as low resolution as possible– select resolution at render­time

Textures● Replace geometrical details with texture

– clothes, faces and ground

Textures● Image­based rendering

– Use photographs of things● Lighting● Material and colours● Textures (microstructure)

Fragment Shaders● Hardware accelerated details

– each fragment individually estimated– real­time updated texturing– bumpiness, details, etc.– relief texturing

Sprites and Impostors● Sprite

– often billboard – polygon facing the camera– image/photograph/video instead of object– commonly plants, grass, people

● Impostors– replace geometry with image

● pre­rendered / dynamically generated

Adaptive Degradation● Adjust degradation at run­time

– Pre­determine degradation● time buffer● priority queue

– depending on distance, size, velocity● buy quality for each object

– Dynamic degradation● adjust quality if frame­rate is high or low● hysteresis

Level­of­Details

Level­of­Details● Adaptive cheating

– select model resolution at run­time– usually distance specific

● Issues– generating detail levels– error estimation / level selection– pop / pop­up artifacts– resolution mis­match

Pop Artifacts● Too large of a change

– object morphing – change in shape– creeping texture– popping attracts eye

● Remidy– gradual change is invisible– change at sub visible size (resolution)

Resolution Mis­match● Part of different resolution don't match

– Texture resolution change

– Polygon patches

MIP­map● Texture resolution issues

– Aliasing at distance – moiré patterns– Unnecessarily high resolution– Too low resolution

● Solution– Multum In Parvo (MIP map)– Multiple texture resolutions– Hardware support in OpenGL

MIP­map

MIP­map

Clip­map● Clipped MIP­map (SGI)

– Full texture at low resolution– Only partial texture at high resolution

LoD Polygonal Objects● Multiple resolution versions

– Generate multiple resolutions– Select right resolution

Decimation

Decimation

Decimation

Switch

Decimation

Decimation● Vertex clustering

– Cluster vertices using regular grid● Replace cluster with single vertex● Update faces accordingly

– Poor result● low quality● no triangle count specification● no error measure

75 ~28%

Decimation● Remove vertices to simplify model

– Vertex removal, edge contraction

– Remove vertex/edge until● exceeding error threshold● reaching triangle count

108 ~20%

Quadric Error Metric● “Quadric Decimation”

– Edge / non­edge contraction– Associate vertex with error quadric Q

– Use heap with vertices and costs

v =vTQv

Q1,2=Q1Q2

cost=v 'T Q1Q2 v '

v '=Q1,2−1 0,0,0,1

T

Quadric Error Metric● Initial Quadric Error

– approximate plane distance square error

v = ∑p∈planes v

pT v 2

= ∑p∈planes v

vT p pT v

= ∑p∈planes v

vT p pT v

=vT

∑p∈planes v

p pT

v

⇒Q= ∑p∈planes v

p pT

Adaptive Decimation● What if different parts are at different distance?

– Parts of same object requires different resolution– MIP­mapping for textures– Adaptive decimation for polygonal data

● e.g. ground meshes

Adaptive Decimation● Alternatives

– Patches– Adaptive decimation

● ROAM● SOAR

● Criteria– distance, frustum culling,

decimation error, line­of­sight

ROAM● Real­time Optimally Adapting Meshes

– Triangle bin­tree– Time­dependent priority– Priority queues

● Incremental changes● Split● Merge

ROAM● Forced split

ROAM

ROAM● Error Metrics

– Nested wedges – hierarchical error bound– Screen distortions – project wedge on screen– Check line­of­sight

● intersection with wedge – modify triangle priority– View frustum, backface detail reduction,

specular highlights, silhouette edges,atmospheric obscurance, object positioning

SOAR● Stateless, One­pass Adaptive Refinement

– Longest edge intersection, 4­k mesh, ...– Multiresolution mesh forms DAG of vertices

● children belong to higher resolution level● nested error terms ensure active parent of active vertex

SOAR● Pros

– Simple to implement– No forced split– High memory coherency

● Cons– No frame­to­frame coherency

The end

Recommended