Upload
mandar
View
31
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Enhancing locality in ray tracing algorithms. 612 presentations by Vidhyashankar Venkataraman Biswanath Panda. Introduction. Two-lecture series We will be discussing two methods to preserve locality By processing data groups that are likely to be accessed at the same time - PowerPoint PPT Presentation
Citation preview
Enhancing locality in ray Enhancing locality in ray tracing algorithmstracing algorithms
612 presentations by612 presentations by
Vidhyashankar VenkataramanVidhyashankar Venkataraman
Biswanath PandaBiswanath Panda
IntroductionIntroduction
Two-lecture seriesTwo-lecture series
We will be discussing two methods to We will be discussing two methods to preserve locality preserve locality By processing data groups that are likely to be By processing data groups that are likely to be
accessed at the same timeaccessed at the same time
Today : A locality-aware algorithm in ray Today : A locality-aware algorithm in ray tracingtracing
What is Computer Graphics What is Computer Graphics (CG)? (CG)?
Generating imagesGenerating images Lots of cool ones in this talk!Lots of cool ones in this talk!
Deals withDeals with Geometric modeling : The math and physics Geometric modeling : The math and physics Rendering : Model to imagesRendering : Model to images Animation : Time dependent behavior of Animation : Time dependent behavior of
objectsobjects
Applications in games, real world Applications in games, real world simulations, CADsimulations, CAD
Rendering an imageRendering an image
Produce scene Produce scene image on the image on the image planeimage plane
Three parts:Three parts: Geometry ModelingGeometry Modeling
Illumination of Illumination of objectsobjects
Surface complexity : Surface complexity : TextureTexture
1) Modeling geometry1) Modeling geometry
Regular objects Regular objects easy to representeasy to represent Eg. Sphere (R,x,y,z)Eg. Sphere (R,x,y,z)
Complicated Complicated objects through a objects through a ‘mesh’ of polygons‘mesh’ of polygons
Millions of Millions of primitives for a primitives for a single scenesingle scene
2) Illumination modeling : 2) Illumination modeling : ShadingShading
Lighting of objects (shading)Lighting of objects (shading)
Light energy absorbed, Light energy absorbed, reflected or transmittedreflected or transmitted Degree varies with nature of Degree varies with nature of
each objecteach object Expressed for R,G,BExpressed for R,G,B
Various aspects to think ofVarious aspects to think of Diffuse and specular lightingDiffuse and specular lighting RefractionRefraction ShadowsShadows
Mathematical models Mathematical models availableavailable
Global IlluminationGlobal Illumination
3) Surface complexity - 3) Surface complexity - TextureTexture
To represent surface To represent surface roughnessroughness
The ‘jaggedness’The ‘jaggedness’
Texture map : Simple 2-D to 3-Texture map : Simple 2-D to 3-D surfaceD surface
Can add geometric detailCan add geometric detail Difficult with polygons Difficult with polygons
Also used to represent Also used to represent complicated surfacescomplicated surfaces
Eg: MarblesEg: Marbles Reflection of a scene on a Reflection of a scene on a
complex polished surfacecomplex polished surface
Storage Complexity?Storage Complexity? 100s of KB to 100s of MB!100s of KB to 100s of MB!
More pictures…More pictures…
Rendering an imageRendering an image
Process of converting 3-D scene to actual Process of converting 3-D scene to actual imageimage Projection of the 3-D objects onto an image planeProjection of the 3-D objects onto an image plane
Global illumination : more realisticGlobal illumination : more realistic
Various methods availableVarious methods available Ray-tracingRay-tracing Scan-line conversionScan-line conversion
Ray tracingRay tracing Introduced in 1980 by Turner WhittedIntroduced in 1980 by Turner Whitted
First global illumination algorithmFirst global illumination algorithm
Insight : To find the color of each pixel : BacktracingInsight : To find the color of each pixel : Backtracing Trace rays from eye (pixel) into sceneTrace rays from eye (pixel) into scene Rays intersect with objects and get reflected or transmittedRays intersect with objects and get reflected or transmitted Shadows, reflection, refractionsShadows, reflection, refractions
Algorithm in picturesAlgorithm in picturesNo intersection Single intersection with object
Intersected object couldbe directly illuminated
Algorithm in PicturesAlgorithm in PicturesShadow region
Reflection
Algorithm in picturesAlgorithm in picturesRefraction : Transmission of rays
Multiple reflections
In short…In short…
Shoot ray from eye through pixel into the sceneShoot ray from eye through pixel into the scene
Obtain intersection point if anyObtain intersection point if any
Spawn off new rays in the incident directions wrt Spawn off new rays in the incident directions wrt reflection, refraction, direct lighting or through reflection, refraction, direct lighting or through shadowsshadows
Color of pixel is the sum of light energies of all of Color of pixel is the sum of light energies of all of them (called the radiance)them (called the radiance)
The secondary rays will also spawn off new rays : The secondary rays will also spawn off new rays : Recursively performedRecursively performed
The algorithm in textThe algorithm in text
For each pixel (x,y) in image, generate corresponding ray in 3DFor each pixel (x,y) in image, generate corresponding ray in 3D
Image(x,y) := TraceRay(ray)Image(x,y) := TraceRay(ray)
TraceRay(ray)TraceRay(ray)1) Compute nearest surface-ray intersection1) Compute nearest surface-ray intersection2) If none found return background color2) If none found return background color3) Compute direct illumination from 3) Compute direct illumination from eacheach light sourcelight source4) Compute illumination arriving from 4) Compute illumination arriving from reflected directionreflected direction5) Compute illumination arriving from 5) Compute illumination arriving from refracted directionrefracted direction6) Combine all illuminations6) Combine all illuminations7) Return resulting color7) Return resulting color
Step 3 involves testing visibility of source by shooting shadow ray Step 3 involves testing visibility of source by shooting shadow ray towards ittowards it
Steps 4 and 5 involve recursive calls to TraceRay using corresponding Steps 4 and 5 involve recursive calls to TraceRay using corresponding raysrays
The ray treeThe ray tree
Recursive calls represented as a tree
RT : Backward TracingRT : Backward Tracing
First ray-traced image
Surface-ray intersectionSurface-ray intersection
Most important partMost important part Closest intersectionClosest intersection Surface primitives : polygons, spheres, cubesSurface primitives : polygons, spheres, cubes
Too expensive to test for each surface primitive in Too expensive to test for each surface primitive in scenescene Moving GB of geometry in and out of memory!Moving GB of geometry in and out of memory!
Optimizations :Optimizations : Curb depth of treeCurb depth of tree Faster and fewer intersection calculationsFaster and fewer intersection calculations
Bounding volume of each object by some regular shape Bounding volume of each object by some regular shape (sphere / cube)(sphere / cube)
Spatial Subdivision (discussed in next slide)Spatial Subdivision (discussed in next slide)
Optimizations – ‘Voxel’ Optimizations – ‘Voxel’ subdivisionsubdivision
Uniform subdivisionAdaptive subdivision (Octree)
Voxel is a 3-D sub-region of a scene
Issues in renderingIssues in rendering Pros and cons of RTPros and cons of RT
Pros :Pros : Almost accurately lit if tree is sufficiently deepAlmost accurately lit if tree is sufficiently deep Simple algorithmSimple algorithm
Cons :Cons : For faster rendering, standard traversals may not be coherent, For faster rendering, standard traversals may not be coherent,
hence can lead to a large number of page faultshence can lead to a large number of page faults
Other rendering algorithmOther rendering algorithm Scan-line based : Can render complex scenesScan-line based : Can render complex scenes Inaccurate illumination : Very unrealisticInaccurate illumination : Very unrealistic Much faster than RTMuch faster than RT
Advent of GPUsAdvent of GPUs Processors exclusively for CG : Faster renderingProcessors exclusively for CG : Faster rendering Parallelism and pipeliningParallelism and pipelining Aggressive prefetching from memoryAggressive prefetching from memory
Examples of Scan Examples of Scan ConversionConversion
Poor lighting; More use of texture maps
A more memory-coherent RT algorithmcould improve things
Enough of intro…Enough of intro…
612 in CG!612 in CG!
Enhance locality in RT to avoid memory issuesEnhance locality in RT to avoid memory issues Take this! An image having 10 million primitives with Take this! An image having 10 million primitives with
400 MB geometry400 MB geometry Involved 2 GB of I/O! Took 5 hours of rendering with RT!Involved 2 GB of I/O! Took 5 hours of rendering with RT!
First paper in two lecture series : Pharr et al. First paper in two lecture series : Pharr et al. ((SIGGRAPH ‘97SIGGRAPH ‘97)) Lazy creation of texture and geometry to manage scene Lazy creation of texture and geometry to manage scene
complexity : complexity : Caching Caching in main memoryin main memory Increase locality of reference by dynamically Increase locality of reference by dynamically reorderingreordering
rendering computationrendering computation
Essential IdeasEssential Ideas Statically reorder geometry into voxels of trianglesStatically reorder geometry into voxels of triangles
Remember voxels? Uniform 3-D cubes enclosing some Remember voxels? Uniform 3-D cubes enclosing some geometrygeometry
Maintain geometry cacheMaintain geometry cache
Texture data pre filtered and cachedTexture data pre filtered and cached
Application-level cachingApplication-level caching
Process one bunch of rays after another (from queue)Process one bunch of rays after another (from queue) Rays partitioned into coherent groupsRays partitioned into coherent groups Calculate illumination wherever rays intersect, possibly spawn Calculate illumination wherever rays intersect, possibly spawn
new ones and queue themnew ones and queue them
Terminate if all rays finishedTerminate if all rays finished
Block diagram of systemBlock diagram of system
Scheduling of rays - Scheduling of rays - ReorderingReordering
Goal : To process rays in particular order so Goal : To process rays in particular order so as toas to Minimize cache misses (here, page faults)Minimize cache misses (here, page faults) Advance computation towards completionAdvance computation towards completion
Each queued ray to be independent of Each queued ray to be independent of result or state of other raysresult or state of other rays
Take advantage of the illumination Take advantage of the illumination computationcomputation
Decompose ComputationDecompose Computation
Illumination computation at point x in Illumination computation at point x in direction w1 is of the form:direction w1 is of the form: Lo(x, wr) = Le(x, wr) + Lo(x, wr) = Le(x, wr) + ΣΣ W(x, wi, wr, W(x, wi, wr, ΘΘi) Li(x, wi)i) Li(x, wi)
WhereWhere Lo = Outgoing radianceLo = Outgoing radiance Le = Emitted radianceLe = Emitted radiance Li = Incoming radiance through direction wi hitting at xLi = Incoming radiance through direction wi hitting at x ΘΘi = Angle between wi and surface normal at xi = Angle between wi and surface normal at x W is a factor that depends on the material of x and whether there W is a factor that depends on the material of x and whether there
is reflection or refractionis reflection or refraction
We can successively multiply the W’s as We can successively multiply the W’s as we go down the tree!we go down the tree!
Decompose ComputationDecompose Computation
Each ray associated Each ray associated with weight and with weight and source pixel locationsource pixel location
Spawned ray’s weight Spawned ray’s weight multiplied by weight of multiplied by weight of parent rayparent ray
If ray hits light source If ray hits light source weight multiplied and weight multiplied and result added to source result added to source pixelpixel
W1
W1.W2
W3
W3.W4
W3.W5
Ray GroupingRay Grouping Closely spaced rays likely to intersect closely spaced Closely spaced rays likely to intersect closely spaced
geometry primitivesgeometry primitives
Scene uniformly divided into another grid of voxels : Scene uniformly divided into another grid of voxels : scheduling gridscheduling grid
Each voxel has following stateEach voxel has following state Queue of rays passing through it Queue of rays passing through it The geometry voxels overlapping itThe geometry voxels overlapping it
Voxel with highest ratio of benefit to cost chosen by Voxel with highest ratio of benefit to cost chosen by scheduler scheduler
For each ray in queue, test for intersection in voxelFor each ray in queue, test for intersection in voxel If yes, calculate illumination and spawn new raysIf yes, calculate illumination and spawn new rays Else, queue it up in next voxelElse, queue it up in next voxel
The algorithmThe algorithm
Issues : Size of scheduling Issues : Size of scheduling voxelvoxel
Scheduling voxel : small enough for overlapping Scheduling voxel : small enough for overlapping geometry voxel to fit into memorygeometry voxel to fit into memory
Non-uniform geometry : Can use adaptive Non-uniform geometry : Can use adaptive subdivision (octree)subdivision (octree)
Avoid geometry cache misses (page faults)Avoid geometry cache misses (page faults) Schedule voxels that have all geometry in cacheSchedule voxels that have all geometry in cache Defer processing rays that don’t have geometry in cacheDefer processing rays that don’t have geometry in cache Lots of rays then : Have ray cache as wellLots of rays then : Have ray cache as well
Issues : Voxel SchedulingIssues : Voxel Scheduling
Choose voxel with highest ratio of benefit to costChoose voxel with highest ratio of benefit to cost
Cost :Cost : How much overlapping geometry not in cacheHow much overlapping geometry not in cache Difficult to estimate apriori if lazy accessDifficult to estimate apriori if lazy access Reduce cost a lot (by 90%) if all geometry in cacheReduce cost a lot (by 90%) if all geometry in cache
Benefit :Benefit : How much towards completionHow much towards completion Number of rays , their weights?Number of rays , their weights? The weighted sum?The weighted sum?
Scene cacheScene cache
Geometry represented as mesh of Geometry represented as mesh of trianglestriangles Even spheres, cubes..!Even spheres, cubes..! For ease of sub dividing into voxelsFor ease of sub dividing into voxels Only one kind of intersection testOnly one kind of intersection test
Storage of geometry:Storage of geometry: ΔΔgle meshes stored as voxels in diskgle meshes stored as voxels in disk Tessellated patches also as trianglesTessellated patches also as triangles Procedurally generated geometryProcedurally generated geometry Texture-based data stored as extra geometryTexture-based data stored as extra geometry
Scene cacheScene cache
Size of geometry cache in main memorySize of geometry cache in main memory Make volume of voxel roughly equal to size of blockMake volume of voxel roughly equal to size of block Few thousands of Few thousands of ΔΔgles per voxelgles per voxel Divided into sub-voxel for ray intersection Divided into sub-voxel for ray intersection
accelerationacceleration
Remember voxels may not occupy same spaceRemember voxels may not occupy same space To avoid fragmentation special allocation routines To avoid fragmentation special allocation routines
writtenwritten
Texture CacheTexture Cache
Similar to one proposed earlier by PeacheySimilar to one proposed earlier by Peachey
Texture data pre-filtered into set of multi Texture data pre-filtered into set of multi resolution imagesresolution images Choose image depending on resolution of textureChoose image depending on resolution of texture Called Called mip-mapsmip-maps
Shading calculation of a pixel makes a small Shading calculation of a pixel makes a small number of accesses to some local part of number of accesses to some local part of texturetexture
Block DiagramBlock Diagram
ResultsResults
Experiments performed on 190 MHz Experiments performed on 190 MHz MIPS R10000 processor with 1 GB of MIPS R10000 processor with 1 GB of memorymemory
I/O buffering disabled to increase I/O buffering disabled to increase memory constraintsmemory constraints
Scenes occupy between 431 MB and Scenes occupy between 431 MB and 1.9 GB1.9 GB
Rendered scenes – Tree by Rendered scenes – Tree by lakelake
Maximum of 3.3 million triangles for the treeMaximum of 3.3 million triangles for the tree Terrain and lake used displacement mapping : Terrain and lake used displacement mapping :
more number of trianglesmore number of triangles Total of 9.6 million primitives : 440 MB neededTotal of 9.6 million primitives : 440 MB needed 677 X 288 resolution677 X 288 resolution
Rendered scenes – Office Rendered scenes – Office buildingbuilding
Very complex scene with dense occlusionsVery complex scene with dense occlusions Office Building has two floors with four officesOffice Building has two floors with four offices 46.4 million primitives with 1.9 GB of memory46.4 million primitives with 1.9 GB of memory Lit by sunlight and some lights in ceilingLit by sunlight and some lights in ceiling 672 X 384 resolution672 X 384 resolution
Rendered Scenes - Rendered Scenes - CathedralCathedral
Base 11K Triangles; With displacement map : 5.1 Base 11K Triangles; With displacement map : 5.1 million primitives! A total of 431 MBmillion primitives! A total of 431 MB
576 X 864 resolution576 X 864 resolution 1495 texture maps of 116 MB!1495 texture maps of 116 MB! Simple lighting sourceSimple lighting source
Caching but no reorderingCaching but no reordering
Unlimited cache size but Unlimited cache size but with lazy loadingwith lazy loading
Both memory and running Both memory and running time costs decreasetime costs decrease
22% memory use 22% memory use reduction in Cathedral case reduction in Cathedral case (not accessed)(not accessed)
Only 18% of total scene Only 18% of total scene accessed in indoor caseaccessed in indoor case
Obvious result!Obvious result!
Caching but no reorderingCaching but no reordering
Performance of Performance of geometry caching geometry caching when DFS Ray when DFS Ray tracing is usedtracing is used
Limited cache sizeLimited cache size
Performance Performance decrease not very decrease not very significantsignificant
Scheduling & ReorderingScheduling & Reordering
Rendering Lake sceneRendering Lake scene
Cache size of 10% of Cache size of 10% of maximum gives orders maximum gives orders of magnitude of magnitude performance gainperformance gain
Ray cache of 100K Ray cache of 100K rays (6% of total rays (6% of total number of rays)number of rays)
80% of scene memory
Scheduling and ReorderingScheduling and Reordering
Lake scene renderingLake scene rendering
Without reordering Without reordering and 325MB of and 325MB of geometry cache, 2.1 geometry cache, 2.1 GB of I/O!GB of I/O!
With reordering and With reordering and 50 MB cache, 938 MB 50 MB cache, 938 MB in totalin total
Accessed 15-20 times
Average access = 8 times
ConclusionsConclusions
Enhance locality in RT through caching and reorderingEnhance locality in RT through caching and reordering
Gives orders of magnitude performance gainGives orders of magnitude performance gain
Algorithm performs well!Algorithm performs well! Ideas not very seminal.. But the work is!Ideas not very seminal.. But the work is!
Future work : Experiments could be redone on the IBM Cell Future work : Experiments could be redone on the IBM Cell Processor to confirm the bottlenecksProcessor to confirm the bottlenecks Designed for PlayStation3Designed for PlayStation3 4.6 GHz specialized graphics processors…4.6 GHz specialized graphics processors…
Next lecture : A static method to perform data groupingNext lecture : A static method to perform data grouping