Foundations of Parallel Computing - Computer Scienceark/654/team/2/report.pdf · Project Report Ray Tracing Advait Bhatwadekar, Siddhant Reddy December 5, 2018 1 Introduction When

Foundations of Parallel ComputingProject Report

Ray Tracing

Advait Bhatwadekar, Siddhant Reddy

December 5, 2018

1 Introduction

When rendering any computer generated 3-D environment, it is often, if notalways required for the environment to look realistic to the human eye. Toachieve this goal it only makes sense that we simulate the behaviour of lightsince it is light that we detect through our eyes. Ray tracing is a technique ofapproximating the behaviour of light. As we’ll see like most computer graphicsrelated techniques Ray tracing is parallelizable, in fact it is classified as anembarrassingly parallelizable problem.

2 Algorithm

1. Firstly we have to define a 3-D environment as a set of graphics primi-tives(triangles and circles).

2. Then we define the position of the camera or the person looking at thescene and where the center of focus of the camera. i.e. What is the cameradirectly looking at.

3. We perceive our 3-D environment in 2-D, this means that if we want todisplay the 3-D scene we have to project onto a 2-D plane. This plane iscalled as the projection plane. Implementation wise this projection planeis a 2-D array which is a placed at a certain distance from the camera(focallength). The focal length along with the dimensions of the plane determinethe viewing angle and how much of the environment is visible.

4. Now every cell in the projection plane has a location in the virtual envi-ronment. The location of a cell and the camera loaction determine theray direction. The ray is our approximation of light.

5. We then check which objects in the environment does the ray intersectswith, and then we check if the intersection point is illuminated by a lightsource. Then we set the color of the cell through which the ray passed, asthe color of the object based on its material properties.

1

6. Repeat steps 4 and 5 for all cells in the projection plane.

7. The colors set for each cell in the 2-D projection plane is then convertedto an image and is basically an approximation of the scene would look inthe real world.

The parallelism in this algorithm comes from the fact that the computationof different cells are completely, independent to each other. This means thatcomputation of each cell can be given to a separate thread. Making this algo-rithm parallelizable not only on a CPU but also on a GPU, since more than1000 threads may easily be invoked.

3 Optimizations

3.1 Computation Optimizations

The check of ray object intersection is done through a linear search of all objectspresent in the world and then checking for intersection with the object. This isnot optimal approach since the direction of ray is not taken into considerationto eliminate objects that may not intersect with the ray.

This is where K-d tree comes into picture. K-d trees is a space partition-ing data structure. It takes into consideration the position of the objects andthereby partitions the space in such a way that there are equal number of objectsin all partitions. This when combined with the ray direction can eliminate uptohave the objects in the environment per iteration of search. This brings the in-tersection time complexity for O(n) to O(log2n). Allowing scenes with millionsof objects to be rendered much faster. This data structure is also crucial in aRay Tracing quality Optimization.

2

3.2 Photon Mapping

Although vanilla ray tracing can produce quite impressive results it still doesn’tproduce very realistic rendering. So we perform a pre-processing step whichessentially attempts to simulate the particle behaviour of light before we performinitial ray tracing. This process is called as photon mapping.

Simply said this involves emitting a large number of photons and then theintersection of photons and surfaces are recorded in a photon map. Implemen-tation wise the photon map is a K-d tree. This procedure effectively simulatesphenomena such as caustics.

Figure 1: Caustics simulated using photon mapping courtesy:wikipedia.org

4 Literature Survey

4.1 Parallelizing K-d tree construction on a GPU

K-d trees are usually constructed sequentially, since ray tracing is done for oneimage. But this is a sequential component of the program and would there-fore reduce the efficacy of parallelizing ray tracing. To reduce this sequentialcomponent an algorithm was suggested in [kun].

The sequential construction involves, given a bounding box containing allobjects.

1. computing a splitting plane. That splits the bounding box in 2. thebounding box may not be split in half.

3

2. sorting the objects based on the splitting plane.

3. recursively repeat the above steps for the bounding boxes gotten by split-ting the bounding box. Till the bounding box contains a certain numberof objects inside it.

For any task to take advantage of a GPU, it should be massively parallel i.e.it should invoke atleast 1000 threads, to compensate for the overhead of passingdata from CPU memory to GPU memory.

So to parallelize the K-d tree construction we construct the K-d tree in aBFS fashion. Since a BFS uses a queue in the background. The threads cancreate child nodes and push it in the queue and then pop nodes from the samequeue to and perform the same operation.

However at the beginning there is only 1 nodes so it is not optimal to invokethreads per node in kd-tree that have been computed. Instead the authorsClassified the nodes into 2 classes, large nodes and small nodes. In the beginningof the tree construction there are few nodes but each node has large numberof objects(large nodes). So when a large node is encountered mutiple threadsare invoked to sort the objects into the child nodes. As we move further thenumber of nodes at higher depth increases but number of objects inside eachnode decrease(small nodes). At this point threads can be invoked per node ofkd-tree. This basic classification of nodes allows a large number of threads tobe invoked in the entire K-d tree production task. This optimisation allowscomputation of about 12 ray traced frames per second.

4.2 Progressive Photon Mapping

Photon Mapping is a technique to approximate behaviour of light and as thenumber of photons are increased, the realism in the rendered images also in-creases. However, greater number of photons would mean higher memory us-age. [Jensen1] suggests progressive photon mapping. In this method we performphoton mapping in multiple iteration. So if we perform photon mapping with10,000 photons for a 1000 iterations we have effectively done photon mappingwith 10,000,000 photons while using the memory of only 10,000 photons. Al-though this method is slower, this method ensures that memory size isn’t aconstraint on the quality of rendering. The memory usage becomes even morevital in GPUs since a GPU works best when most if not all the data it needs toaccess is within its memory.

Figure 2: Increasing fidelity with more photons

4

4.3 Parallel Progressive Photon Mapping

Progressive photon mapping reduces the memory usage however if each of theiterations were to be performed parallely, we would negate the advantage of us-ing parallel progressive photon mapping. To fix this issue [Jensen2] recommendsa new data structure, which is a spatial hash map. In a spatial hash map theray tracing step is performed first in order to find points of intersection of rayand objects. These points of intersection become key of the spatial hash map.Now instead of using a kd-tree to store photon maps. The intersection pointis found near the position of photon-surface interaction within a certain radiusand instead of storing all photons for the intersection point. we simply selectone of the photons with 1/n probability while increasing the power of the se-lected photon by a factor of n. This keeps the memory usage at bay even whenmultiple threads are used in progressive photon mapping, finding intersectionin O(1) time instead of O(log2n).

5 Sequential Ray Tracing

In the sequential ray tracing the procedure followed is the same as what wasdescribed in section 2. Here the environment is hard coded in the program. Asthe cells of the projection plane are calculated row-wise, each row is added tothe image queue and once ray tracing is complete, the image queue writes itsdata on image file.

6 Parallel Ray Tracing

6.1 Single node Parallel

In single node parallel since we have 2 for-loops in the program with no sequen-taial dependency we can parallelize the outer for loop as a parallelFor whileusing a leapfrog schedule. This keeps all the threads computing one region attime together. So a dense region is computed together by all the threads.

But we are doing file I/O along with computations so we keep a dedicatedthread that performs file I/O. This is done by wrapping the parallel for with aparallel do, where parallel For computes in one of its sections while in the secondsection, there is a writer thread that takes in the image queue and writes it tothe respective file as it gets data.

6.2 Cluster Parallel

Cluster parallel computers are much more complicated than single node parallelcomputers, since they have front-end job assigner node with backend workernodes. So a file that is available in front-end node has to be explicity transferredto the backend or vice-versa.

5

Figure 3: material properties simulated.

Cluster parallel program performs the same task as single node parallel pro-gram however it does not take into consideration the material properties of theobject in the 3-D scene. Causing the end result to be not as realistic. Theoutput it achieves is the output showcased in presentation 3.

The cluster program runs pretty much similar to the MandelBrotClu exam-ple program in the pj2 library. The 3-D scene is created in the job processand then put in tuple space. The size of the projection plane is given throughcommand line. The projection plane is split row-wise in masterFor in leapfrogschedule. In worker Tasks the workerFor does row by row computation of finalimage. putting the data of each row in tuple space after it computes it.

The Reducer Task running in job node collects all the rows in the tuple spaceand consolidates them to form the final image.

7 scaling

The testing was done on core-i7 quad core dual hyperthreaded processor(8threads).

7.1 Strong Scaling

The Strong Scaling performance of the program was tested on projection plane ofresolution 2160x2160 using the single node parallel ray tracing program.Overall,the program shows non-ideal scaling up to 4 cores. On using more than 4 cores

6

Figure 4: without material properties simulated.

the performance outright tanks, with speedups less than what was observed for4 cores for a few cases.

7.2 Weak Scaling

The weak scaling performance was tested by increasing the cells in the projectionplane from a little under 250x250 to about 3000x3000 while increasing the threadcount. This increase in resolution or number of cells can be thought of as a wayof shrinking the size of the cells and thereby sharpening the image. Again thewe observe non-ideal scaling till 4 cores, but size up gets even worse for morethan 4 cores.

7.3 Reason for non-ideal scaling

The main reason behind the non-ideal scaling is that the task involves file I/Oand for the scaling operation we perform the amount of file I/O increases as weincrease the problem size. Secondly since all the threads are perfoming the sametask invoking more threads than physical cores might be causing contention forfunctional units inside the processor causing sub-par speed up and size up.

8 Future Work

Implementing material based computation in cluster parallel program. Thenshifting the project from cpu parallel, processing to GPU parallel processing

7



8

where in the ray-tracing is performed in CUDA. Since the calculations involvedare not too complicated. The speed up on a GPU should be quite consider-able.Furthermore we intend to implement photon mapping on top of the raytracing that has already been done. The supporting data structure has alreadybeen implemented (K-d tree), it has to be tested and then absorbed into themain program.

9 Learning Experience

The more complex the computer we work with, the more rigorously codingstandards must be followed so the computer works in a predictable manner. RayTracing is an inherently unbalanced task which may involve reflective objectsthat require extra computation or have regions that have a large number ofobjects while at the same time having regions that are sparsely populated. Thisproblem is ameliorated upto some extent by using an appropriate schedule.Lastly it is an excellent practice to treat all variables as immutable, whereinstead of changing the state of an object we make a copy change the state ofth copy and then copy it back.

10 Contributions

Advait and Siddhant came up with the initial code responsible for the resultshown on 3rd presentation.Later, the works got diverged where Advait workedon simulating material property of objects while Siddhant built cluster parallelversion of the same program for testing on tardis. We used single node paralleland sequential programs that ended up simulating material properties of objectfor scaling performance tests.

11 Reference

Kun Real-time KD-tree construction on graphics hardware (2008)by Kun Zhou, Qiming Hou, Rui Wang, Baining Guohttps://dl.acm.org/citation.cfm?id=1409079

Jensen1 Progressive Photon mapping (2008) by Hachisuka, Okagi and Jensenhttps://dl.acm.org/citation.cfm?id=1409083

Jensen2 Parallel Progressive Ray Tracing (2010) by Hachisuka, Jensenhttps://dl.acm.org/citation.cfm?id=1900004

1. A Practical Guide to Global Illumination using Photon Maps Siggraph2000 Course 8Henrik Wann Jensen

9

Documents

Foundations of Parallel Computing - Computer Scienceark/654/team/2/report.pdf · Project Report Ray Tracing Advait Bhatwadekar, Siddhant Reddy December 5, 2018 1 Introduction When