CPU vs. GPU presentation

Shortest Path Algorithms Application to Traffic

Assignment Problem Comparing Central

Processing

Unit (CPU) vs. Graphical Processing Unit (GPU)

Vishal SinghDepartment of Computer Science & Engineering

University of Texas-ArlingtonArlington, TX

Advisor: Dr. Srinivas PeetaMentor: Dr. Xiaozheng He & Mr. Amit Kumar

NEXTRANS Center/Department of Civil Engineering

Purdue UniversityWest Lafayette, Indiana

Traffic Assignment Problem A historical problem which over the course of the past

five decades has been addressed through a number of different iterative algorithms [3].

It is the fourth phase of the classical urban transportation planning system model following: Trip Generation, Trip Distribution, and Mode Choice [4].

Figure 1: The Urban Transportation Model System. Source: Pas (1995, p.65). Copyright 1995 by The Guilford Press.

Traffic Assignment Problem(TAP) To estimate the volume of traffic on the links of the

network

To provide estimates of travel costs between trip origins and destinations.

To identify heavily traveled or congested arcs (links) as well as the routes used between each origin-destination (O-D) pair.

Traffic Assignment Problem(TAP)

The optimal goal for TAP is User Equilibrium which is

based on minimizing the travel time of individual users

[3].

User Equilibrium

User Equilibrium is achieved when there no

alternative in path choice that is available for

drivers to improve one’s travel time [2].

Every used route connecting an origin and destination has equal and minimal travel time.

Route 1 vs. Route 2

Figure 1: Intersection showcases the point where the User Equilibrium is satisfied [2].

Figure 2: NO intersection means that Path 2 is a faster alternative compared to Path 1 [2].

User Equilibrium

Slope-based MultiPath Algorithm

Several approaches have been established to solve TAP

Gradient projection(GP) algorithm of Jayakrishnan

Frank-Wolfe(F-W) algorithm

Origin-based algorithm(OBA)

SMPA seeks to move path costs towards the average cost for an O-D pair at each respective iteration.

Flow Update MechanismFigure 3: At each

iteration, the flow

update seeks to

reduce the costs of

costlier paths and

bring them

to the average cost

(Cav) for the O-D

pair and aims to

increase the costs

of the cheaper paths

to a value μ [3].

Costlier paths

Cheaper paths

What is GPU Computing?

GPU computing is the use of a GPU (graphics processing

unit) together with a CPU to accelerate general-purpose

scientific and engineering applications.

CUDA

CUDA is the language for GPU computing

It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).

Good for lots of computations and heavy data sets.

Tailored for engineering simulation and massive data sets.

How is this beneficial to TAP problem

In transportation Engineering, simulations play a vital role in attaining data and network modeling.

In case of the Winnipeg network and Austin network, the data sets are so massive that implementation through CPU would take hours.

Whereas this is where GPU computing comes into play as it is efficient for massive data, where parallel computing is utilized to a greater extent.

Hardware of GPU has more ALU’s (Arithmetic Logic Units) than a typical CPU [6].

Better capability to process parallel arithmetic operations, meaning same operations is performed on different data sets.

GPU and CPU Architecture

CPU + GPU CPUs consist of a few

cores optimized for serial processing.

GPUs consist of thousands of smaller, more efficient cores.

Serial portions of the code run on the CPU while parallel portions run on the GPU.

CPU vs. GPU CPUs are designed for a wide variety of applications

and to provide fast response times to a single task.

Limited number of cores limits how many pieces of data can be processed simultaneously.

GPUs, whereas are built specifically for rendering and other graphics applications that have a large degree of data parallelism [2].

Larger number of cores makes its ideal for throughput computing.

CPU Implementation The CPU code for the shortest has been implemented

in C language as it is the most efficient in terms of computational speed.

Dijikstra’s algorithm is used to implement the shortest path, as this step is the most time consuming, which has been implemented successfully.

Constrains

But the algorithm faces bugs as there is memory management problems as well as a lack of data structure knowledge.

Not the best language in terms of my skill sets.

GPU coding Require more time to digest GPU CUDA programming

as the language is new in the market and there is limited number of resources.

Program written in CUDA are compiled by NVIDIA’s nvcc compiler and can be run only on NVIDIA’s GPU’s so in terms of implementation the restriction on the hardware limits the access for the programmer.

CPU vs. GPU comparison

Table 1: Simple implementation of the Floyd-Warshall all-pairs-

shortest-path algorithm written in two versions, a standard serial

CPU version and a CUDA GPU version [5].

On average the GPU time is 45X

faster!

Conclusion

The GPU aspect of shortest path algorithm has not yet been programmed in CUDA so the comparison between CPU vs. GPU is only partially satisfied.

Sample output on the Floyd-Warshall shortest path algorithm notions GPU speeds to be 45 times faster [5].

For smaller tasks, the GPU is not much faster than CPU as the overhead cost of data transfer is more than time saved by parallelization [6].

Many factors play a role in the large performance gap, with regards to which CPU and GPU are used and especially what optimizations are applied to the code on each platform [1].

What I learned essentially… The significance of C language has been more evident that ever

for me as it is clearly the most time efficient language but C is difficult to optimize due to its low-level nature, there are very few clues to the compiler as to where data structures and algorithms can be optimized or parallelized.

GPU computing is gaining momentum as in today’s age of massive data, parallel computation holds precedence. Will surely work on CUDA programming over the course of

Undergraduate studies

Data Structures is an area which I want to gain a strong grasp on as without a structure to data we cannot convert it into information.

My Doctorate Analogy Grad school is like an

isolated journey towards monkhood, as the student can be compared to the likes of Luke Skywalker.

With the advisor assuming the role of Yoda, the wise One.

References[1] Abhranil Das. Process Time Comparsion between GPU and CPU. High Performance

computing on graphics processing unit. Hamburg University. (July 2011), pp.1-11

[2] Jesse Gawling. CUDA Floyd Warshall. GitHub.com. Collaborative Revision Control. (March 2013). Web. (July 2013)

[3] R.A.Johnston. “The Urban Transportation Planning Process.” 2004. Book ch. for The Geography of Urban Transportation. Ed. by Susan Hanson and Genevieve Guiliano.

[4] Srinivas Peeta, Amit Kumar. Slope-Based Multipath Flow Update Algorithm for Static User Equilibrium Traffic Assignment Problem. Transportation Research Record: Journal of the Transportation Research Board, Vol. 2196. (Feburary 2010), pp. 1-10

[5] Stephen D. Boyles. User Equilibrium and System Optimum. https://webspace.utexas.edu/sdb382/www/teaching/ce392c/ueso.pdf

[6] Victor W. Lee, Changkyu Kim, Jatin Chhugani, Michael Deisher, Daehyun Kim, Anthony D. Nguyen, Nadathur Satish, Mikhail Smelyanskiy, Srinivas Chennupaty, Per Hammarlund, Ronak Singhal, Pradeep Dubey. Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU. SIGARCH Comput. Archit. News, Vol. 38, No. 3. (June 2010), pp. 451-460

Acknowledgements Srinivas Peeta, Ph.D.NEXTRANS Center DirectorPurdue UniversityProfessor of Civil Engineering

Xiaozheng "Sean" He, Ph.D.Research AssociatePurdue UniversityDepartment of Civil Engineering

Amit KumarDoctoral StudentPurdue UniversityDepartment of Civil Engineering

Kumer Pial Das, Ph.D.Lamar UniversityDepartment of Mathematics

Mamta Singh, Ph.D. (My Mum )Lamar UniversityDepartment of Teacher Education

Documents

CPU vs. GPU presentation