25
Parallel All-Points Shortest Paths ECE 563 - Spring 2013 Jason Holmes Bharadwaj Krishnamurthy Hector Rodriguez-Simmonds

Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

Parallel All-Points Shortest PathsECE 563 - Spring 2013

Jason HolmesBharadwaj KrishnamurthyHector Rodriguez-Simmonds

Page 2: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

Outline

•Overview•Sequential Code Development•Parallel Dijkstra•Parallel Floyd-Warshall•Results

Page 3: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

Overview• Tackled the All-Points Shortest Paths problem• Constructed graphs from real data (social networks, road

networks, etc.)• Wrote modification of Dijkstra’s Algorithm

• Better for sparse graphs• Wrote Floyd-Warshall Dynamic Programming Algorithm

• Less structural overhead• Can handle negative edge weights

• Developed parallel versions using OpenMP• Parallel Dijkstra: 7.6x speedup on 8 cores• Parallel Floyd-Warshall: ~6x speedup on 8 cores

Page 4: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

Sequential Code - graphCreate

#Input File<1, 2><1, 4><2, 5><3, 5><3, 6><4, 2><5, 4><6, 6>

buildGraph(Dijkstra)

buildGraph(FW)

Input Data Adj. List

Adj. Matrix

Page 5: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

Sequential Code - Dijkstragraph = (vertex **) buildGraphFromFile(argv[1],LIST, &numberOfVertices);

for (source = 0; source < numberOfVertices ; source++) { for (target = 0 ; target < numberOfVertices ; target++) {

vertex * VSource = returnVertex(graph, source); vertex * VTarget = returnVertex(graph, target); VSource->distance = 0; int dist = Dijkstra3(graph, VSource, VTarget, VSource->

number);initGraph(graph, numberOfVertices);

} }

Run Dijkstra’s single source algorithm V times

Page 6: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

Sequential Code - FW• Dynamic programming problem• Find the shortest path from i to j using only intermediate

nodes 1 to k-1• Once k reaches total number of nodes, we have the shortest

path from i to j

Page 7: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

Sequential Code - FWedge ** FW_direct (edge ** matrix,int v_count){

int i,j,k; edge ** max_node;

max_node = malloc(v_count*sizeof(edge *));for(i = 0;i < v_count;i++){….}for(k = 1;k < v_count;k++){

for(j = 0;j < v_count;j++){for(i = 0;i < v_count;i++){

if(matrix[i][j] > matrix[i][k] + matrix [k][j]){matrix[i][j] = matrix[i][k]+matrix[k][j];max_node[i][j] = k;

}}

}}return(max_node);

}

K loop cannot be parallelized!

Page 8: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

Bad Parallelization

i

j

Let K = 5

CORE 0 CORE 1 CORE 2 CORE 3

Page 9: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

Sequential Code - FW• Change the algorithm – use smaller blocks and deal with

dependencies

Page 10: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

Parallel Floyd-Warshall• Transformations

1. Parallel with tuned blocks2. Restructured parallel with nowait3. Manual balancing of workload distribution4. Parallelized computation of self dependent block5. Loop coalesced version of previous transformation

Page 11: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

Parallel Floyd-Warshall

i

j

Page 12: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

Parallel Floyd-Warshall

i

j

Page 13: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

1. Parallel With Tuned Blocks• Transformed from naïve OpenMP directives• Large block size reduces number of independent blocks that

can run in parallel• Small block sizes cut down on number of computations per

block• Optimum block size found to be ~20x20

• This is somewhat graph-size dependent

Page 14: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

2. Restructured with NOWAIT• Issue: Many separate loops can run in parallel for processing

different blocked types• Most for loops combined into one OMP parallel construct

• Eliminates multiple fork/join (wakeup/sleep) operations• Intermediate serial sections handled by OpenMP master• NOWAIT clause added to loops where correctness would not

be violated

Page 15: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

3. Redistribute Workload• Issue: Self dependent block migrates as k varies, workload

becomes unbalanced• Using various scheduling options (guided, dynamic) decreased

performance• Hence, manually restructured the loops to balance workload

Page 16: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

4/5. Loop Coalescing• 4. Parallelize internal loops of self-dependent blocks to

eliminate serialization• 5. Coalesce loops as number of iterations is small

#pragma omp for nowaitfor(i = block_ly;i < (block_ly + BLOCK_SIZE);i++){

for(j = block_lx;j < (block_lx + BLOCK_SIZE);j++){

if((i >= v_count)||(j >= v_count)||(k >= v_count)) continue;

if(submatrix[i][j] > (submatrix[i][k] + submatrix[k][j])){

submatrix[i][j] = submatrix[i][k] + submatrix[k][j];max_node[i][j] = k;

}}

}

for(k = start_k;k < (start_k + BLOCK_SIZE);k++){

#pragma omp for nowaitfor(ij = 0;ij < BLOCK_SIZE_SQ ;ij++){

i = (ij / BLOCK_SIZE) + block_ly;j = (ij % BLOCK_SIZE) + block_lx;

if((i >= v_count)||(j >= v_count)||(k >= v_count)) continue;

if(submatrix[i][j] > (submatrix[i][k] + submatrix[k][j])){

submatrix[i][j] = submatrix[i][k] + submatrix[k][j];max_node[i][j] = k;

}}

Normal Coalesced

Page 17: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

Parallel Dijkstragraph0 = (vertex **) buildGraphFromFile(argv[1],LIST, &numberOfVertices);

//Able to parallelize the very outer loop, compiler could not detect due to subroutine calls#pragma omp parallel {

vertex ** graphX = copyGraph(graph0, numberOfVertices); //Done X times for X threads#pragma omp for private (target) for (source = 0; source < numberOfVertices ; source++) {

for (target = 0 ; target < numberOfVertices ; target++) {

if (omp_get_thread_num() == X) { //Again X is thread numbervertex * VSource = returnVertex(graph0, source); vertex * VTarget = returnVertex(graph0, target); VSource->distance = 0; int dist = Dijkstra3(graph0, VSource, VTarget ,

VSource->number); initGraph(graph0, numberOfVertices);

}}

}}

Page 18: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

Parallel Dijkstra

Copy Graph

Process N/X single source

shortest paths

Copy Graph

Process N/X single source

shortest paths

Copy Graph

Process N/X single source

shortest paths

Copy Graph

Process N/X single source

shortest paths

Build Graph

• Outer loop parallelized, each thread executes Dijkstra’salgorithm with N/X source vertices (X # cores)

• Each thread retains a copy of the graph to modify

Page 19: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

Results - FW

00.5

11.5

22.5

33.5

44.5

5

Sped

up

Program Version

Floyd-Warshall Speedup – Input Graph 1• Graph 1

• 493 vertices• 1189 edges

• Final Speedup: 4.93 on 8 cores

Page 20: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

Results - FW

01234567

Sped

up

Program Version

Floyd-Warshall Speedup – Input Graph 1• Graph 2

• 767 vertices• 1795 edges

• Final Speedup: 6.66 on 8 cores

Page 21: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

Results - FW

012345678

Sped

up

Program Version

Floyd-Warshall Speedup – Input Graph 3• Graph 2

• 5,242 vertices• 28,980 edges

• Final Speedup: 7.66 on 8 cores

Page 22: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

Results - FW

0123456789

1 2 4 8

Spee

dup

Speedup vs. # of Cores

Graph 1Graph 2Graph 3

• More parallelism exploited for larger graphs

Page 23: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

Results - Dijkstra

7.667.68

7.77.727.747.767.78

7.87.82

Graph 1 Graph 2 Graph 3

Parallel Dijkstra Speedup on 8 Cores

• Near linear speedup due to outer loop parallelization• As graph size increases less graph build and copy overhead

Page 24: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

Future Work / Improvements• Utilize Mapreduce for huge graph input sets• Covert to MPI for Floyd-Warshall to deal with memory issues

on one machine• Port to map API to view shortest path information on a GUI

• OpenStreetMap• Add mechanisms to detect sparsity, negative edge weights and

call appropriate routines

Page 25: Parallel All -Points Shortest Paths - Purdue Engineeringeigenman/ECE563/Project... · 2013-04-23 · Overview • Tackled the All-Points Shortest Paths problem • Constructed graphs

Questions?