COSC 6374 Parallel Computation Graph Algorithmsgabriel/courses/cosc6374_s08/ParCo_14...0,0 needs 1,0 1,0 needs 1,0 2,0 needs 1,0 COSC 6374 –Parallel Computation Edgar Gabriel Parallelizing

1

Edgar Gabriel

COSC 6374

Parallel Computation

Graph Algorithms

Edgar Gabriel

Spring 2008

COSC 6374 – Parallel Computation

Edgar Gabriel

Definitions and Representations (I)

• A graph G is a pair (V,E) with

– V: finite set of vertices

– E: finite set of edges between pairs of vertices

• Directed vs. undirected graphs

– Undirected: an edge e �E is an unordered pair (u,v)

– Directed: an edge e �E is an ordered pair (u,v) (connection from u to v does not imply connection from v

to u)

1

2

3

4 1

2

3

4

2


Edgar Gabriel

Definitions and Representations (II)

• A path from a v to u is a sequence of vertices ‹v0, v1,

v2,…,vk › with v0 =v and vk =u and (vk, vk+1)�E

• The length of the path is defined as the number of edges

in the path

• If there is a path from v to u, then v is reachable from u

• A path forms a cycle if its starting and ending are the

same

• A graph without cycles is called acyclic

• An undirected graph is connected, if every pair of

vertices is connected by a path

• G’=(V’,E’) is a subgraph of G=(V,E) if V’ V and E’ E

⊆ ⊆


Edgar Gabriel

Definitions and Representations (III)

• Weighted Graphs G=(V,E,ω):

– Weights are associated with each edge in E

– Weights ω are real numbers representing costs or

benefits of traversing the associated edge

1

2

3

4

3

5

8

3


Edgar Gabriel

Representation of Graphs in

Computer Programs

• Adjacency matrix A = (aij) such that

• Note: the adjacency matrix of an undirected graph is

symmetric

=

∈

∞

=

otherwise

ji

Evv

if

ifvv

a

jiji

ij

),(

0

),(ω


Edgar Gabriel

Examples for adjacency matrix

0

1

2

3

3

5

8

→

∞∞∞

∞∞∞

∞∞

∞

0

0

80

530

0

1

2

3

3

5

8

→

∞∞

∞∞

∞

∞

08

05

803

530

4


Edgar Gabriel

All-pair shortest-path

• All-pair shortest-path:

– find the length of the shortest path between each pair of

vertices

• Floyd’s Algorithm

– Transforms the adjacency matrix into a matrix containing

the shortest path between all pairs of vertices

– Checks in iteration k, whether path between edges i and

j is shorter if going through vertex k than the currently

stored shortest path between i and j

– O(n3)


Edgar Gabriel

Floyd’s Algorithm - Example

∞

∞

∞

∞

013

102

301

210

0

1

2

3

1

2 1

k=0

∞

∞

013

1032

3301

210

k=1

0134

1032

3301

4210

k=2

0133

1032

3301

3210

3

k=3

0133

1032

3301

3210

→

5


Edgar Gabriel

Floyd’s Algorithm

• Sequential algorithm– Input

• A: adjacency matrix

• n: number of vertices

for k=0,n-1

for i=0,n-1

for j=0,n-1

A[i,j]= min(A[i,j],A[i,k]+A[k,j])

end for

end for

end for


Edgar Gabriel

Parallelizing Floyd’s Algorithm (I)

• Data Parallel Problem – the same operation applied to

different data items

• Initial guess: consider one element of the adjacency

matrix on a separate processor

• E.g. for k=1update of element A[3,4] requires A[3,1]

and A[1,4]

x

y

6


Edgar Gabriel

Parallelizing Floyd’s Algorithm (II)

• Generalizing:

– At iteration k, every task in row k has to broadcast its

value to all processes in the same column

e.g k=1, A[i,j] needs A[i,1]

and

- At iteration k, every task in column k needs to broadcast

its value to all processes in the same row

e.g. k=1, A[i,j] needs A[1,j]

i , j i,k

0,0 needs 0,1

0,1 needs 0,1

0,2 needs 0,1

i , j k, j

0,0 needs 1,0

1,0 needs 1,0

2,0 needs 1,0


Edgar Gabriel

Parallelizing Floyd’s Algorithm (III)• All processes have a unique coordinate (cx,cy), which is

identical to the position of the element in the matrix,

which they own

• Create row-wise subgroups, e.g. MPI

• … and similarly

column-wise subgroups

Given A, n, cx,cy

MPI_Comm_rank (MPI_COMM_WORLD, &rank);

MPI_Comm_split (MPI_COMM_WORLD, cy , rank, &rowcomm);

7


Edgar Gabriel

Preliminary algorithm

Given Axy, n, cx,cy

int rank, rowtemp, coltemp;

MPI_Comm rowcomm, colcomm;

MPI_Comm_rank (MPI_COMM_WORLD, &rank);

MPI_Comm_split (MPI_COMM_WORLD, cy , rank, &rowcomm);

MPI_Comm_split (MPI_COMM_WORLD, cx , rank, &colcomm);

for (k=0; k<n; k++ ){

if (cx == k ) rowtemp= Axy ;

MPI_Bcast (rowtemp, 1, MPI_INT, k, rowcomm );

if (cy == k ) coltemp = Axy;

MPI_Bcast (coltemp 1, MPI_INT, k, colcomm );

Axy =min ( Axy, rowtemp+coltemp );

}


Edgar Gabriel

A more realistic data decomposition

• 1-D column wise data distribution

– Each process holds a column of the adjacency matrix

– No need to broadcast data in column-communicator

• No need to create column-wise communicators

• No need to use sub-communicators at all!

8


Edgar Gabriel

Preliminary algorithm (IIa)

Given Ax0… Axn n, cx,

int temp[n];

/* note: rank in MPI_COMM_WORLD = cx */

for (k=0; k<n; k++ ){

if (cx == k ) {

for (i=0; i<n; i++ ) {

temp[i]= Axi ;

}

}

MPI_Bcast (temp, n, MPI_INT, k, MPI_COMM_WORLD );

for (i=0; i<n; i++) {

Axi =min ( Axi, Axk +temp[i] );

}

}


Edgar Gabriel

Preliminary algorithm (IIb)

Given Ax0… Axn n, cx,

int temp[n];


for (k=0; k<n; k++ ){

for (i=0; i<n; i++) {

if (cx == k ) {

temp[i]= Axi ;

}

MPI_Bcast (&temp[i], 1, MPI_INT, k, MPI_COMM_WORLD );

Axi =min ( Axi, Axk +temp[i] );

}

}

9


Edgar Gabriel

An even more realistic data decomposition

• Each process holds a certain number of columns, e.g. nx

• Thus, each process is the owner of the columns

rank* nx to [(rank+1)* nx]-1

• The owner of the column k is the process with the rank

r=[floor(k/nx)]

• Mapping of global to local indices: column k of the

global matrix is column s in the local matrix of process

r with s=k%nx


Edgar Gabriel

Preliminary Algorithm (III)

Given A[n][nx] n, cx,

int temp[n]


for ( k = 0; k< N; k++ ) {

root = floor(k/nx);

if ( root == rank ) {

for (i=0; i<n; i++ ) {

temp[i] = a[i][k%nx];

}

}

MPI_Bcast ( temp, n, MPI_INT, root, MPI_COMM_WORLD);

for (i=0; i<n; i++ ) {

for ( j=0; j<nx; j++ ) {

a[i][j] = min(a[i][j], (temp[i]+a[k][j]));

}

}

}

10


Edgar Gabriel

2-D data decomposition

• Each process holds a block of the adjacency matrix, e.g

(nx,ny) elements

• Need to re-introduce row and column-wise sub-

communicators


Edgar Gabriel

Preliminary algorithm (IVa)Given A, n, nx, ny

int coltemp[ny], rowtemp[nx]

px = n/nx; py = n/ny;

cx = rank %px;

cy = floor (rank/py);

/* Generate subcommunicators */

MPI_Comm_split (MPI_COMM_WORLD, cx, rank, &colcomm);

MPI_Comm_split (MPI_COMM_WORLD, cy, rank, &rowcomm);

for ( k = 0; k< n; k++ ) {

rootx = floor(k/nx);

if ( rootx == cx ) {

for (i=0; i<ny; i++ )

coltemp[i] = a[i][k%nx];

}

MPI_Bcast ( coltemp, ny, MPI_INT, rootx, rowcomm);

11


Edgar Gabriel

Preliminary algorithm (IVb)

rooty = floor(k/ny);

if ( rooty == cy ) {

for (i=0; i<nx; i++ )

rowtemp[i] = a[k%ny][i];

}

MPI_Bcast ( rowtemp, nx, MPI_INT, rooty, colcomm);

for (i=0; i<ny; i++ ) {

for ( j=0; j<nx; j++ ) {

a[i][j] = min (a[i][j],(coltemp[i]+rowtemp[j]));

}

}

}


Edgar Gabriel

Minimum Spanning Tree

• Minimum Spanning Tree:

– Spanning tree: undirected Graph G’ being a subgraph of G

containing all vertices

– Minimum spanning tree: spanning tree with minimum

weight

1

2

3

4

3

5

8

5

2

4

4

41

2

3

4

3

5

2

4

4

12


Edgar Gabriel

Prim’s Algorithm

Given G=(V,E,ω) and an arbitrary vertex r

VT = {r}

d[r] = 0

for all v �(V-VT) do

d[v] = w(r,v);

while ( VT V ) do

find vertex u such that d[u]=min(d[v],v �(V-VT));

VT = VT {u};

for all v �(V-VT) do

d[v] = min (d[v], w(u,v));

end while

∪


Edgar Gabriel

Prim’s Algorithm (II)

• VT: vector containing the vertices already added to the

spawning tree

• (V-VT): set of vertices, which have not yet been

added to the spanning tree.

• d: distance vector, e.g. d[i] contains the minimum

weight of vertex i to any vertex in the spanning tree

13


Edgar Gabriel

Example (I)

1

2

3

55

1

3

4

5

4

2

0

1 3

1

∞∞∞

∞∞

∞∞

∞

∞∞

∞∞

052

5041

4021

12053

1501

3310

Arbitrary starting point r=1

1

2

3

55

1

3

4

5

4

2

0

1 3

1

[ ]∞∞= 1501d

[ ]111111 −−−−−=TV

-1:= undefined

node not considered in the following search,

since vertex is already in the spanning tree


Edgar Gabriel

Example (II)

1

2

3

55

1

3

4

5

4

2

0

1 3

1

[ ]111131 −−−−=TV

[ ]∞= 41201d

1

2

3

55

1

3

4

5

4

2

0

1 3

1

[ ]111031 −−−=TV

[ ]341201=d

14


Edgar Gabriel

Example (III)

1

2

3

55

1

3

4

5

4

2

0

1 3

1

[ ]112031 −−=TV

[ ]311201=d

1

2

3

55

1

3

4

5

4

2

0

1 3

1

[ ]142031 −=TV

[ ]311201=d


Edgar Gabriel

Example (IV)

1

2

3

55

1

3

4

5

4

2

0

1 3

1

[ ]542031=TV

[ ]311201=d

15


Edgar Gabriel

Sequential implementation (I)

Given A[N][N], N

u = 1; vt[vtcount++] = u;

for ( i=0; i<N; i++ ) {

d[i] = A[u][i];

}

for ( i=1; i<N; i++ ) {

u = find_u ( vt, vtcount, d );

vt[vtcount++] = u;

update_d (d, vt, vtcount, N, u, a );

}


Edgar Gabriel

Sequential implementation (II)

int find_u ( int vt[N], int vtcount, int d[N] )

{

int i, j, found;

int current_min=MY_INF, current_minloc=-1;

for ( i=0; i<N; i++ ) {

for (found = 0, j=0; j<vtcount; j++ ) {

if (i==vt[j]) found=1;

}

if (found) continue;

if ( d[i] < current_min) {

current_minloc = i;

current_min = d[i];

}

}

return current_minloc;

}

16


Edgar Gabriel

Sequential implementation (III)

void update_d ( int d[N], int vt[N], int vtcount,

int N, int u, int a[N][N] )

{

int i, j, found;

for ( i=0; i<N; i++ ) {

for ( found=0, j=0; j<vtcount; j++ ) {

if (i==vt[j]) found=1;

}


d[i] = min (d[i], a[u][i]);

}

return;

}


Edgar Gabriel

Parallel Algorithm 1 (I)

• Each process owns one column of the adjacency matrix

• Each process owns the according part of the distance

vector d

• VT is replicated on each process

• Only find_u and update_d need to be modified!

A

d

17


Edgar Gabriel

Parallel Algorithm 1 (II)

int find_u ( int vt[N], int vtcount, int d )

{

int min[2], gmin[2], i, rank;

MPI_Comm_rank ( MPI_COMM_WORLD, &rank );

min[0] = d;

min[1] = rank;

for ( i=0; i < vtcount; i++ ) {

if ( vt[i] == rank ) {

min[0] = MY_INF;

break;

}

}

MPI_Allreduce ( min, gmin, 1, MPI_2INT, MPI_MINLOC,

MPI_COMM_WORLD);

return gmin[1];

}


Edgar Gabriel

MPI_MINLOC and MPI_MAXLOC (I)

• Operators for reduction operations returning the minimum/maximum value and the process owning the minimum and maximum value

• Special MPI data types have to be used

– MPI_2INT: array of two integers.

– Element zero contains min/max value

– Element one contains the rank of the process owning minimal/maximal value

– MPI_FLOAT_INT structure consisting of a float and an int

struct {

float val;

int rank;

}

18


Edgar Gabriel

MPI_MINLOC and MPI_MAXLOC (II)

• Similarly:

– MPI_DOUBLE_INT,

– MPI_SHORT_INT,

– MPI_LONG_INT …

• Note:

– the rank in the second element has to be set by each

process for the input-values

– MPI guarantees, that each process will have the same

rank in the result-vector for the location of the

minimum/maximum, even if several processes have the

same minimal/maximal value.


Edgar Gabriel

Parallel Algorithm 1 (III)

void update_d ( int *d, int *vt, int vtcount, int u, int a[N] )

{

int i, rank;

MPI_Comm_rank ( MPI_COMM_WORLD, &rank);

for ( i=0; i < vtcount; i++ ) {

if ( vt[i] == rank ) {

return;

}

}

*d = min (*d, a[u]);

return;

}

19


Edgar Gabriel

Parallel Algorithm 2 (I)

• Each process owns a certain number of columns of the

adjacency matrix

• Each process owns the according elements of the

distance vector

• VT is replicated on each process

A

d


Edgar Gabriel

Parallel Algorithm 2 (II)

for ( i=0; i<nx; i++ ) {

for (found=0, j=0; j<vtcount; j++ ) {

if ((cx*nx+i)==vt[j]) found=1;

}


if ( d[i] < current_min) {

current_minloc = (cx*nx+i);

current_min = d[i];

}

}

min[0] = current_min;

min[1] = rank;

MPI_Allreduce ( min, gmin, 1, MPI_2INT,

MPI_MINLOC, MPI_COMM_WORLD);

MPI_Bcast ( &current_minloc, 1, MPI_INT, gmin[1],

MPI_COMM_WORLD );

return current_minloc;

20


Edgar Gabriel

Parallel Algorithm 2 (III)

void update_d ( int *d, int *vt, int vtcount, int u, int a[N] )

{

int i, j, rank, found;

for ( i=0; i<nx; i++ ) {

for ( found=0, j=0; j<vtcount; j++ ) {

if ((cx*nx+i)== vt[j]) {

found=1;

}

}


d[i] = my_min (d[i], a[u][i]);

}

return;

}

Documents

COSC 6374 Parallel Computation Graph Algorithmsgabriel/courses/cosc6374_s08/ParCo_14...0,0 needs 1,0 1,0 needs 1,0 2,0 needs 1,0 COSC 6374 –Parallel Computation Edgar Gabriel Parallelizing