Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
1
Edgar Gabriel
COSC 6374
Parallel Computation
Graph Algorithms
Edgar Gabriel
Spring 2008
COSC 6374 – Parallel Computation
Edgar Gabriel
Definitions and Representations (I)
• A graph G is a pair (V,E) with
– V: finite set of vertices
– E: finite set of edges between pairs of vertices
• Directed vs. undirected graphs
– Undirected: an edge e �E is an unordered pair (u,v)
– Directed: an edge e �E is an ordered pair (u,v) (connection from u to v does not imply connection from v
to u)
1
2
3
4 1
2
3
4
2
COSC 6374 – Parallel Computation
Edgar Gabriel
Definitions and Representations (II)
• A path from a v to u is a sequence of vertices ‹v0, v1,
v2,…,vk › with v0 =v and vk =u and (vk, vk+1)�E
• The length of the path is defined as the number of edges
in the path
• If there is a path from v to u, then v is reachable from u
• A path forms a cycle if its starting and ending are the
same
• A graph without cycles is called acyclic
• An undirected graph is connected, if every pair of
vertices is connected by a path
• G’=(V’,E’) is a subgraph of G=(V,E) if V’ V and E’ E
⊆ ⊆
COSC 6374 – Parallel Computation
Edgar Gabriel
Definitions and Representations (III)
• Weighted Graphs G=(V,E,ω):
– Weights are associated with each edge in E
– Weights ω are real numbers representing costs or
benefits of traversing the associated edge
1
2
3
4
3
5
8
3
COSC 6374 – Parallel Computation
Edgar Gabriel
Representation of Graphs in
Computer Programs
• Adjacency matrix A = (aij) such that
• Note: the adjacency matrix of an undirected graph is
symmetric
=
∈
∞
=
otherwise
ji
Evv
if
ifvv
a
jiji
ij
),(
0
),(ω
COSC 6374 – Parallel Computation
Edgar Gabriel
Examples for adjacency matrix
0
1
2
3
3
5
8
→
∞∞∞
∞∞∞
∞∞
∞
0
0
80
530
0
1
2
3
3
5
8
→
∞∞
∞∞
∞
∞
08
05
803
530
4
COSC 6374 – Parallel Computation
Edgar Gabriel
All-pair shortest-path
• All-pair shortest-path:
– find the length of the shortest path between each pair of
vertices
• Floyd’s Algorithm
– Transforms the adjacency matrix into a matrix containing
the shortest path between all pairs of vertices
– Checks in iteration k, whether path between edges i and
j is shorter if going through vertex k than the currently
stored shortest path between i and j
– O(n3)
COSC 6374 – Parallel Computation
Edgar Gabriel
Floyd’s Algorithm - Example
∞
∞
∞
∞
013
102
301
210
0
1
2
3
1
2 1
k=0
∞
∞
013
1032
3301
210
k=1
0134
1032
3301
4210
k=2
0133
1032
3301
3210
3
k=3
0133
1032
3301
3210
→
5
COSC 6374 – Parallel Computation
Edgar Gabriel
Floyd’s Algorithm
• Sequential algorithm– Input
• A: adjacency matrix
• n: number of vertices
for k=0,n-1
for i=0,n-1
for j=0,n-1
A[i,j]= min(A[i,j],A[i,k]+A[k,j])
end for
end for
end for
COSC 6374 – Parallel Computation
Edgar Gabriel
Parallelizing Floyd’s Algorithm (I)
• Data Parallel Problem – the same operation applied to
different data items
• Initial guess: consider one element of the adjacency
matrix on a separate processor
• E.g. for k=1update of element A[3,4] requires A[3,1]
and A[1,4]
x
y
6
COSC 6374 – Parallel Computation
Edgar Gabriel
Parallelizing Floyd’s Algorithm (II)
• Generalizing:
– At iteration k, every task in row k has to broadcast its
value to all processes in the same column
e.g k=1, A[i,j] needs A[i,1]
and
- At iteration k, every task in column k needs to broadcast
its value to all processes in the same row
e.g. k=1, A[i,j] needs A[1,j]
i , j i,k
0,0 needs 0,1
0,1 needs 0,1
0,2 needs 0,1
i , j k, j
0,0 needs 1,0
1,0 needs 1,0
2,0 needs 1,0
COSC 6374 – Parallel Computation
Edgar Gabriel
Parallelizing Floyd’s Algorithm (III)• All processes have a unique coordinate (cx,cy), which is
identical to the position of the element in the matrix,
which they own
• Create row-wise subgroups, e.g. MPI
• … and similarly
column-wise subgroups
Given A, n, cx,cy
MPI_Comm_rank (MPI_COMM_WORLD, &rank);
MPI_Comm_split (MPI_COMM_WORLD, cy , rank, &rowcomm);
7
COSC 6374 – Parallel Computation
Edgar Gabriel
Preliminary algorithm
Given Axy, n, cx,cy
int rank, rowtemp, coltemp;
MPI_Comm rowcomm, colcomm;
MPI_Comm_rank (MPI_COMM_WORLD, &rank);
MPI_Comm_split (MPI_COMM_WORLD, cy , rank, &rowcomm);
MPI_Comm_split (MPI_COMM_WORLD, cx , rank, &colcomm);
for (k=0; k<n; k++ ){
if (cx == k ) rowtemp= Axy ;
MPI_Bcast (rowtemp, 1, MPI_INT, k, rowcomm );
if (cy == k ) coltemp = Axy;
MPI_Bcast (coltemp 1, MPI_INT, k, colcomm );
Axy =min ( Axy, rowtemp+coltemp );
}
COSC 6374 – Parallel Computation
Edgar Gabriel
A more realistic data decomposition
• 1-D column wise data distribution
– Each process holds a column of the adjacency matrix
– No need to broadcast data in column-communicator
• No need to create column-wise communicators
• No need to use sub-communicators at all!
8
COSC 6374 – Parallel Computation
Edgar Gabriel
Preliminary algorithm (IIa)
Given Ax0… Axn n, cx,
int temp[n];
/* note: rank in MPI_COMM_WORLD = cx */
for (k=0; k<n; k++ ){
if (cx == k ) {
for (i=0; i<n; i++ ) {
temp[i]= Axi ;
}
}
MPI_Bcast (temp, n, MPI_INT, k, MPI_COMM_WORLD );
for (i=0; i<n; i++) {
Axi =min ( Axi, Axk +temp[i] );
}
}
COSC 6374 – Parallel Computation
Edgar Gabriel
Preliminary algorithm (IIb)
Given Ax0… Axn n, cx,
int temp[n];
/* note: rank in MPI_COMM_WORLD = cx */
for (k=0; k<n; k++ ){
for (i=0; i<n; i++) {
if (cx == k ) {
temp[i]= Axi ;
}
MPI_Bcast (&temp[i], 1, MPI_INT, k, MPI_COMM_WORLD );
Axi =min ( Axi, Axk +temp[i] );
}
}
9
COSC 6374 – Parallel Computation
Edgar Gabriel
An even more realistic data decomposition
• Each process holds a certain number of columns, e.g. nx
• Thus, each process is the owner of the columns
rank* nx to [(rank+1)* nx]-1
• The owner of the column k is the process with the rank
r=[floor(k/nx)]
• Mapping of global to local indices: column k of the
global matrix is column s in the local matrix of process
r with s=k%nx
COSC 6374 – Parallel Computation
Edgar Gabriel
Preliminary Algorithm (III)
Given A[n][nx] n, cx,
int temp[n]
/* note: rank in MPI_COMM_WORLD = cx */
for ( k = 0; k< N; k++ ) {
root = floor(k/nx);
if ( root == rank ) {
for (i=0; i<n; i++ ) {
temp[i] = a[i][k%nx];
}
}
MPI_Bcast ( temp, n, MPI_INT, root, MPI_COMM_WORLD);
for (i=0; i<n; i++ ) {
for ( j=0; j<nx; j++ ) {
a[i][j] = min(a[i][j], (temp[i]+a[k][j]));
}
}
}
10
COSC 6374 – Parallel Computation
Edgar Gabriel
2-D data decomposition
• Each process holds a block of the adjacency matrix, e.g
(nx,ny) elements
• Need to re-introduce row and column-wise sub-
communicators
COSC 6374 – Parallel Computation
Edgar Gabriel
Preliminary algorithm (IVa)Given A, n, nx, ny
int coltemp[ny], rowtemp[nx]
px = n/nx; py = n/ny;
cx = rank %px;
cy = floor (rank/py);
/* Generate subcommunicators */
MPI_Comm_split (MPI_COMM_WORLD, cx, rank, &colcomm);
MPI_Comm_split (MPI_COMM_WORLD, cy, rank, &rowcomm);
for ( k = 0; k< n; k++ ) {
rootx = floor(k/nx);
if ( rootx == cx ) {
for (i=0; i<ny; i++ )
coltemp[i] = a[i][k%nx];
}
MPI_Bcast ( coltemp, ny, MPI_INT, rootx, rowcomm);
11
COSC 6374 – Parallel Computation
Edgar Gabriel
Preliminary algorithm (IVb)
rooty = floor(k/ny);
if ( rooty == cy ) {
for (i=0; i<nx; i++ )
rowtemp[i] = a[k%ny][i];
}
MPI_Bcast ( rowtemp, nx, MPI_INT, rooty, colcomm);
for (i=0; i<ny; i++ ) {
for ( j=0; j<nx; j++ ) {
a[i][j] = min (a[i][j],(coltemp[i]+rowtemp[j]));
}
}
}
COSC 6374 – Parallel Computation
Edgar Gabriel
Minimum Spanning Tree
• Minimum Spanning Tree:
– Spanning tree: undirected Graph G’ being a subgraph of G
containing all vertices
– Minimum spanning tree: spanning tree with minimum
weight
1
2
3
4
3
5
8
5
2
4
4
41
2
3
4
3
5
2
4
4
12
COSC 6374 – Parallel Computation
Edgar Gabriel
Prim’s Algorithm
Given G=(V,E,ω) and an arbitrary vertex r
VT = {r}
d[r] = 0
for all v �(V-VT) do
d[v] = w(r,v);
while ( VT V ) do
find vertex u such that d[u]=min(d[v],v �(V-VT));
VT = VT {u};
for all v �(V-VT) do
d[v] = min (d[v], w(u,v));
end while
∪
COSC 6374 – Parallel Computation
Edgar Gabriel
Prim’s Algorithm (II)
• VT: vector containing the vertices already added to the
spawning tree
• (V-VT): set of vertices, which have not yet been
added to the spanning tree.
• d: distance vector, e.g. d[i] contains the minimum
weight of vertex i to any vertex in the spanning tree
13
COSC 6374 – Parallel Computation
Edgar Gabriel
Example (I)
1
2
3
55
1
3
4
5
4
2
0
1 3
1
∞∞∞
∞∞
∞∞
∞
∞∞
∞∞
052
5041
4021
12053
1501
3310
Arbitrary starting point r=1
1
2
3
55
1
3
4
5
4
2
0
1 3
1
[ ]∞∞= 1501d
[ ]111111 −−−−−=TV
-1:= undefined
node not considered in the following search,
since vertex is already in the spanning tree
COSC 6374 – Parallel Computation
Edgar Gabriel
Example (II)
1
2
3
55
1
3
4
5
4
2
0
1 3
1
[ ]111131 −−−−=TV
[ ]∞= 41201d
1
2
3
55
1
3
4
5
4
2
0
1 3
1
[ ]111031 −−−=TV
[ ]341201=d
14
COSC 6374 – Parallel Computation
Edgar Gabriel
Example (III)
1
2
3
55
1
3
4
5
4
2
0
1 3
1
[ ]112031 −−=TV
[ ]311201=d
1
2
3
55
1
3
4
5
4
2
0
1 3
1
[ ]142031 −=TV
[ ]311201=d
COSC 6374 – Parallel Computation
Edgar Gabriel
Example (IV)
1
2
3
55
1
3
4
5
4
2
0
1 3
1
[ ]542031=TV
[ ]311201=d
15
COSC 6374 – Parallel Computation
Edgar Gabriel
Sequential implementation (I)
Given A[N][N], N
u = 1; vt[vtcount++] = u;
for ( i=0; i<N; i++ ) {
d[i] = A[u][i];
}
for ( i=1; i<N; i++ ) {
u = find_u ( vt, vtcount, d );
vt[vtcount++] = u;
update_d (d, vt, vtcount, N, u, a );
}
COSC 6374 – Parallel Computation
Edgar Gabriel
Sequential implementation (II)
int find_u ( int vt[N], int vtcount, int d[N] )
{
int i, j, found;
int current_min=MY_INF, current_minloc=-1;
for ( i=0; i<N; i++ ) {
for (found = 0, j=0; j<vtcount; j++ ) {
if (i==vt[j]) found=1;
}
if (found) continue;
if ( d[i] < current_min) {
current_minloc = i;
current_min = d[i];
}
}
return current_minloc;
}
16
COSC 6374 – Parallel Computation
Edgar Gabriel
Sequential implementation (III)
void update_d ( int d[N], int vt[N], int vtcount,
int N, int u, int a[N][N] )
{
int i, j, found;
for ( i=0; i<N; i++ ) {
for ( found=0, j=0; j<vtcount; j++ ) {
if (i==vt[j]) found=1;
}
if (found) continue;
d[i] = min (d[i], a[u][i]);
}
return;
}
COSC 6374 – Parallel Computation
Edgar Gabriel
Parallel Algorithm 1 (I)
• Each process owns one column of the adjacency matrix
• Each process owns the according part of the distance
vector d
• VT is replicated on each process
• Only find_u and update_d need to be modified!
A
d
17
COSC 6374 – Parallel Computation
Edgar Gabriel
Parallel Algorithm 1 (II)
int find_u ( int vt[N], int vtcount, int d )
{
int min[2], gmin[2], i, rank;
MPI_Comm_rank ( MPI_COMM_WORLD, &rank );
min[0] = d;
min[1] = rank;
for ( i=0; i < vtcount; i++ ) {
if ( vt[i] == rank ) {
min[0] = MY_INF;
break;
}
}
MPI_Allreduce ( min, gmin, 1, MPI_2INT, MPI_MINLOC,
MPI_COMM_WORLD);
return gmin[1];
}
COSC 6374 – Parallel Computation
Edgar Gabriel
MPI_MINLOC and MPI_MAXLOC (I)
• Operators for reduction operations returning the minimum/maximum value and the process owning the minimum and maximum value
• Special MPI data types have to be used
– MPI_2INT: array of two integers.
– Element zero contains min/max value
– Element one contains the rank of the process owning minimal/maximal value
– MPI_FLOAT_INT structure consisting of a float and an int
struct {
float val;
int rank;
}
18
COSC 6374 – Parallel Computation
Edgar Gabriel
MPI_MINLOC and MPI_MAXLOC (II)
• Similarly:
– MPI_DOUBLE_INT,
– MPI_SHORT_INT,
– MPI_LONG_INT …
• Note:
– the rank in the second element has to be set by each
process for the input-values
– MPI guarantees, that each process will have the same
rank in the result-vector for the location of the
minimum/maximum, even if several processes have the
same minimal/maximal value.
COSC 6374 – Parallel Computation
Edgar Gabriel
Parallel Algorithm 1 (III)
void update_d ( int *d, int *vt, int vtcount, int u, int a[N] )
{
int i, rank;
MPI_Comm_rank ( MPI_COMM_WORLD, &rank);
for ( i=0; i < vtcount; i++ ) {
if ( vt[i] == rank ) {
return;
}
}
*d = min (*d, a[u]);
return;
}
19
COSC 6374 – Parallel Computation
Edgar Gabriel
Parallel Algorithm 2 (I)
• Each process owns a certain number of columns of the
adjacency matrix
• Each process owns the according elements of the
distance vector
• VT is replicated on each process
A
d
COSC 6374 – Parallel Computation
Edgar Gabriel
Parallel Algorithm 2 (II)
for ( i=0; i<nx; i++ ) {
for (found=0, j=0; j<vtcount; j++ ) {
if ((cx*nx+i)==vt[j]) found=1;
}
if (found) continue;
if ( d[i] < current_min) {
current_minloc = (cx*nx+i);
current_min = d[i];
}
}
min[0] = current_min;
min[1] = rank;
MPI_Allreduce ( min, gmin, 1, MPI_2INT,
MPI_MINLOC, MPI_COMM_WORLD);
MPI_Bcast ( ¤t_minloc, 1, MPI_INT, gmin[1],
MPI_COMM_WORLD );
return current_minloc;
20
COSC 6374 – Parallel Computation
Edgar Gabriel
Parallel Algorithm 2 (III)
void update_d ( int *d, int *vt, int vtcount, int u, int a[N] )
{
int i, j, rank, found;
for ( i=0; i<nx; i++ ) {
for ( found=0, j=0; j<vtcount; j++ ) {
if ((cx*nx+i)== vt[j]) {
found=1;
}
}
if (found) continue;
d[i] = my_min (d[i], a[u][i]);
}
return;
}