Upload
david-gleich
View
25.717
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Slides from my talk at the Mathworks summit at Stanford
Citation preview
A tale of two Matlab libraries �for graph algorithms�MatlabBGL and gaimc
David F. Gleich Purdue University
The Setting
recursive spectral graph partitioning
The Setting
recursive spectral graph partitioning
��� � ⋅ ��������
�.�.� Sparse matrices in Matlab
To store anm×n sparsematrixM, Matlab uses compressed column format[Gilbert et al., ����]. Matlab never stores a 0 value in a sparsematrix. It always“re-compresses” the data structure in these cases. IfM is the adjacency matrixof a graph, then storing the matrix by columns corresponds to storing thegraph as an in-edge list.
We brie�y illustrate compressed row and column storage schemes in �g-ure �.�.
1
2
3
4
5
6
16
13
12
9
14
7
20
4
410
�������������
0 16 13 0 0 00 0 10 12 0 00 4 0 0 14 00 0 9 0 0 200 0 0 7 0 40 0 0 0 0 0
�������������
Compressed sparse row1 3 5 7 9 11 11
162
133
103
124
42
145
93
206
74
46 �ci
rp
ai
Compressed sparse column1 1 3 6 8 9 11
161
43
131
102
94
122
75
143
204
45 �ri
cp
ai
Figure 6.1 – Compressed row and columnstorage. At far le�, we have a weighted,directed graph. Its weighted adjacencymatrix lies below. At right are the com-pressed row and compressed columnarrays for this graph and matrix. Forsparse matrices, compressed row andcolumn storage make it easy to accessentries in rows and columns, respectively.Consider the �rd entry in rp. It saysto look at the �th element in ci to �ndall the columns in the �rd row of thematrix. �e �th and �th elements of ciand ai tell us that row � has non-zerosin columns � and �, with values � and��. When the sparse matrix correspondsto the adjacency matrix of a graph, thiscorresponds to e�cient access to theout-edges and in-edges of a vertex.
Most graph algorithms are designed to work with out-edge lists instead ofin-edge lists.� Before running an algorithm, MatlabBGL explicitly transposes
� See section �.�.� for a discussion aboutthe requirements for various graphalgorithms.
the graph so that Matlab’s internal representation corresponds to storing out-edge lists. For algorithms on symmetric graphs, these transposes are notrequired.
�e mex commands mxGetPr, mxGetJc, and mxGetIr retrieve pointers toMatlab’s internal storage of thematrixwithoutmaking a copy.�ese functionsmake it possible to access a sparse matrix e�ciently without making a copyand are a requirement of our implementation.
Let us recap. Sparse matrices are the best way to store graphs in Matlab.�ey provide all the necessary pieces to integrate cleanlywith “natural”Matlabsyntax and allow us access to their internals to run algorithms e�ciently.
�.�.� Other packages
�ere are other graph packages for Matlab too. One of the �rst was themeshpart toolkit [Gilbert and Teng, ����], which focuses on partitioningmeshes. Amore recent example is Matgraph [Scheinerman, ����], which con-tains a rich set of graph constructors to create adjacencymatrices for standardgraphs. It also provides an interface to support graph properties, such as la-bels and weights. Various authors released individual graph theory functionson the Mathworks File Exchange [Various, ����a, search for dijkstra]. Forexample, the Exchange contains more than three separate implementationsof Dijkstra’s shortest path algorithm.
��� � ⋅ ��������
�.�.� Sparse matrices in Matlab
To store anm×n sparsematrixM, Matlab uses compressed column format[Gilbert et al., ����]. Matlab never stores a 0 value in a sparsematrix. It always“re-compresses” the data structure in these cases. IfM is the adjacency matrixof a graph, then storing the matrix by columns corresponds to storing thegraph as an in-edge list.
We brie�y illustrate compressed row and column storage schemes in �g-ure �.�.
1
2
3
4
5
6
16
13
12
9
14
7
20
4
410
�������������
0 16 13 0 0 00 0 10 12 0 00 4 0 0 14 00 0 9 0 0 200 0 0 7 0 40 0 0 0 0 0
�������������
Compressed sparse row1 3 5 7 9 11 11
162
133
103
124
42
145
93
206
74
46 �ci
rp
ai
Compressed sparse column1 1 3 6 8 9 11
161
43
131
102
94
122
75
143
204
45 �ri
cp
ai
Figure 6.1 – Compressed row and columnstorage. At far le�, we have a weighted,directed graph. Its weighted adjacencymatrix lies below. At right are the com-pressed row and compressed columnarrays for this graph and matrix. Forsparse matrices, compressed row andcolumn storage make it easy to accessentries in rows and columns, respectively.Consider the �rd entry in rp. It saysto look at the �th element in ci to �ndall the columns in the �rd row of thematrix. �e �th and �th elements of ciand ai tell us that row � has non-zerosin columns � and �, with values � and��. When the sparse matrix correspondsto the adjacency matrix of a graph, thiscorresponds to e�cient access to theout-edges and in-edges of a vertex.
Most graph algorithms are designed to work with out-edge lists instead ofin-edge lists.� Before running an algorithm, MatlabBGL explicitly transposes
� See section �.�.� for a discussion aboutthe requirements for various graphalgorithms.
the graph so that Matlab’s internal representation corresponds to storing out-edge lists. For algorithms on symmetric graphs, these transposes are notrequired.
�e mex commands mxGetPr, mxGetJc, and mxGetIr retrieve pointers toMatlab’s internal storage of thematrixwithoutmaking a copy.�ese functionsmake it possible to access a sparse matrix e�ciently without making a copyand are a requirement of our implementation.
Let us recap. Sparse matrices are the best way to store graphs in Matlab.�ey provide all the necessary pieces to integrate cleanlywith “natural”Matlabsyntax and allow us access to their internals to run algorithms e�ciently.
�.�.� Other packages
�ere are other graph packages for Matlab too. One of the �rst was themeshpart toolkit [Gilbert and Teng, ����], which focuses on partitioningmeshes. Amore recent example is Matgraph [Scheinerman, ����], which con-tains a rich set of graph constructors to create adjacencymatrices for standardgraphs. It also provides an interface to support graph properties, such as la-bels and weights. Various authors released individual graph theory functionson the Mathworks File Exchange [Various, ����a, search for dijkstra]. Forexample, the Exchange contains more than three separate implementationsof Dijkstra’s shortest path algorithm.
1 2 3 4 5 6
1 2 3 4 5 6
The Setting
recursive spectral graph partitioning >> A = load_adjacency_matrix; >> L = speye(sum(A,2)) - A; >> [V,D] = eigs(L,2,’SA’); >> f = V(:,2); >> A1 = A(f>=0,f>=0); A2 = A(f<0, f<0);*
The Setting
recursive spectral graph partitioning >> A = load_adjacency_matrix; >> L = speye(sum(A,2)) - A; >> [V,D] = eigs(L,2,’SA’); >> f = V(:,2); >> A1 = A(f>=0,f>=0); A2 = A(f<0, f<0);*
*Warning Can do much better than this split!
The Problem
disconnected components
The Problem
disconnected components >> C = components(A); ??? Undefined function or method ’components' for input arguments of type 'double’.
The Problem
disconnected components >> C = components(A); ??? Undefined function or method ’components' for input arguments of type 'double’.
*Warning Strictly speaking, this isn’t a problem. However, it’s inefficient to solve larger eigenproblems than required.
The Rescue
disconnected components MESHPART toolkit by �John Gilbert and Sheng-hua Teng >> C = components(A); Uses Matlab’s dmperm function
The Failed Rescue
disconnected components >> C = components(A); caused Matlab to randomly crash I wanted a fast max-flow routine too
MatlabBGL Matlab and the Boost graph library
The Recoup
working recursive spectral partitioning code using Boost graph library in C++�including a max-flow heuristic extension Boost graph library has a components function and many other graph algorithms Boost has a “generic” graph data-type
The Idea
add graph algorithms to Matlab naturally using Boost graph library
The Plan
graph data type => Matlab sparse matrix results => “natural” Matlab types
The Plan
>> A = load_adjacency_matrix >> d = bfs(A,1); >> d = dijkstra(A,size(A,1)); >> T = mst(A); >> c = components(A); >> F = maxflow(A,s,t); >> test_dag(A) >> [flag,K] = test_planarity(A);
The Plan
suitable for large problems => 10 million edges circa 2006 => avoid copying data
The Catch
Matlab sparse type compressed sparse column 1:n [i,j,w] = find(A); size(A,1) [~,j,w] = find(A(v,:)) [~,j] = find(A(v,:))
Boost graph type vertices(G) edges(G) num_vertices(G) out_edges(G,v) adjacenct(G,v)
��� � ⋅ ��������
�.�.� Sparse matrices in Matlab
To store anm×n sparsematrixM, Matlab uses compressed column format[Gilbert et al., ����]. Matlab never stores a 0 value in a sparsematrix. It always“re-compresses” the data structure in these cases. IfM is the adjacency matrixof a graph, then storing the matrix by columns corresponds to storing thegraph as an in-edge list.
We brie�y illustrate compressed row and column storage schemes in �g-ure �.�.
1
2
3
4
5
6
16
13
12
9
14
7
20
4
410
�������������
0 16 13 0 0 00 0 10 12 0 00 4 0 0 14 00 0 9 0 0 200 0 0 7 0 40 0 0 0 0 0
�������������
Compressed sparse row1 3 5 7 9 11 11
162
133
103
124
42
145
93
206
74
46 �ci
rp
ai
Compressed sparse column1 1 3 6 8 9 11
161
43
131
102
94
122
75
143
204
45 �ri
cp
ai
Figure 6.1 – Compressed row and columnstorage. At far le�, we have a weighted,directed graph. Its weighted adjacencymatrix lies below. At right are the com-pressed row and compressed columnarrays for this graph and matrix. Forsparse matrices, compressed row andcolumn storage make it easy to accessentries in rows and columns, respectively.Consider the �rd entry in rp. It saysto look at the �th element in ci to �ndall the columns in the �rd row of thematrix. �e �th and �th elements of ciand ai tell us that row � has non-zerosin columns � and �, with values � and��. When the sparse matrix correspondsto the adjacency matrix of a graph, thiscorresponds to e�cient access to theout-edges and in-edges of a vertex.
Most graph algorithms are designed to work with out-edge lists instead ofin-edge lists.� Before running an algorithm, MatlabBGL explicitly transposes
� See section �.�.� for a discussion aboutthe requirements for various graphalgorithms.
the graph so that Matlab’s internal representation corresponds to storing out-edge lists. For algorithms on symmetric graphs, these transposes are notrequired.
�e mex commands mxGetPr, mxGetJc, and mxGetIr retrieve pointers toMatlab’s internal storage of thematrixwithoutmaking a copy.�ese functionsmake it possible to access a sparse matrix e�ciently without making a copyand are a requirement of our implementation.
Let us recap. Sparse matrices are the best way to store graphs in Matlab.�ey provide all the necessary pieces to integrate cleanlywith “natural”Matlabsyntax and allow us access to their internals to run algorithms e�ciently.
�.�.� Other packages
�ere are other graph packages for Matlab too. One of the �rst was themeshpart toolkit [Gilbert and Teng, ����], which focuses on partitioningmeshes. Amore recent example is Matgraph [Scheinerman, ����], which con-tains a rich set of graph constructors to create adjacencymatrices for standardgraphs. It also provides an interface to support graph properties, such as la-bels and weights. Various authors released individual graph theory functionson the Mathworks File Exchange [Various, ����a, search for dijkstra]. Forexample, the Exchange contains more than three separate implementationsof Dijkstra’s shortest path algorithm.
The Compromise
make a transpose when its required but let “smart” users by-pass it
The Details
� .� ⋅ ��������� ���
BGL is largely irrelevant to MatlabBGL.�ere is no need for the copy_graphfunction from Boost, for example.
Next, �gure �.� shows the high level architecture of MatlabBGL. �ere
Boost
CSR Graph
CSR Graph
Sparse Matrix
Matlab l ibmbgl
M code
mex code
extern c code
c++ code
dfs
bfsmst pr immst
dfsbfs
Figure 6.2 –MatlabBGL architecture.MatlabBGL consists of four components:m-�les, mex-�les, libmbgl, and Boostgraph library functions. See the text fora description of how data �ows throughthese components.
are four main components: m-�les, mex-�les, libmbgl, and BGL functions.Let’s illustrate a typical call to a MatlabBGL function: dfs for a depth-�rstsearch through the graph.
First, the dfs.m �le (an M code) receives the sparse matrix representationof the graph and the identi�er of a vertex that originates the search. It per-forms some basic parameter checking on the data, transposes the matrix toget the graph stored by out-edges in the Matlab data structure, and forwardsthe information to the dfs_mex.cmex-�le. By providing an optional argu-ment to the function, both the check and the transpose can be eliminated forthe fastest performance. �e mex-�le extracts the compressed sparse columnarrays for the sparse matrix, which corresponds to a compressed out-edgelist representation of the graph, and sends the information to the libmbglfunction depth_first_search. �e libmbgl functions implement wrappersaround Boost functions on compressed sparse row arrays and expose themvia a C calling convention. �is library is further described in section �.�.�.For the depth_first_search function, the wrapper takes the compressedsparse row arrays and instantiates a csr_graph type that implements theVer-texListGraph, IncidenceGraph, EdgeListGraph, andAdjacencyGraph conceptsdirectly on the compressed sparse row arrays. With the csr_graph object,the libmbgl wrapper calls a Boost graph library function.
�roughout this entire process, the only copy of the data occurs whenthe initial sparse matrix is transposed to store the data by out-edges (rows)instead of in-edges (columns).� � Some Boost graph functions make a
copy of the graph inside the algorithm.We can do nothing about these copieswithout modifying the BGL itself.
�us far, the interface between the libraries is only complicated by thelayers of abstraction. Although maintaining three layers (m-�les, mex-�les,and libmbgl) may seem unnecessary, it simpli�es calling conventions acrossmultiple platforms. �e m-�les call mex-�les, which Matlab always supports.�e mex-�les call functions in libmbgl with a C calling convention, which isalso extremely portable. And the C functions interface with the Boost graphlibrary. We discuss other reasons to keep libmbgl separate from the mex �lesin the next section.
MatlabBGL – Version 1.0
Released April 2006 on �Matlab File exchange
July ‘06 v2.0 added visitors April ‘07 v2.1 64-bit Matlab April ‘08 v3.0 performance improvement Oct ‘08 v4.0 planarity testing, layout, structural zeros
Jan ‘12 v5.0 update forthcoming?
Impact
Downloaded over 20,000 times Used in over 10 publications by others!�including a PNAS article on brain topology Identified numerous bugs in the �Boost graph library
Impact
Network Partitioning
… and now for a demo …
The Devil of the Details
� .� ⋅ ��������� ���
BGL is largely irrelevant to MatlabBGL.�ere is no need for the copy_graphfunction from Boost, for example.
Next, �gure �.� shows the high level architecture of MatlabBGL. �ere
Boost
CSR Graph
CSR Graph
Sparse Matrix
Matlab l ibmbgl
M code
mex code
extern c code
c++ code
dfs
bfsmst pr immst
dfsbfs
Figure 6.2 –MatlabBGL architecture.MatlabBGL consists of four components:m-�les, mex-�les, libmbgl, and Boostgraph library functions. See the text fora description of how data �ows throughthese components.
are four main components: m-�les, mex-�les, libmbgl, and BGL functions.Let’s illustrate a typical call to a MatlabBGL function: dfs for a depth-�rstsearch through the graph.
First, the dfs.m �le (an M code) receives the sparse matrix representationof the graph and the identi�er of a vertex that originates the search. It per-forms some basic parameter checking on the data, transposes the matrix toget the graph stored by out-edges in the Matlab data structure, and forwardsthe information to the dfs_mex.cmex-�le. By providing an optional argu-ment to the function, both the check and the transpose can be eliminated forthe fastest performance. �e mex-�le extracts the compressed sparse columnarrays for the sparse matrix, which corresponds to a compressed out-edgelist representation of the graph, and sends the information to the libmbglfunction depth_first_search. �e libmbgl functions implement wrappersaround Boost functions on compressed sparse row arrays and expose themvia a C calling convention. �is library is further described in section �.�.�.For the depth_first_search function, the wrapper takes the compressedsparse row arrays and instantiates a csr_graph type that implements theVer-texListGraph, IncidenceGraph, EdgeListGraph, andAdjacencyGraph conceptsdirectly on the compressed sparse row arrays. With the csr_graph object,the libmbgl wrapper calls a Boost graph library function.
�roughout this entire process, the only copy of the data occurs whenthe initial sparse matrix is transposed to store the data by out-edges (rows)instead of in-edges (columns).� � Some Boost graph functions make a
copy of the graph inside the algorithm.We can do nothing about these copieswithout modifying the BGL itself.
�us far, the interface between the libraries is only complicated by thelayers of abstraction. Although maintaining three layers (m-�les, mex-�les,and libmbgl) may seem unnecessary, it simpli�es calling conventions acrossmultiple platforms. �e m-�les call mex-�les, which Matlab always supports.�e mex-�les call functions in libmbgl with a C calling convention, which isalso extremely portable. And the C functions interface with the Boost graphlibrary. We discuss other reasons to keep libmbgl separate from the mex �lesin the next section.
Compile mex files on �OSX/Linux/Win in �32-bit and 64-bit mode
Compile libmbgl on �OSX/Linux/Win in �32-bit and 64-bit mode
The Devil of the Details
Hard to keep up with changes in Matlab Hard for users to compile themselves (changes in Boost and changes in Matlab) Hard to play around with new algorithms Mathworks graph library in bioinformatics toolbox
gaimc graph algorithms in matlab code
A vision
function n=my1norm(x) n = 0; for i=1:numel(x), n=n+abs(x(i)); end >> x = randn(1e7,1); >> tic, n1=my1norm(x); toc Elapsed time is 0.16 seconds >> tic, n1 = norm(x,1); toc; Elapsed time is 0.32 seconds
Note �R2007b on 64-bit linux
A vision
function n=my1norm(x) n = 0; for i=1:numel(x), n=n+abs(x(i)); end >> x = randn(1e7,1); >> tic, n1=my1norm(x); toc Elapsed time is 0.16 seconds >> tic, n1 = norm(x,1); toc; Elapsed time is 0.32 seconds
Note �R2007b on 64-bit linux
A vision
function n=my1norm(x) n = 0; for i=1:numel(x), n=n+abs(x(i)); end >> x = randn(1e7,1); >> tic, n1=my1norm(x); toc Elapsed time is 0.15 seconds >> tic, n1 = norm(x,1); toc; Elapsed time is 0.1 seconds
Note �R2011a on 64-bit osx
Quite impressed
get within spitting distance of vectorized performance using Matlab for loops even faster than some things in python
Another idea
implement graph algorithms in pure Matlab code should only be “somewhat” slower much more portable
More problems
function calls make things REALLY slow�(unless the function is built-in, e.g. abs) mst and dijkstra need a heap, �a heap in Matlab?
Problem specifics
function n=my1normfunc(x) n = 0;for i=1:numel(x),n=n+abs1(x(i)); end function a=myabs(a), if a<0, a=-a; end >> tic, n1=mynorm1(x); toc Elapsed time is 0.15 seconds >> tic, n1 = my1normfunc(x,1); toc; Elapsed time is 3.16 seconds
Note �R2011a on 64-bit osx
A heap in Matlab code
Old reference D. K. Kahaner �Algorithm 561: �Fortran implementation�of heap programs. �ACM TOMS 1980
� .� ⋅ ����� ���
�.�.� AMatlab heap structure
To implement bothDijkstra’s shortest path algorithm andPrim’sminimumspanning tree algorithm we need a means to store and access vertices, insorted order, based on a constantly changing set of values. A heap is one datastructure that meets these requirements [Cormen et al., ����]. In this section,we discuss a Matlab implementation of a heap.
�e following implementation is inspired by Kahaner [����].�� From a �� More generally speaking, algorithmswritten in Fortran �� are excellent can-didates for the Matlab just-in-timecompiler.
data structure perspective, a heap is a binary tree where smaller elements areparents of larger elements. It supports the following operations:
������ add an element to the heap;��� remove the element from the heap with the smallest
value; and������ change the value of an element in the heap.
Matlab specializes in arrays (or vectors), and a common way to store abinary tree in an array is to associate the tree node of index j with a le� childof index 2 j and a right child of index 2 j + 1. See �gure �.� for an example.
�e array5 6 7 1 9 6
corresponds to the following tree:5
8
1 9
7
6
Figure 6.3 – Binary trees as arrays.
�e data structure for our Matlab heap will consist of four arrays and onenumber.
� the heap tree. �e array T stores the identi�ers of the items in the heap.�at is, T(i) is the id of the element in tree node i and T(1) is the idof the element at the root of the heap tree.
� the data store. �e array T stores ids of elements in D so that D(T(i)) isthe actual item for tree node i.�� �e size of D must be the maximum �� For items without natural ids, ids can
be uniquely assigned based on how manyitems have already been added to theheap. In this case, D(i) contains the ithitem added to the heap.
number of items ever added to the heap.��
�� An alternative is to grow the heap byreallocating the arrays if additional itemsmust be added.
� a look up table. �e size of L is the maximum id of any item added to theheap. For id i, L(i) is the tree node index of i in T , and T(L(i)) = i.
� the value array. �e current value associated with id i is given by V(i).�is means that D(i) and V(i) are the item and its value, respectively.
� the current size of the heap.
When we use this heap structure to store vertices of a graph, there is noneed to maintain the data array D. Each vertex is just a unique numericidenti�er for the compressed sparse row arrays that gaimc uses. With D, T(⋅)contains the index of an element in D. When we store vertices in the heap,each vertex already has a unique identi�er—its index—and the array D isunnecessary.
Graph access, take 1
Simple, efficient neighbor access At = A’; [v,~,w] = find(At(:,u));
Graph access, take 2
Complicated neighbor access [i,j,w] = find(A); [ai,aj,a] = indexed2csr(i,j,w,size(A,1)) v = aj(ai(u):ai(u+1));
Graph access
bfs, take 1 At=A’; for w=find(A(:,v)) >> tic, d=bfs(A,1), toc Elapsed time 0.05 seconds bfs, take 2 indexed2csr(A); for ci=rp(v):rp(v+1) …
>> tic, d=bfs(A,1), toc Elapsed time, 0.007 seconds
Graph access
bfs, take 1 At=A’; for w=find(A(:,v)) >> tic, d=bfs(A,1), toc Elapsed time 0.05 seconds bfs, take 2 indexed2csr(A); for ci=rp(v):rp(v+1) …
>> tic, d=bfs(A,1), toc Elapsed time, 0.007 seconds
gaimc
convert input to CSR arrays run graph algorithms on CSR arrays bfs, clustering coeffients, core numbers, cosine knn, dfs, dijkstra, floyd warshall, mst, strong components bipartite_matching (Thanks to Ying Wang)
The pudding
function s=mysumsq(x) s = 0; for i=1:numel(x), s = s + x(i)^2; end >> x = randn(1e7,1); >> tic, s1 = mysumsq(x); toc; >> tic, s2 = x’*x; toc
��� � ⋅ ��������
We evaluate each function on either a small set of sample graphs (dfs anddijkstra) or a set of synthetic graphs (scomponents, dirclustercoeffs,prim_mst, and clustercoeffs). For each function, we call it once to ensurethat the Matlab just-in-time compiler has the current version compiled. �etwo search functions that begin with a source vertex—dfs and dijkstra—are called on each of the graphs listed in table �.� with ��� random startingvertices, and every test is repeated �� times. �e functions scomponents anddirclustercoeffs are evaluated on �� instances of random directed graphswith �� edges per row and 10, 100, 5000, 10000, and 50000 vertices. �efunction clustercoeffs is evaluated similarly, but with random symmetricgraphs instead. Finally, the minimum spanning tree function is evaluated on�� instances of a random symmetric graph with average degree �� and 100,5000, and 10000 vertices. �e aggregated results of all these tests are shownin �gure �.�.
Graph Verts. Edges
allsp1 5 9clr24-1 9 14wb-cs.stan 9914 36584minnesota 2642 3303tapir 1024 2846
Table 6.2 – gaimc evaluation graphs.
dfs scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs0
2
4
6
8
10
12
14
Slow
dow
n
StandardFast
dfs scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs0
2
4
6
8
10
12
14
Slow
dow
n
Figure 6.4 – Performance of the gaimc library. An experimental comparison of the performanceof the gaimc library to MatlabBGL shows that many functions in gaimc take only twice asmuch time as their MatlabBGL counterparts. �e di�erence between the standard and fastoperations is that fast operations eliminate any data translations and measure pure algorithmspeed. Standard calls in these libraries involve some data translation, which is included in thetime for the standard operations.
With the exception of mst_prim, the gaimc functions are roughly �-�times slower than their MatlabBGL counterparts. At the moment, we don’tunderstand why the dfs function is faster in gaimc or why the mst_prim
routine has dramatically di�erent performance. Exploring these di�erencesis a task for the future.
�.� ������ ��� ������� ’� �� ������
�e �nal so�ware package that we discuss in this chapter is libbvg andits Matlab counterpart bvgraph. All the source code and examples for thesepaired packages are online at the LaunchPad open-source hosting repository,https://launchpad.net/libbvg.�� �� We anticipate migrating them to the
github system soon.Both of these packages work with web graphs, which are graphs formedby hyper-linking relationships on the world wide web. �ese graphs areextremely large—the complete network has over one trillion nodes [Alpertand Hajaj, ����]—and subsets o�en have more than one hundred million
The pudding changes
function s=mysumsq(x) s = 0; for i=1:numel(x), s = s + x(i)^2; end >> x = randn(1e7,1); >> tic, s1 = mysumsq(x); toc; >> tic, s2 = x’*x; toc
��� � ⋅ ��������
We evaluate each function on either a small set of sample graphs (dfs anddijkstra) or a set of synthetic graphs (scomponents, dirclustercoeffs,prim_mst, and clustercoeffs). For each function, we call it once to ensurethat the Matlab just-in-time compiler has the current version compiled. �etwo search functions that begin with a source vertex—dfs and dijkstra—are called on each of the graphs listed in table �.� with ��� random startingvertices, and every test is repeated �� times. �e functions scomponents anddirclustercoeffs are evaluated on �� instances of random directed graphswith �� edges per row and 10, 100, 5000, 10000, and 50000 vertices. �efunction clustercoeffs is evaluated similarly, but with random symmetricgraphs instead. Finally, the minimum spanning tree function is evaluated on�� instances of a random symmetric graph with average degree �� and 100,5000, and 10000 vertices. �e aggregated results of all these tests are shownin �gure �.�.
Graph Verts. Edges
allsp1 5 9clr24-1 9 14wb-cs.stan 9914 36584minnesota 2642 3303tapir 1024 2846
Table 6.2 – gaimc evaluation graphs.
dfs scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs0
2
4
6
8
10
12
14
Slow
dow
n
StandardFast
dfs scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs0
2
4
6
8
10
12
14
Slow
dow
n
Figure 6.4 – Performance of the gaimc library. An experimental comparison of the performanceof the gaimc library to MatlabBGL shows that many functions in gaimc take only twice asmuch time as their MatlabBGL counterparts. �e di�erence between the standard and fastoperations is that fast operations eliminate any data translations and measure pure algorithmspeed. Standard calls in these libraries involve some data translation, which is included in thetime for the standard operations.
With the exception of mst_prim, the gaimc functions are roughly �-�times slower than their MatlabBGL counterparts. At the moment, we don’tunderstand why the dfs function is faster in gaimc or why the mst_prim
routine has dramatically di�erent performance. Exploring these di�erencesis a task for the future.
�.� ������ ��� ������� ’� �� ������
�e �nal so�ware package that we discuss in this chapter is libbvg andits Matlab counterpart bvgraph. All the source code and examples for thesepaired packages are online at the LaunchPad open-source hosting repository,https://launchpad.net/libbvg.�� �� We anticipate migrating them to the
github system soon.Both of these packages work with web graphs, which are graphs formedby hyper-linking relationships on the world wide web. �ese graphs areextremely large—the complete network has over one trillion nodes [Alpertand Hajaj, ����]—and subsets o�en have more than one hundred million
dfs scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs0
5
10
15
20
25
30
35
Slow
dow
n
StandardFast
dfs scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs0
5
10
15
20
25
30
35
Slow
dow
n
Afterward
“putting the graph into Matlab” Matlab could just as easily have been called “Graphlab” with a few extra functions It’s a great environment to play with graphs as matrices