InfoVis Infrastructure Workshop
Chris MuellerOpen Systems Lab, Indiana University
October 9, 2004chemuell at cs dot indiana dot edu
www.osl.iu.edu
Overview
• Position Paper– Repository style infrastructure (SourceForge, GenBank,
CPAN)– Standard software protocols – Guiding policies to help ensure quality
• Current Work – Open Systems Lab, IU– High performance components for IVC
• Boost Graph Library– Very large data sets/visualization
• Interests– Understand community needs– Learn what’s available, where we’re going
• Industry Viewpoint– Web-based Scientific Visualization and Analysis products– In-house visualization and analysis tools (high-throughput
analytical chemistry)
Core Algorithm Patterns breadth_first_search breadth_first_visit depth_first_search depth_first_visit undirected_dfs
Shortest Paths Algorithms dijkstra_shortest_paths bellman_ford_shortest_paths dag_shortest_paths johnson_all_pairs_shortest_paths
Minimum Spanning Tree Algorithms kruskal_minimum_spanning_tree prim_minimum_spanning_tree connected_components strong_components Incremental Connected Components initialize_incremental_components incremental_components same_component component_index
Maximum Flow Algorithms edmunds_karp_max_flow push_relabel_max_flow topological_sort transitive_closure copy_graph transpose_graph isomorphism cuthill_mckee_ordering sequential_vertex_coloring* minimum_degree_ordering sloan_ordering ith_wavefront, max_wavefront, aver_wavefront, and rms_wavefront
Recent Additions
Betweenness Centrality
Betweenness Centrality clustering
A* search
Floyd-Warshall all-pairs shortest paths
Kamada-Kawai layout
Boost Graph Library Algorithms
BC Clustering Performance BGL vs. JUNG
0
10
20
30
40
50
60
200 225 250 275 300 325 350 375 400
# of Movies
Wall
clo
ck t
ime (
min
ute
s)
BGL
JUNG
Actor Collaboration DatabaseBetweenness Centrality Clustering (threshold=0.01)
Single Processor
Scalability of Parallel BGL Betweenness Centrality
1
10
100
1000
1 10 100
# of Processors
Wall
clo
ck t
ime (
secon
ds)
Actor Collaboration DatabaseBetweenness Centrality
Dot Plot Performance ResultsBase SIMD 1 SIMD 2 Thread
Ideal 140 1163 1163 2193
NFS 88 370 400 -
NFS Touch 88 - 446 891
Local - 500 731 -
Local Touch 90 - 881 1868
• Base is a direct port of the DOTTER algorithm • SIMD 1 is the SIMD algorithm using a sparse matrix data structure based on STL vectors• SIMD 2 is the SIMD algorithm using a binary format and memory mapped output files• Thread is the SIMD 2 algorithm on 2 Processors
Ideal Speedup Real Speedup Ideal/Real Throughput
SIMD 8.3x 9.7x 75%
Thread 15x 18.1x 77%
Thread (large data) 13.3 21.2 85%