View
2
Download
0
Category
Preview:
Citation preview
ACCELERATING REAL-WORLD
APPLICATIONS
David A. Bader
Georgia Institute of Technology
Professor, School of Computational Science and Engineering
3 | Accelerating Real-world Applications | June 14, 2011
ACCELERATING REAL-WORLD
APPLICATIONS
4 | Accelerating Real-world Applications | June 14, 2011
EXASCALE STREAMING DATA ANALYTICS:
REAL-WORLD CHALLENGES
All involve analyzing massive streaming
complex networks:
Health care disease spread, detection and prevention
of epidemics/pandemics (e.g. SARS, Avian flu, H1N1
“swine” flu)
Massive social networks understanding
communities, intentions, population dynamics, pandemic
spread, transportation and evacuation
Intelligence business analytics, anomaly detection,
security, knowledge discovery from massive data sets
Systems Biology understanding complex life
systems, drug design, microbial research, unravel the
mysteries of the HIV virus; understand life, disease,
Electric Power Grid communication, transportation,
energy, water, food supply
Modeling and Simulation Perform full-scale
economic-social-political simulations
0
50
100
150
200
250
300
350
400
450
Dec-
04
Mar
-05
Jun-
05
Sep-
05
Dec-
05
Mar
-06
Jun-
06
Sep-
06
Dec-
06
Mar
-07
Jun-
07
Sep-
07
Dec-
07
Mar
-08
Jun-
08
Sep-
08
Dec-
08
Mar
-09
Jun-
09
Sep-
09
Dec-
09
Facebook Active Users
Million Users
Exponential growth:
More than 600 million active users
Sample queries: Allegiance switching: identify
entities that switch communities.
Community structure: identify
the genesis and dissipation of
communities
Phase change: identify
significant change in the network
structure
REQUIRES PREDICTING / INFLUENCE CHANGE IN REAL-TIME AT SCALE
Ex: discovered minimal changes in
O(billions)-size complex network
that could hide or reveal top
influencers in the community
5 | Accelerating Real-world Applications | June 14, 2011
EXAMPLE: MINING TWITTER FOR SOCIAL GOOD
ICPP
2010
Image credit: bioethicsinstitute.org
6 | Accelerating Real-world Applications | June 14, 2011
CDC / Nation-scale surveillance of public health
Cancer genomics and drug design
– computed Betweenness Centrality of Human
Proteome
Human Genome core protein interactionsDegree vs. Betweenness Centrality
Degree
1 10 100
Bet
wee
nnes
s C
entra
lity
1e-7
1e-6
1e-5
1e-4
1e-3
1e-2
1e-1
1e+0
MASSIVE DATA ANALYTICS: PROTECTING OUR NATION US High Voltage Transmission Grid
(>150,000 miles of line) Public Health
ENSG0000
0145332.2
Kelch-like
protein 8
implicated
in breast
cancer
7 | Accelerating Real-world Applications | June 14, 2011
GRAPHS ARE PERVASIVE IN LARGE-SCALE DATA ANALYSIS
Sources of massive data: petascale simulations, experimental devices, the
Internet, scientific applications.
New challenges for analysis: data sizes, heterogeneity, uncertainty, data quality.
Astrophysics Problem: Outlier detection.
Challenges: massive datasets,
temporal variations.
Graph problems: clustering, matching.
Bioinformatics Problem: Identifying drug target proteins.
Challenges: Data heterogeneity, quality.
Graph problems: centrality, clustering.
Social Informatics Problem: Discover emergent
communities, model spread of
information.
Challenges: new analytics routines,
uncertainty in data.
Graph problems: clustering, shortest
paths, flows.
Image sources: (1) http://physics.nmt.edu/images/astro/hst_starfield.jpg
(2,3) www.visualComplexity.com
8 | Accelerating Real-world Applications | June 14, 2011
NETWORK ANALYSIS FOR INTELLIGENCE AND SURVEILLANCE
[Krebs ‟04] Post 9/11 Terrorist Network Analysis from public domain information
Plot masterminds correctly identified from interaction patterns: centrality
A global view of entities is often more
insightful
Detect anomalous activities by
exact/approximate graph matching
Image Source: http://www.orgnet.com/hijackers.html
Image Source: T. Coffman, S. Greenblatt, S. Marcus, Graph-based technologies
for intelligence analysis, CACM, 47 (3, March 2004): pp 45-47
9 | Accelerating Real-world Applications | June 14, 2011
LOOKING AHEAD: GRAND CHALLENGE IN GRAPH ANALYTICS
Driving real-world applications are not just traditional HPC:
– health care, transportation, energy, proteomics, security, data sciences, informatics, …
FLOPS are free, Data is the challenge!
Fundamental abstractions are more irregular and complex than sparse/dense
matrices
– Very sparse, very high-dimensional data
– For example, there are few (if any) efficient distributed-memory parallel
implementations of even the simplest algorithm for sparse, arbitrary graphs!
We must integrate across algorithms, programming models, and
architectures, to address the growing requirements of grand
challenge applications
10 | Accelerating Real-world Applications | June 14, 2011
INFORMATICS GRAPHS ARE EVEN TOUGHER
Very different from graphs in scientific computing!
– Graphs can be enormous
– Power-law distribution of the number of neighbors
– Small world property – no long paths
– Very limited locality, not partitionable
– Highly unstructured
– Edges and vertices have types
Experience in scientific computing applications provides only limited insight.
Six degrees of Kevin Bacon
Source: Seokhee Hong
11 | Accelerating Real-world Applications | June 14, 2011
OPEN QUESTIONS: ALGORITHMIC KERNELS FOR
SPATIO-TEMPORAL INTERACTION NETWORKS AND GRAPHS
(STING)
Traditional graph theory:
– Graph traversal (e.g. breadth-first search)
– S-T connectivity
– Single-source shortest paths
– All-pairs shortest paths
– Spanning Tree
– Connected Components
– Biconnected Components
– Subgraph isomorphism (pattern matching)
– ….
12 | Accelerating Real-world Applications | June 14, 2011
HIERARCHY OF INTERESTING GRAPH ANALYTICS
Extend single-shot graph queries to include time.
Are there s-t paths between time T1 and T
2?
What are the important vertices at time T?
Use persistent queries to monitor properties.
Does the path between s and t shorten drastically?
Is some vertex suddenly very central?
Extend persistent queries to fully dynamic properties.
Does a small community stay independent rather than merge with larger groups?
When does a vertex jump between communities?
New types of queries, new challenges...
13 | Accelerating Real-world Applications | June 14, 2011
GRAPH ANALYTICS FOR SOCIAL NETWORKS
Are there new graph techniques? Do they parallelize? Can the
computational systems (algorithms, machines) handle massive
networks with millions to billions of individuals? Can the techniques
tolerate noisy data, massive data, streaming data, etc. …
• Communities may overlap, exhibit different properties and
sizes, and be driven by different models
– Detect communities (static or emerging)
– Identify important individuals
– Detect anomalous behavior
– Given a community, find a representative member of the
community
– Given a set of individuals, find the best community that
includes them
14 | Accelerating Real-world Applications | June 14, 2011
CENTRALITY IN MASSIVE SOCIAL NETWORK ANALYSIS
Centrality metrics: Quantitative measures to capture the importance of person in a social
network
– Betweenness is a global index related to shortest paths that traverse through the person
– Can be used for community detection as well
Identifying central nodes in large complex networks is the key metric in a number of
applications:
– Biological networks, protein-protein interactions
– Sexual networks and AIDS
– Identifying key actors in terrorist networks
– Organizational behavior
– Supply chain management
– Transportation networks
Current Social Network Analysis (SNA) packages handle 1,000‟s of entities, our techniques handle
BILLIONS (6+ orders of magnitude larger data sets)
15 | Accelerating Real-world Applications | June 14, 2011
BETWEENNESS CENTRALITY (BC)
Key metric in social network analysis [Freeman ‟77, Goh ‟02, Newman ‟03, Brandes ‟03]
: Number of shortest paths between vertices s and t
: Number of shortest paths between vertices s and t passing through v
Exact BC is compute-intensive
st
s v t V st
vBC v
)(vst
st
16 | Accelerating Real-world Applications | June 14, 2011
Vertex ID
0 1000 2000 3000 4000
Bet
wee
nnes
s ce
ntra
lity
scor
e
1e-1
1e+0
1e+1
1e+2
1e+3
1e+4
1e+5
1e+6
1e+7Exact (scatter data)
Approximate (smooth)
BETWEENNESS CENTRALITY ALGORITHMS
Brandes [2003] proposed a faster sequential algorithm for BC on sparse graphs
– time and space for weighted graphs
– time for unweighted graphs
We designed and implemented the first parallel algorithm:
– [Bader, Madduri; ICPP 2006]
Approximating Betweenness Centrality [Bader Kintali Madduri Mihail 2007]
– Novel approximation algorithm for determining the
betweenness of a specific vertex or edge in a graph
– Adaptive in the number of samples
– Empirical result: At least 20X speedup over exact BC
)(nO)log( 2 nnmnO
)(mnO
Graph: 4K vertices and 32K edges,
System: Sun Fire T2000 (Niagara 1)
17 | Accelerating Real-world Applications | June 14, 2011
RELATED WORK: PARTITIONING ALGORITHMS
FROM SCIENTIFIC COMPUTING
Theoretical and empirical evidence: existing techniques perform poorly on small-world networks
[Mihail, Papadimitriou ‟02] Spectral properties of power-law graphs are skewed in favor of high-degree vertices
[Lang ‟04] On using spectral techniques, “Cut quality varies inversely with cut balance” in social graphs: Yahoo! IM graph, DBLP collaborations
[Abou-Rjeili, Karypis ‟06] Multilevel partitioning heuristics give large edge-cut for small-world networks, new coarsening schemes necessary
18 | Accelerating Real-world Applications | June 14, 2011
BRANDES’ ALGORITHM
Brandes showed that dependency of a source vertex s on vertex v obeys the following recursion,
BC is computed in two stages
1. Iterate over all vertices to compute distance and shortest path counts from s to each vertex.
2. Revisit vertices starting from the farthest vertex and accumulate dependencies to compute
19 | Accelerating Real-world Applications | June 14, 2011
BETWEENNESS CENTRALITY USING GPGPU (JOINT WORK WITH PUSHKAR PANDE)
20 | Accelerating Real-world Applications | June 14, 2011
SEQUENTIAL IMPLEMENTATION 1.for each v V do
2. BC[v] = 0;
3.endfor
4.for each v V do
5. I. Initialization
6. succCount[t] = 0, sigma[t] = 0,
7. and d[t] = -1, t V
8. level = 0; S[level] = v;
9. count =1;
10. II. Shortest paths
11. while count > 0 do
12. count = 0
13. for each neighbor w of v S[level] do
14. if d[w] = -1 then
15. p = count++;
16. Insert w at pos p in S[level+1];
17. d[w] = level + 1;
18. if d[w] = level + 1 then
19. p = succCount[v]++;
20. Insert w at pos p of succList[v];
21. sigma[w] += sigma[v];
22. endfor
23. endwhile
24. III. Dependency accumulation by back propagation
25. level--;
26. while level > 0 do
27. for each w S[level] do
28. del[w] = 0;
29. for each v succList[w] do
30.
31. endfor
32. BC[w] += del[w];
33. endfor
34. level--;
35. endwhile
36. endfor
K. Madduri, D. Ediger, K. Jiang, D.A. Bader, and D.G. Chavarría-Miranda, “A Faster Parallel Algorithm and Efficient
Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets,” Third Workshop on Multithreaded
Architectures and Applications (MTAAP), Rome, Italy, May 29, 2009.
21 | Accelerating Real-world Applications | June 14, 2011
PARALLELIZATION STRATEGIES
Parallelism can be exploited at three levels of
granularity:
1. Coarse grained: Run multiple Breadth-First traversals in
parallel
2. Medium grained: Process different vertices of the same
level in parallel
3. Fine grained: Process the neighbors of the same vertex in
parallel
22 | Accelerating Real-world Applications | June 14, 2011
MAPPING BC ALGORITHM TO GPGPU
Using OpenCL provides a hierarchy of threads, grouped into grids and
blocks, it looks promising to exploit medium grained parallelism at the grid
level and fine-grained parallelism at the block level.
Coarse grained parallelism can be carried out but has the following
limitations:
– Memory constraints limit the amount of coarse-grained parallelism that can
be exploited. Requires more memory as more copies of data structure are
required.
– Centrality running sum has to be updated atomically.
23 | Accelerating Real-world Applications | June 14, 2011
OUTLINE OF THE HOST CODE 1. BC[v] = 0, v V 2. for each v V do
3. I. Initialization 4. level = 0; S[level] empty stack;
5. cuda_initialize (succCount[], sigma[], d[], del[]); // Kernel invocation
6. Enqueue v in S[level]; count = 1;
7. II. Shortest paths (Augmented Breadth-First traversal)
8. while count > 0 do
9. count = 0;
10. for all v S[level] in parallel do
11. cuda_traverse (v, S[], level, succList[], succCount[], sigma[], d[], &count); // Kernel invocation
12. level++;
14. III. Dependency accumulation by back propagation 15. level --; 16. chunkSize = numThreadsPerBlock;
17. while level > 0 do
18. for all w1…wchunkSize S[level] in parallel do
19. cuda_accumulate (w1…wchunkSize, S[], level, succList[], succCount[], sigma[], del[]); // Kernel invocation
20. level --;
24 | Accelerating Real-world Applications | June 14, 2011
INITIALIZATION KERNEL
cuda_initialize (succCount[], sigma[], d[], del[], initBC)
1. global (d[], sig[], del[], succCount[])
2. idx = blockIdx.x * chunkSize + threadIdx.x; // chunkSize = blockDim.x
3. if ( idx < n ) then // n = |V|
4. d[idx] = -1;
5. sigma[idx] = 0;
6. del[idx] = 0;
7. succCount[idx] = 0;
…
Thread blocks
d[], sig[], del[],
succ_count[]
0 1 2 … b
….
chunkSize = blockDim.x
25 | Accelerating Real-world Applications | June 14, 2011
GRAPH TRAVERSAL KERNEL
cuda_traverse (v, S[], level, succList[], succCount[], sigma[], d[], &count)
1. shared (v, nw, myS[], mySuccList[], myCount, mySuccCount)
2. global (G(V, E), sig[], d[], S[], succList[], count, succCount[])
3. v = S[ beg + blockIdx.x ];
4. w = ( thid < nw)? G.Ev[thid] : -1;
5. if (w != -1 and v != w ) then
6. dw = atomicCAS (&d[w], -1, level + 1);
7. if (dw == -1) then
8. p = atomicAdd (&myCount, 1);
9. myS[p] = w;
10. dw = level+ 1;
11. if (dw == level + 1) then
12. p = atomicAdd (&mySuccCount, 1)
13. mySuccList[p] = w;
14. atomicAdd(&sigma[w], sigma[v]);
15. copy_to_global (S, &count, myS, myCount);
16. copy_to_global (succList, &succCount[v], mySuccList, mySuccCount);
w1 w2 …
v1 v2 v3 vr …
v v v v v …
w1 w2 …
….
S[level]
Thread block 1 Thread block b
26 | Accelerating Real-world Applications | June 14, 2011
DEPENDENCY ACCUMULATION KERNEL
cuda_accumulate (w1…wchunkSize, S[], level, succList[], succCount[], sigma[], del[])
1. global (S[], succList[], succCount[], del[], sigma[], BC[])
2. offset = beg + blockIdx.x * chunkSize + threadIdx.x
3. w1…wchunkSize = (offset < end)? S[offset] : -1;
4. if (w != -1) then // w w1…wchunkSize
5. myDelw = 0.0;
6. nSuccw = succCount[w];
7. mySuccList = succList + |G.Ew|;
8. for j = 0 to nSuccw do
9. v = mySuccList[j];
10.
11. del[w] = myDelw;
12. BC[w] += myDelw;
0 1 2 … b ….
chunkSize = blockDim.x
Thread blocks
S[level]
27 | Accelerating Real-world Applications | June 14, 2011
TEST GRAPH INSTANCES Graph #Vertices #Edges Average
degree
Maximum
Degree
#Connected
components
syn1.gr 262,144 2,097,152 8 4,588 53,888
syn2.gr 524,288 4,194,304 8 4,063 115,376
syn3.gr 1,048,576 8,388,608 8 5,421 247,339
syn4.gr 1,000,000 10,000,000 10 10,030 536,229
syn5.gr 1,000,000 10,000,000 10 1,090 475,952
syn5.gr
Out degree
10-1 100 101 102 103 104
Freq
uenc
y
10-1
100
101
102
103
104
105
syn4.gr
Out degree
10-1 100 101 102 103 104 105
Freq
uenc
y
10-1
100
101
102
103
104
105
106
Degree distribution of syn4.gr and syn5.gr
• Unweighted – directed graphs were generated using R-MAT graph generator in SNAP[2].
• R-MAT[1] method generates graphs that match a property of the real world graphs like small diameter and the power law degree distribution.
1. D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In SDM, 2004.
2. D.A. Bader and K. Madduri, “SNAP, Small-world Network Analysis and Partitioning: an open-source parallel graph framework for the exploration of large-scale networks,” 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS), Miami, FL, April 14-18, 2008.
28 | Accelerating Real-world Applications | June 14, 2011
PRELIMINARY PERFORMANCE RESULTS
Graph #Vertices #Edges CPU Time (s) GPU Time (s) Speed Up
syn1.gr 262,144 2,097,152 26.0 20.7 1.26
syn2.gr 524,288 4,194,304 89.9 49.9 1.80
syn3.gr 1,048,576 8,388,608 193.2 87.1 2.22
syn4.gr 1,000,000 10,000,000 92.2 44.0 2.09
syn5.gr 1,000,000 10,000,000 252.2 79.7 3.16
• The computation was carried out for a subset of all-pairs shortest paths
by using 500 source vertices.
CPU : Intel(R) Core(TM)2 CPU 6400
• Clock rate : 2.13 GHz
• Cache : 2MB
• Memory : 2GB
GPU: Tesla C1060 with 30 MP x 8 cores/MP
• Clock rate : 1.3 GHz
• Shared Mem per block : 16 KB
• Memory : 4 GB
Runtime on CPU and GPU in seconds (with global memory stack and successor lists)
29 | Accelerating Real-world Applications | June 14, 2011
PERFORMANCE RESULTS
Graph #Vertices #Edges CPU Time (s) GPU Time (s) Speed Up
syn1.gr 262,144 2,097,152 26.0 20.7 10.3 2.52
syn2.gr 524,288 4,194,304 89.9 49.9 23.4 3.84
syn3.gr 1,048,576 8,388,608 193.2 87.1 40.2 4.80
syn4.gr 1,000,000 10,000,000 92.2 44.0 22.3 4.14
syn5.gr 1,000,000 10,000,000 252.2 79.7 24.7 10.22
Using shared memory for stack and successor
lists improves the performance by more than 2x
31 | Accelerating Real-world Applications | June 14, 2011
PERFORMANCE BOTTLENECK
The implementation exploits parallelism at the block level by distributing vertices at
each level of traversal to different blocks and distributing vertices to threads for
dependency accumulation.
Major Performance Bottlenecks:
– Load imbalance due to power law degree distribution causes different thread blocks in a
kernel to execute at different speeds. Distributing edges for the frontier to be examined
instead of the vertices can improve the load balance.
– Irregular memory access makes it difficult to realize coalesced memory access, thus
constrains effective memory bandwidth.
– Atomic operations on global memory are slow, but they are necessary for exploiting fine-
grained parallelism in graph algorithms.
32 | Accelerating Real-world Applications | June 14, 2011
PERFORMANCE SUMMARY (WORK IN PROGRESS)
33 | Accelerating Real-world Applications | June 14, 2011
FURTHER IMPROVEMENTS: LOAD BALANCE
Frontier Pre-processing • Pre-process frontier for balanced load
• Split adjacencies into „warpSize‟ chunks
• Observed best performance with „half-warp‟ sized
chunks.
• Each instruction is fetched and executed in parallel over 4 cycles for 32 data
elements (a warp)
• Global memory request for a warp are split into two, one each per half warp
34 | Accelerating Real-world Applications | June 14, 2011
FURTHER IMPROVEMENT:
DEPENDENCY ACCUMULATION PER WARP
35 | Accelerating Real-world Applications | June 14, 2011
CONCLUSION AND FUTURE WORK
GPGPU is a promising accelerator for real-world applications
Data-intensive irregular applications are a suitable candidate for parallelization
– Use of shared memory improved performance by as much as 3x.
Demonstration of high-performance betweenness centrality, a key kernel in the DARPA
HPCS SSCA2 Benchmark, highlights the advantage of a GPU-accelerated system for an
important application.
36 | Accelerating Real-world Applications | June 14, 2011
COLLABORATORS AND ACKNOWLEDGMENTS
Jason Riedy, Research Scientist, (Georgia Tech)
Graduate Students (Georgia Tech):
– Seunghwa Kang (PNNL)
– David Ediger
– Karl Jiang
– Pushkar Pande
Bader PhD Graduates:
– Kamesh Madduri (Lawrence Berkeley National Lab Penn State)
– Guojing Cong (IBM TJ Watson Research Center)
John Feo and Daniel Chavarría-Miranda (Pacific Northwest Nat‟l Lab)
37 | Accelerating Real-world Applications | June 14, 2011
RELATED RECENT PUBLICATIONS (2008-2010)
D.A. Bader and K. Madduri, “SNAP, Small-world Network Analysis and Partitioning: an open-source parallel graph framework for the
exploration of large-scale networks,” 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS), Miami, FL, April 14-18,
2008.
S. Kang, D.A. Bader, “An Efficient Transactional Memory Algorithm for Computing Minimum Spanning Forest of Sparse Graphs,” 14th
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Raleigh, NC, February 2009.
Karl Jiang, David Ediger, and David A. Bader. “Generalizing k-Betweenness Centrality Using Short Paths and a Parallel Multithreaded
Implementation.” The 38th International Conference on Parallel Processing (ICPP), Vienna, Austria, September 2009.
Kamesh Madduri, David Ediger, Karl Jiang, David A. Bader, Daniel Chavarría-Miranda. “A Faster Parallel Algorithm and Efficient
Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets.” 3rd Workshop on Multithreaded
Architectures and Applications (MTAAP), Rome, Italy, May 2009.
David A. Bader, et al. “STINGER: Spatio-Temporal Interaction Networks and Graphs (STING) Extensible Representation.” 2009.
David Ediger, Karl Jiang, E. Jason Riedy, and David A. Bader. “Massive Streaming Data Analytics: A Case Study with Clustering
Coefficients,” Fourth Workshop in Multithreaded Architectures and Applications (MTAAP), Atlanta, GA, April 2010.
Seunghwa Kang, David A. Bader. “Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce cluster and a
Highly Multithreaded System:,” Fourth Workshop in Multithreaded Architectures and Applications (MTAAP), Atlanta, GA, April 2010.
David Ediger, Karl Jiang, Jason Riedy, David A. Bader, Courtney Corley, Rob Farber and William N. Reynolds. “Massive Social Network
Analysis: Mining Twitter for Social Good,” The 39th International Conference on Parallel Processing (ICPP 2010), San Diego, CA, September
2010.
Virat Agarwal, Fabrizio Petrini, Davide Pasetto and David A. Bader. “Scalable Graph Exploration on Multicore Processors,” The 22nd IEEE
and ACM Supercomputing Conference (SC10), New Orleans, LA, November 2010.
38 | Accelerating Real-world Applications | June 14, 2011
RELATED RECENT PUBLICATIONS (2005-2007) D.A. Bader, G. Cong, and J. Feo, “On the Architectural Requirements for Efficient Execution of Graph Algorithms,” The 34th International Conference
on Parallel Processing (ICPP 2005), pp. 547-556, Georg Sverdrups House, University of Oslo, Norway, June 14-17, 2005.
D.A. Bader and K. Madduri, “Design and Implementation of the HPCS Graph Analysis Benchmark on Symmetric Multiprocessors,” The 12th
International Conference on High Performance Computing (HiPC 2005), D.A. Bader et al., (eds.), Springer-Verlag LNCS 3769, 465-476, Goa, India, December
2005.
D.A. Bader and K. Madduri, “Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2,” The 35th
International Conference on Parallel Processing (ICPP 2006), Columbus, OH, August 14-18, 2006.
D.A. Bader and K. Madduri, “Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks,” The 35th International Conference on Parallel
Processing (ICPP 2006), Columbus, OH, August 14-18, 2006.
K. Madduri, D.A. Bader, J.W. Berry, and J.R. Crobak, “Parallel Shortest Path Algorithms for Solving Large-Scale Instances,” 9th DIMACS Implementation
Challenge -- The Shortest Path Problem, DIMACS Center, Rutgers University, Piscataway, NJ, November 13-14, 2006.
K. Madduri, D.A. Bader, J.W. Berry, and J.R. Crobak, “An Experimental Study of A Parallel Shortest Path Algorithm for Solving Large-Scale Graph
Instances,” Workshop on Algorithm Engineering and Experiments (ALENEX), New Orleans, LA, January 6, 2007.
J.R. Crobak, J.W. Berry, K. Madduri, and D.A. Bader, “Advanced Shortest Path Algorithms on a Massively-Multithreaded Architecture,” First Workshop
on Multithreaded Architectures and Applications (MTAAP), Long Beach, CA, March 30, 2007.
D.A. Bader and K. Madduri, “High-Performance Combinatorial Techniques for Analyzing Massive Dynamic Interaction Networks,” DIMACS Workshop
on Computational Methods for Dynamic Interaction Networks, DIMACS Center, Rutgers University, Piscataway, NJ, September 24-25, 2007.
D.A. Bader, S. Kintali, K. Madduri, and M. Mihail, “Approximating Betewenness Centrality,” The 5th Workshop on Algorithms and Models for the Web-
Graph (WAW2007), San Diego, CA, December 11-12, 2007.
David A. Bader, Kamesh Madduri, Guojing Cong, and John Feo, “Design of Multithreaded Algorithms for Combinatorial Problems,” in S. Rajasekaran and
J. Reif, editors, Handbook of Parallel Computing: Models, Algorithms, and Applications, CRC Press, Chapter 31, 2007.
Kamesh Madduri, David A. Bader, Jonathan W. Berry, Joseph R. Crobak, and Bruce A. Hendrickson, “Multithreaded Algorithms for Processing Massive
Graphs,” in D.A. Bader, editor, Petascale Computing: Algorithms and Applications, Chapman & Hall / CRC Press, Chapter 12, 2007.
39 | Accelerating Real-world Applications | June 14, 2011
ACKNOWLEDGMENT OF SUPPORT
40 | Accelerating Real-world Applications | June 14, 2011
Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions
and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not
limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases,
product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no
obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to
make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes.
NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO
RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS
INFORMATION.
ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY
DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT,
SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED
HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in
this presentation are for informational purposes only and may be trademarks of their respective owners.
The contents of this presentation were provided by individual(s) and/or company listed on the title page. The information and
opinions presented in this presentation may not represent AMD‟s positions, strategies or opinions. Unless explicitly stated, AMD
is not responsible for the content herein and no endorsements are implied.
Recommended