Upload
madison-spencer
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Graph Algorithms for Irregular, Unstructured Data
John Feo
Center for Adaptive Supercomputing SoftwarePacific Northwest National Laboratory
July, 2010
Analytic methods and applications
Community thought leaders
Blog Analysis
Community Activities
FaceBook - 300 M users
Connect-the-dots
Bus
Hayashi
Zaire
Train
Anthrax
MoneyEndo
National Security
People, Places, & Actions
Semantic Web
Anomaly detection
Security
N-x contingency analysis
SmartGrid
Data analytics
Sample queries: Allegiance switching: identify entities that switch communities.Community structure: identify the genesis and dissipation of communitiesPhase change: identify significant change in the network structure
Traditional graph partitioning often fails:Topology: Interaction graph is low-diameter and has no good separatorsIrregularity: Communities are not uniform in sizeOverlap: individuals are members of one or more communities
1000x growth
in 3 years!
has more than 300 million active users
Graphs are not grids
Graphs arising in informatics are very different from the grids used in scientific computing
Static or slowly involving
Planar
Nearest neighbor communication
Work performed per cell or node
Work modifies local data
Scientific Grids
Dynamic
Non-planar
Communications are non-local and dynamic
Work performed by crawlers or autonomous agents
Work modifies data in many places
Graphs for Data Informatics
Small-world and scale-free
In low diameter graphswork explodes
difficult to partition
high percentage of nodes are visited
“Six degrees of separation”
Large hubs are in grey
In scale-free graphs difficult to partition
work concentrates in a few nodes
PathsShortest path
Betweenness
Min/max flow
StructuresSpanning trees
Connected components
Graph isomorphism
GroupsMatching/Coloring
Partitioning
Equivalence
Graph methods
Influential FactorsDegree distribution
Normal
Scale-free
Planar or non-planar
Static or dynamic
Weighted or unweightedWeight distribution
Typed or untyped edges
Load imbalanceNon-planar
Concurrent insertsand deletions
Difficult to partition
Challenges
Problem sizeTon of bytes, not ton of flops
Little data locality
Have only parallelism to tolerate latencies
Low computation to communication ratioSingle word access
Threads limited by loads and stores
Frequent synchronizationNode, edge, record
Work tends to be dynamic and imbalancedLet any processor execute any thread
Grids, Uniform, and Scale-Free GraphsUSA Roadmap
Uniform
Scale-Free
METIS Partitioner
System requirements
Global shared memoryNo simple data partitions
Local storage for thread private data
Network support for single word accessesTransfer multiple words when locality exists
Multi-threaded processorsHide latency with parallelism
Single cycle context switching
Multiple outstanding loads and stores per thread
Full-and-empty bitsEfficient synchronization
Wait in memory
Message driven operationsDynamic work queues
Hardware support for thread migration
Cray XMT
Center for Adaptive Supercomputer Software
Driving Development of
Next-Generation Massively Multithreading ArchitecturesSponsored by DODSponsored by DOD
Summary
The new HPC is irregular and sparse
There are commercial and consumer applications
If the applications are important enough, machines will be built
HPC is too large and too diverse for “one size fits all”
We need to build the right machines for the problems we have to solve