Upload
braedon-wayment
View
227
Download
0
Tags:
Embed Size (px)
Citation preview
Dept. of Computer ScienceRutgers
Node and Graph Similarity: Theory and Applications
Danai Koutra (CMU)Tina Eliassi-Rad (Rutgers) Christos Faloutsos (CMU)
ICDM 2014, Monday December 15th 2014, Shenzhen, ChinaCopyright for the tutorial materials is held by the authors. The authors grant IEEE ICDM
permission to distribute the materials through its website.
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Part 2aGraph Similarity: known node
correspondence
2
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
What to remember• Numerous applications:
– Network monitoring, anomaly detection, network intrusion, behavioral studies
• Although seems easy problem, it’s not!– Some measures are counter-intuitive.– DeltaCon [Koutra+, SDM’13] (based on node
proximity) satisfies several intuitive properties. • There are multiple measures, but which one
to use?– Depends on the application!– Good news according to the guide of
[Soundarajan+, SDM’14]!3
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Roadmap• Known node correspondence
– Simple features– Complex features– Visualization– Summary
• Unknown node correspondence
4
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Problem Definition:Graph Similarity
• Given: (i) 2 graphs with the same nodes and different edge sets (ii) node correspondence• Find: similarity score s [0,1]
GA
GB
5
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Problem Definition:Graph Similarity
• Given: (a) 2 graphs with the same nodes and different edge sets (b) node correspondence• Find: similarity score, s [0,1]
s = 0: GA <> GB
s = 1: GA == GB
GA
GB
6
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Applications
Discontinuity Detection
Day 1 Day 2 Day 3 Day 4 Day 5
2
Classification1
different brain wiring?
7
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Applications
Intrusion detection4
Behavioral Patterns3
FB message graph vs. wall-to-wall network
8
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Roadmap• Known node correspondence
– Simple features– Complex features– Visualization– Summary
• Unknown node correspondence
9
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Is there any obvious solution?
10
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
One Solution
Edge Overlap(EO)
# of common edges (normalized or not)
GA
GB
11
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
… but “barbell”…
EO(B10,mB10) == EO(B10,mmB10)
GA GA
GB GB’
12
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Other solutions?
13
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
1. “… they share many vertices and/or edges”
2. “… the rankings of their vertices are similar.” VR = rank correlation of node pagerank
3. “… their edge weights are similar.”
GA GB
Vertex/Edge OverlapO(|V|+|V’|+|E|+|E’|)
Vertex RankingO(|V|+|V’|)
Similar if …
Weighted distanceO(|E|+|E’|)
14[Papadimitriou, Dasdan, Garcia-Molina ’10; Bunke ‘06,
Shoubridge+ ’02, Dickinson+ ’04]14
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
4. “… they have similar subgraphs.”
5. “… if we need few node/edge additions/deletions to transform GA to GB”
GA GB
Similar if …
Maximum Common SubgraphNP-complete
(weighted) Graph Edit Distance
Vertex MCS Distance Edge MCS Distance
[Bunke ‘06, Shoubridge+ ’02, Dickinson+ ’04; [Bunke+ ’98, ’06, Riesen ’09, Gao ’10, Fankhauser ’11; Kapsabelis+ ’07]
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
6. “… they have similar fingerprints.” b-bit fingerprint of GA:
b-bit fingerprint of GB:
Hamming Distance: 1
GA GB
Similar if …
Signature similarity
1 0 1 0 1
0 0 1 0 1
[Papadimitriou, Dasdan, Garcia-Molina ‘10]
16
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Event Detection
[Bunke+ ’06]
MC
S D
ista
nce
(|
G|=
|V|)
day
17
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Application: Web graph anomaly detection
[Papadimitriou, Dasdan, Garcia-Molina ‘10]
18
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Roadmap• Known node correspondence
– Simple features– Complex features– Visualization– Summary
• Unknown node correspondence
19
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Graph Kernels: Idea
1) Compute graph substructures in poly time2) Compare them to find sim(GA, GB)
Source: http://mloss.org/software/view/139/
GA GBsim(GA, GB)
GA
GB
[Vishwanathan]
20
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Fast Subtree Kernel
[Shervashidze+ ’09 NIPS, JMLR’11] O(m h) per graph pair
Sorted list of neighborsLabeled graphs
Label compression (hash func. on sorted strings)
Relabeling
Weisfeiler-Lehman algorithm
test forisomorphism
21
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Graph kernels: Applications
[Ralaivola+ ’05, Borgwardt+ ’05]Source: http://www.ra.cs.uni-tuebingen.de/forschung/molsim/welcome_e.html
Aligning chemical compounds
Functionprediction
22
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Other Graph Kernels• RWR [Kashima+ ’03,
Gaertner+ ’03, Vishwanathan
’10]• Shortest path kernels [Borgwardt &
Kriegel ’05]• Cyclic path kernels
[Horvath+ ’04]• Depth-first search kernels
[Swamidass+ ’05]• Subtree kernels [Shervashidze+
’09 NIPS, JMLR’11 , Ralaivola+ ’05]• Graphlet / Subgraph kernels
[Shervashidze+ ’09, Thoma+
’10]• All-paths kernels [Airola+
’08]• …
23
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
… Many similarity functions can be defined…
What properties should
a good similarityfunction have?
24
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Axioms
A1. Identity property sim( , ) = 1
A2. Symmetric property sim( , ) = sim( , )
A3. Zero propertysim( , ) = 0
[Koutra, Faloutsos, Vogelstein ‘13]
25
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Desired Properties
• Intuitiveness
P1. Edge ImportanceP2. Weight AwarenessP3. Edge-“Submodularity”P4. Focus Awareness
• Scalability
[Koutra, Faloutsos, Vogelstein ‘13]
26
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Desired Properties
• Intuitiveness
P1. Edge ImportanceP2. Weight AwarenessP3. Edge-“Submodularity”P4. Focus Awareness
• ScalabilityCreation of disconnected components matters more than small connectivity changes.
[Koutra, Faloutsos, Vogelstein ‘13]
27
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Desired Properties
• Intuitiveness
P1. Edge ImportanceP2. Weight AwarenessP3. Edge-“Submodularity”P4. Focus Awareness
• ScalabilityThe bigger the edge weight, the more the edge change matters.
w=5
w=1✗
✗
[Koutra, Faloutsos, Vogelstein ‘13]
28
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Desired Properties
• Intuitiveness
P1. Edge ImportanceP2. Weight AwarenessP3. Edge-“Submodularity”P4. Focus Awareness
• Scalability“Diminishing Returns”: The
sparser the graphs, the more important
is a ‘’fixed’’ change.
n=5GA
GA
GB
GB
[Koutra, Faloutsos, Vogelstein ‘13]
29
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Desired Properties
• Intuitiveness
P1. Edge ImportanceP2. Weight AwarenessP3. Edge-“Submodularity”P4. Focus Awareness
• Scalability Targeted changes are more important
than random changes of the same extent.
GA
targete
dGB’
random
GB
[Koutra, Faloutsos, Vogelstein ‘13]
30
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
How do state-of-the-art methods fare?
Metric P1 P2 P3 P4
Vertex/Edge Overlap ✗ ✗ ✗ ?
Graph Edit Distance (XOR) ✗ ✗ ✗ ?
Signature Similarity ✗ ✔ ✗ ?
λ-distance (adjacency matrix)
✗ ✔ ✗ ?
λ-distance (graph laplacian)
✗ ✔ ✗ ?
λ-distance (normalized lapl.)
✗ ✔ ✗ ?
importance weight returns focus
[Koutra, Vogelstein, Faloutsos ‘13]
31
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Is there a method that satisfies the properties?
Yes! DeltaCon
32
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
DELTACON
SA = SB =
DETAILS
① Find the pairwise node influence, SA & SB.
② Find the similarity between SA & SB.
[Koutra, Faloutsos, Vogelstein ‘13]
33
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
How? Using FaBP.
•Sound theoretical background (MLE on marginals)•Attenuating Neighboring Influence for small ε: 1-hop 2-hops …
Note: ε > ε2 > ..., 0<ε<1
INTUITION
34
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
OUR SOLUTION: DELTACONDETAI
LS
① Find the pairwise node influence, SA & SB.
② Find the similarity between SA & SB.SA,SB
SB =SA =
sim(SA , SB) = 0.3
[Koutra, Faloutsos, Vogelstein ‘13]
35
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
… but O(n2) …
f a s t e r ?
1
4
2
3
in the paper
http://www.cs.cmu.edu/~dkoutra/CODE/deltacon.zip[Koutra, Faloutsos, Vogelstein ‘13]
36
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Comparison of methods revisited
Metric P1 P2 P3 P4
Vertex/Edge Overlap ✗ ✗ ✗ ?
Graph Edit Distance (XOR) ✗ ✗ ✗ ?
Signature Similarity ✗ ✔ ✗ ?
DELTACON0✔ ✔ ✔ ✔
DELTACON ✔ ✔ ✔ ✔
edge weight returns focus
[Koutra, Faloutsos, Vogelstein ‘13]
37
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
• Nodes: employees• Edges: email exchange
Day 1 Day 2 Day 3 Day 4 Day 5
sim1 sim2 sim3 sim4
Temporal Anomaly Detection
[Koutra, Faloutsos, Vogelstein ‘13]
38
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
sim
ilari
ty
consecutive days
Feb 4: Lay resigns
Temporal Anomaly Detection
[Koutra, Faloutsos, Vogelstein ‘13]
39
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Brain Connectivity Graph Clustering
• 114 brain graphs– Nodes: 70 cortical regions– Edges: connections
• Attributes: gender, IQ, age…
[Koutra, Faloutsos, Vogelstein ‘13]
40
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Brain Connectivity Graph Clustering
t-test p-value = 0.0057 [Koutra, Faloutsos, Vogelstein ‘13]
41
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Roadmap• Known node correspondence
– Simple features– Complex features– Visualization– Summary
• Unknown node correspondence
42
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Tested Visual Encodings
[Alper+ ’13, CHI]
Augmenting the graphs /adjacency matrices to show the differences.
User Study Result:
For bigger and sparser graphs, matrices are better.
40-80 nodes
low density
43
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
More on visualization
• For large graphs HoneyComb [van Ham+ ’09]
• Reference graph [Andrews ’09]
• Interactive comparison [Hascoet+ ’12]• General principles
[Gleicher+ ’11]• …
44
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Roadmap• Known node correspondence
– Simple features– Complex features– Visualization– Summary
• Unknown node correspondence
45
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
A Guide to Selecting a Measure
[Soundarajan, Gallagher, Eliassi-Rad. SDM’14]
H15
H1
H20
H2
Hk
…
46
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Q1 Q2
Q3
Much higher than
expected!
Some complex methods are very similar to simpler
methods
NetSimile, RWR often
close to consensus
[Soundarajan, Gallagher, Eliassi-Rad. SDM’14]
A Guide to Selecting a Measure
Are the graph similarity methods
correlated?
Are there groupsof methods that
behave comparably?
How can weget a singleconsensus method?
RWR ≈BP≈SSL[Koutra+ PKDD’11]
47
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
Summary• Numerous applications:
– Network monitoring, anomaly detection, network intrusion, behavioral studies
• Although seems easy problem, it’s not!– Some measures are counter-intuitive.– DeltaCon [Koutra+, SDM’13] (based on node
proximity) satisfies several intuitive properties. • There are multiple measures, but which one
to use?– Depends on the application!– Good news according to the guide of
[Soundarajan+, SDM’14]!48
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
References•S. Soundarajan and B. Gallagher, T. Eliassi-Rad. 2014. A Guide to Selecting a Network Similarity Method. SDM 2014. •D. Koutra, J.T. Vogelstein, C. Faloutsos. 2013. DELTACON: A Principled Massive-Graph Similarity Function. SDM 2013: 162-170. [CODE]•Stefan Fankhauser, Kaspar Riesen, and Horst Bunke. 2011. Speeding up graph edit distance computation through fast bipartite matching. In GbRPR'11.•Xinbo Gao, Bing Xiao, Dacheng Tao, and Xuelong Li. 2010. A survey of graph edit distance. Pattern Anal. Appl. 13, 1 (January 2010), 113-129.•Papadimitriou, Panagiotis and Dasdan, Ali and Garcia-Molina, Hector (2010). Web Graph Similarity for Anomaly Detection. Journal of Internet Services and Applications, Volume 1 (1). pp. 19-30. 49
(In reverse chronological order)
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
References•Kaspar Riesen and Horst Bunke. 2009. Approximate graph edit distance computation by means of bipartite graph matching.•Kelly Marie Kapsabelis, Peter John Dickinson, Kutluyil Dogancay. Investigation of graph edit distance cost functions for detection of network anomalies. ANZIAM J. 48 (CTAC2006) pp.436–449, 2007.•H. Bunke, P. J. Dickinson, M. Kraetzl, and W. D. Wallis. A Graph-Theoretic Approach to Enterprise Network Dynamics (PCS). Birkhauser, 2006.•Shoubridge P., Kraetzl M., Wallis W. D., Bunke H. Detection of Abnormal Change in a Time Series of Graphs. Journal of Interconnection Networks (JOIN) 3(1-2):85-101, 2002. •Horst Bunke and Kim Shearer. 1998. A graph distance metric based on the maximal common subgraph. Pattern Recogn. Lett. 19, 3-4 (March 1998), 255-259. 50
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
References•Kelmans, A. 1976. Comparison of graphs by their number of spanning trees. Discrete Mathematics 16, 3, 241 – 261.
Kernels (for more references, check slide 22)
•U. Kang, H. Tong, and J. Sun. Fast random walk graph kernel. in SDM, 2012.•Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, and Karsten M. Borgwardt. 2011. Weisfeiler-Lehman Graph Kernels. J. Mach. Learn. Res. 12, 2539-2561.•N. Shervashidze and K. M. Borgwardt. Fast subtree kernels on graphs. In Advances in Neural Information Processing Systems, pages 1660–1668, 2009. •Airola, A., Pyysalo, S., Björne, J., Pahikkala, T., Ginter, F., & Salakoski, T. (2008). All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning. BMC Bioinformatics C7 - S2, 9(Suppl 11).
51
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
ReferencesVisualization•Basak Alper, Benjamin Bach, Nathalie Henry Riche, Tobias Isenberg, and Jean-Daniel Fekete. 2013. Weighted graph comparison techniques for brain connectivity analysis. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13).•Mountaz Hascoët and Pierre Dragicevic. 2012. Interactive graph matching and visual comparison of graphs and clustered graphs. In Proceedings of the International Working Conference on Advanced Visual Interfaces (AVI '12).•Michael Gleicher, Danielle Albers, Rick Walker, Ilir Jusufi, Charles D. Hansen, and Jonathan C. Roberts. 2011. Visual comparison for information visualization.•Andrews, K., Wohlfahrt, M., and Wurzinger, G. 2009. Visual graph comparison. In Information Visualisation, 2009 13th International Conference. 62 –67.
52
D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial
References•Frank Ham, Hans-Jörg Schulz, and Joan M. Dimicco. 2009. Honeycomb: Visual Analysis of Large Scale Social Networks. In Proceedings of the 12th IFIP TC 13 International Conference on Human-Computer Interaction: Part II (INTERACT '09)
53