Dept. of Computer Science Rutgers Node and Graph Similarity: Theory and Applications Danai Koutra...

Preview:

Citation preview

Dept. of Computer ScienceRutgers

Node and Graph Similarity: Theory and Applications

Danai Koutra (CMU)Tina Eliassi-Rad (Rutgers) Christos Faloutsos (CMU)

ICDM 2014, Monday December 15th 2014, Shenzhen, ChinaCopyright for the tutorial materials is held by the authors.  The authors grant IEEE ICDM

permission to distribute the materials through its website.

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Part 2aGraph Similarity: known node

correspondence

2

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

What to remember• Numerous applications:

– Network monitoring, anomaly detection, network intrusion, behavioral studies

• Although seems easy problem, it’s not!– Some measures are counter-intuitive.– DeltaCon [Koutra+, SDM’13] (based on node

proximity) satisfies several intuitive properties. • There are multiple measures, but which one

to use?– Depends on the application!– Good news according to the guide of

[Soundarajan+, SDM’14]!3

Danai Koutra
Maybe take out the applications and move them to part 3?

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Roadmap• Known node correspondence

– Simple features– Complex features– Visualization– Summary

• Unknown node correspondence

4

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Problem Definition:Graph Similarity

• Given: (i) 2 graphs with the same nodes and different edge sets (ii) node correspondence• Find: similarity score s [0,1]

GA

GB

5

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Problem Definition:Graph Similarity

• Given: (a) 2 graphs with the same nodes and different edge sets (b) node correspondence• Find: similarity score, s [0,1]

s = 0: GA <> GB

s = 1: GA == GB

GA

GB

6

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Applications

Discontinuity Detection

Day 1 Day 2 Day 3 Day 4 Day 5

2

Classification1

different brain wiring?

7

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Applications

Intrusion detection4

Behavioral Patterns3

FB message graph vs. wall-to-wall network

8

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Roadmap• Known node correspondence

– Simple features– Complex features– Visualization– Summary

• Unknown node correspondence

9

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Is there any obvious solution?

10

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

One Solution

Edge Overlap(EO)

# of common edges (normalized or not)

GA

GB

11

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

… but “barbell”…

EO(B10,mB10) == EO(B10,mmB10)

GA GA

GB GB’

12

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Other solutions?

13

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

1. “… they share many vertices and/or edges”

2. “… the rankings of their vertices are similar.” VR = rank correlation of node pagerank

3. “… their edge weights are similar.”

GA GB

Vertex/Edge OverlapO(|V|+|V’|+|E|+|E’|)

Vertex RankingO(|V|+|V’|)

Similar if …

Weighted distanceO(|E|+|E’|)

14[Papadimitriou, Dasdan, Garcia-Molina ’10; Bunke ‘06,

Shoubridge+ ’02, Dickinson+ ’04]14

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

4. “… they have similar subgraphs.”

5. “… if we need few node/edge additions/deletions to transform GA to GB”

GA GB

Similar if …

Maximum Common SubgraphNP-complete

(weighted) Graph Edit Distance

Vertex MCS Distance Edge MCS Distance

[Bunke ‘06, Shoubridge+ ’02, Dickinson+ ’04; [Bunke+ ’98, ’06, Riesen ’09, Gao ’10, Fankhauser ’11; Kapsabelis+ ’07]

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

6. “… they have similar fingerprints.” b-bit fingerprint of GA:

b-bit fingerprint of GB:

Hamming Distance: 1

GA GB

Similar if …

Signature similarity

1 0 1 0 1

0 0 1 0 1

[Papadimitriou, Dasdan, Garcia-Molina ‘10]

16

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Event Detection

[Bunke+ ’06]

MC

S D

ista

nce

(|

G|=

|V|)

day

17

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Application: Web graph anomaly detection

[Papadimitriou, Dasdan, Garcia-Molina ‘10]

18

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Roadmap• Known node correspondence

– Simple features– Complex features– Visualization– Summary

• Unknown node correspondence

19

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Graph Kernels: Idea

1) Compute graph substructures in poly time2) Compare them to find sim(GA, GB)

Source: http://mloss.org/software/view/139/

GA GBsim(GA, GB)

GA

GB

[Vishwanathan]

20

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Fast Subtree Kernel

[Shervashidze+ ’09 NIPS, JMLR’11] O(m h) per graph pair

Sorted list of neighborsLabeled graphs

Label compression (hash func. on sorted strings)

Relabeling

Weisfeiler-Lehman algorithm

test forisomorphism

21

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Graph kernels: Applications

[Ralaivola+ ’05, Borgwardt+ ’05]Source: http://www.ra.cs.uni-tuebingen.de/forschung/molsim/welcome_e.html

Aligning chemical compounds

Functionprediction

22

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Other Graph Kernels• RWR [Kashima+ ’03,

Gaertner+ ’03, Vishwanathan

’10]• Shortest path kernels [Borgwardt &

Kriegel ’05]• Cyclic path kernels

[Horvath+ ’04]• Depth-first search kernels

[Swamidass+ ’05]• Subtree kernels [Shervashidze+

’09 NIPS, JMLR’11 , Ralaivola+ ’05]• Graphlet / Subgraph kernels

[Shervashidze+ ’09, Thoma+

’10]• All-paths kernels [Airola+

’08]• …

23

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

… Many similarity functions can be defined…

What properties should

a good similarityfunction have?

24

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Axioms

A1. Identity property sim( , ) = 1

A2. Symmetric property sim( , ) = sim( , )

A3. Zero propertysim( , ) = 0

[Koutra, Faloutsos, Vogelstein ‘13]

25

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Desired Properties

• Intuitiveness

P1. Edge ImportanceP2. Weight AwarenessP3. Edge-“Submodularity”P4. Focus Awareness

• Scalability

[Koutra, Faloutsos, Vogelstein ‘13]

26

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Desired Properties

• Intuitiveness

P1. Edge ImportanceP2. Weight AwarenessP3. Edge-“Submodularity”P4. Focus Awareness

• ScalabilityCreation of disconnected components matters more than small connectivity changes.

[Koutra, Faloutsos, Vogelstein ‘13]

27

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Desired Properties

• Intuitiveness

P1. Edge ImportanceP2. Weight AwarenessP3. Edge-“Submodularity”P4. Focus Awareness

• ScalabilityThe bigger the edge weight, the more the edge change matters.

w=5

w=1✗

[Koutra, Faloutsos, Vogelstein ‘13]

28

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Desired Properties

• Intuitiveness

P1. Edge ImportanceP2. Weight AwarenessP3. Edge-“Submodularity”P4. Focus Awareness

• Scalability“Diminishing Returns”: The

sparser the graphs, the more important

is a ‘’fixed’’ change.

n=5GA

GA

GB

GB

[Koutra, Faloutsos, Vogelstein ‘13]

29

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Desired Properties

• Intuitiveness

P1. Edge ImportanceP2. Weight AwarenessP3. Edge-“Submodularity”P4. Focus Awareness

• Scalability Targeted changes are more important

than random changes of the same extent.

GA

targete

dGB’

random

GB

[Koutra, Faloutsos, Vogelstein ‘13]

30

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

How do state-of-the-art methods fare?

Metric P1 P2 P3 P4

Vertex/Edge Overlap ✗ ✗ ✗ ?

Graph Edit Distance (XOR) ✗ ✗ ✗ ?

Signature Similarity ✗ ✔ ✗ ?

λ-distance (adjacency matrix)

✗ ✔ ✗ ?

λ-distance (graph laplacian)

✗ ✔ ✗ ?

λ-distance (normalized lapl.)

✗ ✔ ✗ ?

importance weight returns focus

[Koutra, Vogelstein, Faloutsos ‘13]

31

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Is there a method that satisfies the properties?

Yes! DeltaCon

32

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

DELTACON

SA = SB =

DETAILS

① Find the pairwise node influence, SA & SB.

② Find the similarity between SA & SB.

[Koutra, Faloutsos, Vogelstein ‘13]

33

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

How? Using FaBP.

•Sound theoretical background (MLE on marginals)•Attenuating Neighboring Influence for small ε: 1-hop 2-hops …

Note: ε > ε2 > ..., 0<ε<1

INTUITION

34

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

OUR SOLUTION: DELTACONDETAI

LS

① Find the pairwise node influence, SA & SB.

② Find the similarity between SA & SB.SA,SB

SB =SA =

sim(SA , SB) = 0.3

[Koutra, Faloutsos, Vogelstein ‘13]

35

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

… but O(n2) …

f a s t e r ?

1

4

2

3

in the paper

http://www.cs.cmu.edu/~dkoutra/CODE/deltacon.zip[Koutra, Faloutsos, Vogelstein ‘13]

36

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Comparison of methods revisited

Metric P1 P2 P3 P4

Vertex/Edge Overlap ✗ ✗ ✗ ?

Graph Edit Distance (XOR) ✗ ✗ ✗ ?

Signature Similarity ✗ ✔ ✗ ?

DELTACON0✔ ✔ ✔ ✔

DELTACON ✔ ✔ ✔ ✔

edge weight returns focus

[Koutra, Faloutsos, Vogelstein ‘13]

37

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

• Nodes: employees• Edges: email exchange

Day 1 Day 2 Day 3 Day 4 Day 5

sim1 sim2 sim3 sim4

Temporal Anomaly Detection

[Koutra, Faloutsos, Vogelstein ‘13]

38

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

sim

ilari

ty

consecutive days

Feb 4: Lay resigns

Temporal Anomaly Detection

[Koutra, Faloutsos, Vogelstein ‘13]

39

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Brain Connectivity Graph Clustering

• 114 brain graphs– Nodes: 70 cortical regions– Edges: connections

• Attributes: gender, IQ, age…

[Koutra, Faloutsos, Vogelstein ‘13]

40

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Brain Connectivity Graph Clustering

t-test p-value = 0.0057 [Koutra, Faloutsos, Vogelstein ‘13]

41

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Roadmap• Known node correspondence

– Simple features– Complex features– Visualization– Summary

• Unknown node correspondence

42

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Tested Visual Encodings

[Alper+ ’13, CHI]

Augmenting the graphs /adjacency matrices to show the differences.

User Study Result:

For bigger and sparser graphs, matrices are better.

40-80 nodes

low density

43

Danai Koutra
Maybe take out the applications and move them to part 3?

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

More on visualization

• For large graphs HoneyComb [van Ham+ ’09]

• Reference graph [Andrews ’09]

• Interactive comparison [Hascoet+ ’12]• General principles

[Gleicher+ ’11]• …

44

Danai Koutra
Maybe take out the applications and move them to part 3?

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Roadmap• Known node correspondence

– Simple features– Complex features– Visualization– Summary

• Unknown node correspondence

45

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

A Guide to Selecting a Measure

[Soundarajan, Gallagher, Eliassi-Rad. SDM’14]

H15

H1

H20

H2

Hk

46

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Q1 Q2

Q3

Much higher than

expected!

Some complex methods are very similar to simpler

methods

NetSimile, RWR often

close to consensus

[Soundarajan, Gallagher, Eliassi-Rad. SDM’14]

A Guide to Selecting a Measure

Are the graph similarity methods

correlated?

Are there groupsof methods that

behave comparably?

How can weget a singleconsensus method?

RWR ≈BP≈SSL[Koutra+ PKDD’11]

47

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

Summary• Numerous applications:

– Network monitoring, anomaly detection, network intrusion, behavioral studies

• Although seems easy problem, it’s not!– Some measures are counter-intuitive.– DeltaCon [Koutra+, SDM’13] (based on node

proximity) satisfies several intuitive properties. • There are multiple measures, but which one

to use?– Depends on the application!– Good news according to the guide of

[Soundarajan+, SDM’14]!48

Danai Koutra
Maybe take out the applications and move them to part 3?

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

References•S. Soundarajan and B. Gallagher, T. Eliassi-Rad. 2014. A Guide to Selecting a Network Similarity Method. SDM 2014. •D. Koutra, J.T. Vogelstein, C. Faloutsos. 2013. DELTACON: A Principled Massive-Graph Similarity Function. SDM 2013: 162-170. [CODE]•Stefan Fankhauser, Kaspar Riesen, and Horst Bunke. 2011. Speeding up graph edit distance computation through fast bipartite matching. In GbRPR'11.•Xinbo Gao, Bing Xiao, Dacheng Tao, and Xuelong Li. 2010. A survey of graph edit distance. Pattern Anal. Appl. 13, 1 (January 2010), 113-129.•Papadimitriou, Panagiotis and Dasdan, Ali and Garcia-Molina, Hector (2010). Web Graph Similarity for Anomaly Detection. Journal of Internet Services and Applications, Volume 1 (1). pp. 19-30. 49

(In reverse chronological order)

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

References•Kaspar Riesen and Horst Bunke. 2009. Approximate graph edit distance computation by means of bipartite graph matching.•Kelly Marie Kapsabelis, Peter John Dickinson, Kutluyil Dogancay. Investigation of graph edit distance cost functions for detection of network anomalies. ANZIAM J. 48 (CTAC2006) pp.436–449, 2007.•H. Bunke, P. J. Dickinson, M. Kraetzl, and W. D. Wallis. A Graph-Theoretic Approach to Enterprise Network Dynamics (PCS). Birkhauser, 2006.•Shoubridge P., Kraetzl M., Wallis W. D., Bunke H. Detection of Abnormal Change in a Time Series of Graphs. Journal of Interconnection Networks (JOIN) 3(1-2):85-101, 2002. •Horst Bunke and Kim Shearer. 1998. A graph distance metric based on the maximal common subgraph. Pattern Recogn. Lett. 19, 3-4 (March 1998), 255-259. 50

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

References•Kelmans, A. 1976. Comparison of graphs by their number of spanning trees. Discrete Mathematics 16, 3, 241 – 261.

Kernels (for more references, check slide 22)

•U. Kang, H. Tong, and J. Sun. Fast random walk graph kernel. in SDM, 2012.•Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, and Karsten M. Borgwardt. 2011. Weisfeiler-Lehman Graph Kernels. J. Mach. Learn. Res. 12, 2539-2561.•N. Shervashidze and K. M. Borgwardt. Fast subtree kernels on graphs. In Advances in Neural Information Processing Systems, pages 1660–1668, 2009. •Airola, A., Pyysalo, S., Björne, J., Pahikkala, T., Ginter, F., & Salakoski, T. (2008). All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning. BMC Bioinformatics C7 - S2, 9(Suppl 11).

51

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

ReferencesVisualization•Basak Alper, Benjamin Bach, Nathalie Henry Riche, Tobias Isenberg, and Jean-Daniel Fekete. 2013. Weighted graph comparison techniques for brain connectivity analysis. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13).•Mountaz Hascoët and Pierre Dragicevic. 2012. Interactive graph matching and visual comparison of graphs and clustered graphs. In Proceedings of the International Working Conference on Advanced Visual Interfaces (AVI '12).•Michael Gleicher, Danielle Albers, Rick Walker, Ilir Jusufi, Charles D. Hansen, and Jonathan C. Roberts. 2011. Visual comparison for information visualization.•Andrews, K., Wohlfahrt, M., and Wurzinger, G. 2009. Visual graph comparison. In Information Visualisation, 2009 13th International Conference. 62 –67.

52

D. Koutra & T. Eliassi-Rad & C. FaloutsosICDM’14 Tutorial

References•Frank Ham, Hans-Jörg Schulz, and Joan M. Dimicco. 2009. Honeycomb: Visual Analysis of Large Scale Social Networks. In Proceedings of the 12th IFIP TC 13 International Conference on Human-Computer Interaction: Part II (INTERACT '09)

53

Recommended