29
SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug. 24-27, 2008, Las Vegas KDD 2008 Colibri: Fast Mining of Large Static and Dynamic Graphs

SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug. 24-27, 2008, Las Vegas

Embed Size (px)

Citation preview

SCS CMU

Joint Work by

Hanghang Tong, Spiros Papadimitriou, Jimeng Sun,

Philip S. Yu, Christos Faloutsos

Speaker: Hanghang Tong

Aug. 24-27, 2008, Las Vegas KDD 2008

Colibri: Fast Mining of Large Static and Dynamic Graphs

SCS CMU

2

Graphs are everywhere!

Q: How to find patterns?e.g., community, anomaly, etc.

SCS CMU

Motivation• Q: How to find patterns?

– e.g., community, anomaly, etc.

• A: Low-Rank Approximation (LRA) for Adjacency Matrix of the Graph.

3

A L

M RX X

~~

SCS CMU

LRA for Graph Mining: Example

4

John

KDD

Tom

Bob

Carl

Van

RoyRECOMB

ISMB

ICDM

Author Conf.

L M R

~~X X

Adj. matrix: A

Au. clusters

Conf. Cluster

Interaction

Recon. error is high ‘Carl’ is abnormal

SCS CMU

Challenges

• How to get (L, M, R)+ Efficiently (both time and space);

+ Intuitively (easy for interpretation);

+ Dynamically (track patterns over time)?

5

SCS CMU

6

Roadmap

• Motivation

• Existing Methods– SVD– CUR/CX

• Proposed Methods: Colibri

• Experimental Results

• Conclusion

SCS CMU

Matrix & Column Space

• Matrix

• Column Space of a Matrix

B =

7

3 11 10 0b1 b2

b1 , b2 are vectors in 3-d space!

b2 b1

SCS CMU

Projection, Projection Matrix & Core Matrix

8

v

v~

v~ = B v

BTBTB+

X X X

Projection of v Projection matrix of B An arbitrary vector

Core Matrix

SCS CMU

Singular-Value-Decomposition (SVD)

9

….a1 a2 a3 am…

A: n x m

….u1 uk…

U: left singular vectors

….

….

v1

V: right singular vectors

vk

1

k

x x

……

… … … … …

~~

SCS CMU

SVD: How to

• #1: Find the left matrix U, where

• #2: Project A into the column space of U

10

( ) ...T TA U U U U A U V

1 ,1 2 ,2 ,...Ti i m i mi

ii i

a v a v a vA vu

Projection Matrix of Column Space of U

SCS CMU

SVD: drawbacks

• Efficiency– Time– Space (U, V) are dense

• Interpretation

• Dynamic: not easy11

2 2(min( , ))O n m nm

1st singular vector

2nd singular vector

=

A U V

SCS CMU

CUR (CX) decomposition

12

…. …

A: n x m

….

C

…. ….

R

x x…

U

( )TC C TC A

~~•Sample Columns from A to form C•Project A onto the col. Space of C

SCS CMU

CUR (CX): advantages

13

• Efficiency (better than SVD)– Time

• (c is # of sampled col.s)

– Space (C, R) are sparse

• Interpretation

2 3( ) or ( )O c n O c cm

SCS CMU

• Redundancy in C, wasting both time and space

• Dynamic: not easy

CUR (CX): drawbacks

14

• 3 copies of green, • 2 copies of red, • 2 copies of purple• purple=0.5*green + red…

SCS CMU

15

Roadmap

• Motivation

• Existing Methods

• Colibri– Colibri-S for static graphs– Colibri-D for dynamic graphs

• Experimental Results

• Conclusion

SCS CMU

16

• 3 copies of green, • 2 copies of red, • 2 copies of purple• purple=0.5*green + red…

Colibri-S: Basic Idea

L

….

….

….

RMx x

CUR (CX) Colibri-SOriginal Matrix

We want the Col.s in L are linearly independent with each other!

SCS CMU

M= =CoreMatrix

17

InitiallySampled matrix C

….

L = : Linearly Ind. Col.s

….

….

….

-1

R = LT x A = ….

Input Output

?

LT L

Q: How to find L & M from C efficiently?

SCS CMU

discard v

18

A: Find L & M iteratively!….

Current L & M

Redundant ?

For each col. v in CProject it on L

Initial Sampled Matrix c

Expand L & M

SCS CMU

19

Colibri-S vs. CUR(CX)• Quality:

• Colibri-S = CUR(CX)• Time:

• Colibri-S >= CUR(CX)• Space

• Colibri-S >= CUR(CX)• Illustrations

Colibri-S CUR (CX)

3 3( ) vs. ( ), where ,O c cm O c cm c c m m

SCS CMU

Colirbri-D for dynamic graphs

20

Initially sampled matrix

t+1

Lt

Mt Rt

Lt+1

Mt+1 Rt+1

?

Q: How to update L and M efficiently?

t

SCS CMU

Colibri-D: How-To

21

Initially sampled matrix

t+1

Lt

Mt Rt

Lt+1

Mt+1 Rt+1

t

Selected Redundant

Selected Redundant

?

Changed from t

SCS CMU

Colibri-D: How-To

22

Initially sampled matrix

t+1

Lt

Mt

Lt+1

Mt+1

t

Selected Redundant

Selected Redundant

L~ Subspace by

blue cols at t+1

Un

ch

ang

ed

C

ols!

SCS CMU

23

Roadmap

• Motivation

• Existing Methods

• Colibri

• Experimental Results

• Conclusion

SCS CMU

24

Experimental Setup

• Datasets• Network traffic• 21,837 sources/destinations• 1,222 consecutive hours• 22,800 edges per hour

• Accuracy:Accu =

• Space Cost:

SCS CMU

25

Performance of Colibri-S

Time Space

Ours

CUR CUR

CMD

OursCMD

• Accuracy• Same 91%+

• Time• 12x of CMD• 28x of CUR

• Space• ~1/3 of CMD• ~10% of CUR

SCS CMU

26Approximation Accuracy

CUR

CMD

Colibri-S

More Evaluation on Colibri-SLog Time (Sec)

SCS CMU

27

Performance of Colibri-D

Time

# of changed cols

CMD

Colibri-S

Colibri-D achieves up to 112x speedups

Colibri-D

SCS CMU

A Family of Low-Rank Approximationfor Fast Graph Mining

• Colibri-S– For static graphs– Remove redundancy– Significant saving in time & space by “free”

• Colibri-D– For dynamic graphs– Explores “smoothness”– Up to 112x than best known methods

28

SCS CMU

29

Poster tonight!

Thank you!

www.cs.cmu.edu/~htong