Upload
cleopatra-summers
View
217
Download
2
Tags:
Embed Size (px)
Citation preview
SCS CMU
Joint Work by
Hanghang Tong, Spiros Papadimitriou, Jimeng Sun,
Philip S. Yu, Christos Faloutsos
Speaker: Hanghang Tong
Aug. 24-27, 2008, Las Vegas KDD 2008
Colibri: Fast Mining of Large Static and Dynamic Graphs
SCS CMU
Motivation• Q: How to find patterns?
– e.g., community, anomaly, etc.
• A: Low-Rank Approximation (LRA) for Adjacency Matrix of the Graph.
3
A L
M RX X
~~
SCS CMU
LRA for Graph Mining: Example
4
John
KDD
Tom
Bob
Carl
Van
RoyRECOMB
ISMB
ICDM
Author Conf.
L M R
~~X X
Adj. matrix: A
Au. clusters
Conf. Cluster
Interaction
Recon. error is high ‘Carl’ is abnormal
SCS CMU
Challenges
• How to get (L, M, R)+ Efficiently (both time and space);
+ Intuitively (easy for interpretation);
+ Dynamically (track patterns over time)?
5
SCS CMU
6
Roadmap
• Motivation
• Existing Methods– SVD– CUR/CX
• Proposed Methods: Colibri
• Experimental Results
• Conclusion
SCS CMU
Matrix & Column Space
• Matrix
• Column Space of a Matrix
B =
7
3 11 10 0b1 b2
b1 , b2 are vectors in 3-d space!
b2 b1
SCS CMU
Projection, Projection Matrix & Core Matrix
8
v
v~
v~ = B v
BTBTB+
X X X
Projection of v Projection matrix of B An arbitrary vector
Core Matrix
SCS CMU
Singular-Value-Decomposition (SVD)
9
….a1 a2 a3 am…
A: n x m
….u1 uk…
U: left singular vectors
….
…
….
v1
V: right singular vectors
vk
1
k
x x
…
……
… … … … …
…
…
~~
SCS CMU
SVD: How to
• #1: Find the left matrix U, where
• #2: Project A into the column space of U
10
( ) ...T TA U U U U A U V
1 ,1 2 ,2 ,...Ti i m i mi
ii i
a v a v a vA vu
Projection Matrix of Column Space of U
SCS CMU
SVD: drawbacks
• Efficiency– Time– Space (U, V) are dense
• Interpretation
• Dynamic: not easy11
2 2(min( , ))O n m nm
1st singular vector
2nd singular vector
=
A U V
SCS CMU
CUR (CX) decomposition
12
…. …
A: n x m
….
C
…. ….
R
x x…
…
…
…
…
…
…
…
U
( )TC C TC A
~~•Sample Columns from A to form C•Project A onto the col. Space of C
SCS CMU
CUR (CX): advantages
13
• Efficiency (better than SVD)– Time
• (c is # of sampled col.s)
– Space (C, R) are sparse
• Interpretation
2 3( ) or ( )O c n O c cm
SCS CMU
• Redundancy in C, wasting both time and space
• Dynamic: not easy
CUR (CX): drawbacks
14
• 3 copies of green, • 2 copies of red, • 2 copies of purple• purple=0.5*green + red…
SCS CMU
15
Roadmap
• Motivation
• Existing Methods
• Colibri– Colibri-S for static graphs– Colibri-D for dynamic graphs
• Experimental Results
• Conclusion
SCS CMU
16
• 3 copies of green, • 2 copies of red, • 2 copies of purple• purple=0.5*green + red…
Colibri-S: Basic Idea
L
….
….
….
RMx x
CUR (CX) Colibri-SOriginal Matrix
We want the Col.s in L are linearly independent with each other!
SCS CMU
M= =CoreMatrix
17
InitiallySampled matrix C
….
L = : Linearly Ind. Col.s
….
….
….
-1
R = LT x A = ….
Input Output
?
LT L
Q: How to find L & M from C efficiently?
SCS CMU
discard v
18
A: Find L & M iteratively!….
Current L & M
Redundant ?
…
For each col. v in CProject it on L
Initial Sampled Matrix c
Expand L & M
SCS CMU
19
Colibri-S vs. CUR(CX)• Quality:
• Colibri-S = CUR(CX)• Time:
• Colibri-S >= CUR(CX)• Space
• Colibri-S >= CUR(CX)• Illustrations
Colibri-S CUR (CX)
3 3( ) vs. ( ), where ,O c cm O c cm c c m m
SCS CMU
Colirbri-D for dynamic graphs
20
Initially sampled matrix
t+1
Lt
Mt Rt
Lt+1
Mt+1 Rt+1
?
Q: How to update L and M efficiently?
t
SCS CMU
Colibri-D: How-To
21
Initially sampled matrix
t+1
Lt
Mt Rt
Lt+1
Mt+1 Rt+1
t
Selected Redundant
Selected Redundant
?
Changed from t
SCS CMU
Colibri-D: How-To
22
Initially sampled matrix
t+1
Lt
Mt
Lt+1
Mt+1
t
Selected Redundant
Selected Redundant
L~ Subspace by
blue cols at t+1
Un
ch
ang
ed
C
ols!
SCS CMU
24
Experimental Setup
• Datasets• Network traffic• 21,837 sources/destinations• 1,222 consecutive hours• 22,800 edges per hour
• Accuracy:Accu =
• Space Cost:
SCS CMU
25
Performance of Colibri-S
Time Space
Ours
CUR CUR
CMD
OursCMD
• Accuracy• Same 91%+
• Time• 12x of CMD• 28x of CUR
• Space• ~1/3 of CMD• ~10% of CUR
SCS CMU
27
Performance of Colibri-D
Time
# of changed cols
CMD
Colibri-S
Colibri-D achieves up to 112x speedups
Colibri-D
SCS CMU
A Family of Low-Rank Approximationfor Fast Graph Mining
• Colibri-S– For static graphs– Remove redundancy– Significant saving in time & space by “free”
• Colibri-D– For dynamic graphs– Explores “smoothness”– Up to 112x than best known methods
28