Upload
zukun
View
840
Download
0
Tags:
Embed Size (px)
Citation preview
1
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 1
A Tutorial on Spectral Clustering
Chris DingComputational Research Division
Lawrence Berkeley National LaboratoryUniversity of California
Supported by Office of Science, U.S. Dept. of Energy
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 2
Some historical notes• Fiedler, 1973, 1975, graph Laplacian matrix• Donath & Hoffman, 1973, bounds• Pothen, Simon, Liou, 1990, Spectral graph
partitioning (many related papers there after)• Hagen & Kahng, 1992, Ratio-cut• Chan, Schlag & Zien, multi-way Ratio-cut• Chung, 1997, Spectral graph theory book• Shi & Malik, 2000, Normalized Cut
1
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 1
A Tutorial on Spectral Clustering
Chris DingComputational Research Division
Lawrence Berkeley National LaboratoryUniversity of California
Supported by Office of Science, U.S. Dept. of Energy
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 2
Some historical notes• Fiedler, 1973, 1975, graph Laplacian matrix• Donath & Hoffman, 1973, bounds• Pothen, Simon, Liou, 1990, Spectral graph
partitioning (many related papers there after)• Hagen & Kahng, 1992, Ratio-cut• Chan, Schlag & Zien, multi-way Ratio-cut• Chung, 1997, Spectral graph theory book• Shi & Malik, 2000, Normalized Cut
2
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 3
Spectral Gold-Rush of 20019 papers on spectral clustering
• Meila & Shi, AI-Stat 2001. Random Walk interpreation of Normalized Cut
• Ding, He & Zha, KDD 2001. Perturbation analysis of Laplacian matrix on sparsely connected graphs
• Ng, Jordan & Weiss, NIPS 2001, K-means algorithm on the embeded eigen-space
• Belkin & Niyogi, NIPS 2001. Spectral Embedding• Dhillon, KDD 2001, Bipartite graph clustering• Zha et al, CIKM 2001, Bipartite graph clustering• Zha et al, NIPS 2001. Spectral Relaxation of K-means• Ding et al, ICDM 2001. MinMaxCut, Uniqueness of relaxation.• Gu et al, K-way Relaxation of NormCut and MinMaxCut
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 4
Part I: Basic Theory, 1973 – 2001
2
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 3
Spectral Gold-Rush of 20019 papers on spectral clustering
• Meila & Shi, AI-Stat 2001. Random Walk interpreation of Normalized Cut
• Ding, He & Zha, KDD 2001. Perturbation analysis of Laplacian matrix on sparsely connected graphs
• Ng, Jordan & Weiss, NIPS 2001, K-means algorithm on the embeded eigen-space
• Belkin & Niyogi, NIPS 2001. Spectral Embedding• Dhillon, KDD 2001, Bipartite graph clustering• Zha et al, CIKM 2001, Bipartite graph clustering• Zha et al, NIPS 2001. Spectral Relaxation of K-means• Ding et al, ICDM 2001. MinMaxCut, Uniqueness of relaxation.• Gu et al, K-way Relaxation of NormCut and MinMaxCut
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 4
Part I: Basic Theory, 1973 – 2001
3
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 5
Spectral Graph Partitioning
MinCut: min cutsize
cutsize = # of cut edgesConstraint on sizes: |A| = |B|
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 6
2-way Spectral Graph Partitioning
∈−∈
=BiAi
qi if if
11
Partition membership indicator:
Relax indicators qi from discrete values to continuous values, the solution for min J(q) is given by the eigenvectors of
(Fiedler, 1973, 1975)
(Pothen, Simon, Liou, 1990)
jijijji iijijji iij qwdqqqqqw ][21]2[
41
,2
,2 −=−+= ∑∑ δ
2,
][41
jiji ij qqwCutSizeJ −== ∑
qWDqT )(21 −=
qqWD λ=− )(
3
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 5
Spectral Graph Partitioning
MinCut: min cutsize
cutsize = # of cut edgesConstraint on sizes: |A| = |B|
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 6
2-way Spectral Graph Partitioning
∈−∈
=BiAi
qi if if
11
Partition membership indicator:
Relax indicators qi from discrete values to continuous values, the solution for min J(q) is given by the eigenvectors of
(Fiedler, 1973, 1975)
(Pothen, Simon, Liou, 1990)
jijijji iijijji iij qwdqqqqqw ][21]2[
41
,2
,2 −=−+= ∑∑ δ
2,
][41
jiji ij qqwCutSizeJ −== ∑
qWDqT )(21 −=
qqWD λ=− )(
4
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 7
Properties of Graph Laplacian
WDL −=Laplacian matrix of the Graph:
• L is semi-positive definite xT Lx ≥ 0 for any x.
• First eigenvector is q1=(1,…,1)T = eT with λ1=0.
• Second eigenvector q2 is the desired solution.
• The smaller λ2, the better quality of the partitioning. Perturbation analysis gives
||||2 Bcutsize
Acutsize +=λ
• Higher eigenvectors are also useful
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 8
Recovering Partitions
}0)(|{},0)(|{ 22 ≥=<= iqiBiqiA
From the definition of cluster indicators: Partitions A, B are determined by:
Thus, we sort q2 to increasing order, and cut in the middle point.
2,
)]()[(41 cqcqwCutSizeJ jiji ij +−+== ∑
However, the objective function J(q) is insensitive to additive constant c :
4
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 7
Properties of Graph Laplacian
WDL −=Laplacian matrix of the Graph:
• L is semi-positive definite xT Lx ≥ 0 for any x.
• First eigenvector is q1=(1,…,1)T = eT with λ1=0.
• Second eigenvector q2 is the desired solution.
• The smaller λ2, the better quality of the partitioning. Perturbation analysis gives
||||2 Bcutsize
Acutsize +=λ
• Higher eigenvectors are also useful
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 8
Recovering Partitions
}0)(|{},0)(|{ 22 ≥=<= iqiBiqiA
From the definition of cluster indicators: Partitions A, B are determined by:
Thus, we sort q2 to increasing order, and cut in the middle point.
2,
)]()[(41 cqcqwCutSizeJ jiji ij +−+== ∑
However, the objective function J(q) is insensitive to additive constant c :
5
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 9
Multi-way Graph Partitioning
• Recursively applying the 2-way partitioning• Recursive 2-way partitioning• Using Kernigan-Lin to do local refinements
• Using higher eigenvectors• Using q3 to further partitioning those obtained via q2.
• Popular graph partitioning packages• Metis, Univ of Minnesota• Chaco, Sandia Nat’l Lab
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 10
2-way Spectral Clustering
• Undirected graphs (pairwise similarities)• Bipartite graphs (contingency tables)• Directed graphs (web graphs)
5
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 9
Multi-way Graph Partitioning
• Recursively applying the 2-way partitioning• Recursive 2-way partitioning• Using Kernigan-Lin to do local refinements
• Using higher eigenvectors• Using q3 to further partitioning those obtained via q2.
• Popular graph partitioning packages• Metis, Univ of Minnesota• Chaco, Sandia Nat’l Lab
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 10
2-way Spectral Clustering
• Undirected graphs (pairwise similarities)• Bipartite graphs (contingency tables)• Directed graphs (web graphs)
6
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 11
Spectral Clustering
min cutsize , without explicit size constraints
Need to balance sizes
But where to cut ?
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 12
Clustering Objective Functions
• Ratio Cut
• Normalized Cut
• Min-Max-Cut
|B|s(A,B)
|A|s(A,B)(A,B)J Rcut +=
),(),),(
),(),(),(
BAsBs(BBAs
BAsAAsBAs
++
+=
s(B,B)s(A,B)
s(A,A)s(A,B)(A,B)JMMC +=
BANcut d
BAsd
BAsBAJ ),(),(),( +=
∑∑∈ ∈
=Ai Bj
ijws(A,B)
∑∈
=Ai
iA dd
6
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 11
Spectral Clustering
min cutsize , without explicit size constraints
Need to balance sizes
But where to cut ?
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 12
Clustering Objective Functions
• Ratio Cut
• Normalized Cut
• Min-Max-Cut
|B|s(A,B)
|A|s(A,B)(A,B)J Rcut +=
),(),),(
),(),(),(
BAsBs(BBAs
BAsAAsBAs
++
+=
s(B,B)s(A,B)
s(A,A)s(A,B)(A,B)JMMC +=
BANcut d
BAsd
BAsBAJ ),(),(),( +=
∑∑∈ ∈
=Ai Bj
ijws(A,B)
∑∈
=Ai
iA dd
7
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 13
Ratio Cut (Hagen & Kahng, 1992)
Min similarity between A , B: ∑∑∈ ∈
=Ai Bj
ijw(A,B) s
Size Balance
Cluster membership indicator:
qWDq(q)J TRcut )( −= Substitute q leads to
Solution given by eigenvectorNow relax q, the solution is 2nd eigenvector of L
∈−∈
=BinnnAinnn
iq if if
21
12
//
)(
|B|s(A,B)
|A|s(A,B)(A,B)J Rcut += (Wei & Cheng, 1989)
Normalization: 0,1 == eqqq TT
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 14
Normalized Cut (Shi & Malik, 1997)
Min similarity between A & B: ∑
∈∑
∈=
Ai Bjijws(A,B)
Balance weights
∈−∈
=BidddAiddd
iqBA
AB
if if
//
)(Cluster indicator:
BANcut d
BAsd
BAsBAJ ),(),(),( += ∑∈
=Ai
iA dd
∑∈
=Gi
idd
0,1 == DeqDqq TTNormalization: Substitute q leads to qWDq(q)J T
Ncut )( −=
)1()( −+− DqqqWDq TT λqmin
DqqWD λ=− )(Solution is eigenvector of
7
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 13
Ratio Cut (Hagen & Kahng, 1992)
Min similarity between A , B: ∑∑∈ ∈
=Ai Bj
ijw(A,B) s
Size Balance
Cluster membership indicator:
qWDq(q)J TRcut )( −= Substitute q leads to
Solution given by eigenvectorNow relax q, the solution is 2nd eigenvector of L
∈−∈
=BinnnAinnn
iq if if
21
12
//
)(
|B|s(A,B)
|A|s(A,B)(A,B)J Rcut += (Wei & Cheng, 1989)
Normalization: 0,1 == eqqq TT
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 14
Normalized Cut (Shi & Malik, 1997)
Min similarity between A & B: ∑
∈∑
∈=
Ai Bjijws(A,B)
Balance weights
∈−∈
=BidddAiddd
iqBA
AB
if if
//
)(Cluster indicator:
BANcut d
BAsd
BAsBAJ ),(),(),( += ∑∈
=Ai
iA dd
∑∈
=Gi
idd
0,1 == DeqDqq TTNormalization: Substitute q leads to qWDq(q)J T
Ncut )( −=
)1()( −+− DqqqWDq TT λqmin
DqqWD λ=− )(Solution is eigenvector of
8
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 15
MinMaxCut (Ding et al 2001)
Min similarity between A & B: ∑∑∈ ∈
=Ai Bj
ijws(A,B)
∑∑∈ ∈
=Ai Aj
ijws(A,A) Max similarity within A & B:
Cluster indicator:
⇒
∈−∈
=BidddAiddd
iqBA
AB
if if
//
)(
s(B,B)s(A,B)
s(A,A)s(A,B)(A,B)JMMC +=
2/
/1
/
/1)( −
+
++
+
+=
BAm
BA
ABm
ABMMC ddJ
dd
ddJ
ddqJ
Substituting,
0)(
<m
mMMC
dJJdJ
min Jmmc ⇒ max Jm(q)
DqqWD λ=− )(DqWq ξ=
DqqWqqJ T
T
m =
Because
⇒
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 16
A simple example2 dense clusters, with sparse connections between them.
Eigenvector q2Adjacency matrix
8
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 15
MinMaxCut (Ding et al 2001)
Min similarity between A & B: ∑∑∈ ∈
=Ai Bj
ijws(A,B)
∑∑∈ ∈
=Ai Aj
ijws(A,A) Max similarity within A & B:
Cluster indicator:
⇒
∈−∈
=BidddAiddd
iqBA
AB
if if
//
)(
s(B,B)s(A,B)
s(A,A)s(A,B)(A,B)JMMC +=
2/
/1
/
/1)( −
+
++
+
+=
BAm
BA
ABm
ABMMC ddJ
dd
ddJ
ddqJ
Substituting,
0)(
<m
mMMC
dJJdJ
min Jmmc ⇒ max Jm(q)
DqqWD λ=− )(DqWq ξ=
DqqWqqJ T
T
m =
Because
⇒
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 16
A simple example2 dense clusters, with sparse connections between them.
Eigenvector q2Adjacency matrix
9
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 17
Comparison of Clustering Objectives
• If clusters are well separated, all three give very similar and accurate results.
• When clusters are marginally separated, NormCut and MinMaxCut give better results
• When clusters overlap significantly, MinMaxCut tend to give more compact and balanced clusters.
B)s(A,B)s(A, ++
+=
),),(
),(),(
Bs(BBAs
AAsBAsJ Ncut
Cluster Compactness ⇒ ),(max AAs
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 18
2-way Clustering of Newsgroups
83.6 ± 2.557.5 ± 0.953.6 ± 3.1Politics.mideastPolitics.misc
79.5 ± 11.074.4 ± 20.454.9 ± 2.5BaseballHockey
97.2 ± 1.197.2 ± 0.863.2 ± 16.2AtheismComp.graphics
MinMaxCutNormCutRatioCutNewsgroups
9
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 17
Comparison of Clustering Objectives
• If clusters are well separated, all three give very similar and accurate results.
• When clusters are marginally separated, NormCut and MinMaxCut give better results
• When clusters overlap significantly, MinMaxCut tend to give more compact and balanced clusters.
B)s(A,B)s(A, ++
+=
),),(
),(),(
Bs(BBAs
AAsBAsJ Ncut
Cluster Compactness ⇒ ),(max AAs
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 18
2-way Clustering of Newsgroups
83.6 ± 2.557.5 ± 0.953.6 ± 3.1Politics.mideastPolitics.misc
79.5 ± 11.074.4 ± 20.454.9 ± 2.5BaseballHockey
97.2 ± 1.197.2 ± 0.863.2 ± 16.2AtheismComp.graphics
MinMaxCutNormCutRatioCutNewsgroups
10
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 19
Cluster Balance Analysis I: Random Graph Model
• Random graph: edges are randomly assigned with probability p: 0 ≤ p ≤ 1.
• RatioCut & NormCut show no size dependence
• MinMaxCut favors balanced clusters: |A|=|B|
constantRcut ==+= npB
BApA
BApBAJ||
||||||
||||),(
constantNcut =−
=−
+−
=1)1(||
||||)1(||
||||),(n
nnBpBAp
nApBApBAJ
1||||
1||||
)1|(|||||||
)1|(|||||||),(
−+
−=
−+
−=
BA
AB
BBpBAp
AApBApBAJMMC
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 20
2-way Clustering of Newsgroups
Eigenvector
JNcut(i)
JMMC(i)
Cluster Balance
10
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 19
Cluster Balance Analysis I: Random Graph Model
• Random graph: edges are randomly assigned with probability p: 0 ≤ p ≤ 1.
• RatioCut & NormCut show no size dependence
• MinMaxCut favors balanced clusters: |A|=|B|
constantRcut ==+= npB
BApA
BApBAJ||
||||||
||||),(
constantNcut =−
=−
+−
=1)1(||
||||)1(||
||||),(n
nnBpBAp
nApBApBAJ
1||||
1||||
)1|(|||||||
)1|(|||||||),(
−+
−=
−+
−=
BA
AB
BBpBAp
AApBApBAJMMC
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 20
2-way Clustering of Newsgroups
Eigenvector
JNcut(i)
JMMC(i)
Cluster Balance
11
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 21
Cluster Balance Analysis II: Large Overlap Case
5.0)],(),()[2/1(
),( >+
=BBsAAs
BAsf
Conditions for skewed cuts:
2/),(),()21
21( BAsBAsf
s(A,A) =−≥ :NormCut
),(),(21 BAsBAsf
s(A,A) =≥ :MinMaxCut
Thus MinMaxCut is much less prone to skewed cuts
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 22
Spectral Clustering of Bipartite Graphs
Simultaneous clustering of rows and columnsof a contingency table (adjacency matrix B )
Examples of bipartite graphs
• Information Retrieval: word-by-document matrix
• Market basket data: transaction-by-item matrix
• DNA Gene expression profiles
• Protein vs protein-complex
11
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 21
Cluster Balance Analysis II: Large Overlap Case
5.0)],(),()[2/1(
),( >+
=BBsAAs
BAsf
Conditions for skewed cuts:
2/),(),()21
21( BAsBAsf
s(A,A) =−≥ :NormCut
),(),(21 BAsBAsf
s(A,A) =≥ :MinMaxCut
Thus MinMaxCut is much less prone to skewed cuts
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 22
Spectral Clustering of Bipartite Graphs
Simultaneous clustering of rows and columnsof a contingency table (adjacency matrix B )
Examples of bipartite graphs
• Information Retrieval: word-by-document matrix
• Market basket data: transaction-by-item matrix
• DNA Gene expression profiles
• Protein vs protein-complex
12
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 23
Spectral Clustering of Bipartite Graphs
)(2)()(
)(2)()(
),;,(22
1221
11
1221
,
,,
,
,,2121
CR
CRCR
CR
CRCRMMC Bs
BsBsBs
BsBsRRCCJ
++
+=
Simultaneous clustering of rows and columns(adjacency matrix B )
cut
min between-cluster sum of xyz weights: s(R1,C2), s(R2,C1)
max within-cluster sum of xyz xyz weights: s(R1,C1), s(R2,C2)
(Ding, AI-STAT 2003)
∑ ∑∈ ∈
=1 2
21)( ,
Rr CcijCR
i j
bBs
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 24
Bipartite Graph Clustering
∈−∈
=2
1
if1 if1
)(RrRr
ifi
i
∈−∈
=2
1
if1 if1
)(CcCc
igi
i
Clustering indicators for rows and columns:
=
2212
2111
,,
,,
CRCR
CRCR
BBBB
B
=
00
TBB
W
=
gf
q
)()(
)()(
),;,(22
12
11
122121 Ws
WsWsWsRRCCJ MMC +=
Substitute and obtain
=
−
gf
DD
gf
BB
DD
c
rT
c
r λ0
0f,g are determined by
12
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 23
Spectral Clustering of Bipartite Graphs
)(2)()(
)(2)()(
),;,(22
1221
11
1221
,
,,
,
,,2121
CR
CRCR
CR
CRCRMMC Bs
BsBsBs
BsBsRRCCJ
++
+=
Simultaneous clustering of rows and columns(adjacency matrix B )
cut
min between-cluster sum of xyz weights: s(R1,C2), s(R2,C1)
max within-cluster sum of xyz xyz weights: s(R1,C1), s(R2,C2)
(Ding, AI-STAT 2003)
∑ ∑∈ ∈
=1 2
21)( ,
Rr CcijCR
i j
bBs
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 24
Bipartite Graph Clustering
∈−∈
=2
1
if1 if1
)(RrRr
ifi
i
∈−∈
=2
1
if1 if1
)(CcCc
igi
i
Clustering indicators for rows and columns:
=
2212
2111
,,
,,
CRCR
CRCR
BBBB
B
=
00
TBB
W
=
gf
q
)()(
)()(
),;,(22
12
11
122121 Ws
WsWsWsRRCCJ MMC +=
Substitute and obtain
=
−
gf
DD
gf
BB
DD
c
rT
c
r λ0
0f,g are determined by
13
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 25
Clustering of Bipartite Graphs
=
vu
vu
BB
T λ0~~0
==
== −−
gDfDDq
vu
zBDDBc
rcr 2/1
2/12/12/1 ,~
Tkk
m
kk vuB λ∑
=
=1
~
Let
We obtain
Solution is SVD:
(Zha et al, 2001, Dhillon, 2001)
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 26
Clustering of Bipartite Graphs
},)(|,{},)(|,{ 2221 riri zifrRzifrR ≥=<=
Recovering row clusters:
zr=zc=0 are dividing points. Relaxation is invariant up to a constant shift.
Algorithm: search for optimal points icut, jcut, let zr=f2(icut), zc= g2(jcut), such that
is minimized. (Zha et al, 2001)
Recovering column clusters:
},)(|,{},)(|,{ 2221 cici zigcCzigcC ≥=<=
),;,( 2121 RRCCJ MMC
13
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 25
Clustering of Bipartite Graphs
=
vu
vu
BB
T λ0~~0
==
== −−
gDfDDq
vu
zBDDBc
rcr 2/1
2/12/12/1 ,~
Tkk
m
kk vuB λ∑
=
=1
~
Let
We obtain
Solution is SVD:
(Zha et al, 2001, Dhillon, 2001)
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 26
Clustering of Bipartite Graphs
},)(|,{},)(|,{ 2221 riri zifrRzifrR ≥=<=
Recovering row clusters:
zr=zc=0 are dividing points. Relaxation is invariant up to a constant shift.
Algorithm: search for optimal points icut, jcut, let zr=f2(icut), zc= g2(jcut), such that
is minimized. (Zha et al, 2001)
Recovering column clusters:
},)(|,{},)(|,{ 2221 cici zigcCzigcC ≥=<=
),;,( 2121 RRCCJ MMC
14
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 27
Clustering of Directed Graphs
• Equivalent to deal with• All spectral methods apply to • For example, web graphs clustered in such
way
TWWW +=~
Min directed edge weights between A & B:
∑∑∈ ∈
+=Ai Bj
jiij wws(A,B) )(
)(∑∑∈ ∈
+=Ai Aj
jiij wws(A,A) Max directed edges within A & B:
W~
(He, Ding, Zha, Simon, ICDM 2001)
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 28
K-way Spectral ClusteringK ≥ 2
14
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 27
Clustering of Directed Graphs
• Equivalent to deal with• All spectral methods apply to • For example, web graphs clustered in such
way
TWWW +=~
Min directed edge weights between A & B:
∑∑∈ ∈
+=Ai Bj
jiij wws(A,B) )(
)(∑∑∈ ∈
+=Ai Aj
jiij wws(A,A) Max directed edges within A & B:
W~
(He, Ding, Zha, Simon, ICDM 2001)
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 28
K-way Spectral ClusteringK ≥ 2
15
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 29
K-way Clustering Objectives
• Ratio Cut
• Normalized Cut
• Min-Max-Cut
∑∑−=
+=
>< k k
kk
lk l
lk
k
lkK ||C
C,GCs||C
,CCs||C
,CCsCCJ )()()(),,(,
1 �Rcut
∑∑−=
+=
>< k k
kk
lk l
lk
k
lkK d
C,GCsd
,CCsd
,CCsCCJ )()()(),,(,
1 �Ncut
∑∑−=
+=
>< k kk
kk
lk ll
lk
kk
lkK CCs
C,GCsCCs,CCs
CCs,CCsCCJ
),()(
),()(
),()(),,(
,1 �MMC
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 30
K-way Spectral Relaxation
• Prove that the solution lie in the subspace spanned by the first k eigenvectors
• Ratio Cut• Normalized Cut• Min-Max-Cut
15
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 29
K-way Clustering Objectives
• Ratio Cut
• Normalized Cut
• Min-Max-Cut
∑∑−=
+=
>< k k
kk
lk l
lk
k
lkK ||C
C,GCs||C
,CCs||C
,CCsCCJ )()()(),,(,
1 �Rcut
∑∑−=
+=
>< k k
kk
lk l
lk
k
lkK d
C,GCsd
,CCsd
,CCsCCJ )()()(),,(,
1 �Ncut
∑∑−=
+=
>< k kk
kk
lk ll
lk
kk
lkK CCs
C,GCsCCs,CCs
CCs,CCsCCJ
),()(
),()(
),()(),,(
,1 �MMC
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 30
K-way Spectral Relaxation
• Prove that the solution lie in the subspace spanned by the first k eigenvectors
• Ratio Cut• Normalized Cut• Min-Max-Cut
16
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 31
K-way Spectral Relaxation
Tk
T
T
h
h
h
)11,00,00(
)00,11,00(
)00,00,11(
2
1
mmm
mmm
mmm
mmm
=
=
=Unsigned cluster indicators:
kTk
kTk
T
T
k hhhWDh
hhhWDhhhJ )()(),,(
11
111
−++−= ��Rcut
Re-write:
kTk
kTk
T
T
k DhhhWDh
DhhhWDhhhJ )()(),,(
11
111
−++−= ��Ncut
kTk
kTk
T
T
k WhhhWDh
WhhhWDhhhJ )()(),,(
11
111
−++−= ��MMC
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 32
K-way Ratio Cut Spectral Relaxation
Unsigned cluster indicators:
))((
)()(),,( 111
XWDX
xWDxxWDxxxJT
kTk
Tk
−=
−++−=
TrRcut ��
Re-write:
By K. Fan’s theorem, optimal solution is eigenvectors: X=(v1,v2, …, vk), (D-W)vk=λkvk
and lower-bound),,(min 11 kk xxJ �� Rcut≤++ λλ
(Chan, Schlag, Zien, 1994)
�
2/1/)00,11,00( kT
n
k nxk
���=
IXXXWDX TTX
=− tosubject Tr :Optimize ),)((min
),,( 1 kxxX �=
16
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 31
K-way Spectral Relaxation
Tk
T
T
h
h
h
)11,00,00(
)00,11,00(
)00,00,11(
2
1
mmm
mmm
mmm
mmm
=
=
=Unsigned cluster indicators:
kTk
kTk
T
T
k hhhWDh
hhhWDhhhJ )()(),,(
11
111
−++−= ��Rcut
Re-write:
kTk
kTk
T
T
k DhhhWDh
DhhhWDhhhJ )()(),,(
11
111
−++−= ��Ncut
kTk
kTk
T
T
k WhhhWDh
WhhhWDhhhJ )()(),,(
11
111
−++−= ��MMC
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 32
K-way Ratio Cut Spectral Relaxation
Unsigned cluster indicators:
))((
)()(),,( 111
XWDX
xWDxxWDxxxJT
kTk
Tk
−=
−++−=
TrRcut ��
Re-write:
By K. Fan’s theorem, optimal solution is eigenvectors: X=(v1,v2, …, vk), (D-W)vk=λkvk
and lower-bound),,(min 11 kk xxJ �� Rcut≤++ λλ
(Chan, Schlag, Zien, 1994)
�
2/1/)00,11,00( kT
n
k nxk
���=
IXXXWDX TTX
=− tosubject Tr :Optimize ),)((min
),,( 1 kxxX �=
17
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 33
K-way Normalized Cut Spectral Relaxation
Unsigned cluster indicators:
))~((
)~()~(),,( 111
YWIY
yWIyyWIyyyJT
kTk
Tk
−=
−++−=
TrNcut ��
Re-write:
By K. Fan’s theorem, optimal solution is eigenvectors: Y=(v1,v2, …, vk),
),,(min 11 kk yyJ ll Ncut≤++ λλ (Gu, et al, 2001)
�
||||/)00,11,00( 2/12/1k
Tn
k hDDyk
ooo=
IYYYWIY TTY
=− tosubject Tr:Optimize ),)~((min
2/12/1~ −−= WDDW
kkk vvWI λ=− )~(
kkkkk vDuDuuWD 2/1,)( −==− λ
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 34
K-way Min-Max Cut Spectral Relaxation
Unsigned cluster indicators:
kyWyyWy
yyJk
Tk
Tk −++=
MMC ~1
~1),,(
111 ��
Re-write:
Theorem. Optimal solution is by eigenvectors: Y=(v1,v2, …, vk),
),,(min 11
2
kk
yyJkkm
m
MMC≤−++ λλ (Gu, et al, 2001)
||||/ 2/12/1kkk hDhDy =
.0~,),(min >= kTk
TMMCY
yWyIYYYJ tosubject :Optimize
2/12/1~ −−= WDDW
kkk vvW λ= ~
17
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 33
K-way Normalized Cut Spectral Relaxation
Unsigned cluster indicators:
))~((
)~()~(),,( 111
YWIY
yWIyyWIyyyJT
kTk
Tk
−=
−++−=
TrNcut ��
Re-write:
By K. Fan’s theorem, optimal solution is eigenvectors: Y=(v1,v2, …, vk),
),,(min 11 kk yyJ ll Ncut≤++ λλ (Gu, et al, 2001)
�
||||/)00,11,00( 2/12/1k
Tn
k hDDyk
ooo=
IYYYWIY TTY
=− tosubject Tr:Optimize ),)~((min
2/12/1~ −−= WDDW
kkk vvWI λ=− )~(
kkkkk vDuDuuWD 2/1,)( −==− λ
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 34
K-way Min-Max Cut Spectral Relaxation
Unsigned cluster indicators:
kyWyyWy
yyJk
Tk
Tk −++=
MMC ~1
~1),,(
111 ��
Re-write:
Theorem. Optimal solution is by eigenvectors: Y=(v1,v2, …, vk),
),,(min 11
2
kk
yyJkkm
m
MMC≤−++ λλ (Gu, et al, 2001)
||||/ 2/12/1kkk hDhDy =
.0~,),(min >= kTk
TMMCY
yWyIYYYJ tosubject :Optimize
2/12/1~ −−= WDDW
kkk vvW λ= ~
18
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 35
K-way Spectral Clustering
• Embedding (similar to PCA subspace approach)– Embed data points in the subspace of the K eigenvectors– Clustering embedded points using another algorithm, such as K-
means (Shi & Malik, Ng et al, Zha, et al)• Recursive 2-way clustering (standard graph partitioning)
– If desired K is not power of 2, how optimcally to choose the next sub-cluster to split? (Ding, et al 2002)
• Both above approach do not use K-way clustering objective functions.
• Refine the obtained clusters using the K-way clustering objective function typically improve the results (Ding et al 2002).
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 36
DNA Gene expression
Effects of feature selection: Select 900 genes out of
4025 genes
Genes
Genes
Tissue sampleTissue sample
Lymphoma Cancer(Alizadeh et al, 2000)
18
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 35
K-way Spectral Clustering
• Embedding (similar to PCA subspace approach)– Embed data points in the subspace of the K eigenvectors– Clustering embedded points using another algorithm, such as K-
means (Shi & Malik, Ng et al, Zha, et al)• Recursive 2-way clustering (standard graph partitioning)
– If desired K is not power of 2, how optimcally to choose the next sub-cluster to split? (Ding, et al 2002)
• Both above approach do not use K-way clustering objective functions.
• Refine the obtained clusters using the K-way clustering objective function typically improve the results (Ding et al 2002).
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 36
DNA Gene expression
Effects of feature selection: Select 900 genes out of
4025 genesG
enesG
enes
Tissue sampleTissue sample
Lymphoma Cancer(Alizadeh et al, 2000)
19
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 37
Lymphoma CancerTissue samples
B cell lymphoma go thru different stages
–3 cancer stages
–3 normal stages
Key question: can we detect them automatically ?
PCA 2D Display
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 38
19
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 37
Lymphoma CancerTissue samples
B cell lymphoma go thru different stages
–3 cancer stages
–3 normal stages
Key question: can we detect them automatically ?
PCA 2D Display
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 38
20
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 39
Brief summary of Part I
• Spectral graph partitioning as origin• Clustering objective functions and solutions• Extensions to bipartite and directed graphs• Characteristics
– Principled approach– Well-motivated objective functions– Clear, un-ambiguous– A framework of rich structures and contents– Everything is proved rigorously (within the relaxation
framework, i.e., using continuous approximation of the discrete variables)
• Above results mostly done by 2001. • More to come in Part II
20
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California 39
Brief summary of Part I
• Spectral graph partitioning as origin• Clustering objective functions and solutions• Extensions to bipartite and directed graphs• Characteristics
– Principled approach– Well-motivated objective functions– Clear, un-ambiguous– A framework of rich structures and contents– Everything is proved rigorously (within the relaxation
framework, i.e., using continuous approximation of the discrete variables)
• Above results mostly done by 2001. • More to come in Part II