On the Equivalence of Nonnegative Matrix Factorization and …ranger.uta.edu/~heng/CSE6389_15_slides/On the Equivalence... · 2015. 2. 26. · the Equivalence of Nonnegative Matrix

On the Equivalence of Nonnegative Matrix Factorization and

Spectral Clustering Chris Ding, Xiaofeng He, Horst D. Simon

Published on SDM 05’ Hongchang Gao

Presenter

Presentation Notes

SIAM International Conference on Data Mining

Outline

• NMF • NMFKmeans • NMFSpectral Clustering • NMFBipartite graph Kmeans

Outline

• NMF • NMFKmeans • NMFSpectral Clustering • NMFBipartite graph Kmeans

NMF

• Paatero and Tapper (1994) – Positive matrix factorization: a non-negative factor model

with optimal utilization of error estimates of data values – Environmetrices

• Lee and Seung (1999, 2000) – Learning the parts of objects by non-negative matrix

factorization, Nature – Algorithms for non-negative matrix factorization, NIPS

Presenter

Presentation Notes

Paatero published his initial factorization algorithms in 1994. However, Paatero’s work is rarely cited by subsequent authors. This is partially due to Paatero’s unfortunate phrasing of positive matrix factorization, which is misleading as Paatero’s algorithms create a NMF. After that, a lot of work has been devoted the NMF algorithms. Today, we will focus on the relationship between NMF and kmeans and spectral clustering

NMF

• Matrix Factorization is widely used in machine learning, such as SVD – interpretation of basis vectors is difficult due to

mixed signs

T

nonnegmixed mixedX U V= Σ

Presenter

Presentation Notes

basis vectors that are mixed in sign. Negative elements make interpretation difficult Althogh svd has a lot of strength, but

NMF

• Nonnegative Matrix Factorization – where – columns of F are the underlying basis vectors – rows of G give the weights associated with each

basis vector

T

nonneg nonnegX F G=

, ,d n d k n kX R F R G R× × ×∈ ∈ ∈

Presenter

Presentation Notes

each of the columns of X can be built from k columns of W.

Outline

• NMF • NMFKmeans • NMFSpectral • NMFBipartite graph Kmenas

Kmeans

• Kmeans clustering is one of most widely used clustering method.

Kmeans

• Reformulate Kmeans Clustering

• Cluster membership indicators:

Kmeans

• Objective function

• Replace , which is the standard inner-product linear Kernel matrix

TW X X=

Presenter

Presentation Notes

Kernel K-means aims at maximizing within-cluster similarities. The advantage of Kernel K -means is that it can describe data distributions more complicated than Gaussion distributions.

Kernel Kmeans

• Map x to higher dimension space:

• Kernel Kmeans objective:

NMFKmeans

• Orthogonal symmetric NMF is equivalent to Kernel Kmeans clustering

Kernel Kmeans=>Symmetric NMF

• Factorization is equivalent to Kernel K-means clustering with the strict orthogonality relaxed

, 0

, 0

2 2

, 0

2

, 0

arg max ( )

arg min 2 ( )

arg min || || 2 ( ) || ||

arg min || ||

Relaxing the orthogonality H H = I completes the proof

T

T

T

T

T

H H I H

T

H H I H

T T

H H I H

T

H H I H

T

H Tr H WH

Tr H WH

W Tr H WH H H

W HH

= ≥

= ≥

= ≥

= ≥

=

= −

= − +

= −

Symmetric NMF=> Kernel Kmeans

• factorization retains H orthogonality approxiamately. – Proof. is equivalent to

– The first one recover the objective

2min|| ||TW HH−

0max ( )T

HTr H WH

≥2

0min || ||T

HH H

≥

TW HH=

Symmetric NMF=> Kernel Kmeans

• The second one – Minimize the first term, we get

– Minimize the second term

• We should make sure H cannot be all zero

Outline

• NMF • NMFKmeans • NMFSpectral • NMFBipartite graph Kmeans

Spectral Clustering

• Spectral clustering objective functions

Spectral Clustering

• Reformulate the objective based on Ncut

• Replace

• Then,

1 1

( , ) ( )1 1( )

TK Kl l l l

Tl ll l l

cut V V V h D W hJK vol V K h Dh= =

− −= =∑ ∑

1/2

1/2|| ||l

ll

D hzD h

=

~ ~

1 1( ) ( )

K KT T Tl l l l

l lJ z I W z z z Tr Z W Z

= =

= − = −∑ ∑

NMFSpectral Clustering

• The objective of spectral clustering

• This is identical to the Kernel Kmeans clustering

• Spectral ClusteringKernel Kmeans NMF

~

, 0max ( )

T

T

Z Z I ZTr Z W Z

= ≥

Outline

• NMF • NMFKmeans • NMFSpectral • NMFBipartite graph Kmenas

Bipartite graph Kmeans

• Simultaneous clustering of rows and columns


• Simultaneously cluster the rows and columns of data matrix

• Row Clustering

• Column Clustering

1 2,...,( , )nB x x x=

, 0max ( )

T

T T

F F I FTr F BB F

= ≥

, 0max ( )

T

T T

G G I GTr G B BG

= ≥

Presenter

Presentation Notes

For example, in document clustering, documents are data points, words are features. Data points can be grouped based on features, and features can be grouped based on words.


• Equivalent problem:

• Solution

• Then,

, Tk k k k k kBg f B f gλ λ= =

2 2,T Tk k k k k kB Bg g BB f fλ λ= =

Presenter

Presentation Notes

The solution of this quadratic optimization is given by the first K eigenvectors This proves that Eq.(20) is the objective for simultaneous row and column K -means clustering.

Bipartite graph Kmeans=>NMF

• The simultaneous row and column Kmeans clustering is equivalent to the following optimization problem

Bipartite graph Kmeans=>NMF

• Proof.

• Therefore, NMF is equivalent to Kmeans clustering with relaxed orthogonality contraints.

,

,

2

,

2

,

max ( )

min ( )

min || || 2 ( ) ( )

min || ||

T

F G

T

F G

T T T

F G

T

F G

Tr F BG

Tr F BG

B Tr F BG Tr F FG G

B FG

⇒ −

⇒ − +

⇒ −

NMF=>Bipartite graph Kmeans

• In the previous, we assume both F and G are orthogonal. If one of them is orthogonal, we can explicitly write as a Kmeans clustering objective function.

• NMF with orthogonal G is identical to Kmeans clustering of the columns of B.

2|| ||TB FG−


• Proof. – At first, normalize the row of G, s.t.

– Then, for the objective function

– We have


• The orthogonality condition of G implies that in each row of G, only one element is nonzero and

• Summing over i: – which is the Kmeans clustering

0,1ikg =

22

1|| ||

k

K

i kk i C

J x f= ∈

= −∑∑

Reference • Ding, Chris HQ, Xiaofeng He, and Horst D. Simon. "On

the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering." SDM. Vol. 5. 2005.

• Li, Tao, and Chris Ding. "The relationships among various nonnegative matrix factorization methods for clustering." Data Mining, 2006. ICDM'06. Sixth International Conference on. IEEE, 2006.

• Von Luxburg, Ulrike. "A tutorial on spectral clustering." Statistics and computing 17.4 (2007): 395-416.

• Shi, Jianbo, and Jitendra Malik. "Normalized cuts and image segmentation." Pattern Analysis and Machine Intelligence, IEEE Transactions on 22.8 (2000): 888-905.

Thanks

Documents

On the Equivalence of Nonnegative Matrix Factorization and …ranger.uta.edu/~heng/CSE6389_15_slides/On the Equivalence... · 2015. 2. 26. · the Equivalence of Nonnegative Matrix