24
A penalized matrix decomposition, with application to sparse hierarchical clustering Daniela Witten Department of Statistics Stanford University October 20, 2009

A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

A penalized matrix decomposition,with application to sparse hierarchical clustering

Daniela WittenDepartment of Statistics

Stanford University

October 20, 2009

Page 2: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

Hierarchical clustering

There has been a resurgence of interest in hierarchical clustering inthe field of genomics.

Page 3: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

Clustering when p � n

Suppose we wish to cluster n observations on p features, wherep � n.

If the true underlying classes are defined on only a subset of thefeatures, then the presence of noise features can obscure this signal.

Page 4: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

Example

A simple example with 10 observations; 2 classes are defined on 10important features.

Page 5: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

Example: 10 important features; 10 features total

9 8 10 6 7 1 4 5 2 3

Page 6: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

Example: 10 important features; 500 features total

7 6 8 10 9 3 1 4 5 2

Page 7: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

Example: 10 important features; 5000 features total

9 7 6 10 5 2 8 1 4 3

Page 8: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

Sparse hierarchical clustering results: 10 importantfeatures; 5000 features total

1 4 5 2 3 10 6 7 8 9

Page 9: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

Sparse Clustering

We want a method to hierarchically cluster observations based ona small subset of the features; we will call this sparse hierarchicalclustering.

We want an automated way to

I find a subset of features to use in the clustering, and

I obtain a more accurate or interesting clustering using thatsubset of features.

Assumption: We assume that the dissimilarity measure used isadditive in the features: Di ,i 0 =

Ppj=1 di ,i 0,j

Page 10: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

Dissimilarity matrix for the n observations

Page 11: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

Dissimilarity matrix for the n observations

Page 12: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

Dissimilarity matrix is a sum of dissimilarity matrices overthe features

Page 13: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

Hierarchical clustering sums the dissimilarity matrices forthe features

Page 14: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

Weighted sum of the dissimilarity matrices for the features

Page 15: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

Sparse hierarchical clustering and the PMD

Let D denote the n

2 ⇥ p matrix for which column j is thefeature-wise dissimilarity matrix for feature j .

Then, suppose we apply the PMD to D:

maximizeu,w

u

TDw subject to ||u||2 1, ||w||2 1,

X

j

wj s,wj � 0

Then, wj is a weight on the dissimilarity matrix for feature j . If were-arrange the elements of Dw into a n ⇥ n matrix, thenperforming hierarchical clustering on this re-weighted dissimilaritymatrix gives sparse hierarchical clustering.

Page 16: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

Sparse hierarchical clustering in action

A simulated example with 6 classes defined on 100 signal features;2000 features in total.

5658

6062

6466

6870

72

Ordinary Clustering

0.000

0.005

0.010

0.015

0.020

0.025

Sparse Clustering

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 500 1000 1500 2000

0.00

0.05

0.10

0.15

W

Index

Page 17: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

An important breast cancer paper

Nature (2000) 406:747-752.

Page 18: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

Breast cancer data

I 65 breast tumor samples for which gene expression data isavailable. Some samples are replicates from the same tumor(before and after chemo).

I Clustered based on full set of 1753 genes first.

I Clustered based on 496 intrinsic genes for which the variationbetween di↵erent tumors is large relative to the variationwithin a tumor.

I Based on the intrinsic gene clustering, determined that 62 of65 tumors fall into one of four classes: normal-breast-like,basal-like, ER+, Erb-B2+.

Page 19: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

Clustering results: normal-breast-like, basal-like, ER+,Erb-B2+

0.0

0.5

1.0

1.5

Clustering Using All 1753 Genes

0.0

0.5

1.0

1.5

Clustering Using 496 Intrinsic Genes

Page 20: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

Sparse clustering

We wonder: If we sparsely cluster the observations using all of thegenes, can we identify the four classes successfully?

Three types of clustering:

1. Sparse hierarchical clustering of all 1753 genes, with thetuning parameter chosen to yield 496 genes.

2. Sparse hierarchical clustering of all 1753 genes, with thetuning parameter chosen by the gap statistic.

3. Standard hierarchical clustering using the 496 genes withhighest marginal variance.

Page 21: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

normal-breast-like, basal-like, ER+, Erb-B2+

0.0

0.5

1.0

1.5

Sparse Clustering: 496 Genes

0.0

0.5

1.0

1.5

Sparse Clustering: 106 Genes

0.0

0.5

1.0

1.5

496 High−Variance Genes

Page 22: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

Genes with high weights

# Gene Weight1 S100 CALCIUM-BINDING PROTEIN A8 (CALGRANULIN A) 0.2232 SECRETED FRIZZLED-RELATED PROTEIN 1 0.21263 ESTROGEN RECEPTOR 1 0.20764 KERATIN 17 0.16275 HUMAN REARRANGED IMMUNOGLOBULIN LAMBDA 0.15686 CYTOCHROME P450, SUBFAMILY IIA 0.1557 APOLIPOPROTEIN D 0.15098 LACTOTRANSFERRIN 0.14719 ESTROGEN RECEPTOR 1 0.140510 134783 0.1411 HEPATOCYTE NUCLEAR FACTOR 3, ALPHA 0.133212 HUMAN REARRANGED IMMUNOGLOBULIN LAMBDA LIGHT 0.130913 FATTY ACID BINDING PROTEIN 4, ADIPOCYTE 0.129214 CERULOPLASMIN (FERROXIDASE) 0.12615 HUMAN SECRETORY PROTEIN (P1.B) MRNA 0.120816 NON-SPECIFIC CROSS REACTING ANTIGEN 0.119917 LIPOPROTEIN LIPASE 0.112318 IMMUNOGLOBULIN LAMBDA LIGHT CHAIN 0.11219 CRYSTALLIN, ALPHA B 0.110820 FATTY ACID BINDING PROTEIN 4, ADIPOCYTE 0.1121 PLEIOTROPHIN (HEPARIN BINDING GROWTH FACTOR 8) 0.109922 85660 0.107723 ESTS, HIGHLY SIMILAR TO PROBABLE ATAXIA-TELANGIECTASIA 0.107124 V-FOS FBJ MURINE OSTEOSARCOMA VIRAL ONCOGENE HOMOLOG 0.105625 EPIDIDYMIS-SPECIFIC, WHEY-ACIDIC PROTEIN TYPE 0.101326 ALDO-KETO REDUCTASE FAMILY 1, MEMBER C1 0.1007

Page 23: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

Conclusions

I Clustering methods are very sensitive to the set of featuresused.

I In high dimensions, we may not want to simply use all of thefeatures that happen to be available.

I Objective methods are required for selecting the features foruse in clustering.

I This proposal can be applied to K -means clustering,hierarchical clustering, K -medoids clustering, and more.

I Unsupervised learning when p � n: need better ways to selecttuning parameters and validate results obtained.

I R package sparcl: Sparse Clustering.

I Witten and Tibshirani (2010) ’A framework for featureselection in clustering’, JASA (T & M) 105(490): 713-726.

Page 24: A penalized matrix decomposition, with application to sparse ...web.stanford.edu/~lmackey/stats306b/doc/stats306b-spring14-lectur… · I R package sparcl: Sparse Clustering. I Witten

References

1. Chin et al. (2006) ’Genomic and transcriptional aberrations linked tobreast cancer pathophysiologies’, Cancer Cell 10: 529-541.

2. Perou et al. (2000) ’Molecular portraits of human breast tumours’,Nature 406: 747-752.

3. Shen and Huang (2008) ’Sparse principal component analysis viaregularized low rank matrix approximation’, Journal of Multivariate

Analysis 6: 1015-1034.

4. Witten, Tibshirani, and Hastie (2009) ’A penalized matrix decomposition,with applications to canonical correlation analysis and principalcomponents’, Biostatistics 10(3): 515-534.

5. Witten and Tibshirani (2009) ’A framework for feature selection inclustering’, Submitted.