61
L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue: The Computer Journal, Profiling Expertise and Behaviour: Deadline 15 Nov. 2006. To submit, http://

L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Embed Size (px)

Citation preview

Page 1: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

L2 and L1 Criteria for K-Means Bilinear Clustering

B. MirkinSchool of Computer ScienceBirkbeck College, University of London

Advert of a Special Issue: The Computer Journal, Profiling Expertise and Behaviour: Deadline 15 Nov. 2006. To submit, http:// www.dcs.bbk.ac.uk/~mark/cfp_cj_profiling.txt

Page 2: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Outline: More of Properties than MethodsClustering, K-Means and IssuesData recovery PCA model and clusteringData scatter decompositions for L2 and L1 Contributions of nominal featuresExplications of Quadratic criterionOne-by-one cluster extraction: Anomalous patterns and iK-MeansIssue of the number of clustersComments on optimisation problemsConclusion and future work

Page 3: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

WHAT IS CLUSTERING; WHAT IS DATAK-MEANS CLUSTERING: Conventional K-Means; Initialization of K-Means; Intelligent K-Means; Mixed Data; Interpretation AidsWARD HIERARCHICAL CLUSTERING: Agglomeration; Divisive Clustering with Ward Criterion; Extensions of Ward ClusteringDATA RECOVERY MODELS: Statistics Modelling as Data Recovery;

Data Recovery Model for K-Means; for Ward; Extensions to Other Data Types; One-by-One ClusteringDIFFERENT CLUSTERING APPROACHES: Extensions of K-Means; Graph-Theoretic Approaches; Conceptual Description of ClustersGENERAL ISSUES: Feature Selection and Extraction; Similarity on Subsets and Partitions; Validity and Reliability

Page 4: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Clustering, K-Means and IssuesBilinear PCA model and clusteringData Scatter Decompositions: Quadratic and Absolute Contributions of nominal featuresExplications of Quadratic criterionOne-by-one cluster extraction: Anomalous patterns and iK-MeansIssue of the number of clustersComments on optimisation problemsConclusion and future work

Page 5: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Example: W. Jevons (1857) planet clusters, updated (Mirkin, 1996)

Pluto doesn’t fit in the two clusters of planets: originated another cluster (2006)

Page 6: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Clustering algorithms

Nearest neighbour Ward’s Conceptual clustering

K-means Kohonen SOM Spectral clustering ………………….

Page 7: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

K-Means: a generic clustering method

Entities are presented as multidimensional points (*) 0. Put K hypothetical centroids (seeds) 1. Assign points to the centroids according to minimum distance rule 2. Put centroids in gravity centres of thus obtained clusters 3. Iterate 1. and 2. until convergence

K= 3 hypothetical centroids (@)

* *

* * * * * * * * @ @

@** * * *

Page 8: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

K-Means: a generic clustering method

Entities are presented as multidimensional points (*) 0. Put K hypothetical centroids

(seeds) 1. Assign points to the centroids according to Minimum distance rule 2. Put centroids in gravity centres of thus obtained clusters 3. Iterate 1. and 2. until convergence

* *

* * * * * * * * @ @

@** * * *

Page 9: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

K-Means: a generic clustering method

Entities are presented as multidimensional points (*) 0. Put K hypothetical centroids (seeds) 1. Assign points to the centroids according to Minimum distance rule 2. Put centroids in gravity centres of thus obtained clusters 3. Iterate 1. and 2. until convergence

* *

* * * * * * * * @ @

@** * * *

Page 10: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

K-Means: a generic clustering method

Entities are presented as multidimensional points (*) 0. Put K hypothetical centroids (seeds) 1. Assign points to the centroids according to Minimum distance rule 2. Put centroids in gravity centres of thus obtained clusters 3. Iterate 1. and 2. until convergence 4. Output final centroids and clusters

* * @ * * * @ * * * *

** * * *

@

Page 11: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Advantages of K-Means Models typology building Computationally effective Can be utilised incrementally, `on-line’

Shortcomings (?) of K-Means Initialisation affects results Convex cluster shape

Page 12: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Initial Centroids: Correct

Two cluster case

Page 13: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Initial Centroids: Correct

Initial Final

Page 14: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Different Initial Centroids

Page 15: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Different Initial Centroids: Wrong

Initial Final

Page 16: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Issues:K-Means gives no advice on:

* Number of clusters* Initial setting* Data normalisation* Mixed variable scales* Multiple data sets

K-Means gives limited advice on: * Interpretation of

resultsThese all can be addressed with the data recovery approach

Page 17: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Clustering, K-Means and IssuesData recovery PCA model and clusteringData Scatter Decompositions: Quadratic and Absolute Contributions of nominal featuresExplications of Quadratic criterionOne-by-one cluster extraction: Anomalous patterns and iK-MeansIssue of the number of clustersComments on optimisation problemsConclusion and future work

Page 18: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Data recovery for data mining (discovery of patterns in data)

Type of Data Similarity Temporal Entity-to-feature

Type of Model Regression Principal

components Clusters

Model:Data = Model_Derived_Data + ResidualPythagoras:

|Data|m = |Model_Derived_Data|m + |Residual|m

m =1, 2. The better fit, the better the model: a natural source of

optimisation problems

Page 19: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

K-Means as a data recovery method

Page 20: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Representing a partition

Cluster k:

Centroid

ckv (v - feature)

Binary 1/0 membership

zik (i - entity)

Page 21: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Basic equations (same as for PCA, but score vectors zk constrained to be binary)

y – data entry, z – 1/0 membership, not score

c - cluster centroid, N – cardinality

i - entity, v - feature /category, k - cluster

,1

ivikz

kvc

K

kivy

Page 22: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Clustering: general and K-MeansBilinear PCA model and clusteringData Scatter Decompositions L2 and L1Contributions of nominal featuresExplications of Quadratic criterionOne-by-one cluster extraction: Anomalous patterns and iK-MeansIssue of the number of clustersComments on optimisation problemsConclusion and future work

Page 23: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Quadratic data scatter decomposition (classic)

K-means: Alternating LS minimisation y – data entry, z – 1/0 membership

c - cluster centroid, N – cardinality

i - entity, v - feature /category, k - cluster

K

k Si

V

vkviv

V

vk

K

kkv

N

i

V

viv

k

cyNcy1 1

2

1 1

2

1 1

2 )(

,1

ivikz

kvc

K

kivy

Page 24: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Absolute Data Scatter Decomposition (Mirkin 1997)

K

k Si

V

vkviv

V

vkvkv

K

kiv

Si

N

i

V

viv

k

kv

cy

cnyy

1 1

1 11 1

||

|)|||2(||

00:

00:

kvivkvk

kvkvivkkv cycSi

ccySiS

Ckv are medians

Page 25: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

OutlineClustering: general and K-MeansBilinear PCA model and clusteringData Scatter Decompositions L2 and L1 Implications for data pre-processingExplications of Quadratic criterionOne-by-one cluster extraction: Anomalous patterns and iK-MeansIssue of the number of clustersComments on optimisation problemsConclusion and future work

Page 26: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Meaning of the Data scatter

m=1,2; The sum of contributions of features – the basis for feature pre-processing (dividing by range rather than std) Proportional to the summary variance (L2) / absolute deviation from the median (L1)

V

v

N

i

miv

N

i

V

v

miv

m yyD1 11 1

||||

Page 27: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Standardisation of features

Yik = (Xik –Ak)/Bk

X - original data Y – standardised data i – entities k – features Ak – shift of the origin, typically, the average Bk – rescaling factor, traditionally the

standard deviation, but range may be better in clustering

Page 28: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Normalising by std

decreases the effect of more useful feature 2

by rangekeeps the effect of distribution shape in T(Y):B = range*#categories (for L2 case)(under the equality-of-variables assumption)

B1=Std1 << B2= Std2

Page 29: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Data standardisation

Categories as one/zero variablesSubtracting the averageAll features: Normalising by rangeCategories - sometimes by the number of them

Page 30: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Illustration of data pre-processing Mixed scale data table

Page 31: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Conventional quantitative coding + … data standardisation

Page 32: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

No normalisation

Tom Sawyer

Page 33: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Z-scoring (scaling by std)

Tom Sawyer

Page 34: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Normalising by range*#categories

Tom Sawyer

Page 35: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

OutlineClustering: general and K-MeansBilinear PCA model and clusteringData Scatter Decompositions: Quadratic and Absolute Contributions of nominal featuresExplications of Quadratic criterionOne-by-one cluster extraction: Anomalous patterns and iK-MeansIssue of the number of clustersComments on optimisation problemsConclusion and future work

Page 36: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Contribution of a feature F to a partition (m=2)

Proportional to correlation ratio 2 if F is quantitative a contingency coefficient between cluster

partition and F, if F is nominal:Pearson chi-square (Poisson normalised)

Goodman-Kruskal tau-b (Range normalised)

Fv

k

K

kkvNc

1

2Contrib(F) =

Page 37: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Contribution of a quantitative feature to a partition (m=2)

Proportional to correlation ratio 2 if F is quantitative

2

1

222 /)(

K

kkkpNN

Page 38: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Contribution of a pair nominal feature – partition, L2 case

Proportional to a contingency coefficient Pearson chi-square (Poisson normalised)

Goodman-Kruskal tau-b (Range normalised) Bj=1

Still needs be normalised by the square root of #categories, to balance the contribution of a numerical feature

2/

2)

,( jBkpjk jpkpkjpNContr

jj pB

Page 39: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Contribution of a pair nominal feature – partition, L1 case

A highly original contingency coefficient Still needs be normalised by square root of #categories, to balance the contribution of a numerical feature

|2|

),(

),(

/2[/

),(

)1/(2 kpkvp

vk

vkvp

kp

kvpKNContr

vkvkvk ppp

||/|2|),(kvckpkvpNvkContr

Page 40: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Clustering: general and K-MeansBilinear PCA model and clusteringData Scatter Decompositions: Quadratic and Absolute Contributions of nominal featuresExplications of Quadratic criterionOne-by-one cluster extraction: Anomalous patterns and iK-MeansIssue of the number of clustersComments on optimisation problemsConclusion and future work

Page 41: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Equivalent criteria (1)

A. Bilinear residuals squared MIN

Minimizing difference between data andcluster structureB. Distance-to-Centre Squared MIN

Minimizing difference between data andcluster structure

N

i Vv

ive1

2

K

kk

Si

icdWk1

2 ),(

Page 42: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Equivalent criteria (2)

C. Within-group error squared MIN

Minimizing difference between data andcluster structureD. Within-group variance Squared MIN

Minimizing within-cluster variance

2

1

)( iv

N

ikv

Vv Si

yck

K

kkk SS

1

2 )(||

Page 43: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Equivalent criteria (3)E. Semi-averaged within distance squared MIN

Minimizing dissimilarities within clustersF. Semi-averaged within similarity squared

MAX

Maximizing similarities within clusters

||/),(1 ,

2k

K

k Sji

Sjidk

jijiawhereSjia k

K

k Sji k

,),(|,|/),(1 ,

Page 44: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Equivalent criteria (4)

G. Distant Centroids MAX

Finding anomalous typesH. Consensus partition MAX

Maximizing correlation between sought partition and given variables

||1

2k

K

k Vv

kv Sc

),(1

vSV

v

Page 45: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Equivalent criteria (5)

I. Spectral Clusters MAX

Maximizing summary Raileigh quotient over binary vectors

K

kk

Tkk

TTk zzzYYz

1

/

Page 46: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Gower’s controversy: 2N+1 entities

Two-cluster possibilities W(c1,c2/c3)= N2 d(c1,c2) W(c1/c2,c3)= N d(c2,c3) W(c1/c2,c3)=o(W(c1,c2/c3))

Separation over grand mean/median rather than just over distances (in the most general d setting)

c1 c2 c3

N N 1

Page 47: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

OutlineClustering: general and K-MeansBilinear PCA model and clusteringData Scatter Decompositions: Quadratic and Absolute Contributions of nominal featuresExplications of Quadratic criterionOne-by-one cluster extraction strategy: Anomalous patterns and iK-MeansComments on optimisation problemsIssue of the number of clustersConclusion and future work

Page 48: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

PCA inspired Anomalous Pattern Clustering

yiv =cv zi + eiv,

where zi = 1 if iS, zi = 0 if iS

With Euclidean distance squared

Si

V

vSviv

V

vSSv

N

i

V

viv cyNcy

1

2

1

2

1 1

2 )(

Si

SSS

N

i

cidNcdid ),()0,()0,(1

cS must be anomalous, that is,

interesting

Page 49: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Spectral clustering can be not optimal (1)

Spectral clustering (becoming popular, in a different setting): Find maximum eigenvector by

maximising

over all possible x

Define zi=1 if xi>a ; zi=0, if xi a, for some a

xxxYYx TTT /

Page 50: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Spectral clustering can be not optimal (2)

Example (for similarity data):

2 6 19 20

3 4

5

1

z | 0.681 0.260 0.126 0.168 |i | 1 2 3-5 6-20

This cannot be typical…

Page 51: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Initial setting with Anomalous Pattern Cluster

Tom Sawyer

Page 52: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Anomalous Pattern Clusters: Iterate

0

Tom Sawyer

12

3

Page 53: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

iK-Means:Anomalous clusters + K-meansAfter extracting 2 clusters (how one can know that 2 is right?)

Final

Page 54: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

iK-Means:Defining K and Initial Setting with Iterative Anomalous Pattern Clustering

Find all Anomalous Pattern clusters Remove smaller (e.g., singleton) clusters Put the number of remaining clusters as K and initialise K-Means with their centres

Page 55: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

OutlineClustering: general and K-MeansBilinear PCA model and clusteringData Scatter Decompositions: Quadratic and Absolute Contributions of nominal featuresExplications of Quadratic criterionOne-by-one cluster extraction: Anomalous patterns and iK-MeansIssue of the number of clustersComments on optimisation problemsConclusion and future work

Page 56: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Study of eight Number-of-clusters methods (joint work with Mark Chiang):

• Variance based:Hartigan (HK)

Calinski & Harabasz (CH) Jump Statistic (JS)• Structure based:

Silhouette Width (SW)• Consensus based:

Consensus Distribution area (CD)Consensus Distribution mean (DD)

• Sequential extraction of APs (iK-Means):Least Square (LS)Least Moduli (LM)

Page 57: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Experimental resultsat 9 Gaussian clusters (3 size patterns), 1000 x 15 data

Estimated number of clusters

Adjusted Rand Index

Large spread

Small spread

Large spread

Small spread

HK

CH

JS

SW

CD

DD

LS

LM

1-time winner 2-times winner 3-times winner

Two winners counted each time

Page 58: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Clustering: general and K-MeansBilinear PCA model and clusteringData Scatter Decompositions: Quadratic and Absolute Contributions of nominal featuresExplications of Quadratic criterionOne-by-one cluster extraction: Anomalous patterns and iK-MeansIssue of the number of clustersComments on optimisation problemsConclusion and future work

Page 59: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Some other data recovery clustering models

Hierarchical clustering: Ward agglomerative and Ward-like divisive, relation to wavelets and Haar basis (1997) Additive clustering: partition and one-by-one clustering (1987) Biclustering: box clustering (1995)

Page 60: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Hierarchical clustering for conventional and spatial data

Model: Same

Cluster structure: 3-valued z representing a splitA split S=S1+S2 of a node S in children S1, S2:

zi = 0 if i S, = a if iS1= -b if iS2If a and b taken to z being centred, thenode vectors for a hierarchy formorthogonal base (an analogue to SVD)

,1

ivikkv

K

kiv ezcy

S1 S2

S

Page 61: L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:

Last:Data recovery K-Means-wise model is an

adequate tool that involves wealth of interesting criteria for mathematical investigation

L1 criterion remains a mystery, even for the most popular method, the PCA

Greedy-wise approaches remain a vital element both theoretically and practically

Evolutionary approaches started sneaking in: should be given more attention

Extending the approach to other data types is a promising direction