Ensemble Clustering

ENSEMBLE CLUSTERING

unlabeled data

……

partition

clustering algorithm 1

combine

clustering algorithm N

……

clustering algorithm 2

Combine multiple partitions of given data into a single partition of better quality

partition 1

partition 2

… …

partition N

WHY ENSEMBLE CLUSTERING? Different clustering algorithms may produce different partitions because they

impose different structure on the data; No single clustering algorithm is optimal

Different realizations of the same algorithm may generate different partitions

WHY ENSEMBLE CLUSTERING? Goal

Exploit the complementary nature of different partitions Each partition can be viewed as taking a different “look” or “cut” through data

Punch, Topchy, and Jain, PAMI, 2005

CHALLENGE I: HOW TO GENERATE CLUSTERING ENSEMBLES?

Produce a clustering ensemble by either Using different clustering algorithms

E.g. K-means, Hierarchical Clustering, Fuzzy C-means, Spectral Clustering, Gaussian Mixture Model,….

Running the same algorithm many times with different parameters or initializations, e.g., run K-means algorithm N times using randomly initialized clusters centers use different dissimilarity measures use different number of clusters

Using different samples of the data E.g. many different bootstrap samples from the givendata

Random projections (feature extraction) E.g. project the data onto a random subspace

Feature selection E.g. use different subsets of features

CHALLENGE II: HOW TO COMBINE MULTIPLE PARTITIONS?

According to (Vega-Pons & Ruiz-Shulcloper, 2011), ensemble clustering algorithms can be divided into

Median partition based approaches

Object co-occurrence based approaches Relabeling/voting based methods Co-association matrix based methods Graph based methods

MEDIAN PARTITION BASED APPROACHES

Basic idea: find a partition P that maximizes the similarity between P and all the N partitions in the ensemble: P1, P2, …, PN

Need to define the similarity between two partitions Normalized mutual information (Strehl & Ghosh, 2002) Utility function (Topchy, Jain, and Punch, 2005) Fowlkes-Mallows index (Fowlkes & Mallows, 1983) Purity and inverse purity (Zhao & Karypis, 2005)

… ….

RELABELING/VOTING BASED METHODS

Basic idea: first find the corresponding cluster labels among multiple partitions, then obtain the consensus partition through a voting process. (Ayad & Kamel, 2007; Dimitriadou et. al, 2002; Dudoit & Fridlyand, 2003; Fischer & Buhmann, 2003; Tumer & Agogino, 2008; etc)

P1 P2 P3

v1 1 3 2v2 1 3 2v3 2 1 2v4 2 1 3v5 3 2 1v6 3 2 1

P1 P2 P3

v1 1 1 1v2 1 1 1v3 2 2 1v4 2 2 2v5 3 3 3v6 3 3 3

Re-labelingP*112233

Voting

Hungarian

algorithm

CO-ASSOCIATION MATRIX BASED METHODS Basic idea: first compute a co-association matrix based on

multiple data partitions, then apply a similarity-based clustering algorithm (e.g., single link and normalized cut) to the co-association matrix to obtain the final partition of the data. (Fred & Jain, 2005; Iam-On et. al, 2008; Vega-Pons & Ruiz-Shulcloper, 2009; Wang et. al, 2009; Li et. al, 2007; etc)

GRAPH BASED METHODS

Basic idea: construct a weighted graph to represent multiple clustering results from the ensemble, then find the optimal partition of data by minimizing the graph cut (Fern & Brodley, 2004; Strehl & Ghosh, 2002; etc)

P1 P2 P3

v1 1 1 1v2 1 2 2v3 2 1 1v4 2 2 2v5 3 3 3v6 3 4 3

P*121233

clustering

ENSEMBLE CLUSTERING IN IMAGE SEGMENTATION

Ensemble Clustering using Semidefinite Programming, Singh et al, NIPS 2007

Ensemble Clustering

Documents

Nearest Neighbor using Ensemble Clustering · 2017-05-22 · k Nearest Neighbor using Ensemble Clustering Loai AbedAllah and Ilan Shimshoni 1 Department of Mathematics, University

Scalable Ensemble Information-Theoretic Co-clustering for … · 2012. 2. 23. · Recently, it has been widely used in numerous practical applications, including simultaneously clustering

GiniClust2: a cluster-aware, weighted ensemble clustering ...bcb.dfci.harvard.edu/~gcyuan/mypaper/daphne; giniclust2; gb.pdf · Fano factor-based k-means is more accurate for detect-ing

JOURNAL OF LA EC3: Combining Clustering and · PDF fileJOURNAL OF LATEX CLASS FILES, VOL. 13, NO. 9, SEPTEMBER 2014 1 EC3: Combining Clustering and Classiﬁcation for Ensemble Learning

Data and text mining Ensemble non-negative matrix ...derekgreene.com/papers/greene08ensemble.pdf · Data and text mining ... hierarchical clustering methods to identify such a groups

Ensemble clustering in the belief functions frameworktdenoeux/dokuwiki/_media/en/...Key words: Clustering, ensemble clustering, belief functions, lattice of partitions, intervals of

Affinity Clustering: Hierarchical Clustering at Scalepapers.nips.cc/paper/7262-affinity-clustering-hierarchical-clustering-at-scale.pdf · Afﬁnity Clustering: Hierarchical Clustering

CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples

A Lattice-Based Consensus Clustering Algorithmcla.inf.upol.cz/papers/cla2016/paper4.pdf · ensemble clustering, lattice-based clustering 1 Introduction and related work It seems,

UNSUPERVISED ROBOTIC SORTING TOWARDS AUTONOMOUS DECISION ... · encoders combined with ensemble clustering to generate feature representations suitable for clustering. As of today,

Clustering: Partition Clustering

ENSEMBLE CLUSTERING VIA · Abstract Jian Li A Thesis Submitted for the Degree of Doctor of Philosophy Brunel University II Ensemble Clustering via Heuristic Optimisation Abstract

Linked Spectral Graph based Cluster Ensemble …...Graph based Consensus Clustering algorithm [47] mainly discovers the cancer data subtypes from the genetic profiles. Still there

CLUSTERING AND PREDICTIVE MODELING: AN ENSEMBLE …

An automated approach for clustering an ensemble …formations/NMR spectroscopy/protein structure Introduction Unlike structures determined by X-ray crystallography, which are deposited

Ensemble Based Gustafson Kessel Fuzzy Clustering

Ensemble and Constrained Clustering with Applicationsabdala/tese_abdala.pdf · Ensemble and Constrained Clustering with Applications Inaugural-Dissertation zur Erlangung des Doktorgrades

Ensemble Examinations Ensemble Performance Awardslcme.uwl.ac.uk/media/1437/ensemble-syllabus.pdf · Ensemble Examinations . Ensemble Performance Awards ... LCM’s graded and diploma

Integrated Mining of PPI Networks: A Case for Ensemble Clustering

Network Ensemble Algorithm for Intrusion … Article A Hybrid Spectral Clustering and Deep Neural Network Ensemble Algorithm for Intrusion Detection in Sensor Networks Tao Ma 1,2,