DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation Alexander Hinneburg Martin-Luther-University Halle- Wittenberg, Germany Hans-Henning Gabriel 101tec GmbH, Halle, Germany

DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

Download PPT Report

Upload
mallory-knight
View
74
Download
0

Tags:

Embed Size (px)

DESCRIPTION

DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation. Alexander Hinneburg Martin-Luther-University Halle-Wittenberg, Germany Hans-Henning Gabriel 101tec GmbH, Halle, Germany. Overview. Density-based clustering and DENCLUE 1.0 Hill climbing as EM-algorithm - PowerPoint PPT Presentation

Citation preview

DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

Alexander HinneburgMartin-Luther-University Halle-Wittenberg, Germany

Hans-Henning Gabriel101tec GmbH, Halle, Germany

Page 2: DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

Overview

• Density-based clustering and DENCLUE 1.0• Hill climbing as EM-algorithm• Identification of local maxima• Applications of general EM-acceleration • Experiments

Page 3: DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

Density-Based Clustering

• Assumption– clusters are regions of high density in the data

space ,

• How to estimate density?– parametric models

• mixture models

– non-parametric models• histogram• kernel density estimation

Page 4: DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

Kernel Density Estimation• Idea

– influence of a data point is modeled by a kernel– density is the normalized sum of all kernels– smoothing parameter h

Gaussian Kernel

Density Estimate

Page 5: DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

DENCLUE 1.0 Framework• Clusters are defined by local maxima of

the density estimate– find all maxima by hill climbing

• Problem– const. step size

Gradient

Hill Climbing

const. step size

Page 6: DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

Problem of const. Step Size

• Not efficient– many unnecessary small steps

• Not effective– does not converge to a local maximum

just comes close

• Example

Page 7: DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

New Hill Climbing Approach

• General approach– differentiate density estimate and set to zero

– no solution, but can be used for iteration

Page 8: DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

New DENCLUE 2.0 Hill Climbing

• Efficient– automatically adjusted step size at no extra costs

• Effective– converges to local maximum (proof follows)

• Example

Page 9: DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

Proof of Convergence• Cast the problem of maximizing kernel denstiy

as maximizing the likelihood of a mixture model

• Introduce hidden variable

Page 10: DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

Proof of Convergence

• Complete likelihood is maximized by EM-Algorithm

• this also maximizes the original likelihood, which is the kernel density estimate

• When starting the EM with we do the hill climbing for

E-Step

M-Step

Page 11: DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

Identification of local Maxima• EM-Algorithm iterates until

– reached end point– sum of k last step sizes

• Assumption– true local maximum is in a ball of around

• Points with end points closerbelong to the same maximum M

• In case of non-unique assignmentdo a few extra EM iterations

Page 12: DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

Acceleration

• Sparse EM– update only the p% points with largest posterior– saves 1-p% of kernel computations after first iteration

• Data Reduction– use only %p of the data as representative points– random sampling– kMeans

Page 13: DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

Experiments

• Comparison of DENCLUE 1.0 (FS) vs. 2.0 (SSA)

• 16-dim. artificial data• both methods are tuned to find the correct clustering

Page 14: DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

Experiments

• Comparison of acceleration methods

Page 15: DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

Experiments

• Clustering quality (normalized mutual information, NMI) vs. sample size (RS)

Page 16: DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

Experiments

• Cluster Quality (NMI) of DENCLUE 2.0 (SSA) and acceleration methods and k-Means on real data

sample sizes 0.8, 0.4, 0.2

Page 17: DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

Conclusion

• New hill climbing for DENCLUE

• Automatic step size adjustment

• Convergence proof by reduction to EM

• Allows the application of general EM accelerations

• Future work– automatic setting of smoothing parameter h

(so far tuned manually)

Page 18: DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

Thank you for your attention!

On Potts Model Clustering, Kernel K-means, and Density Estimation › wxs › Learning-papers › potts-2-27 … · On Potts Model Clustering, Kernel K-means, and Density Estimation

Documents

Topic 1 Clustering Basics · Topic 1 Clustering Basics CS898. Overview Basics (K-means) • variance clustering • generalizations (parametric & non-parametric) Kernel K-means Probabilistic

Documents

Lirong Xia Kernel (continued), Clustering Friday, April 18, 2014

Documents

kernlab - An S4 Package for Kernel Methods in Rlc436/07Spring665/kernlab.pdf · Keywords: kernel methods, support vector machines, quadratic programming, ranking, clustering, S4,

Documents

Topographic graph clustering with kernel and dissimilarity methods

Science

DYNAMIC TIME-ALIGNMENT K-MEANS KERNEL CLUSTERING …xzhang/publications/icip2015-Santar-Zhang.pdf · DYNAMIC TIME-ALIGNMENT K-MEANS KERNEL CLUSTERING FOR TIME SEQUENCE CLUSTERING

Documents

Clustering: K-means and Kernel K-means · Clustering: K-means and Kernel K-means Piyush Rai Machine Learning (CS771A) Aug 31, 2016 Machine Learning (CS771A) Clustering: K-means and

Documents

On Potts Model Clustering, Kernel K-Means, and Density Estimation · 2009-09-24 · On Potts Model Clustering, Kernel K-Means, and Density Estimation Alejandro MURUA, Larissa STANBERRY,

Documents

CISD 794 – Knowledge Discovery in Databases Prof. Junping Sun DENCLUE - Clustering DNA Sequences using FFT (Fast Fourier Transform ) Gilberto R dos Santos

Documents

Performance assessment of kernel density clustering for gene …downloads.hindawi.com/journals/ijg/2003/758291.pdf · 2019. 8. 1. · clustering, K-means clustering, and multivariate

Documents

Review on Density-based Clustering - DBSCAN, DenClue & GRID

Documents

SENTENCE LEVEL TEXT CLUSTERING USING A FUZZY … · SENTENCE LEVEL TEXT CLUSTERING USING A FUZZY HIERARCHICAL RELATIONAL CLUSTERING ALGORITHM . ... reproducing kernel Hilbert space

Documents

Constrained Clustering by Spectral Kernel Learningmmlab.ie.cuhk.edu.hk/2009/ICCV09_CCSKL.pdf · 2012-10-31 · clustering spectral embedding without using eigenvalues 2. Figure 1

Documents

Poisson Kernel-Based Clustering on the Sphere: Convergence

Documents

KERNEL-BASED CLUSTERING OF BIG DATA - Biometricsbiometrics.cse.msu.edu/Publications/Thesis/RadhaChitta... · 2015. 10. 8. · ABSTRACT KERNEL-BASED CLUSTERING OF BIG DATA By Radha

Documents

MACHINE LEARNING kernel CCA, kernel Kmeans …lasa.epfl.ch/.../Slides/kCCA-kKmeans-SpectralClustering.pdfMACHINE LEARNING kernel CCA, kernel Kmeans Spectral Clustering MACHINE LEARNING

Documents

Kernel Spectral Clustering and applications - arxiv.org · Kernel Spectral Clustering and applications Chapter Contribution to the book: Unsupervised Learning Algorithms May 5, 2015

Documents

Kernel Clustering

Documents

KCK-means A Clustering Method based on Kernel Canonical Correlation Analysis

Documents

DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation Alexander Hinneburg Martin-Luther-University Halle-Wittenberg, Germany Hans-Henning Gabriel

Documents

Francesco Camastra- Kernel Methods for Clustering

Documents

N D -BASED CLUSTERING TECHNIQUE · CURE Clustering Using REpresentatives. DBSCAN A Density-Based Clustering Method Based on Connected Regions with Sufficiently High Density. DENCLUE

Documents

Kernel truncated regression representation for robust ... · Kernel truncated regression representation for robust subspace clustering Liangli Zhen a, Dezhong Peng b, c , ∗, Wei

Documents

Kernel Clustering - Computer Science and Engineering | …cse902/S14/ppt/kernelClustering.pdf · 2014-02-05 · Approximate Kernel k-means: Solution to Large Scale Kernel Clustering,

Documents

1 Kernel clustering: density biases and solutions · 1 Kernel clustering: density biases and solutions Dmitrii Marin ∗Meng Tang Ismail Ben Ayed† Yuri Boykov ∗Computer Science,

Documents

A Graph Kernel Approach for Detecting Core Patents and ...perpustakaan.unitomo.ac.id › repository › A Graph... · • clustering of nodes using kernel k-means clustering, •

Documents

Constraint Reasoning and Kernel Clustering for …damoulas/Site/papers_files/pattern...Constraint Reasoning and Kernel Clustering for Pattern Decomposition With Scaling Ronan LeBras

Documents

Plugin Kdd98 DENCLUE

Documents

Spatial Kernel-based Generalized C-mean Clustering for ...eprints.usm.my/24565/1/SPATIAL_KERNEL_BASED_GENERALIZED_… · spatial kernel-based generalized c-mean clustering for medical

Documents

A novel ant-based clustering algorithm using the kernel method

Documents