51
MeanShift, MedoidShift and QuickShift Advanced Clustering Methods References: D. Comaniciu, P. Meer: Mean Shift: A Robust Approach toward Feature Space Analysis, IEEE Trans. Pattern Analysis Machine Intel., Vol. 24(5), 603-619, 2002 Yaser Sheikh, Erum Khan, Takeo Kanade, "Mode-seeking via Medoidshifts", IEEE International Conference on Computer Vision, 2007. A. Vedaldi and S. Soatto, "Quick Shift and Kernel Methods for Mode Seeking," in Proceedings of the European Conference on Computer Vision (ECCV), 2008

Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

MeanShift, MedoidShift and QuickShift

Advanced Clustering Methods

References:

D. Comaniciu, P. Meer: Mean Shift: A Robust Approach toward Feature Space Analysis, IEEE Trans. Pattern Analysis Machine Intel., Vol. 24(5), 603-619, 2002

Yaser Sheikh, Erum Khan, Takeo Kanade, "Mode-seeking via Medoidshifts", IEEE International Conference on Computer Vision, 2007.

A. Vedaldi and S. Soatto, "Quick Shift and Kernel Methods for Mode Seeking," in Proceedings of the European Conference on Computer Vision (ECCV), 2008

Page 2: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

MeanShift Clustering

nonparametric mode seeking

Page 3: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Background: Parzen Estimation

(Aka Kernel Density Estimation)

Consider a mathematical model of how histograms are formed. Assume continuous data points.

Page 4: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Parzen Estimation

(Aka Kernel Density Estimation)

Consider a mathematical model of how histograms are formed. Assume continuous data points.

Convolve with box filter of width w (e.g. [1 1 1])Subsample the results, with spacing wResulting value at point u represents count of

data points falling in range u-w/2 to u+w/2

Page 5: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Example Histograms

Box filter

[1 1 1]

[ 1 1 1 1 1 1 1]

[ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]

Increased smoothing

samples: h[0] h[1] h[2] h[3] h[4] h[5] h[6] h[7] h[8] h[9]

Page 6: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Why Formulate it This Way?

• Generalize from box filter to other filters(for example Gaussian)

• Gaussian acts as a smoothing filter.

• Key idea: produces a nonparametric estimate of a continuous distribution from a discrete set of points.

Page 7: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Kernel Density Estimation

• Parzen windows: Approximate probability density by estimating local density of points (same idea as a histogram)– Convolve points with window/kernel function (e.g., Gaussian) using scale

parameter (e.g., sigma)

from Hastie et al.

Page 8: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Density Estimation at Different Scales

• Example: Density estimates for 5 data points with differently-scaled kernels

• Scale influences accuracy vs. generality (overfitting)

from Duda et al.

Page 9: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Smoothing Scale Selection

• Unresolved issue: how to pick the scale (sigma value) of the smoothing filter

• Answer for now: a user-settable parameter

from Duda et al.

Page 10: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Mean-Shift

The mean-shift algorithm is a hill-climbing algorithm that seeks modes of a density without explicitly computing that density.

The density is implicitly represented by raw samples and a kernel function. The density is the one that would be computed if Parzen estimation was applied to the data with the given kernel.

Page 11: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

What is Mean Shift ?

Non-parametricDensity Estimation

Non-parametricDensity GRADIENT Estimation

(Mean Shift)

Data

Discrete PDF Representation

PDF Analysis

A tool for:Finding modes in a set of data samples, manifesting an underlying probability density function (PDF) in RN

Ukrainitz&Sarel, Weizmann

PDF in feature space• Color space• Scale space• Actually any feature space you can conceive• …

Page 12: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Intuitive Description

Distribution of identical billiard balls

Region ofinterest

Center ofmass

Mean Shiftvector

Objective : Find the densest region

Ukrainitz&Sarel, Weizmann

Page 13: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Intuitive Description

Distribution of identical billiard balls

Region ofinterest

Center ofmass

Mean Shiftvector

Objective : Find the densest region

Ukrainitz&Sarel, Weizmann

Page 14: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Intuitive Description

Distribution of identical billiard balls

Region ofinterest

Center ofmass

Mean Shiftvector

Objective : Find the densest region

Ukrainitz&Sarel, Weizmann

Page 15: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Intuitive Description

Distribution of identical billiard balls

Region ofinterest

Center ofmass

Mean Shiftvector

Objective : Find the densest region

Ukrainitz&Sarel, Weizmann

Page 16: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Intuitive Description

Distribution of identical billiard balls

Region ofinterest

Center ofmass

Mean Shiftvector

Objective : Find the densest region

Ukrainitz&Sarel, Weizmann

Page 17: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Intuitive Description

Distribution of identical billiard balls

Region ofinterest

Center ofmass

Mean Shiftvector

Objective : Find the densest region

Ukrainitz&Sarel, Weizmann

Page 18: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Intuitive Description

Distribution of identical billiard balls

Region ofinterest

Center ofmass

Objective : Find the densest region

Ukrainitz&Sarel, Weizmann

Page 19: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

MeanShift as Mode Seeking

x1 x2 x3

P(x)

Parzen density estimation

Page 20: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

MeanShift as Mode Seeking

y1

P(x)

Seeking the mode from an initial point y1

Construct a tight convex lower bound b at y1 [b(y1)=P(y1)]

y2

Find y2 to maximize b

Page 21: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

MeanShift as Mode Seeking

y1

P(x)

Note, P(y2) >= b(y2) > b(y1) = P(y1)Therefore P(y2) > P(y1)

y2

P(y2)

P(y1)

Page 22: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

MeanShift as Mode Seeking

P(x)

Move to y2 and repeat until you reach a mode

y2 y3 y4 y5 y6

Page 23: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

MeanShift in Equations

Parzen estimate using {x1,x2,...,xN}

where (r) is a convex, symmetric windowing (weighting) kernel; e.g. (r)=exp(-r)

Form lower bound of each j by expanding about r0= ||yold-xj||2

note: first-order taylor series expansionf(r) ~ f(r0) + (r-r0)f’(r0) + h.o.t.

Page 24: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

MeanShift in Equations

Lower bound of at r0

and substitute into Parzen equation

now let

where

to get a quadratic lower bound (wrt y) on the Parzen density function

Page 25: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

MeanShift in EquationsQuadratic lower bound on the Parzen density function

Want to find y to maximize this. We therefore need to maximize

Take derivative wrt y, set to 0, and solve

mean-shiftupdate eqn

Page 26: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

MeanShift ClusteringFor each data point xi

Let yi(0) = xi ; t = 0

Repeat

Compute weights w1, w2, ... wN for all data points, using yi(t)

Compute updated location yi(t+1) using mean-shift update eqn

If ( yi(t) and yi(t+1) are “close enough” )declare convergence to a mode and exit

elset = t + 1;

end (repeat)

number distinct modes found = number clusters

points that converge to “same” mode are labeled as belonging to one cluster

Page 27: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

What is MedoidShift?

Instead of solving for continuous-valued y to maximize the quadratic lower bound on the Parzen density estimate,

consider only a discrete set of y locations, taken from the initial set of data points {x1,x2,...,xN}

yi(0) = xiy is now chosen from a discrete set of data points

That is, for the first step towards the mode

Page 28: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

MedoidShift

Implications of this strategy for clustering:

Since we are choosing the best y from among {x1,x2,...,xN},yi(1) will be one of the original data points.

Therefore, we only need to compute a single step starting from each data point, because yi(2) will then be the result of yyi(1)(1)

x1

x2

x3

x4

x5

y3(1)

y2(1)

y4(1)

y5(1)y5(1)

y5(2)

y5(3)

y5(0)

Page 29: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

MedoidShift

Different clusters are found as separate connected components (trees, actually... so can do efficient tree traversal to locate all points in each cluster.

Also note: no need for heuristic distance thresholds to test forconvergence to a mode, or to determine membership in a cluster.

x1

x2

x3

x4

x5

x6

x7

x10

x9

x11

x8

Page 30: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

MedoidShift

MedoidShift behavior asymptotically approaches MeanShiftbehavior as number of data samples approaches infinity.

Page 31: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Advantages of MedoidShift

•Automatically determines the number of clusters (as does mean-shift)

•Previous computations can be reused when new samples are added or old samples are deleted (good for incremental clustering applications)

•Can work in domains where only distances between samples are defined (e.g. EMD or ISOMAP); there is no need to compute a mean.

•No need for heuristic terminating conditions (don’t need to use a distance threshold to determine convergence)

Page 32: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

MeanShift vs MedoidShift

Page 33: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Drawbacks of MedoidShift

•O(N3) [although not really – Vedaldi paper]

•Can fail to find all the modes of a density, thereby causing over-fragmentation

starting point continuous meanshift step goes here

but medoidshift has to stay here

Page 34: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Iterated MedoidShift

•keep a count of how many points move to xj and apply a new iteration of medoidshift with points weighted by that count.

•Each iteration is essentially using a smoother Parzen estimate

Page 35: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Seeds of QuickShift!

Page 36: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Vedaldi and Soatto

• If using Euclidean distance, MedoidShift can actually be implemented in O(N^2)

• generalize to non-Euclidean distances using kernel trick

• show that MedoidShift does not always find all modes

• propose QuickShift, an alternative approach that is simple, fast, and yields a one-parameter family of clustering results that explicitly represent the tradeoff between over- and under-fragmentation.

Contributions:

Page 37: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Complexity of MedoidShift

(NxN matrices)

note: define NxN matrices D=[Dkj] , F=[Fji]then matrix multiply does the summation over jand argmax is looking for max element in col i

Page 38: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Complexity of MedoidShift

However, if using Euclidean distance (which is typical), the computation time is better than this.

Page 39: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Euclidean MedoidShift

then we can write the squared Euclidean distance matrix D as

Page 40: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Euclidean MedoidShift

Therefore

This has constant columnsand can be ignored

We have to compute this

But these quantities can all be computed in O(N2) time

Page 41: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Kernel MedoidShift

Insight: medoidshift is defined in terms of pairwise distances between data points.

Therefore, using the “kernel trick”, we can generalize it to handle nonEuclidean distances defined by a kernel matrix K(similar to how many machine learning algorithms are “kernelized”)

Page 42: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Kernel MedoidShift

and we can apply medoidshift using this DF matrix instead

- 2

Page 43: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Kernel MedoidShift

note:

But

Page 44: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

What is QuickShift?

A combination of the above kernel trick and...

instead of choosing the data point that optimizes the lowerbound function b(x) of the Parzen estimate P(x)

choose the closest data point that yields a higher valueof P(x) directly

Page 45: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

QuickShift vs MedoidShift

•Recall our failure case for medoidshift

continuous meanshift goes heremedoidshift has to stay here

starting point

quickshift goes HERE

Page 46: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Comparisons

meanshift medoidshift quickshift

Page 47: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Quickshift Benefits

•Similar benefits to medoidshift, plus additional ones:

•All points are in one connected tree. You can therefore explore various clusterings by choosing a threshold T and pruning edges such that d(xi,xj) < T

•Runs much faster in practice than either meanshift or medoidshift

Page 48: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Applicationsimage segmentation by clustering (r,g,b), or (r,g,b,x,y), or whateveryour favorite color space is

Page 49: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Applicationsmanifold clustering using ISOMAP distance

Page 50: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

Applicationsclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual “categories” in Caltech-4

clusters in 400D space (visualized by projection to rank 2)

5 discovered categories. Ground-truth category “airplanes” is split into two clusters

Page 51: Advanced Clustering Methodsrtc12/CSE586/lectures/meanshiftclustering.pdfclustering bag-of-features signatures using Chi-squared kernel (distances between histograms) to discover visual

ApplicationsClustering images using a bhattacharya coefficient on 5D histogramscomposed of (L,a,b,x,y) values