ECE 5984: Introduction to Machine Learning

Dhruv Batra Virginia Tech

Topics: –  Unsupervised Learning: Kmeans, GMM, EM

Readings: Barber 20.1-20.3

Midsem Presentations Graded •  Mean 8/10 = 80%

–  Min: 3 –  Max: 10

(C) Dhruv Batra 2

(C) Dhruv Batra 3

Classification x y

Regression x y

Discrete

Continuous

Clustering x c Discrete ID

Dimensionality Reduction

x z Continuous

Supervised Learning

Unsupervised Learning

Unsupervised Learning •  Learning only with X

–  Y not present in training data

•  Some example unsupervised learning problems: –  Clustering / Factor Analysis –  Dimensionality Reduction / Embeddings –  Density Estimation with Mixture Models

(C) Dhruv Batra 4

New Topic: Clustering

Slide Credit: Carlos Guestrin 5

Synonyms •  Clustering

•  Vector Quantization

•  Latent Variable Models •  Hidden Variable Models •  Mixture Models

•  Algorithms: –  K-means –  Expectation Maximization (EM)

(C) Dhruv Batra 6

Some Data

7 (C) Dhruv Batra Slide Credit: Carlos Guestrin

K-means

1.  Ask user how many clusters they’d like.

(e.g. k=5)

K-means

(e.g. k=5)

2.  Randomly guess k cluster Center

locations

K-means

(e.g. k=5)

locations

3.  Each datapoint finds out which Center it’s

closest to. (Thus each Center “owns” a set of datapoints)

K-means

(e.g. k=5)

locations

closest to.

4.  Each Center finds the centroid of the

points it owns

K-means

(e.g. k=5)

locations

closest to.

4.  Each Center finds the centroid of the

points it owns…

5.  …and jumps there

6.  …Repeat until terminated! 12 (C) Dhruv Batra Slide Credit: Carlos Guestrin

K-means •  Randomly initialize k centers

–  µ(0) = µ1(0),…, µk

•  Assign: –  Assign each point i∈{1,…n} to nearest center: – 

•  Recenter: –  µj becomes centroid of its points

C(i) ⇥� argminj

||xi � µj ||2

K-means •  Demo

–  http://www.kovan.ceng.metu.edu.tr/~maya/kmeans/ –  http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/

AppletKM.html

(C) Dhruv Batra 14

What is K-means optimizing? •  Objective F(µ,C): function of centers µ and point

allocations C:

– 

–  1-of-k encoding

•  Optimal K-means: –  minµmina F(µ,a)

15 (C) Dhruv Batra

F (µ, C) =NX

||xi � µC(i)||2

F (µ,a) =NX

aij ||xi � µj ||2

Coordinate descent algorithms

•  Want: mina minb F(a,b)

•  Coordinate descent: –  fix a, minimize b –  fix b, minimize a –  repeat

•  Converges!!! –  if F is bounded –  to a (often good) local optimum

•  as we saw in applet (play with it!)

•  K-means is a coordinate descent algorithm!

•  Optimize objective function:

•  Fix µ, optimize a (or C)

K-means as Co-ordinate Descent

minµ1,...,µk

mina1,...,aN

F (µ,a) = minµ1,...,µk

mina1,...,aN

•  Optimize objective function:

•  Fix a (or C), optimize µ

K-means as Co-ordinate Descent

minµ1,...,µk

mina1,...,aN

F (µ,a) = minµ1,...,µk

mina1,...,aN

One important use of K-means •  Bag-of-word models in computer vision

(C) Dhruv Batra 19

Bag of Words model

aardvark 0

about 2

Africa 1

apple 0

anxious 0

Zaire 0

Slide Credit: Carlos Guestrin (C) Dhruv Batra 20

Object Bag of ‘words’

Fei-‐Fei Li

Interest Point Features

Normalize patch

Detect patches [Mikojaczyk and Schmid ’02]

[Matas et al. ’02]

[Sivic et al. ’03]

Compute SIFT

descriptor [Lowe’99]

Slide credit: Josef Sivic

Patch Features

dictionary formation

Clustering (usually k-means)

Vector quantization

Clustered Image Patches

Fei-Fei et al. 2005

Visual Polysemy. Single visual word occurring on different (but locally similar) parts on different object categories.

Visual Synonyms. Two different visual words representing a similar part of an object (wheel of a motorbike).

Visual synonyms and polysemy

Andrew Zisserman

Image representation

codewords

Fei-‐Fei Li

(One) bad case for k-means

•  Clusters may overlap •  Some clusters may be

“wider” than others

•  GMM to the rescue!

(C) Dhruv Batra 31 Figure Credit: Kevin Murphy

−25 −20 −15 −10 −5 0 5 10 15 20 250

Recall Multi-variate Gaussians

(C) Dhruv Batra 32

(C) Dhruv Batra 33

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Figure Credit: Kevin Murphy

Hidden Data Causes Problems #1 •  Fully Observed (Log) Likelihood factorizes

•  Marginal (Log) Likelihood doesn’t factorize

•  All parameters coupled!

(C) Dhruv Batra 34

GMM vs Gaussian Joint Bayes Classifier •  On Board

–  Observed Y vs Unobserved Z –  Likelihood vs Marginal Likelihood

(C) Dhruv Batra 35

Hidden Data Causes Problems #2

(C) Dhruv Batra 36

−25 −20 −15 −10 −5 0 5 10 15 20 250

Hidden Data Causes Problems #2 •  Identifiability

(C) Dhruv Batra 37

−25 −20 −15 −10 −5 0 5 10 15 20 250

−15.5 −10.5 −5.5 −0.5 4.5 9.5 14.5 19.5

−15.5

−10.5

−5.5

−0.5

Hidden Data Causes Problems #3 •  Likelihood has singularities if one Gaussian

“collapses”

(C) Dhruv Batra 38 x

Special case: spherical Gaussians and hard assignments

•  If P(X|Z=k) is spherical, with same σ for all classes:

•  If each xi belongs to one class C(i) (hard assignment), marginal likelihood:

•  M(M)LE same as K-means!!!

P(xi, y = j)j=1

∑i=1

∏ ∝ exp − 12σ 2 xi −µC(i)

P(xi | z = j)∝ exp −12σ 2 xi −µ j

The K-means GMM assumption

•  There are k components

•  Component i has an associated mean vector µι

•  Each component generates data from a Gaussian with mean mi and

covariance matrix σ2Ι

Each data point is generated according to the following recipe:

•  Each component generates data from a Gaussian with

mean mi and covariance matrix σ2Ι

Each data point is generated according to the following

recipe:

1.  Pick a component at random: Choose component i with

probability P(y=i)

mean mi and covariance matrix σ2Ι

recipe:

probability P(y=i)

2.  Datapoint ∼ Ν(µι, σ2Ι )

The General GMM assumption

•  Component i has an associated mean vector mi

mean mi and covariance matrix Σi

recipe:

probability P(y=i)

2.  Datapoint ~ N(mi, Σi ) Slide Credit: Carlos Guestrin (C) Dhruv Batra 44

K-means vs GMM •  K-Means

–  http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html

•  GMM –  http://www.socr.ucla.edu/applets.dir/mixtureem.html

(C) Dhruv Batra 45

ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe...

Documents

Gear Measuring Machine GMM

GMM February 2010

ECE 5984: Introduction to Machine Learnings15ece5984/slides/L24_em_pca.pptx.pdf · • Expectation Maximization [Dempster ‘77] • Often looks like “soft” K-means • Extremely

CSC411 Tutorial #6 Clustering: K-Means, GMM, EM

UNIT - IV GMM

saraswata.research.mcgill.casaraswata.research.mcgill.ca/chaudhuri_guilkey_14.pdf · GMM with Multiple Missing Variables∗ Saraswata Chaudhuri†and David K. Guilkey‡ January 17,

Data sheet GCM MOD GMM EC.1 Communications … Data sheet GCM MOD GMM EC.1 V_1.0 Data sheet GCM MOD GMM EC.1 Communications module Modbus for GMM EC ERP no.: 5206415 GCM MOD GMM EC.1

AGA GMM.11-1988

Design Gmm Guide

GMM assignment

GMM Solar Energy by Rob Smith GMM Committee on Climate Change

Operating instructions Güntner Motor Management GMM · Operating instructions- Güntner Motor Management GMM Version 1.8_1 Operating instructions Güntner Motor Management GMM for

GMM Presentation May10 Rev2

GMM Comm.docx

Multi-camera detection, tracking and re-identificationandrea/dwnld/2012.03.08_UCL... · ,w 1:h f GMM-PP g(x ) f (x |z:h-1) k GMM-PP k k next-hop. Last node N z k 1. Receives PP from

Gaussian Mixture Model (GMM) Hidden Markov Model (HMM)iitg.ac.in/.../gmmHmmTutoChiefWiSSAP09.pdf · WiSSAP 2009: “Tutorial on GMM and HMM”, Samudravijaya K 6 of 88. In other words,

GMM 2015 Brochure

THE MM, ME, ML, EL, EF AND GMM APPROACHES TO ...staff.ustc.edu.cn/~zwp/teach/Math-Stat/10.1.1.25.34.pdfThe MM, ME, ML, EL, EF and GMM Approaches to Estimation: A Synthesis Anil K

Gmm Pfaudler Project Management

2008 spie gmm