58
Data Mining Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert, Eirini Ntoutsi Ludwig-Maximilians-Universität München 2012-06-28 — KDD class tutorial

Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Data Mining TutorialSession 5: Clustering

Erich Schubert, Eirini Ntoutsi

Ludwig-Maximilians-Universität München

2012-06-28 — KDD class tutorial

Page 2: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distribution

A

B

C

p

Let us look at this in more detail!

Page 3: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distribution

A

B

C

p

Clusters can be:I Spherical (A)I Ellipsoid (C)I Rotated ellipsoid (B)

But: same formula for each situation!

Let us look at this in more detail!

Page 4: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distribution

A

B

C

p

Probability Density Function ofmultivariate normal distribution:

pdf (p, µ,Σ) :=1√

(2π)d|Σ|· e−

12 ((p−µ)T Σ−1(p−µ))

The most essential multi-dimensionaldistribution!

Let us look at this in more detail!

Page 5: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distribution

A

B

C

p

Let us look at this in more detail!

Page 6: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distributionDissecting the formula

pdf (p, µ,Σ) :=1√

(2π)d|Σ|· e−

12 ((p−µ)TΣ−1(p−µ))

normalization and squared distance from meanMahalanobis distance:What is the role of Σ−1?

Page 7: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distributionDissecting the formula

pdf (p, µ,Σ) :=1√

(2π)d|Σ|· e−

12 ((p−µ)TΣ−1(p−µ))

One-dimensional normal distribution

pdf (x, µ, σ) :=1√

(2π)σ· e−

12

((x−µ)

σ

)2

normalization and squared distance from meanMahalanobis distance:What is the role of Σ−1?

Page 8: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distributionDissecting the formula

pdf (p, µ,Σ) :=1√

(2π)d|Σ|· e−

12 ((p−µ)TΣ−1(p−µ))

One-dimensional normal distribution

pdf (x, µ, σ) :=1√

(2π)σ· e−

12

((x−µ)

σ

)2

normalization and squared distance from meanMahalanobis distance:What is the role of Σ−1?

Page 9: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distributionDissecting the formula

pdf (p, µ,Σ) :=1√

(2π)d|Σ|· e−

12 ((p−µ)TΣ−1(p−µ))

One-dimensional normal distribution

pdf (x, µ, σ) :=1√

(2π)σ· e−

12

((x−µ)

σ

)2

normalization and squared distance from mean

Mahalanobis distance:What is the role of Σ−1?

Page 10: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distributionDissecting the formula

pdf (p, µ,Σ) :=1√

(2π)d|Σ|· e−

12 ((p−µ)TΣ−1(p−µ))

One-dimensional normal distribution

pdf (x, µ, σ) :=1√

(2π)σ2· e−

12 ((x−µ)σ−2(x−µ))

normalization and squared distance from mean

Mahalanobis distance:What is the role of Σ−1?

Page 11: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distributionDissecting the formula

pdf (p, µ,Σ) :=1√

(2π)d|Σ|· e−

12 ((p−µ)TΣ−1(p−µ))

One-dimensional normal distribution

pdf (x, µ, σ) :=1√

(2π)σ2· e−

12 ((x−µ)σ−2(x−µ))

normalization and squared distance from mean

Mahalanobis distance:

dMahalanobis(x, µ,Σ) :=√

(x− µ)TΣ−1(x− µ)

What is the role of Σ−1?

Page 12: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distributionDissecting the formula

pdf (p, µ,Σ) :=1√

(2π)d|Σ|· e−

12 ((p−µ)TΣ−1(p−µ))

One-dimensional normal distribution

pdf (x, µ, σ) :=1√

(2π)σ2· e−

12 ((x−µ)σ−2(x−µ))

normalization and squared distance from mean

Mahalanobis distance:

dMahalanobis(x, µ,Σ)2 := (x− µ)TΣ−1(x− µ)

What is the role of Σ−1?

Page 13: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distributionDissecting the formula

pdf (p, µ,Σ) :=1√

(2π)d|Σ|· e−

12 ((p−µ)TΣ−1(p−µ))

One-dimensional normal distribution

pdf (x, µ, σ) :=1√

(2π)σ2· e−

12 ((x−µ)σ−2(x−µ))

normalization and squared distance from mean

Mahalanobis distance:

dMahalanobis(x, µ,Σ)2 := (x− µ)TΣ−1(x− µ)

What is the role of Σ−1?

Page 14: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distributionInverse of covariance matrix – Σ−1

Covariance matrix is symmetric, and non-negative ondiagonal, therefore can usually be inverted (ignoreconstant dimensions).

They can be decomposed (equivalently):

VΛV−1 = Σ ≡ VΛ−1V−1 = Σ−1

where V contains the Eigenvectors and Λ diagonalcontaining the Eigenvalues. V ∼= rotation, Λ ∼= scaling!(This is the key idea of Principal Component Analysis!)

Page 15: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distributionInverse of covariance matrix – Σ−1

Covariance matrix is symmetric, and non-negative ondiagonal, therefore can usually be inverted (ignoreconstant dimensions).They can be decomposed (equivalently):

VΛV−1 = Σ ≡ VΛ−1V−1 = Σ−1

where V contains the Eigenvectors and Λ diagonalcontaining the Eigenvalues.

V ∼= rotation, Λ ∼= scaling!(This is the key idea of Principal Component Analysis!)

Page 16: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distributionInverse of covariance matrix – Σ−1

Covariance matrix is symmetric, and non-negative ondiagonal, therefore can usually be inverted (ignoreconstant dimensions).They can be decomposed (equivalently):

VΛV−1 = Σ ≡ VΛ−1V−1 = Σ−1

where V contains the Eigenvectors and Λ diagonalcontaining the Eigenvalues. V ∼= rotation, Λ ∼= scaling!(This is the key idea of Principal Component Analysis!)

Page 17: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distributionInverse of covariance matrix – Σ−1

Build Ω using ωi = 1/√λi = λ

− 12

i . Then ΩΩ = Λ−1.

Σ−1 = VΛ−1V−1 = VΩΩTVT = VΩ(VΩ)T

d2Mahalanobis = (x− µ)T(x− µ)

=⟨(VΩ)T(x− µ), (VΩ)T(x− µ)

⟩= L2((VΩ)T(x− µ))2

L2 is the L2 norm (Euclidean distance d(x, y) = L2(x− y)!)Mahalanobis ≈ Euclidean distance after PCA!

Page 18: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distributionInverse of covariance matrix – Σ−1

Build Ω using ωi = 1/√λi = λ

− 12

i . Then ΩΩ = Λ−1.

Σ−1 = VΛ−1V−1 = VΩΩTVT = VΩ(VΩ)T

d2Mahalanobis = (x− µ)T(x− µ)

=⟨(VΩ)T(x− µ), (VΩ)T(x− µ)

⟩= L2((VΩ)T(x− µ))2

L2 is the L2 norm (Euclidean distance d(x, y) = L2(x− y)!)Mahalanobis ≈ Euclidean distance after PCA!

Page 19: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distributionInverse of covariance matrix – Σ−1

Build Ω using ωi = 1/√λi = λ

− 12

i . Then ΩΩ = Λ−1.

Σ−1 = VΛ−1V−1 = VΩΩTVT = VΩ(VΩ)T

d2Mahalanobis = (x− µ)TΣ−1(x− µ)

=⟨(VΩ)T(x− µ), (VΩ)T(x− µ)

⟩= L2((VΩ)T(x− µ))2

L2 is the L2 norm (Euclidean distance d(x, y) = L2(x− y)!)Mahalanobis ≈ Euclidean distance after PCA!

Page 20: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distributionInverse of covariance matrix – Σ−1

Build Ω using ωi = 1/√λi = λ

− 12

i . Then ΩΩ = Λ−1.

Σ−1 = VΛ−1V−1 = VΩΩTVT = VΩ(VΩ)T

d2Mahalanobis = (x− µ)TVΩ(VΩ)T(x− µ)

=⟨(VΩ)T(x− µ), (VΩ)T(x− µ)

⟩= L2((VΩ)T(x− µ))2

L2 is the L2 norm (Euclidean distance d(x, y) = L2(x− y)!)Mahalanobis ≈ Euclidean distance after PCA!

Page 21: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distributionInverse of covariance matrix – Σ−1

Build Ω using ωi = 1/√λi = λ

− 12

i . Then ΩΩ = Λ−1.

Σ−1 = VΛ−1V−1 = VΩΩTVT = VΩ(VΩ)T

d2Mahalanobis = (x− µ)TVΩ(VΩ)T(x− µ)

=⟨(VΩ)T(x− µ), (VΩ)T(x− µ)

= L2((VΩ)T(x− µ))2

L2 is the L2 norm (Euclidean distance d(x, y) = L2(x− y)!)Mahalanobis ≈ Euclidean distance after PCA!

Page 22: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distributionInverse of covariance matrix – Σ−1

Build Ω using ωi = 1/√λi = λ

− 12

i . Then ΩΩ = Λ−1.

Σ−1 = VΛ−1V−1 = VΩΩTVT = VΩ(VΩ)T

d2Mahalanobis = (x− µ)TVΩ(VΩ)T(x− µ)

=⟨(VΩ)T(x− µ), (VΩ)T(x− µ)

⟩= L2((VΩ)T(x− µ))2

L2 is the L2 norm (Euclidean distance d(x, y) = L2(x− y)!)Mahalanobis ≈ Euclidean distance after PCA!

Page 23: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distributionInverse of covariance matrix – Σ−1

Build Ω using ωi = 1/√λi = λ

− 12

i . Then ΩΩ = Λ−1.

Σ−1 = VΛ−1V−1 = VΩΩTVT = VΩ(VΩ)T

d2Mahalanobis = (x− µ)TVΩ(VΩ)T(x− µ)

=⟨(VΩ)T(x− µ), (VΩ)T(x− µ)

⟩= L2((VΩ)T(x− µ))2

L2 is the L2 norm (Euclidean distance d(x, y) = L2(x− y)!)

Mahalanobis ≈ Euclidean distance after PCA!

Page 24: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

Multivariate normal distributionInverse of covariance matrix – Σ−1

Build Ω using ωi = 1/√λi = λ

− 12

i . Then ΩΩ = Λ−1.

Σ−1 = VΛ−1V−1 = VΩΩTVT = VΩ(VΩ)T

d2Mahalanobis = (x− µ)TVΩ(VΩ)T(x− µ)

=⟨(VΩ)T(x− µ), (VΩ)T(x− µ)

⟩= L2((VΩ)T(x− µ))2

L2 is the L2 norm (Euclidean distance d(x, y) = L2(x− y)!)Mahalanobis ≈ Euclidean distance after PCA!

Page 25: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM homework

A

B

C

p

Page 26: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM homework

A

B

C

p

Page 27: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM homework

A

B

C

pΣA =

(3 00 3

)Σ−1

A =

( 13 00 1

3

)

Page 28: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM homework

A

B

C

pΣA =

(3 00 3

)Σ−1

A =

( 13 00 1

3

)

p− µA = (0.5, 1)

Page 29: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM homework

A

B

C

pΣA =

(3 00 3

)Σ−1

A =

( 13 00 1

3

)

p− µA = (0.5, 1)

dist2 = (p− µ)TΣ−1(p− µ) ≈ 0.41666

Page 30: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM homework

A

B

C

pΣA =

(3 00 3

)Σ−1

A =

( 13 00 1

3

)

p− µA = (0.5, 1)

dist2 = (p− µ)TΣ−1(p− µ) ≈ 0.41666

probA ≈1√

(2π)29e−

12 0.41666

≈ 0.04307456

Page 31: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM homework

A

B

C

pΣ−1

B ≈(

0.571428 −0.142857−0.142857 0.285714

)

Page 32: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM homework

A

B

C

pΣ−1

B ≈(

0.571428 −0.142857−0.142857 0.285714

)

p− µB = (−2.5, 0)

Page 33: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM homework

A

B

C

pΣ−1

B ≈(

0.571428 −0.142857−0.142857 0.285714

)

p− µB = (−2.5, 0)

dist2 = (p− µ)TΣ−1(p− µ) ≈ 3.5714285

Page 34: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM homework

A

B

C

pΣ−1

B ≈(

0.571428 −0.142857−0.142857 0.285714

)

p− µB = (−2.5, 0)

dist2 = (p− µ)TΣ−1(p− µ) ≈ 3.5714285

probB ≈1√

(2π)27e−

12 3.5714285

≈ 0.01008661

Page 35: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM homework

A

B

C

pΣC =

(16 00 4

)Σ−1

C =

( 116 00 1

4

)

Page 36: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM homework

A

B

C

pΣC =

(16 00 4

)Σ−1

C =

( 116 00 1

4

)

p− µC = (1.5,−1)

Page 37: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM homework

A

B

C

pΣC =

(16 00 4

)Σ−1

C =

( 116 00 1

4

)

p− µC = (1.5,−1)

dist2 = (p− µ)TΣ−1(p− µ) ≈ 0.390625

Page 38: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM homework

A

B

C

pΣC =

(16 00 4

)Σ−1

C =

( 116 00 1

4

)

p− µC = (1.5,−1)

dist2 = (p− µ)TΣ−1(p− µ) ≈ 0.390625

probC ≈1√

(2π)264e−

12 0.390625

≈ 0.01636466

Page 39: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM homework

A

B

C

p

A B Cprob 0.043075 0.010087 0.016365size 30% 20% 50%

score 0.012922 0.002017 0.008182sum divide by 0.023122weight ≈ 55.9% ≈ 8.2% ≈ 35.4%

Page 40: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM homework

A

B

C

p

A B Cprob 0.043075 0.010087 0.016365size 30% 20% 50%score 0.012922 0.002017 0.008182

sum divide by 0.023122weight ≈ 55.9% ≈ 8.2% ≈ 35.4%

Page 41: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM homework

A

B

C

p

A B Cprob 0.043075 0.010087 0.016365size 30% 20% 50%score 0.012922 0.002017 0.008182sum divide by 0.023122

weight ≈ 55.9% ≈ 8.2% ≈ 35.4%

Page 42: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM homework

A

B

C

p

A B Cprob 0.043075 0.010087 0.016365size 30% 20% 50%score 0.012922 0.002017 0.008182sum divide by 0.023122weight ≈ 55.9% ≈ 8.2% ≈ 35.4%

Page 43: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM homework

A

B

C

p

A B Cprob 0.043075 0.010087 0.016365size 30% 20% 50%score 0.012922 0.002017 0.008182sum divide by 0.023122weight ≈ 55.9% ≈ 8.2% ≈ 35.4%

Point p belongs mostly to cluster A!

Page 44: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM demonstration

Old Faithful geyser data (Iteration 0):

1 2 3 4 5 6Duration40

50

60

70

80

90

100Delay

Page 45: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM demonstration

Old Faithful geyser data (Iteration 1):

1 2 3 4 5 6Duration40

50

60

70

80

90

100Delay

Page 46: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM demonstration

Old Faithful geyser data (Iteration 2):

1 2 3 4 5 6Duration40

50

60

70

80

90

100Delay

Page 47: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM demonstration

Old Faithful geyser data (Iteration 3):

1 2 3 4 5 6Duration40

50

60

70

80

90

100Delay

Page 48: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM demonstration

Old Faithful geyser data (Iteration 4):

1 2 3 4 5 6Duration40

50

60

70

80

90

100Delay

Page 49: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM demonstration

Old Faithful geyser data (Iteration 5):

1 2 3 4 5 6Duration40

50

60

70

80

90

100Delay

Page 50: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM demonstration

Old Faithful geyser data (Iteration 6):

1 2 3 4 5 6Duration40

50

60

70

80

90

100Delay

Page 51: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM demonstration

Old Faithful geyser data (Iteration 7):

1 2 3 4 5 6Duration40

50

60

70

80

90

100Delay

Page 52: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM demonstration

Old Faithful geyser data (Iteration 8):

1 2 3 4 5 6Duration40

50

60

70

80

90

100Delay

Page 53: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM demonstration

Old Faithful geyser data (Iteration 9):

1 2 3 4 5 6Duration40

50

60

70

80

90

100Delay

Page 54: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM demonstration

Old Faithful geyser data (Iteration 10):

1 2 3 4 5 6Duration40

50

60

70

80

90

100Delay

Page 55: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM demonstration

Old Faithful geyser data (Iteration 11):

1 2 3 4 5 6Duration40

50

60

70

80

90

100Delay

Page 56: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM demonstration

Old Faithful geyser data (Iteration 12):

1 2 3 4 5 6Duration40

50

60

70

80

90

100Delay

Page 57: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM demonstration

Old Faithful geyser data (Iteration 13):

1 2 3 4 5 6Duration40

50

60

70

80

90

100Delay

Page 58: Data Mining Tutorial - uni-muenchen.de · Tutorial E. Schubert, E. Ntoutsi Aufgabe 8-1 + Aufgabe 8-2 Old Faithful EM Demo Data Mining Tutorial Session 5: Clustering Erich Schubert,

Data MiningTutorial

E. Schubert,E. Ntoutsi

Aufgabe 8-1 +Aufgabe 8-2

Old FaithfulEM Demo

EM demonstration

Old Faithful geyser data (Iteration 14):

1 2 3 4 5 6Duration40

50

60

70

80

90

100Delay