26
EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc´ ıa-Escudero, A. Gordaliza, C. Matr´ an and A. Mayo-Iscar Dpto. de Estad´ ıstica e I. O. Universidad de Valladolid (SPAIN)

EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

Embed Size (px)

Citation preview

Page 1: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

EXPLORATORY TOOLS IN MODEL-BASEDCLUSTERING

L.A. Garcıa-Escudero, A. Gordaliza, C. Matran and A. Mayo-IscarDpto. de Estadıstica e I. O. Universidad de Valladolid

(SPAIN)

usuario
Nota adhesiva
MigrationConfirmed definida por usuario
usuario
Nota adhesiva
MigrationConfirmed definida por usuario
usuario
Texto escrito a máquina
usuario
Texto escrito a máquina
usuario
Texto escrito a máquina
usuario
Texto escrito a máquina
BoSanTouVal 2009
Page 2: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

1.- Introduction

• Two key questions:

¦ How to adequately choose the number of groups in a clusteringproblem?

¦ How to measure the strength of data point cluster assignments?

• It is impossible to answer these two questions without:

¦ Stating clearly which is the probabilistic model assumed.¦ Putting constrains on the allowed clusters scatters.¦ Stating clearly what we understand by noise.

Page 3: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

2.- Model Based Clustering:

• Many statistical practitioners view the Cluster Analysis as a collectionof mostly heuristic techniques for partitioning multivariate data.

• This view relies on the fact that most of the cluster techniques are notexplicitly based on a probabilistic model:

“...lead the naive investigator into believing that he or she didnot make any assumption at all, and that the results thereforeare ‘objective’...” (Flury 1997, page 123)

⇒ A properly stated underlying probabilistic model is convenient

Page 4: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

• Two model-based clustering approaches:

¦ Mixture approach:

n∏

i=1

[ k∑

j=1

πjφ(xi; θj)]

(assign xi to cluster j whenever πjφ(x; θj) > πlφ(x; θl) for l 6= j).

¦ “Crisp” (0-1) approach:

k∏

j=1

i∈Rj

φ(xi; θj)

(Rj indexes of the xi’s assigned to cluster j).

Page 5: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

¦ Mixture approach:

n∏

i=1

[ k∑

j=1

πjφ(xi; θj)]⇒ EM-algorithm

(assign xi to cluster j whenever πjφ(x; θj) > πlφ(x; θl) for l 6= j).

¦ “Crisp” (0-1) approach:

k∏

j=1

i∈Rj

φ(xi; θj) ⇒ CEM-algorithm

(Rj indexes of the xi’s assigned to cluster j).

Page 6: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

• Noise in real problems ⇒ Robust Clustering

• Two robust clustering approaches providing “theoretical well-basedclustering criterion in presence of outliers” (Bock 2002):

¦ Mixture modeling: The noise is fitted through mixture components(Fraley and Raftery, Peel and McLachlan,...)

¦ Trimming approach: A fraction α of most outlying data is trimmed.(Gallegos and Ritter, Cuesta-Albertos et al., Garcıa-Escudero et al.,Neykov et al.,...).

Page 7: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

• Noise in real problems ⇒ Robust Clustering

• Two robust clustering approaches providing “theoretical well-basedclustering criterion in presence of outliers” (Bock 2002):

¦ Mixture modeling: The noise is fitted through mixture components(Fraley and Raftery, Peel and McLachlan,...)

¦ Trimming approach: A fraction α of most outlying data is trimmed.(Gallegos and Ritter, Cuesta-Albertos et al., Garcıa-Escudero et al.,Neykov et al.,...).

⇒ We will focus on the trimming approach!!

Page 8: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

3.- TCLUST methodology

• Spurious-Outlier Model (Gallegos 2001 and Gallegos and Ritter 2005):

[ k∏

j=1

i∈Rj

f(xi; µj, Σj)][ ∏

i/∈R

gψi(xi)

]

¦ f(xi; µ, Σ) is a p-variate normal p.d.f..

¦ R = ∪kj=1Rj contains [n(1− α)] regular data.

¦ gψiare some p.d.f.’s for the non-regular data.

Page 9: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

• If no conditions are possed on Σi’s ⇒ Not a well-defined problem.

• Restrictions are needed:

¦ Same spherical covariance matrices (i.e., Σj = σ2 · I) ⇒ Trimmedk-means (Cuesta-Albertos et al. 1997).

¦ Same (not necessarily spherical) covariance matrices (Σj = Σ) ⇒Determinantal criteria (Gallegos and Ritter 2005).

¦ Different covariances but with equal scales (|Σ1| = ... = |Σg|) ⇒Heterogeneous robust clustering (Gallegos 2001, 2003)

Page 10: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

• A different constrain:

Mn = maxj=1,...,k

maxl=1,...,p

λl(Σj) and mn = minj=1,...,k

minl=1,...,p

λl(Σj),

where λl(Σj) are the eigenvalues of the Σj.

• Fix a constant c such that

Mn/mn ≤ c (Eigenvalues-ratio restriction).

¦ c controls the strength of the restriction:· c = 1 ⇒ Trimmed k-means.· Large c ⇒ An almost unrestricted solution.

• It extends Hathaway’s restrictions σ2i ≤ c · σ2

j for 1 ≤ i, j ≤ k.

Page 11: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

• Weights: We consider group weights πj ∈ [0, 1].

• Trimming + Eigenvalue restrictions + Weights ⇒ TCLUST

(Garcıa-Escudero et al. (2008) Annals of Statistics, 36, 1324-1345)

¦ Existence of both theoretical and sample solutions.¦ Consistency.¦ Feasible algorithm.

Page 12: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

3. Guidance in choosing k

• Many procedures for choosing k in “crisp” clustering are based onmonitoring the size of the (log-)“likelihoods”:

k 7→k∑

j=1

i∈Rj

log φ(xi; θj) for k = 1, 2, ... .

• Examples:

¦ Σj = σ2I (k-means) ⇒ Friedman and Rubin 1967, Engelman andHartigan 1969, Calinski and Harabasz 1974,...

¦ Σj = Σ (determinant criterium) ⇒ Marriot 1971,...

• Trimmed versions can be also considered⇒ Garcıa-Escudero et al 2003.

Page 13: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

• Drawback: The log-likelihoods strictly increase when increasing k:

0 5 10

−20

24

68

10

k = 1

x1

x2

0 5 10

−20

24

68

10

k = 2

x1

x2

0 5 10

−20

24

68

10

k = 3

x1

x2

• Log-likelihoods: -4765.8 (k = 1) < -3530.1 (k = 2) < -3197.7 (k = 3)

Page 14: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

• Figure:

Number of groups

Log−

likeli

hood

1 2 3

−500

0−4

500

−400

0−3

500

• Solutions:

¦ Searching for an “elbow”.¦ Nonlinear transformation by Sugar and James 2003.

Page 15: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

• The TCLUST does not suffer from this problem:

0 5 10

−20

24

68

10k = 1

x1

x2

0 5 10

−20

24

68

10

k = 2

x1

x2

0 5 10

−20

24

68

10

k = 3

x1

x2

• Log-likelihoods: -4765.8 (k = 1) < -4203.2 (k = 2) = -4203.2 (k = 3)

• Recall the presence of weights (which can be set to zero) ⇒ π3 = 0 ink = 3 solution!.

Page 16: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

• The TCLUST does not suffer from this problem:

0 5 10

−20

24

68

10k = 1

x1

x2

0 5 10

−20

24

68

10

k = 2

x1

x2

0 5 10

−20

24

68

10

k = 3

x1

x2

• Log-likelihoods: -4765.8 (k = 1) < -4203.2 (k = 2) = -4203.2 (k = 3)

• This fact was already noticed by Bryant (1991) when dealing with theso-called Penalized Classification Maximum Likelihood...

Page 17: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

• Importance of the “trimming” and the scatter constrain:

−10 0 5 10 15 20

−10−5

05

1015

20

(a) k=3, alpha=0 and large c

−10 0 5 10 15 20−10

−50

510

1520

(b) k=2, alpha=0.1 and moderate c

⇒ “◦” are trimmed points in the figure on the right.

Page 18: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

4.- Classification Trimmed Likelihood Curves

• Based on the TCLUST methodology, we monitor:

(α, k) 7→ L(α, k) :=k∑

j=1

nj log πj +k∑

j=1

i∈Rj

log φ(xi; θj),

when k = 1, 2, ... and α ∈ (0, 1).

• Smallest k such that L(α, k) ' L(α, k + 1) (for almost every α) ⇒ k isa good choice for the number of groups.

• L(α, k) increase very fast till α ≤ α0 ⇒ α0 is a good choice for thetrimming level.

• They provide information about the group sizes.

Page 19: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

Example 1: Mixture 0.9 ·N(( 0

0 ) , ( 4 11 1 )

)+ 0.1 ·N(

( 5−5 ) , ( 30 00 30 )

)

0.0 0.1 0.2 0.3 0.4 0.5

−500

0−3

000

−100

0

k = 1

alpha

0.0 0.1 0.2 0.3 0.4 0.5

−500

0−3

000

−100

0

k = 2

alpha

0.0 0.1 0.2 0.3 0.4 0.5

−500

0−3

000

−100

0

k = 3

alpha

−5 0 5 10 15 20

−20

−10

05

Unrestric

x

y

k = 2 vs. 1

alpha

0 0.1 0.2 0.3 0.4 0.5

020

060

0

k = 3 vs. 2

alpha

0 0.1 0.2 0.3 0.4 0.5

020

060

0

← L(α, k)

← L(α, k − 1)

Page 20: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

Example 2: 0.3 · N((−5

5 ) , ( 4 11 1 )

)+ 0.3 · N

(( 5−5 ) ,

(4 −1−1 1

) )+ 0.3 ·

N(( 5

5 ) , ( 4 00 4 )

)+ 0.1 ·N(

( 00 ) , ( 30 0

0 30 ))

0.0 0.2 0.4 0.6

−600

0−4

000

−200

0

k = 1

alpha

0.0 0.2 0.4 0.6−6

000

−400

0−2

000

k = 2

alpha

0.0 0.2 0.4 0.6

−600

0−4

000

−200

0

k = 3

alpha

0.0 0.2 0.4 0.6

−600

0−4

000

−200

0

k = 4

alpha

−15 −5 5 15

−15

−55

15

Unrestric

x

y

k = 2 vs. 1

alpha

0 0.2 0.5

020

040

060

080

0

k = 3 vs. 2

alpha

0 0.2 0.50

200

400

600

800

k = 4 vs. 3

alpha

0 0.2 0.5

020

040

060

080

0

Page 21: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

Example 3: “The topography of multivariate normal mixtures” (Ray andLindsay 2005) ⇒ Mixture with 2 components.

0.0 0.2 0.4 0.6

−250

0−1

500

−500

k = 1

alpha

0.0 0.2 0.4 0.6

−250

0−1

500

−500

k = 2

alpha

0.0 0.2 0.4 0.6

−250

0−1

500

−500

k = 3

alpha

−3 −2 −1 0 1 2

−10

12

34

Unrestric

x

y

k = 2 vs. 1

alpha

0 0.1 0.3 0.5 0.7

010

030

050

0k = 3 vs. 2

alpha

0 0.1 0.3 0.5 0.7

010

030

050

0

Page 22: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

Mixture with 2 components with 3 modes:

−2 −1 0 1 2 3

−2−1

01

23

x

y

−2 −1 0 1 2 3

−2−1

01

23

alpha = 0.5

x

y

Page 23: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

5.- Strength of cluster assignments

• Confirmatory tool: Were satisfactory the choices made for k and α?• The strength of the cluster assignment of observation xi to group j:

Dj(xi, θ) = πjφ(xi, θj)

• If D(1)(x, θ) ≤ ... ≤ D(k)(x, θ), define some Bayes factors as:

BF(i) = log(D(k−1)(xi; θ)/D(k)(xi; θ)

).

¦ Small BF(i) ⇒ Clear cluster assignment for the observation xi.

Page 24: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

• Bayes factors for trimmed points: Given the maximum possiblestrength:

di = maxj=1,...,k

{πjφ(xi, θj)} = D(k)(xi, θ), we have:

¦ d(1) ≤ ... ≤ d([nα]) ≤ ... ≤ d(n) ⇒ [nα] observations to be trimmed.

¦ Bayes factors for trimmed data ⇒ BF(i) = log(D(k)(xi, θ)/d([nα])

).

¦ Small BF(i) ⇒ More clearly observation i should be trimmed.

Page 25: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

• Graphical display I: “Silhouette” plot

−10 −5 0 5 10 15 20

−10

−50

510

15

(a) k= 3 and alpha= 0.1

x1

x2

−150 −100 −50 0

020

040

060

080

010

00

(b)

Bayes factor

Grou

ps

Page 26: EXPLORATORY TOOLS IN MODEL-BASED …bbercu/Event/Bosantouval...EXPLORATORY TOOLS IN MODEL-BASED CLUSTERING L.A. Garc ‡a-Escudero, A. Gordaliza, C. Matr an and A. Mayo-Iscar Dpto

• Graphical display II: “Most Doubtful assignments”¦ Label observations i’s with BF(i) ≥ log(3/4):

−10 −5 0 5 10 15 20

−10

−50

510

15

BF > log(3/4)

[PCA, discriminant or Bhattacharyya coordinates (Hennig and Christlieb 2002) if p > 2...]