44
Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT 2010

Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

Embed Size (px)

Citation preview

Page 1: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

Characterization of Linkage-Based Algorithms

Margareta AckermanJoint work with

Shai Ben-David and David LokerUniversity of Waterloo

To appear in COLT 2010

Page 2: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

There are a wide variety of clustering algorithms, which often produce very different clusterings.

How can we distinguish between clustering algorithms?

How should a user decide which algorithm to use for a given application?

Motivation

Page 3: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

We propose a framework that lets a user utilize prior knowledge to select an algorithm

• Identify properties that distinguish between different clustering paradigms

• The properties should be:1) Intuitive and “user-friendly”2) Useful for classifying clustering algorithms

Our approach for clustering algorithm selection

Page 4: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

• Kleinberg proposes abstract properties (“Axioms”) of clustering functions (NIPS, 2002)

• Bosagh Zadeh and Ben-David provide a set of properties that characterize single linkage clustering (UAI, 2009)

Previous work

Page 5: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

• Propose a set of intuitive properties that uniquely indentify linkage-based clustering algorithms

• Construct a taxonomy of clustering algorithms based on the properties

Our contributions

Page 6: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

• Define linkage-based clustering• Our new clustering properties• Main result • Sketch of proof• A taxonomy of common clustering algorithms

using clustering properties• Conclusions

Outline

Page 7: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

For a finite domain set X, a dissimilarity function d over the members of X.

A Clustering Function F mapsInput: (X,d) and k>0 totoOutput: a k-partition (clustering) of X

Formal setup

Page 8: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

• Start with the clustering of singletons• Merge the closest pair of clusters • Repeat until only k clusters remain.Ex. Single linkage, average linkage, complete linkage

Informally, a linkage function isan extension of the between-point distancethat applies to subsets of the domain.

• The choice of the linkage function distinguishes between different linkage-based algorithms.

?

Linkage-based algorithm:An informal definition

Page 9: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

• Define linkage-based clustering• Our new clustering properties• Main result • Sketch of proof• A taxonomy of common clustering algorithms

using our properties• Conclusions

Outline

Page 10: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

• A clustering C is a refinement of clustering C’ if every cluster in C’ is a union of some clusters in C.

• A clustering function is hierarchical if for and every

F(X,d,k’) is a refinement of F(X,d,k).

dX

Hierarchical clustering

||'1 Xkk

Page 11: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

F is local if for any

),,,( kdXFC |)|,,( CdcFC

Cc

C

)4,,( dXF )2,'/,'( XdXF

Locality

Page 12: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

• Many clustering algorithms are local:K-meansK-medianSingle-linkageAverage-linkageComplete-linkage

• Notably, some clustering algorithms fail locality:Ratio cutNormalized cut

Which paradigms satisfy locality ?

Page 13: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

If d’ equals d, except for increasing between-cluster distances, then F(X,d,k)=F(X,d’,k) for all d, X, and k.

d d’

F(X,d,3) F(X,d’,3)

Outer Consistency Based on Kleinberg, 2002.

Page 14: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

• K-means• K-median• Single-linkage• Average-linkage• Complete-linkageNot all clustering algorithms are outer-consistent

Ratio cutNormalized cut

Which paradigms satisfy outer-consistency?

Page 15: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

),( 11 dX ),( 22 dX ),( 33 dX

),( 11 dX ),( 22 dX

),( 33 dX),( dX

Extended Richness

Page 16: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

),( 11 dX ),( 22 dX ),( 33 dX

),( 11 dX ),( 22 dX

),( 33 dX

Extended Richness

)3,,( dXF

Page 17: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

F satisfies extended richness if for any set of domains

there is a d over that extends each of the so that

)},(,),,(),,{( 2211 kk dXdXdX iXX

}.,,,{),,( 21 kXXXkdXF

),( 11 dX ),( 22 dX ),( 33 dX

),( 11 dX ),( 22 dX

),( 33 dX

Extended Richness

)3,,( dXF

sdi

Page 18: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

• K-means• K-median• Single-linkage• Average-linkage• Complete-linkage• Ratio cut• Normalized cut

Many clustering algorithms satisfy extended richness

Page 19: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

• Define linkage-based clustering• Our new clustering properties• Main result • Sketch of proof• A taxonomy of common clustering algorithms

using our properties• Conclusions

Outline

Page 20: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

Theorem: A clustering function is Linkage Based

if and only if it is Hierarchical and it satisfies Outer Consistency, Locality and Extended Richness.

Our main result

Page 21: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

Every Linkage Based clustering function is Hierarchical, Local, Outer-Consistent, and satisfies Extended Richness.

The proof is quite straight-forward.

Easy direction of proof

Page 22: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

If F is Hierarchical and it satisfies Outer Consistency, Locality and Extended-Richness

then F is Linkage-Based.

To prove this direction we first need to formalize linkage-based clustering, by formally defining what is a linkage function.

Interesting direction of proof

Page 23: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

A linkage function is a function

l:{ : d is a distance function over } that satisfies the following:

What do we expect from linkage function?

),,( 21 dXX21 XX

R

1) Representation independent: Doesn’t change if we re-label the data

2) Monotonic: if we increase edges that go between and , then l doesn’t decrease.

3) Any pair of clusters can be made arbitrarily distant:By increasing edges that go between

and , we can make l reach any value in the range of l.

1X2X

),( 21 dXX

1X 2X ),,( 21 dXX

1X 2X ),,( 21 dXX

Page 24: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

Recall direction: If F is a hierarchical function that satisfies outer-consistency, locality, and extended richness then F is linkage-based.

Goal: Define a linkage function l so that the linkage based clustering based on l outputs F(X,d,k) (for every X, d and k).

Sketch of proof

Page 25: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

• Define an operator <F : (A,B,d1) <F (C,D,d2) if when we run F on , where d extends d1 and d2, A and B are merged before C and D.

),( dDCBA

Sketch of proof (continued…)

)4,,( dDCBAF

A

B

C

D

Page 26: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

Sketch of proof (continued…)

)3,,( dDCBAF

A

B

C

D

• Define an operator <F : (A,B,d1) <F (C,D,d2) if when we run F on , where d extends d1 and d2, A and B are merged before C and D.

),( dDCBA

Page 27: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

Sketch of proof (continued…)

)3,,( dDCBAF

• Prove that <F can be extended to a partial ordering

• Use the ordering to define l

A

B

C

D

• Define an operator <F : (A,B,d1) <F (C,D,d2) if when we run F on , where d extends d1 and d2, A and B are merged before C and D.

),( dDCBA

Page 28: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

Sketch of proof continue:Show that <F is a partial ordering

We show that <F is cycle-free.

Lemma: Given a function F that is hierarchical, local, outer-consistent and satisfies extended richness, there are noso that

and

),,(,),,,(),,,( 1122111 dBAdBAdBA nn),,(),,(),,( 222111 nnnFFF dBAdBAdBA

),,(),,( 111 nnn dBAdBA

Page 29: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

• By the above Lemma, the transitive closure of <F is a partial ordering.

• This implies that there exists an order preserving function l that maps pairs of data sets to RR.

• It can be shown that l satisfies the properties of a linkage function.

Sketch of proof (continued…)

Page 30: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

• Define linkage-based clustering• Our new clustering properties• Main result • Sketch of proof• A taxonomy of common clustering algorithms

using our properties• Conclusions

Outline

Page 31: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

Local OuterCon.

InnerCon.

Heirar-chical

PathDist.

OrderInv.

Extent.Rich.

ScaleInv.

Iso.Inv.

Single linkage

Average linkage

Complete linkage

K-means

K-median

Min-Sum

Ratio-cut

Normalized-cut

Taxonomy of clustering algorithms

Page 32: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

Local OuterCon.

InnerCon.

Heirar-chical

PathDist.

OrderInv.

Extent.Rich.

ScaleInv.

Iso.Inv.

Single linkage

Average linkage

Complete linkage

K-means

K-median

Min-Sum

Ratio-cut

Normalized-cut

Characterization of Linkage-Based Algorithms

Page 33: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

Local OuterCon.

InnerCon.

Heirar-chical

PathDist.

OrderInv.

Extent.Rich.

ScaleInv.

Iso.Inv.

Single linkage

Average linkage

Complete linkage

K-means

K-median

Min-Sum

Ratio-cut

Normalized-cut

Characterization of Single-LinkageBy Bosagh Zadeh and Ben-David (UAI, 09)

Page 34: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

Local OuterCon.

InnerCon.

Heirar-chical

PathDist.

OrderInv.

Extent.Rich.

ScaleInv.

Iso.Inv.

Single linkage

Average linkage

Complete linkage

K-means

K-median

Min-Sum

Ratio-cut

Normalized-cut

Distinguishing among Linkage-Based Algorithms

Page 35: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

A function F is order invariant if for all d and d’ where for all points p,q,r,s d(p,q)< d(r,s) iff d’(p,q)< d’(r,s),

we have that F(X,d) = F(X,d’).

Local OuterCon.

InnerCon.

Heirar-chical

PathDist.

OrderInv.

Extent.Rich.

ScaleInv.

Iso.Inv.

Single linkage

Average linkage

Complete linkage

Distinguishing among Linkage-Based Algorithms

Page 36: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

Local OuterCon.

InnerCon.

Heirar-chical

PathDist.

OrderInv.

Extent.Rich.

ScaleInv.

Iso.Inv.

Single linkage

Average linkage

Complete linkage

K-means

K-median

Min-Sum

Ratio-cut

Normalized-cut

When “Natural” properties are not satisfied

Page 37: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

Local OuterCon.

InnerCon.

Heirar-chical

PathDist.

OrderInv.

Extent.Rich.

ScaleInv.

Iso.Inv.

Single linkage

Average linkage

Complete linkage

K-means

K-median

Min-Sum

Ratio-cut

Normalized-cut

Properties Axioms

Page 38: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

• Using this framework, clustering users can utilize prior knowledge to determine which properties make sense for their application

• The goal is to construct a property-based taxonomy for many useful clustering algorithms

• Using this approach, a user will be able to find a suitable algorithm without the overhead of executing many clustering algorithms

Advantages of the Framework

Page 39: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

• We introduced new properties of clustering algorithms.

• We use these properties to provide a characterization of linkage-based algorithms.

• We classified common clustering algorithms using these properties.

Conclusions

Page 40: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

• Kleinberg (NIPS, 02) proposed 3 “axioms” of clustering functions, which he showed to be inconsistent.

• Ackerman and Ben-David (NIPS, 08) showed that the these properties are consistent in the setting of clustering quality measures.

• Goal: find a consistent set of axioms of clustering functions.

Axioms of clustering

Page 41: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

• An axiom is a property that is satisfied by all members of a class

• A complete set of axioms of clustering functions would be satisfied by all clustering functions, and only by clustering functions

• Our goal is to find a complete set of axioms of clustering

• We use Kleinberg’s axioms as a starting point• If we fix k, Kleinberg’s axioms are consistent.

Axioms VS properties

Page 42: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

• Scale Invariance: The output of the function doesn’t change if the data is scaled uniformly. Satisfied by common clustering algorithms.

• Richness: For all k-clustering C of X, there exists a distance function d over X so that F(X,d,k) = C. Richness is implied by extended richness. Satisfied by common clustering algorithms.

Kleinberg’s axioms for fixed K

• Consistency: If d’ equals d, except for increasing between-cluster distances, then F(X,d,k)=F(X,d’,k).Not satisfied by some common clustering algorithms.Relaxations of this property, inner and outer consistency

are also not satisfied by some common algorithms.

Page 43: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

• We propose using the following as axioms of clustering. Scale invariance Isomorphism invariance Extended richness

• Are there natural clustering functions that fail any of these properties?

• Are these axioms sufficient?

Towards axioms of clustering functions

Page 44: Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT

Local OuterCon.

InnerCon.

Heirar-chical

PathDist.

OrderInv.

Extent.Rich.

ScaleInv.

Iso.Inv.

Single linkage

Average linkage

Complete linkage

K-means

K-median

Min-Sum

Ratio-cut

Normalized-cut

Properties Axioms