Upload
hugo-smith
View
223
Download
0
Tags:
Embed Size (px)
Citation preview
Characterization of Linkage-Based Algorithms
Margareta AckermanJoint work with
Shai Ben-David and David LokerUniversity of Waterloo
To appear in COLT 2010
There are a wide variety of clustering algorithms, which often produce very different clusterings.
How can we distinguish between clustering algorithms?
How should a user decide which algorithm to use for a given application?
Motivation
We propose a framework that lets a user utilize prior knowledge to select an algorithm
• Identify properties that distinguish between different clustering paradigms
• The properties should be:1) Intuitive and “user-friendly”2) Useful for classifying clustering algorithms
Our approach for clustering algorithm selection
• Kleinberg proposes abstract properties (“Axioms”) of clustering functions (NIPS, 2002)
• Bosagh Zadeh and Ben-David provide a set of properties that characterize single linkage clustering (UAI, 2009)
Previous work
• Propose a set of intuitive properties that uniquely indentify linkage-based clustering algorithms
• Construct a taxonomy of clustering algorithms based on the properties
Our contributions
• Define linkage-based clustering• Our new clustering properties• Main result • Sketch of proof• A taxonomy of common clustering algorithms
using clustering properties• Conclusions
Outline
For a finite domain set X, a dissimilarity function d over the members of X.
A Clustering Function F mapsInput: (X,d) and k>0 totoOutput: a k-partition (clustering) of X
Formal setup
• Start with the clustering of singletons• Merge the closest pair of clusters • Repeat until only k clusters remain.Ex. Single linkage, average linkage, complete linkage
Informally, a linkage function isan extension of the between-point distancethat applies to subsets of the domain.
• The choice of the linkage function distinguishes between different linkage-based algorithms.
?
Linkage-based algorithm:An informal definition
• Define linkage-based clustering• Our new clustering properties• Main result • Sketch of proof• A taxonomy of common clustering algorithms
using our properties• Conclusions
Outline
• A clustering C is a refinement of clustering C’ if every cluster in C’ is a union of some clusters in C.
• A clustering function is hierarchical if for and every
F(X,d,k’) is a refinement of F(X,d,k).
dX
Hierarchical clustering
||'1 Xkk
F is local if for any
),,,( kdXFC |)|,,( CdcFC
Cc
C
)4,,( dXF )2,'/,'( XdXF
Locality
• Many clustering algorithms are local:K-meansK-medianSingle-linkageAverage-linkageComplete-linkage
• Notably, some clustering algorithms fail locality:Ratio cutNormalized cut
Which paradigms satisfy locality ?
If d’ equals d, except for increasing between-cluster distances, then F(X,d,k)=F(X,d’,k) for all d, X, and k.
d d’
F(X,d,3) F(X,d’,3)
Outer Consistency Based on Kleinberg, 2002.
• K-means• K-median• Single-linkage• Average-linkage• Complete-linkageNot all clustering algorithms are outer-consistent
Ratio cutNormalized cut
Which paradigms satisfy outer-consistency?
),( 11 dX ),( 22 dX ),( 33 dX
),( 11 dX ),( 22 dX
),( 33 dX),( dX
Extended Richness
),( 11 dX ),( 22 dX ),( 33 dX
),( 11 dX ),( 22 dX
),( 33 dX
Extended Richness
)3,,( dXF
F satisfies extended richness if for any set of domains
there is a d over that extends each of the so that
)},(,),,(),,{( 2211 kk dXdXdX iXX
}.,,,{),,( 21 kXXXkdXF
),( 11 dX ),( 22 dX ),( 33 dX
),( 11 dX ),( 22 dX
),( 33 dX
Extended Richness
)3,,( dXF
sdi
• K-means• K-median• Single-linkage• Average-linkage• Complete-linkage• Ratio cut• Normalized cut
Many clustering algorithms satisfy extended richness
• Define linkage-based clustering• Our new clustering properties• Main result • Sketch of proof• A taxonomy of common clustering algorithms
using our properties• Conclusions
Outline
Theorem: A clustering function is Linkage Based
if and only if it is Hierarchical and it satisfies Outer Consistency, Locality and Extended Richness.
Our main result
Every Linkage Based clustering function is Hierarchical, Local, Outer-Consistent, and satisfies Extended Richness.
The proof is quite straight-forward.
Easy direction of proof
If F is Hierarchical and it satisfies Outer Consistency, Locality and Extended-Richness
then F is Linkage-Based.
To prove this direction we first need to formalize linkage-based clustering, by formally defining what is a linkage function.
Interesting direction of proof
A linkage function is a function
l:{ : d is a distance function over } that satisfies the following:
What do we expect from linkage function?
),,( 21 dXX21 XX
R
1) Representation independent: Doesn’t change if we re-label the data
2) Monotonic: if we increase edges that go between and , then l doesn’t decrease.
3) Any pair of clusters can be made arbitrarily distant:By increasing edges that go between
and , we can make l reach any value in the range of l.
1X2X
),( 21 dXX
1X 2X ),,( 21 dXX
1X 2X ),,( 21 dXX
Recall direction: If F is a hierarchical function that satisfies outer-consistency, locality, and extended richness then F is linkage-based.
Goal: Define a linkage function l so that the linkage based clustering based on l outputs F(X,d,k) (for every X, d and k).
Sketch of proof
• Define an operator <F : (A,B,d1) <F (C,D,d2) if when we run F on , where d extends d1 and d2, A and B are merged before C and D.
),( dDCBA
Sketch of proof (continued…)
)4,,( dDCBAF
A
B
C
D
Sketch of proof (continued…)
)3,,( dDCBAF
A
B
C
D
• Define an operator <F : (A,B,d1) <F (C,D,d2) if when we run F on , where d extends d1 and d2, A and B are merged before C and D.
),( dDCBA
Sketch of proof (continued…)
)3,,( dDCBAF
• Prove that <F can be extended to a partial ordering
• Use the ordering to define l
A
B
C
D
• Define an operator <F : (A,B,d1) <F (C,D,d2) if when we run F on , where d extends d1 and d2, A and B are merged before C and D.
),( dDCBA
Sketch of proof continue:Show that <F is a partial ordering
We show that <F is cycle-free.
Lemma: Given a function F that is hierarchical, local, outer-consistent and satisfies extended richness, there are noso that
and
),,(,),,,(),,,( 1122111 dBAdBAdBA nn),,(),,(),,( 222111 nnnFFF dBAdBAdBA
),,(),,( 111 nnn dBAdBA
• By the above Lemma, the transitive closure of <F is a partial ordering.
• This implies that there exists an order preserving function l that maps pairs of data sets to RR.
• It can be shown that l satisfies the properties of a linkage function.
Sketch of proof (continued…)
• Define linkage-based clustering• Our new clustering properties• Main result • Sketch of proof• A taxonomy of common clustering algorithms
using our properties• Conclusions
Outline
Local OuterCon.
InnerCon.
Heirar-chical
PathDist.
OrderInv.
Extent.Rich.
ScaleInv.
Iso.Inv.
Single linkage
Average linkage
Complete linkage
K-means
K-median
Min-Sum
Ratio-cut
Normalized-cut
Taxonomy of clustering algorithms
Local OuterCon.
InnerCon.
Heirar-chical
PathDist.
OrderInv.
Extent.Rich.
ScaleInv.
Iso.Inv.
Single linkage
Average linkage
Complete linkage
K-means
K-median
Min-Sum
Ratio-cut
Normalized-cut
Characterization of Linkage-Based Algorithms
Local OuterCon.
InnerCon.
Heirar-chical
PathDist.
OrderInv.
Extent.Rich.
ScaleInv.
Iso.Inv.
Single linkage
Average linkage
Complete linkage
K-means
K-median
Min-Sum
Ratio-cut
Normalized-cut
Characterization of Single-LinkageBy Bosagh Zadeh and Ben-David (UAI, 09)
Local OuterCon.
InnerCon.
Heirar-chical
PathDist.
OrderInv.
Extent.Rich.
ScaleInv.
Iso.Inv.
Single linkage
Average linkage
Complete linkage
K-means
K-median
Min-Sum
Ratio-cut
Normalized-cut
Distinguishing among Linkage-Based Algorithms
A function F is order invariant if for all d and d’ where for all points p,q,r,s d(p,q)< d(r,s) iff d’(p,q)< d’(r,s),
we have that F(X,d) = F(X,d’).
Local OuterCon.
InnerCon.
Heirar-chical
PathDist.
OrderInv.
Extent.Rich.
ScaleInv.
Iso.Inv.
Single linkage
Average linkage
Complete linkage
Distinguishing among Linkage-Based Algorithms
Local OuterCon.
InnerCon.
Heirar-chical
PathDist.
OrderInv.
Extent.Rich.
ScaleInv.
Iso.Inv.
Single linkage
Average linkage
Complete linkage
K-means
K-median
Min-Sum
Ratio-cut
Normalized-cut
When “Natural” properties are not satisfied
Local OuterCon.
InnerCon.
Heirar-chical
PathDist.
OrderInv.
Extent.Rich.
ScaleInv.
Iso.Inv.
Single linkage
Average linkage
Complete linkage
K-means
K-median
Min-Sum
Ratio-cut
Normalized-cut
Properties Axioms
• Using this framework, clustering users can utilize prior knowledge to determine which properties make sense for their application
• The goal is to construct a property-based taxonomy for many useful clustering algorithms
• Using this approach, a user will be able to find a suitable algorithm without the overhead of executing many clustering algorithms
Advantages of the Framework
• We introduced new properties of clustering algorithms.
• We use these properties to provide a characterization of linkage-based algorithms.
• We classified common clustering algorithms using these properties.
Conclusions
• Kleinberg (NIPS, 02) proposed 3 “axioms” of clustering functions, which he showed to be inconsistent.
• Ackerman and Ben-David (NIPS, 08) showed that the these properties are consistent in the setting of clustering quality measures.
• Goal: find a consistent set of axioms of clustering functions.
Axioms of clustering
• An axiom is a property that is satisfied by all members of a class
• A complete set of axioms of clustering functions would be satisfied by all clustering functions, and only by clustering functions
• Our goal is to find a complete set of axioms of clustering
• We use Kleinberg’s axioms as a starting point• If we fix k, Kleinberg’s axioms are consistent.
Axioms VS properties
• Scale Invariance: The output of the function doesn’t change if the data is scaled uniformly. Satisfied by common clustering algorithms.
• Richness: For all k-clustering C of X, there exists a distance function d over X so that F(X,d,k) = C. Richness is implied by extended richness. Satisfied by common clustering algorithms.
Kleinberg’s axioms for fixed K
• Consistency: If d’ equals d, except for increasing between-cluster distances, then F(X,d,k)=F(X,d’,k).Not satisfied by some common clustering algorithms.Relaxations of this property, inner and outer consistency
are also not satisfied by some common algorithms.
• We propose using the following as axioms of clustering. Scale invariance Isomorphism invariance Extended richness
• Are there natural clustering functions that fail any of these properties?
• Are these axioms sufficient?
Towards axioms of clustering functions
Local OuterCon.
InnerCon.
Heirar-chical
PathDist.
OrderInv.
Extent.Rich.
ScaleInv.
Iso.Inv.
Single linkage
Average linkage
Complete linkage
K-means
K-median
Min-Sum
Ratio-cut
Normalized-cut
Properties Axioms