43
Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

Embed Size (px)

Citation preview

Page 1: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

Towards Theoretical Foundations of Clustering

Margareta AckermanCaltech

Joint work with Shai Ben-David and David Loker

Page 2: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

Clustering is one of the most widely used tools for exploratory data analysis.

Social Sciences

Biology

Astronomy

Computer Science

….

All apply clustering to gain a first understanding of the structure of large data sets.

2

The Theory-Practice GapThe Theory-Practice Gap

Page 3: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

“While the interest in and application of cluster analysis has been rising rapidly, the abstract nature of the tool is still poorly understood” (Wright, 1973)

“While the interest in and application of cluster analysis has been rising rapidly, the abstract nature of the tool is still poorly understood” (Wright, 1973)

“There has been relatively little work aimed at reasoning about clustering independently of any particular algorithm, objective function, or generative data model” (Kleinberg, 2002)

“There has been relatively little work aimed at reasoning about clustering independently of any particular algorithm, objective function, or generative data model” (Kleinberg, 2002)

Both statements still apply today.3

The Theory-Practice GapThe Theory-Practice Gap

Page 4: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

Clustering aims to assign data into groups of similar items

Beyond that, there is very little consensus on the definition of clustering

4

Inherent Obstacles:Clustering is ill-defined

Inherent Obstacles:Clustering is ill-defined

Page 5: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

• Clustering is inherently ambiguous• There may be multiple reasonable

clusterings• There is usually no ground truth

5

Inherent ObstaclesInherent Obstacles

Page 6: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

6

Differences in Input/Output Behavior of Clustering Algorithms

Differences in Input/Output Behavior of Clustering Algorithms

Page 7: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

7

Differences in Input/Output Behavior of Clustering Algorithms

Differences in Input/Output Behavior of Clustering Algorithms

Page 8: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

• Previous work• Clustering algorithm selection• Characterization of Linkage-Based clustering• Conclusions and future work

8

OutlineOutline

Page 9: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

• Axioms of clustering [(Wright, ‘73), (Meila, ACM ‘05), (Pattern Recognition, ‘00), (Kleinberg, NIPS ‘02), (Ackerman & Ben-David, NIPS ‘08)].

• Clusterability [(Balcan, Blum, and Vempala, STOC ‘08),(Balcan, Blum and Gupta, SODA ‘09), (Ackerman & Ben-David, AISTATS ’09)].

9

Previous Work Towards a General Theory

Previous Work Towards a General Theory

Page 10: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

• Previous work• Clustering algorithm selection• Characterization of Linkage-Based clustering• Conclusions and future work

10

OutlineOutline

Page 11: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

There are a wide variety of clustering algorithms, which often produce very different clusterings.

11

How should a user decide which algorithm to use for

a given application?

Selecting a Clustering AlgorithmSelecting a Clustering Algorithm

Page 12: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

12

Selecting a Clustering AlgorithmSelecting a Clustering Algorithm Users rely on cost related considerations: running times,

space usage, software purchasing costs, etc…

There is inadequate emphasis on

input-output behaviour

Page 13: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

• Identify properties that distinguish between different input-output behaviour of clustering paradigms

• The properties should be:1) Intuitive and “user-friendly”2) Useful for distinguishing clustering

algorithms

13

Our Framework for Selecting a Clustering Algorithm

Our Framework for Selecting a Clustering Algorithm

Page 14: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

• Enables users to identify a suitable algorithm without the overhead of executing many algorithms

• Helps understand the behaviour of algorithms• The long-term goal is to construct a large

property-based classification for many useful clustering algorithms

14

Our Framework for Selecting a Clustering Algorithm

Our Framework for Selecting a Clustering Algorithm

Page 15: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

15

Taxonomy of Partitional Algorithms(Ackerman, Ben-David, Loker, NIPS 2010)

Taxonomy of Partitional Algorithms(Ackerman, Ben-David, Loker, NIPS 2010)

Page 16: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

Properties Axioms

16

Properties VS AxiomsProperties VS Axioms

Page 17: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

17

Characterization of Linkage-Based Clustering(Ackerman, Ben-David, Loker, COLT 2010)

Characterization of Linkage-Based Clustering(Ackerman, Ben-David, Loker, COLT 2010)

Page 18: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

The 2010 characterization applies in the partitional setting, by using the k-stopping criteria.

This characterization distinguished linkage-based algorithms from other partitional algorithms.

18

Characterization of Linkage-Based Clustering(Ackerman, Ben-David, Loker, COLT 2010)

Characterization of Linkage-Based Clustering(Ackerman, Ben-David, Loker, COLT 2010)

Page 19: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

• Propose two intuitive properties that uniquely indentify hierarchical linkage-based clustering algorithms.

• Show that common hierarchical algorithms, including bisecting k-means, cannot be simulated by any linkage-based algorithm

19

Characterizing Linkage-Based Clustering inthe Hierarchical Setting

(Ackerman and Ben-David, IJCAI 2011)

Characterizing Linkage-Based Clustering inthe Hierarchical Setting

(Ackerman and Ben-David, IJCAI 2011)

Page 20: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

• Previous work• Clustering algorithm selection• Characterization of Linkage-Based clustering• Conclusions and future work

20

OutlineOutline

Page 21: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

C_i is a cluster in a dendrogram D if there exists a node in the dendrogram so that C_i is the set of its leaf descendents.

21

Formal Setup:Dendrograms and clusterings

Formal Setup:Dendrograms and clusterings

Page 22: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

C = {C1, … , Ck} is a clustering in a dendrogram D if

– Ci is a cluster in D for all 1≤ i ≤ k, and

– The clusters are disjoint22

Formal Setup:Dendrograms and clusterings

Formal Setup:Dendrograms and clusterings

Page 23: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

A Hierarchical Clustering Algorithm Amaps

Input: A data set X with a dissimilarity function d, denoted (X,d)

toOutput: A dendrogram of X

23

Formal Setup:Hierarchical clustering algorithm

Formal Setup:Hierarchical clustering algorithm

Page 24: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

• Create a leaf node for every element of X

Insert image

24

Linkage-Based AlgorithmLinkage-Based Algorithm

Page 25: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

• Create a leaf node for every elements of X

• Repeat the following until a single tree remains:– Consider clusters represented by the remaining root nodes.

25

Linkage-Based AlgorithmLinkage-Based Algorithm

Page 26: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

• Create a leaf node for every elements of X

• Repeat the following until a single tree remains:– Consider clusters represented by the remaining root nodes.

Merge the closest pair of clusters by assigning them a common parent node.

26

?

Linkage-Based AlgorithmLinkage-Based Algorithm

Page 27: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

• The choice of Linkage Function distinguishes between different linkage-based algorithms.

• Examples of common linkage-functions– Single-linkage: shortest between-cluster distance– Average-linkage: average between-cluster distance– Complete-linkage: maximum between-cluster distance

X1 X2

27

Example Linkage-Based AlgorithmsExample Linkage-Based Algorithms

Page 28: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

If we select a set of disjoint clusters from a dendrogram, and run the algorithm on the union of these clusters, we obtain a result that is consistent with the original dendrogram.

D = A(X,d) D’ = A(X’,d)X’={x1, …, x6}

28

Locality Informal Definition

Locality Informal Definition

Page 29: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

If we select a set of disjoint clusters from a dendrogram, and run the algorithm on the union of these clusters, we obtain a result that is consistent with the original dendrogram.

D = A(X,d) D’ = A(X’,d)X’={x1, …, x6}

29

Locality Informal Definition

Locality Informal Definition

Page 30: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

A(X,d)

C

C on dataset (X,d)C on dataset (X,d’)

Outer-consistent change

30

If A is outer-consistent, then A(X,d’) will also include the clustering C.

Outer Consistency Outer Consistency

Page 31: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

Theorem (Ackerman & Ben-David, IJCAI 2011):

A hierarchical clustering algorithm is

Linkage-Basedif and only if

it is Local and Outer-Consistent.

31

Page 32: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

Every Linkage-Based hierarchical clustering algorithm is Local and Outer-Consistent.

The proof is quite straightforward.

32

Easy Direction of ProofEasy Direction of Proof

Page 33: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

If A is Local and Outer-Consistent, then A is Linkage-Based.

To prove this direction we first need to formalize Linkage-Based clustering, by formally defining what is a Linkage Function.

33

Interesting Direction of ProofInteresting Direction of Proof

Page 34: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

A Linkage Function is a function

l:{(X1, X2 ,d): d is a distance function over X1uX2 }→ R+

that satisfies the following:

- Representation independence: Doesn’t change if we re-label data- Monotonicity: if we increase edges that go between X1 and X2,

then l(X1, X2 ,d) doesn’t decrease.

(X1uX2,d)

X1 X2

34

What Do We Expect From Linkage Functions?

What Do We Expect From Linkage Functions?

Page 35: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

Recall direction:If A satisfies Outer-Consistency and Locality, then A is Linkage-Based.

Goal:Define a linkage function l so that the linkage-based clustering based on l outputs A(X,d)(for every X and d).

35

Proof SketchProof Sketch

Page 36: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

• Define an operator <A :

(X,Y,d1) <A (Z,W,d2) if when we run A on (XuYuZuW,d), where d extends d1 and d2, X and Y are merged before Z and W. A(X,d)

Z W X Y

• Prove that <A can be extended to a partial ordering

• Use the ordering to define l

36

Proof SketchProof Sketch

Page 37: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

We show that <A is cycle-free.

Lemma: Given a hierarchical algorithm A that is Local and Outer-Consistent, there exists no finite sequence so that

(X1,Y1,d1) <A …. <A(Xn,Yn,dn) <A (X1,Y1,d1).

37

Sketch of proof continue:Show that <A is a partial ordering

Sketch of proof continue:Show that <A is a partial ordering

Page 38: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

• By the above Lemma, the transitive closure of <A is a partial ordering.

• This implies that there exists an order preserving function l that maps pairs of data sets to R+.

• It can be shown that l satisfies the properties of a Linkage Function.

38

Proof Sketch (continued…)Proof Sketch (continued…)

Page 39: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

P -Divisive algorithms construct dendrograms top-downusing a partitional 2-clustering algorithm P to split nodes.

39

Apply partitional clustering PEx. k-means for k=2

Hierarchical but Not Linkage-BasedHierarchical but Not Linkage-Based

Page 40: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

A partitional 2-clustering algorithm P is

Context Sensitive if there exist d’ extending d so that

P({x,y,z},d) = {x, {y,z}} and P({x,y,z,w} ,d’)= {{x,y}, {z,w}}.

A partitional 2-clustering algorithm P is

Context Sensitive if there exist d’ extending d so that

P({x,y,z},d) = {x, {y,z}} and P({x,y,z,w} ,d’)= {{x,y}, {z,w}}.

Ex. K-means, min-sum, min-diameter.

40

Theorem [Ackerman & Ben-David, IJCAI ’11]: If P is context-sensitive, then the

P –divisive algorithm fails the locality property.

Hierarchical but Not Linkage-BasedHierarchical but Not Linkage-Based

Page 41: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

• The input-output behaviour of some natural divisive algorithms is distinct from that of all linkage-based algorithms.

• The bisecting k-means algorithm, and other natural divisive algorithms, cannot be simulated by any linkage-based algorithm.

41

Hierarchical but Not Linkage-BasedHierarchical but Not Linkage-Based

Page 42: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

• We present a new framework for clustering algorithm selection

• Provide a property-based classification of common clustering algorithms

• Characterize linkage-based clustering in terms of two natural properties

• Show that no linkage-based algorithm can simulate some natural divisive algorithms

42

ConclusionsConclusions

Page 43: Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker

• Apply our approach to specific clustering applications (Ackerman, Brow, and Loker, ICCABS ‘12).

• Bridging the gap in other clustering settings– clustering with a “noise cluster”– algorithms for categorical data

• Axioms of clustering algorithms

43

What’s Next?What’s Next?