Upload
finian
View
54
Download
0
Embed Size (px)
DESCRIPTION
Generalized Single Linkage Clustering Werner Stuetzle Rebecca Nugent Department of Statistics University of Washington. Groups K-means with k = 2. Detect that there are 5 or 6 distinct groups. Assign group labels to observations. leaf.code - PowerPoint PPT Presentation
Citation preview
February 27, 2007 Stanford 1
Generalized Single Linkage Clustering
Werner StuetzleRebecca NugentDepartment of StatisticsUniversity of Washington
February 27, 2007 Stanford 2
February 27, 2007 Stanford 3
February 27, 2007 Stanford 4
Groups K-means with k = 2
February 27, 2007 Stanford 5
February 27, 2007 Stanford 6
• Detect that there are 5 or 6 distinct groups.
• Assign group labels to observations.
40 45 50 55
74
76
78
80
82
84
February 27, 2007 Stanford 7
February 27, 2007 Stanford 8
February 27, 2007 Stanford 9
-2 0 2 4 6 8 100
10
02
00
30
0
Feature histogram
February 27, 2007 Stanford 10
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
(a) (b) (c)
(d) (e) (f)
February 27, 2007 Stanford 11
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
(a) (b) (c)
(d) (e) (f)
February 27, 2007 Stanford 12
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
February 27, 2007 Stanford 13
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
February 27, 2007 Stanford 14
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
February 27, 2007 Stanford 15
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
February 27, 2007 Stanford 16
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
February 27, 2007 Stanford 17
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
February 27, 2007 Stanford 18
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
February 27, 2007 Stanford 19
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
February 27, 2007 Stanford 20
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
February 27, 2007 Stanford 21
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
February 27, 2007 Stanford 22
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
February 27, 2007 Stanford 23
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
February 27, 2007 Stanford 24
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
February 27, 2007 Stanford 25
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
February 27, 2007 Stanford 26
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
February 27, 2007 Stanford 27
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
February 27, 2007 Stanford 28
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
February 27, 2007 Stanford 29
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
February 27, 2007 Stanford 30
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
February 27, 2007 Stanford 31
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
leaf.codegroup.id 4 6 14 15 20 21 23 44 45 1 24 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 47 0 9 3 0 0 0 0 0 0 4 78 124 4 5 0 0 1 0 0 17 0 13 5 0 0 0 0 4 61 0 0 0 6 0 0 0 0 28 5 0 0 0 7 0 3 33 14 0 0 0 0 0 8 0 0 0 48 0 1 1 0 0 9 0 51 0 0 0 0 0 0 0
February 27, 2007 Stanford 32
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
February 27, 2007 Stanford 33
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).