Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Cluster AnalysisOrganize observations into similar groups
Outline
• Visual clustering
• Algorithmic clustering
• Hierarchical clustering
• Self-organising maps
Graphical clustering
How many clusters are cluster-unknown.csv?
Use brush and spin to identify them
Spin and brush
“book”2007/7/19page 109!
!!
!
!!
!!
5.2 Purely graphics 109
Initial projection...
!
! !
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!!
!
!
!
!
!
!
!
!!
! !
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!!
!!
! !!
!
!
!
!!
!
!
!! !
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!!
!
!
!!!
!
!!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!!
!!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
! !!
!
!
!
!
!
!
!
!
!
!
!
! !
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
! !
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!!
!
!
!
!
!
!!
PC1
PC2
Spin,Stop,Brush,...
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!!!
! !!
!
!
!
!
!!
!
!
!!!
!!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!!!!
!!
!
!!!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!!!
!!
!
!
!!
!
!
!
!
!
!
!
!!!!
!
!
!
!!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!!
!
!!
!
!!! !
!!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!!
!
!!!
!
!!
!
!
!!!
!
!
!!!!
!!
!
!!
!!
!
!
!!
! !
!!
!!
!!
!
!!
!
!
!
!
!
!
!
!
!
!!
!!!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!!
!
!!
!!
!
!
!
! !!
!!
!
!!!
!
!
!!!
!
!
!
!!
!
!
!
!!
!!!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!!
!
!! !
!!
!
!!
!!
!!
!
!
!
!
!
!!!
!
!
!
!!!
!!!
!
!
!
!
!!!
!
!
!!
!
!!!
!
!!
!
!!!
!
!
!!
!
!!!
!
!!
!
!!!
!
!!
!
!
!
!!!!!
!
!
!
!
!!
!!
!
!
!!
!
!
!
!
!!
!
!
!
!!
!
!
!!
!
!
!
!!!
!
!
!
!
!
!
! !
!
!!
PC2
PC3
PC4
PC5
PC6
PC7
Brush,Brush,Spin...
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!!!
! !!
!
!
!
!
!!
!
!
!!!
!!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!!!!
!!
!
!!!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!!!
!!
!
!
!!
!
!
!
!
!
!
!
!!!!
!
!
!
!!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!!
!
!!
!
!!! !
!!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!!
!
!!!
!
!!
!
!
!!!
!
!
!!!!
!!
!
!!
!!
!
!
!!
! !
!!
!!
!!
!
!!
!
!
!
!
!
!
!
!
!
!!
!!!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!!
!
!!
!!
!
!
!
! !!
!!
!
!!!
!
!
!!!
!
!
!
!!
!
!
!
!!
!!!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!!
!
!! !
!!
!
!!
!!
!!
!
!
!
!
!
!!!
!
!
!
!!!
!!!
!
!
!
!
!!!
!
!
!!
!
!!!
!
!!
!
!!!
!
!
!!
!
!!!
!
!!
!
!!!
!
!!
!
!
!
!!!!!
!
!
!
!
!!
!!
!
!
!!
!
!
!
!
!!
!
!
!
!!
!
!
!!
!
!
!
!!!
!
!
!
!
!
!
! !
!
!!
PC2
PC3
PC4
PC5
PC6
PC7
...Brush...
!
!
!!
!
! !!
!!
!!!
!!
!!
! !!
!
!!!
!!
!
!
!
!!
!
!!!
!!
! !!
!
!!
!
!
!! !
!
!
!
!!
!!
!!
!
!!
!
!!
!
!!
!!
!!
!
!
!!
!!
!!!!
! !
!
!
!
!
!!
!!
!!
!!!!!
!
!
!!
!!
!! !
!
!
!!
!
!
!
!
!
!
!!! !
!! !!!
!
!
!!!
!
!
!!
!
!
!
!!
!!
!
!
!!
!
!
!!
!! !
!
!
!
!
!! !
!
!
!!
!
!
!
!!
!
!
!
!
!!
!
!!
!
!!
! !!!!
!! !!
!
!!!
!
!
!!
!!
!
!
!
!
!
!!! !
!!
! !
!
!
!
!
!
!
!!
!
!
!! !!
!!
!!! !!
!
!
!
!!
!! !!
!!
!
!!
!!
!
!!
!!
!!
! !!
!!!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!! !
!!
!
!!
!
!
!!!!!
!!
!!
!
!!
!!!!!
!
!
!!
! !
! !
!!!! !! !
!
!! !
!
!
!
!
!! !
!!
!
!
!
!!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!!
!!!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!!
!
!
!
! ! !
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!!
!!
!
!
!!
! !
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
! !!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!!
PC1PC3
PC4 PC6PC7
Hide,Brush,Spin...
!!
! !
!
!
!!
!
!!
!
! !
!
!
!
!!
!
! !!!!
!
!
!
!! !
!!
!
! !
!
!!
!!
!
!!
!
!
!
!
!
!
!!
!
!
! !!
!!
!
!!!
!!
!!
!
! !!
!
!
!!!
!
!
!
!
!!
!
!
!!
!
!!!
!!
!
!
!!
!
!!
!
!
!!
!
!
!
!
!
!
!
!!
!!
!!
!
!
! !
!!
!!
!!
!
!!
!!
!
!!
!
!
!!
!!
!!!!
!
!
! !
!
!
!
!
!
!!
!!
!!
!
!!!!!
!
!
!
!!
!!
!
!
! !
!
!
!!
!
!
!
!
!
!
!
!!!
!
!
!
!
!! !!
!!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!!
!
!
!!
!! !
!
!
!
!
!
!
!
!! !
!
!
!!
!
!
! !
!
!!
!
!
!
!
!
!
!
!
!!
!
!!
! !!!!
!
!
! !!!
!
!
!!!
!
!
!!
!!
!
!
! !
!
!
!
!!! !
!!
! !
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!! !!
!!
!!
!! !!
!
!
!
!
!
!
!
!!
!
!
!!! !
!
!
!!!
!
!!
!
!!
!
!
!
!
!!
!!
!!
! !!
!!!
!
!
!
!
!!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
! !!
!
!
!!
!
!
!
!
!!
!
!!!
!
!
!
!
!!
!
!
!!
!!
!
!!!
!
!
!
!
!
! !
! !
!
!
!!! !! !
!
!
!
!
!
! !
!
!
!
!
!! !
!!
!
! !
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!!
!
!!
PC1PC3
PC4 PC6PC7
...Brush...
!!!!
!
!
!!
!
!
!
!
!
!
!!!
!
!!
!!
!
!
!!
!
!
!!
!!
!
!!
!!
!
!!
!
!
!
!!
!
!
!! !
!
!!!!
!
!
!
!
!
!
!
!!
!!
!
!
!
!!
!
!
!!
!
!!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
! ! !
!
!
!
! !!!
!
!!! !
!!
!
!
!!
!!!
!
!
!!!!!
!
!!
! !! !
!
! !
!
!
!
!
!!
!!
!
!!
!
!
!
!!
!!! !
!
!
!!
!
!!
!!
!
!!
!!
!!
!
! !
!
!
!
!!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!!
!
!
!!
!
!!
!
!
!
!
!
!
!!!!
!! !
!
!
!
!
!
!
!!
!
!
!
!!!
!!
!
!!
!!!
!
!
!
!!
! !
!
!!
!
!!
!
!
!!
!!!!
!
!
!
!
!
!
!
!
!
!!!
!!!
!
! !!
!
!
!
!
!!
!!
!
!!
!
!!
!
!
!
!!
!
!!
!!
!
!
!
!
!
!!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!!
!!
!
!!!
!
! !
!
!
!
!
! !!
!
!
!
!
!
!
!!
!
!
!
!
!!
! !!
!
!
!
!!
!
! !!
!
!
!
!
!
!
! !!!
!!
!
!
!
!
!
!
!!
!
!
!!!!
!!!
!
!!
!
!!!
!
!
! !!
!
!
!!!!
!
!
!
!
!
!!!
!!!!!!
!
!
!!!
!
!
!
!
!
!
!!!!!
!
!
!
!
!!!
!
!
!!
!
!!
!
!!
!
!!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!!
!
!!
!!
PC3
PC5PC6
PC7
Connect Dots
!
!
!
!!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
! !
!
!
!
!
! !
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!! ! !
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
PC1PC3
PC5PC7
Show,Connect
!
!!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!!!
!
!
!!
!
!!!
!
!!
!
!
!
!!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!!
!!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!!
!
!!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!!
!
!
!
!
!
! !
!
!
!!
!
!!!
!
!
!
!
!
!
!
!
PC1
PC2PC3
PC4PC5
PC6
...Finished!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
! !
!
!!
!!
!
!
!
!!
!!
!
!
!!
!
!
!
!!
!
! !
! !
!!!
!
!
! !!
!!
!!
! !!!
!
!
!!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!!
!
!!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!!
!
!!
!!
!
!
!
!
!!!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
! !
!
!
!
!
!
!
! !
!
!
!!
!!!
!
! !
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!!
!
! !
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!!
!!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!!!
!!!
!!
!!
!
!
!!
!!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
X3
X4X5
X6
Fig. 5.3. Stages of spin and brush on PRIM7. The high-dimensional geometryemerges as the clusters are painted.
The results at this stage are summarized by the bottom row of plots.There is a very visible triangular component (in gray) revealed when all ofthe colored points are hidden. We check the shape of this cluster by drawinglines between outer points to contain the inner ones. Touring after the linesare drawn helps to check how well they match the shape of the clusters. Thecolored groups pair up at each vertex, and we draw in the shape of these too— a single line matches the structures reasonably well.
The final step of the spin and brush clustering is to clean up this solution,touching up the color groups by continuing to tour, and repainting a point
Tour, paint a cluster, continue touring, until no more clusters are revealed.
“book”2007/7/19page 105!
!!
!
!!
!!
5.1 Background 105
had not been found using numerical clustering methods, and to find a varietyof structures in high-dimensional space. Section 5.3 describes methods for re-ducing the interpoint distance matrix to an intercluster distance matrix usinghierarchical algorithms and model-based clustering, and shows how graphicaltools are used to assess the results of numerical methods. Section 5.5 summa-rizes the chapter and revisits the data analysis strategies used in the examples.A good companion to the material presented in this chapter is Venables &Ripley (2002), which provides data and code for practical examples of clusteranalysis using R. Section 5.5 summarizes the chapter and revisits the dataanalysis strategies used in the examples.
5.1 Background
Before we can begin finding groups of cases that are similar, we needto decide on a definition of similarity. How is similarity defined? Consider adataset with three cases and four variables, described in matrix format as
X =
!
"X1
X2
X3
#
$ =
!
"7.3 7.6 7.7 8.07.4 7.2 7.3 7.24.1 4.6 4.6 4.8
#
$
which is plotted in Fig. 5.2. The Euclidean distance between two cases (rowsof the matrix) is defined as
dEuc(Xi,Xj) = ||Xi !Xj || i, j = 1, . . . , n,
where ||Xi|| =%
X2i1 + X2
i2 + . . . + X2ip. For example, the Euclidean distance
between cases 1 and 2 in the above data, is&
(7.3! 7.4)2 + (7.6! 7.2)2 + (7.7! 7.3)2 + (8.0! 7.2)2 = 1.0.
For the three cases, the interpoint Euclidean distance matrix is
dEuc =
!
"0.01.0 0.06.3 5.5 0.0
#
$X1
X2
X3
Cases 1 and 2 are more similar to each other than they are to case 3, becausethe Euclidean distance between cases 1 and 2 is much smaller than the distancebetween cases 1 and 3 and between cases 2 and 3.
What is similar?
“book”2007/7/19page 105!
!!
!
!!
!!
5.1 Background 105
had not been found using numerical clustering methods, and to find a varietyof structures in high-dimensional space. Section 5.3 describes methods for re-ducing the interpoint distance matrix to an intercluster distance matrix usinghierarchical algorithms and model-based clustering, and shows how graphicaltools are used to assess the results of numerical methods. Section 5.5 summa-rizes the chapter and revisits the data analysis strategies used in the examples.A good companion to the material presented in this chapter is Venables &Ripley (2002), which provides data and code for practical examples of clusteranalysis using R. Section 5.5 summarizes the chapter and revisits the dataanalysis strategies used in the examples.
5.1 Background
Before we can begin finding groups of cases that are similar, we needto decide on a definition of similarity. How is similarity defined? Consider adataset with three cases and four variables, described in matrix format as
X =
!
"X1
X2
X3
#
$ =
!
"7.3 7.6 7.7 8.07.4 7.2 7.3 7.24.1 4.6 4.6 4.8
#
$
which is plotted in Fig. 5.2. The Euclidean distance between two cases (rowsof the matrix) is defined as
dEuc(Xi,Xj) = ||Xi !Xj || i, j = 1, . . . , n,
where ||Xi|| =%
X2i1 + X2
i2 + . . . + X2ip. For example, the Euclidean distance
between cases 1 and 2 in the above data, is&
(7.3! 7.4)2 + (7.6! 7.2)2 + (7.7! 7.3)2 + (8.0! 7.2)2 = 1.0.
For the three cases, the interpoint Euclidean distance matrix is
dEuc =
!
"0.01.0 0.06.3 5.5 0.0
#
$X1
X2
X3
Cases 1 and 2 are more similar to each other than they are to case 3, becausethe Euclidean distance between cases 1 and 2 is much smaller than the distancebetween cases 1 and 3 and between cases 2 and 3.
“book”2007/7/19page 105!
!!
!
!!
!!
5.1 Background 105
had not been found using numerical clustering methods, and to find a varietyof structures in high-dimensional space. Section 5.3 describes methods for re-ducing the interpoint distance matrix to an intercluster distance matrix usinghierarchical algorithms and model-based clustering, and shows how graphicaltools are used to assess the results of numerical methods. Section 5.5 summa-rizes the chapter and revisits the data analysis strategies used in the examples.A good companion to the material presented in this chapter is Venables &Ripley (2002), which provides data and code for practical examples of clusteranalysis using R. Section 5.5 summarizes the chapter and revisits the dataanalysis strategies used in the examples.
5.1 Background
Before we can begin finding groups of cases that are similar, we needto decide on a definition of similarity. How is similarity defined? Consider adataset with three cases and four variables, described in matrix format as
X =
!
"X1
X2
X3
#
$ =
!
"7.3 7.6 7.7 8.07.4 7.2 7.3 7.24.1 4.6 4.6 4.8
#
$
which is plotted in Fig. 5.2. The Euclidean distance between two cases (rowsof the matrix) is defined as
dEuc(Xi,Xj) = ||Xi !Xj || i, j = 1, . . . , n,
where ||Xi|| =%
X2i1 + X2
i2 + . . . + X2ip. For example, the Euclidean distance
between cases 1 and 2 in the above data, is&
(7.3! 7.4)2 + (7.6! 7.2)2 + (7.7! 7.3)2 + (8.0! 7.2)2 = 1.0.
For the three cases, the interpoint Euclidean distance matrix is
dEuc =
!
"0.01.0 0.06.3 5.5 0.0
#
$X1
X2
X3
Cases 1 and 2 are more similar to each other than they are to case 3, becausethe Euclidean distance between cases 1 and 2 is much smaller than the distancebetween cases 1 and 3 and between cases 2 and 3.
Hierarchical clustering> library(rggobi)> d.prim7 <- read.csv("prim7.csv")> d.prim7.dist <- dist(d.prim7)> d.prim7.dend <- hclust(d.prim7.dist, method="average")> plot(d.prim7.dend)
“book”2007/7/19page 112!
!!
!
!!
!!
112 5 Cluster Analysis119
485
108
410
306
44
319
493 22
64
271
170
461
52
285
449
451 197
80
416 1
92
138
382
148
371
424
439
224
472 2
4247
184
328
337
25 7
36
304
135
435
103
146 45
426
256
287 31
387
173
190
290
89
258 29
254
309
324
293
174
168
12
475 181
466
134
179
462
123
83
237
341
359
437
447
65
145
428
434
440
253
23 3
286
480
238
405
139
218 2
244
478
487
379
383
210
467
204
430
203
331
113
444
419
54
296
185
265
346
282
388
473
311
463
222
325
368
396
167
261 1
307
221
122
308
357
151
355
458
154
206
117
187
464
294
494
124
219
333
101
160
432 79
234
116
66
121 161
220
281
377
120 4
342 352
267
131
196
298
233
360
402
407
34
317
299
385 43
58
107
14
72 92
90
484 77
413
88
351
102
63
105
127
277
335
89
431
295
230
193
349
421
250
251 5
1358
338
339 98
401
268
381 474
363
111
367
411
81
32932
199
279
399
406
441
384
443
156
344
479
262
188
165
442
71
162 322
19
13
96
320
274
115
378 6
2499
21
429
245
452
490
240
415
53
266
455
278
86
152 91
106
177
158
126
414
400
433
140
393
453
147
284
195
46
418
476 26
297
292
109
438
37
100
180
200
389
94
133 7
8456
75
471
372
252
305 74
270
110
194
482
249
454
369
469
137
495
144
289
404
364
225
354
136
216
243
242
353
149
272
376
27
395 175
291
48
483
47
235
422
186
155
205
213
191
460
500
201
239
39
423
255
301 5
348
326
327
11
202 56
231
207
208 33
366
118
446
60
20
496 61
232 87
318 198
392
217
227
340
457
302
171
486
10
241 8
468
373
82
288 38
214
394
276
459
313
189
347 492
67
215
350 6
40
112
303
28
182
128
336
163
330
183
73
248
310
176
209
226
212
321
172
273
125
30
104
477
489
257
59
409
498
275
228
260
142
99
488
130
166
356
465
97
450
398
236
211
427 50
169
70
334
93
365 18
390
129
17
263
315
481
314
391
380
468
76
361 132
412
35
345
420
16
143
408
15
55
150
445
114
264
95
69
164
386
323
374
300
362 85
343
159
229
425
49
153
497
316
312
259
283
370
397
157
417
41
280
448
491
470
332
141
223
436
246
375
403
178
269
42
57
05
10
15
20
25
hclust (*, "average")
d.prim7.dist
Heig
ht
Dendrogram for Hierarchical Clustering with Average Linkage
Cluster 1
!
!
!
!!
!
!
!
!
!!
! !
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!! !
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!!!
!
!
!
!! !!
!!
!
!!
!!
!
!!!
! !
!
!
!
!
!!
!!
!
!
!!
!!
!
!
!
!!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
! !
!!!
!
!
!!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
X3
X5
Cluster 2
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!!
!
!
!
!
!
!
! !
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
! !
!!!!
!
!
!
!
!
!
!!
!
!!
!
!
!
!!
!
!
!
!
!
!!!
!
!
!! !!
!
!
!!
!
!
!!
!
!!
!
! !
!
!
!
!!
!
!
!
!!
!
!
!!
!
!
!!
!
!!
!
!!
!!
!
!!
!!
!
!
!!
!
!
!
!!
!
!
!!
!
!
!!!
!
!!! !
!!
!
!
!
!
!
!
! !! !
!
!
!
!
!
!
!
!
!
!
!!
!! !!!
!
!
!!
!
!!
!
!
!
X3
X5
Cluster 3
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
! !
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !!
!
!
!
!
!
!
! !
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
! !!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!!
!
!!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!! !
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!!
!
!!
!
!
!
!
!!!! !!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!! !
!!!
!!!!!!!
!
!
!
!
!!
!
!
!
!!!
!!
!
!!
!
!!
!!
!
!!
!
!
!
!
!!
!!
!!!
!
!
!
!
!!
!!!
!
!!
!
!
!
!!
X3
X5
Cluster 5
!
!!
!
!
!
!
!
!
!
!
!
!!!!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
! !
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
! !
!!
!
!
!!
!
!
!
!!
!!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!! !
!
!!
!
!
!!
!!
!
!
!
!
!
!
!
!!
!!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
! !!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!!
!
! !
!!
!
!
!!
!!!
!
!!!!
!
!!
!!!!
!
!
!
!!
! !!!
!!
!
!!!!!!!!
!
!
!!
!!
!!
!
!
! !!!!!!!
X1 X3
X5
X6
X7
Cluster 6
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!!
!
!
!!
!
!
!
!!
!!
!
!!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!!
!
!
!!
!
!
!!
!!!!!
!
!!!
!
!
!!! !
X1 X3
X5
X6
X7
Cluster 7
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!!
!
!
!!
!
!
!
!!
!!
!
!!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!!
!
! !
!!
!
!
!!
!
X1 X3
X5
X6
X7
Fig. 5.5. Hierarchical clustering of the particle physics data. The dendrogram showsthe results of clustering the data using average linkage. Clusters 1, 2, and 3 carveup the base triangle of the data; clusters 5 and 6 divide one of the arms; and cluster7 is a singleton.
The top three plots show, respectively, clusters 1, 2, and 3: These clustersroughly divide the main triangular section of the data into three. The bottomrow of plots show clusters labeled 5, and 6, which lie along the linear pieces,and cluster 7, which is a singleton cluster corresponding to an outlier in thedata.
The results are reasonably easy to interpret. Recall that the basic geometryunderlying this data is that there is a 2D triangle with two linear strandsextending from each vertex. The hierarchical average linkage clustering ofthe particle physics data using nine clusters essentially divides the data into
> gd <- ggobi(d.prim7)[1]> clust9 <- cutree(d.prim7.dend, k=9)> glyph_color(gd)[clust9==1] <- 9 # highlight triangle> glyph_color(gd)[clust9==1] <- 1 # reset color> glyph_color(gd)[clust9==2] <- 9 # highlight cluster 2
“book”2007/7/19page 112!
!!
!
!!
!!
112 5 Cluster Analysis
119
485
108
410
30
644
319
493 22
64
27
1170
46
152
285
44
945
1 197
80
416 1
92
13
83
82
14
83
71
42
443
92
24
472 2
42
47
18
43
28
337
25 7
36
30
41
35
435
103
146 45
42
62
56
287 31
387
17
319
029
08
92
58 29
254
30
932
42
93
17
41
68
12
475 18
14
66
13
41
79
46
21
23
83
23
73
41
359
437
44
76
51
45
428
434
440
25
323 3
286
480
23
840
51
39
21
8 224
447
84
87
37
938
321
046
72
04
430
203
331
11
344
44
19
54
29
618
52
65
346
28
238
847
33
11
463
22
23
25
368
396
16
726
1 13
07
22
112
230
83
57
15
135
54
58
15
42
06
11
71
87
46
42
94
49
412
421
93
33
101
160
432 79
234
116
66
12
1 161
220
28
13
77
12
0 43
42 352
267
13
11
96
298
233
360
402
407
34
317
29
938
5 43
58
10
71
47
2 92
90
48
4 77
413
88
351
10
26
31
05
127
277
335
89
43
129
52
30
193
349
421
25
025
1 51
35
83
38
33
9 98
40
12
68
381 47
43
63
11
136
74
11
81
32
93
21
99
279
399
406
441
384
443
15
63
44
479
262
18
81
65
44
27
11
62 322
19
13
96
320
27
41
15
37
8 62
499
21
429
245
452
490
24
041
553
26
64
55
278
86
15
2 91
106
177
158
12
641
44
00
433
14
039
34
53
14
728
41
95
46
41
847
6 26
29
729
21
09
43
83
71
00
18
020
03
89
94
13
3 78
456
75
47
137
225
230
5 74
27
01
10
19
44
82
249
454
36
94
69
137
495
144
289
404
364
225
354
13
62
16
243
24
235
31
49
272
376
27
395 17
52
91
48
483
47
23
54
22
186
155
20
52
13
19
146
05
00
20
12
39
39
42
325
530
1 53
48
32
63
27
11
20
2 56
231
207
208 33
36
61
18
44
66
02
04
96 61
232 87
318 19
839
22
17
227
340
45
73
02
17
14
86
10
241 8
46
837
382
28
8 38
214
394
27
64
59
31
31
89
347 49
26
721
535
0 64
01
12
30
328
182
128
336
163
330
183
73
24
83
10
17
620
922
621
232
11
72
27
31
25
30
10
44
77
489
25
75
94
09
49
82
75
22
826
014
29
94
88
13
016
635
646
59
74
50
398
23
621
142
7 50
16
97
03
34
93
36
5 18
390
129
17
26
33
15
481
314
391
38
046
87
636
1 132
41
235
345
420
16
143
408
15
55
15
044
51
14
264
95
69
16
43
86
323
374
300
362 85
34
315
922
942
54
91
53
49
73
16
312
259
283
37
039
71
57
417
41
280
448
491
470
332
141
223
436
246
375
403
178
269
42
57
05
10
15
20
25
hclust (*, "average")
d.prim7.dist
Heig
ht
Dendrogram for Hierarchical Clustering with Average Linkage
Cluster 1
!
!
!
!!
!
!
!
!
!!
! !
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!! !
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!!!
!
!
!
!! !!
!!
!
!!
!!
!
!!!
! !
!
!
!
!
!!
!!
!
!
!!
!!
!
!
!
!!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
! !
!!!
!
!
!!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
X3
X5
Cluster 2
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!!
!
!
!
!
!
!
! !
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
! !
!!!!
!
!
!
!
!
!
!!
!
!!
!
!
!
!!
!
!
!
!
!
!!!
!
!
!! !!
!
!
!!
!
!
!!
!
!!
!
! !
!
!
!
!!
!
!
!
!!
!
!
!!
!
!
!!
!
!!
!
!!
!!
!
!!
!!
!
!
!!
!
!
!
!!
!
!
!!
!
!
!!!
!
!!! !
!!
!
!
!
!
!
!
! !! !
!
!
!
!
!
!
!
!
!
!
!!
!! !!!
!
!
!!
!
!!
!
!
!
X3
X5
Cluster 3
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
! !
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !!
!
!
!
!
!
!
! !
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
! !!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!!
!
!!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!! !
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!!
!
!!
!
!
!
!
!!!! !!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!! !
!!!
!!!!!!!
!
!
!
!
!!
!
!
!
!!!
!!
!
!!
!
!!
!!
!
!!
!
!
!
!
!!
!!
!!!
!
!
!
!
!!
!!!
!
!!
!
!
!
!!
X3
X5
Cluster 5
!
!!
!
!
!
!
!
!
!
!
!
!!!!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
! !
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
! !
!!
!
!
!!
!
!
!
!!
!!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!! !
!
!!
!
!
!!
!!
!
!
!
!
!
!
!
!!
!!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
! !!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!!
!
! !
!!
!
!
!!
!!!
!
!!!!
!
!!
!!!!
!
!
!
!!
! !!!
!!
!
!!!!!!!!
!
!
!!
!!
!!
!
!
! !!!!!!!
X1 X3
X5
X6
X7
Cluster 6
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!!
!
!
!!
!
!
!
!!
!!
!
!!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!!
!
!
!!
!
!
!!
!!!!!
!
!!!
!
!
!!! !
X1 X3
X5
X6
X7
Cluster 7
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!!
!
!
!!
!
!
!
!!
!!
!
!!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!!
!
! !
!!
!
!
!!
!
X1 X3
X5
X6
X7
Fig. 5.5. Hierarchical clustering of the particle physics data. The dendrogram showsthe results of clustering the data using average linkage. Clusters 1, 2, and 3 carveup the base triangle of the data; clusters 5 and 6 divide one of the arms; and cluster7 is a singleton.
The top three plots show, respectively, clusters 1, 2, and 3: These clustersroughly divide the main triangular section of the data into three. The bottomrow of plots show clusters labeled 5, and 6, which lie along the linear pieces,and cluster 7, which is a singleton cluster corresponding to an outlier in thedata.
The results are reasonably easy to interpret. Recall that the basic geometryunderlying this data is that there is a 2D triangle with two linear strandsextending from each vertex. The hierarchical average linkage clustering ofthe particle physics data using nine clusters essentially divides the data into
Clusters 1, 2, 3 carve up the base triangle.
Clusters 5, 6 divide an arm.
Cluster 7 is a singleton cluster containing an outlier.
Self-organizing maps
• A constrained k-means algorithm.
• A 1D or 2D net is stretched through the data. The knots in the net form the cluster centres, and the points closest to the knot are considered to belong to that cluster.
• Net can be unwrapped and laid out flat to give a visual representation of the clusters.
Music data
• Descriptive statistics for 62 songs: Variance, Mean, Max, Energy & Frequency
• 32 rock (Abba, Beatles, Eels) 27 classical (Beethoven, Mozart, Vivaldi)3 new wave (Enya)
• Can a computer identify the different types?
Can you hear the difference?
• One
• Two
• Three
Can you hear the difference?
• One
• Two
• Three
Can you hear the difference?
"Rock" - Abba• One
• Two
• Three
Can you hear the difference?
"Rock" - Abba• One
• Two
• Three
Can you hear the difference?
"Rock" - Abba
New wave - Enya
• One
• Two
• Three
Can you hear the difference?
"Rock" - Abba
New wave - Enya
• One
• Two
• Three
Can you hear the difference?
"Rock" - Abba
New wave - Enya
Classical - Vivaldi
• One
• Two
• Three
Can you see the difference?
Is rock different from classical?
Can you see the difference?
Is rock different from classical?
Can also see the shape in 5d: outliers and non-linearity
Model & data
1
2
3
4
5
6
1 2 3 4 5 6
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
●
● ●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
GirlB3 AgonyV2 V1 B4
M2M4 Saturday Morning
I Want to Hold Your Hand
Love of the Loveless
V10
M1 B8 HelpM5The Good Old Days
Ticket to RideB7V5 Restraining All in a Days Work
Rock Hard Times
YesterdayV9B6I Have A Dream Eleanor Rigby Yellow SubmarineV3 Penny LaneWrong About BobbyI Feel Fine
B5
Lone WolfLove Me DoCant Buy Me LoveHey Jude
The WinnerThe Memory of Trees Anywhere IsV4 B2 V13
M3
Money WaterlooV12
Super TrouperMamma MiaTake a ChanceKnowing MePax Deorum Lay All YouV11
V7
V8 Dancing QueenSOS
V6M6 B1
map
2
map1
data in the model space
How well does the model fit the data?
SOM
−0.2
0.0
0.2
0.4
0.6
●●
●
●● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
● ●●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
V8
Lay All YouDancing QueenSuper TrouperTake a ChanceAnywhere Is The WinnerMoney
WaterlooLove Me Do V4
V11
Eleanor RigbyYellow Submarine B2All in a Days Work I Have A DreamV9Penny LaneLone WolfI Feel Fine Restraining
Pax Deorum
V3Cant Buy Me Love V5Ticket to RideWrong About Bobby B7 Hey JudeYesterday B6M5
SOS
The Good Old DaysHelpSaturday Morning M1B8Love of the LovelessM4
I Want to Hold Your Hand Agony B1
Mamma Mia
V1 V10B3Rock Hard Times M2B5V6 M3 M6
Girl
V13
B4
V2
V7
The Memory of TreesKnowing Me V12
Com
p.2
Comp.1
PCA
Model & data
1
2
3
4
5
6
1 2 3 4 5 6
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
●
● ●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
GirlB3 AgonyV2 V1 B4
M2M4 Saturday Morning
I Want to Hold Your Hand
Love of the Loveless
V10
M1 B8 HelpM5The Good Old Days
Ticket to RideB7V5 Restraining All in a Days Work
Rock Hard Times
YesterdayV9B6I Have A Dream Eleanor Rigby Yellow SubmarineV3 Penny LaneWrong About BobbyI Feel Fine
B5
Lone WolfLove Me DoCant Buy Me LoveHey Jude
The WinnerThe Memory of Trees Anywhere IsV4 B2 V13
M3
Money WaterlooV12
Super TrouperMamma MiaTake a ChanceKnowing MePax Deorum Lay All YouV11
V7
V8 Dancing QueenSOS
V6M6 B1
map
2
map1
data in the model space
How well does the model fit the data?
SOM
−0.2
0.0
0.2
0.4
0.6
●●
●
●● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
● ●●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
V8
Lay All YouDancing QueenSuper TrouperTake a ChanceAnywhere Is The WinnerMoney
WaterlooLove Me Do V4
V11
Eleanor RigbyYellow Submarine B2All in a Days Work I Have A DreamV9Penny LaneLone WolfI Feel Fine Restraining
Pax Deorum
V3Cant Buy Me Love V5Ticket to RideWrong About Bobby B7 Hey JudeYesterday B6M5
SOS
The Good Old DaysHelpSaturday Morning M1B8Love of the LovelessM4
I Want to Hold Your Hand Agony B1
Mamma Mia
V1 V10B3Rock Hard Times M2B5V6 M3 M6
Girl
V13
B4
V2
V7
The Memory of TreesKnowing Me V12
Com
p.2
Comp.1
PCA
Model & data
1
2
3
4
5
6
1 2 3 4 5 6
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
●
● ●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
GirlB3 AgonyV2 V1 B4
M2M4 Saturday Morning
I Want to Hold Your Hand
Love of the Loveless
V10
M1 B8 HelpM5The Good Old Days
Ticket to RideB7V5 Restraining All in a Days Work
Rock Hard Times
YesterdayV9B6I Have A Dream Eleanor Rigby Yellow SubmarineV3 Penny LaneWrong About BobbyI Feel Fine
B5
Lone WolfLove Me DoCant Buy Me LoveHey Jude
The WinnerThe Memory of Trees Anywhere IsV4 B2 V13
M3
Money WaterlooV12
Super TrouperMamma MiaTake a ChanceKnowing MePax Deorum Lay All YouV11
V7
V8 Dancing QueenSOS
V6M6 B1
map
2
map1
data in the model space
How well does the model fit the data?
1
2
3
4
5
6
1 2 3 4 5 6
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ●
●
● ●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
map2
map1
−0.2
0.0
0.2
0.4
0.6
● ●
●
●● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
● ●●
●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
Com
p.2
Comp.1
SOM
Model & datamodel in the data space
som
“book”2007/7/19page 122!
!!
!
!!
!!
122 5 Cluster Analysis
points are spread evenly through the grid, with rock tracks (purple) at theupper right, classical tracks (green) at the lower left, and new wave tracks(the three black rectangles) in between. The tour view in the same figure,however, shows the fit to be inadequate. The net is a flat rectangle in the 5Dspace and has not su!ciently wrapped through the data. This is the result ofstopping the algorithm too soon, thus failing to let it converge fully.
! ! ! ! ! !
! ! ! ! ! !
! ! ! ! ! !
! ! ! ! ! !
! ! ! ! ! !
! ! ! ! ! !!!
!
!!!
!!
!
!
!
!
!!
!
!
!
!
!!!
!
!!
!
!!
!
!!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!!
!!
!!
!!
!!
!!
!!
!!
!!
!!
!!
!!
!!
!!
!!
!!
!!
!!
!
!
!!!
!
!
!
!
!
!
!
!!
!
!!
!!!
!
!
! !!
!!
!!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
lvar lave
lmax lfreq
Fig. 5.9. Unsuccessful SOM fit shown in a 2D map view and a tour projection.Although the fit looks good in the map view, the view of the fit in the 5D dataspace shows that the net has not su!ciently wrapped into the data: The algorithmhas not converged fully.
Figure 5.10 shows our favorite fit to the data. The data was standardized,we used a 6!6 net, and we ran the SOM algorithm for 1,000 iterations. Themap is at the top left, and it matches the map already shown in Fig. 5.8,except for the small jittering of points in the earlier figure. The other threeplots show di"erent projections from the grand tour. The upper right plotshows how the net curves with the nonlinear dependency in the data: The netis warped in some directions to fit the variance pattern. At the bottom rightwe see that one side of the net collects a long separated cluster of the Abbatracks. We can also see that the net has not been stretched out to the fullextent of the range of the data. It is tempting to manually manipulate the netto stretch it in di"erent directions and update the fit.
It turns out that the PCA view of the data more accurately reflects thestructure in the data than the map view. The music pieces really are clumpedtogether in the 5D space, and there are a few outliers.
5.3.4 Comparing methods
To compare the results of two methods we commonly compute a confusiontable. For example, Table 5.1 is the confusion table for five-cluster solutions
First attempt at model fitting doesn’t work. Model appears to be unfinished, flat in the data space rather than wrapped into the points.
“book”2007/7/19page 123!
!!
!
!!
!!
5.3 Numerical methods 123
! ! ! ! ! !
! ! ! ! ! !
! ! ! ! ! !
! ! ! ! ! !
! ! ! ! ! !
! ! ! ! ! !!
!
!
!
!!!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!!
!
!
!
! !
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
! !!
! !!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
! ! !
!
!
!
! ! !
!!
!
!!
!
!
!
!
!!
!!!
!
!!!
!
!
!
!
!
!
!!
!
!
!
!!
!
! !!
!!
!!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
lvar
lavelmax
lfenerlfreq
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!!!!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
lvarlmaxlfenerlfreq
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
lvarlavelmaxlfener
lfreq
Fig. 5.10. Successful SOM fit shown in a 2D map view and tour projections. Herewe see a more successful SOM fit, using standardized data. The net wraps throughthe nonlinear dependencies in the data, but some outliers remain.
for the Music data from k-means and Ward’s linkage hierarchical clustering,generated by:
> d.music.dist <- dist(subset(d.music.std,select=c(lvar:lfreq)))
> d.music.hc <- hclust(d.music.dist, method="ward")> cl5 <- cutree(d.music.hc,5)> d.music.km <- kmeans(subset(d.music.std,
select=c(lvar:lfreq)), 5)> table(d.music.km$cluster, cl5)> d.music.clustcompare <-
cbind(d.music.std,cl5,d.music.km$cluster)> names(d.music.clustcompare)[8] <- "Wards"> names(d.music.clustcompare)[9] <- "km"> gd <- ggobi(d.music.clustcompare)[1]
Next attempt looks better.
Problem was solved by increasing the number of iterations used.
Deciding on a solution• One part in coming to a decision about the
best solution is to compare the results from different methods.
• This can be done using a confusion table.
“book”2007/7/19page 124!
!!
!
!!
!!
124 5 Cluster Analysis
The numerical labels of clusters are arbitrary, so these can be rearranged tobetter digest the table. There is a lot of agreement between the two methods:Both methods agree on the cluster for 48 tracks out of 62, or 77% of the time.We want to explore the data space to see where the agreement occurs andwhere the two methods disagree.
Table 5.1. Tables showing the agreement between two five-cluster solutions for theMusic data, showing a lot of agreement between k-means and Ward’s linkage hierar-chical clustering. The rows have been rearranged to make the table more readable.
Ward’sk-means 1 2 3 4 5
1 0 0 3 0 142 0 0 1 0 03 0 9 5 0 04 8 2 1 0 05 0 0 3 16 0
Rearrange rows !
Ward’sk-means 1 2 3 4 5
4 8 2 1 0 03 0 9 5 0 02 0 0 1 0 05 0 0 3 16 01 0 0 3 0 14
In Fig. 5.11, we link jittered plots of the confusion table for the two clus-tering methods with 2D tour plots of the data. The first column contains twojittered plots of the confusion table. In the top row of the figure, we have high-lighted a group of 14 points that both methods agree form a cluster, paintingthem as orange triangles. From the plot at the right, we see that this clus-ter is a closely grouped set of points in the data space. From the tour axeswe see that lvar has the largest axis pointing in the direction of the clusterseparation, which suggests that music pieces in this cluster are characterizedby high values on lvar (variable 3 in the data); that is, they have large vari-ance in frequency. By further investigating which tracks are in this cluster,we can learn that it consists of a mix of tracks by the Beatles (“Penny Lane,”“Help,” “Yellow Submarine,” ...) and the Eels (“Saturday Morning,” “Loveof the Loveless,” ...).
In the bottom row of the figure, we have highlighted a second group oftracks that were clustered together by both methods, painting them againusing orange triangles. In the plot to the right, we see that this cluster isclosely grouped in the data space. Despite that, this cluster is a bit moredi!cult to characterize. It is oriented mostly in the negative direction of lave(variable 4), so it would have smaller values on this variable. But this verticaldirection in the plot also has large contributions from variables 3 (lvar) and 7(lfreq). If you label these eight points on your own, you will see that they areall Abba songs (“Dancing Queen,” “Waterloo,” “Mamma Mia,” ...).
We have explored two groups of tracks where the methods agree. In asimilar fashion, we could also explore the tracks where the methods disagree.
Agree on 48/62 cases = 77%
...and graphically“book”2007/7/19page 125!
!!
!
!!
!!
5.4 Characterizing clusters 125
1
2
3
4
5
1 2 3 4 5
!!
!!
!
!
!
!!
!!!!!!!!
!
!!!!!
!!!!
!!!!!! !
!
! !
!
!
!
!!
!
!
!
KM!5
HC!W5
!
!
!
!
!
!
!
!
!
!!!
!!
!
!!
!
!!
!
!
!
!
!!
!
!
!!!!!
!
!
!
!
!!
!
!
!
!
!
!
lvar
lavelmax
lfener
1
2
3
4
5
1 2 3 4 5
!!
!!
!
!
!
! !
!!!!!!!!
!
!!!!!
!!!!
!
!
!
!!! !
!
!
!
!
!!! !!!!!! !
!
!
!
KM!5
HC!W5
!
!
!!
!
!
!
!!
!
!
!
! !!
!
!
!
!!
!!
!! !
!
!!
!
!!
!
!!
!
!
!
!
!! !
!
!!
!!!!!!
!
lvarlave
lmaxlfener
lfreq
Fig. 5.11. Comparing two five-cluster models of the Music data using confusiontables linked to tour plots. In the confusion tables, k-means cluster identifiers foreach plot are plotted against Ward’s linkage hierarchical clustering ids. (The valueshave been jittered.) Two di!erent areas of agreement have been highlighted, and thetour projections show the tightness of each cluster where the methods agree.
5.4 Characterizing clusters
The final step in a cluster analysis is to characterize the clusters. Actually, wehave engaged in cluster characterization throughout the examples, becauseit is an intrinsic part of assessing the results of any cluster analysis. If wecannot detect any numerical or qualitative di!erences between clusters, thenour analysis was not successful, and we start over with a di!erent distancemetric or algorithm.
However, once we are satisfied that we have found a set of clusters that canbe di!erentiated from one another, we want to describe them more formally,both quantitatively and qualitatively. We characterize them quantitatively bycomputing such statistics as cluster means and standard deviations for eachvariable. We can look at these results in tables and in plots, and we can refine
Two different areas of agreement have been highlighted, and the tour projections show the tightness of each cluster where the methods agree.
Your turn
Run hierarchical clustering with average linkage on the Flea Beetles data (excluding species).
• Cut the tree at three clusters and append a cluster id to the dataset.
• How well do the clusters correspond to the species? (Plot cluster id vs species} and use jittering if necessary.)
• Using brushing in a plot of cluster id linked to a tour plot of the six variables, examine the beetles that are misclassified.
• Now cut the tree at four clusters, and repeat the last part.
• Which is the better solution, three or four clusters? Why?
Timeline20 mins Toolbox
30 mins Missing values
45 minsSupervised
Classification
45 minsUnsupervised Classification
30 mins Inference
Break