Upload
sawyer-spencer
View
48
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Discriminant analysis is used to determine which variables discriminate between two or more naturally occurring groups. Computationally, discriminant function analysis is very similar to analysis of variance (ANOVA). Discriminant Analysis. Sunday, 30 November 2014 9:42 PM. - PowerPoint PPT Presentation
Citation preview
1
Discriminant Analysis
Discriminant analysis is used to determine which variables discriminate between two or more naturally occurring groups.
Computationally, discriminant function analysis is very similar to analysis of variance (ANOVA).
Wednesday 19 April 2023 04:37 PM
2
Discriminant Analysis
For example, an educational researcher may want to investigate which variables discriminate between high school graduates who decide to(1)go to college, (2)attend a trade or professional school, or (3)seek no further training or education. For that purpose the researcher could collect data on numerous variables prior to students' graduation. After graduation, most students will naturally fall into one of the three categories. Discriminant Analysis could then be used to determine which variable(s) are the best predictors of students' subsequent educational choice.
3
Discriminant Analysis
For example, a medical researcher may record different variables relating to patients' backgrounds in order to learn which variables best predict whether a patient is likely to recover completely (group 1), partially (group 2), or not at all (group 3). A biologist could record different characteristics of similar types (groups) of flowers, and then perform a discriminant function analysis to determine the set of characteristics that allows for the best discrimination between the types.
4
Discriminant Analysis
The data examined here consist of five measurements on each of 32 skulls found in the southwestern and eastern districts of Tibet.
1. Greatest length of skull (measure 1)2. Greatest horizontal breadth of skull (measure
2)3. Height of skull (measure 3)4. Upper face length (measure 4)5. Face breadth between outermost points of
cheekbones (measure 5)There are also location and grouping variables.
5
Discriminant AnalysisThis work is loosely based on A Handbook of Statistical Analyses Using SPSS Sabine Landau, Brian S. Everitt Chapman and Hall CRC 2003 and Handbook of Statistical Analyses Using Stata, Fourth Edition By Brian S. Everitt, Sophia Rabe-Hesketh CRC 2006.
These data, collected by Colonel L.A. Waddel, were first reported in Morant (1923) and are also given in Hand et al. (1994).
Hand, D.J. Daly, F. Lunn, A.D. McConway K.J. and Ostrowski, E. 1994 A Handbook of Small Data Sets London: Chapman & Hall.Morant, G.M. 1923 A first study of the Tibetan skull Biometrika 14 193-260.
6
Discriminant Analysis
The data can be divided into two groups. The first comprises skulls 1 to 17 found in graves in Sikkim and the neighbouring area of Tibet (Type A skulls). The remaining 15 skulls (Type B skulls) were picked up on a battlefield in the Lhasa district and are believed to be those of native soldiers from the eastern province of Khams. These skulls were of particular interest since it was thought at the time that Tibetans from Khams might be survivors of a particular human type, unrelated to the Mongolian and Indian types that surrounded them.
7
Discriminant Analysis
There are two questions that might be of interest for these data:
Do the five measurements discriminate between the two assumed groups of skulls and can they be used to produce a useful rule for classifying other skulls that might become available?
Taking the 32 skulls together, are there any natural groupings in the data and, if so, do they correspond to the groups assumed?
8
Discriminant Analysis
Classification is an important component of virtually all scientific research. Statistical techniques concerned with classification are essentially of two types. The first (cluster analysis) aims to uncover groups of observations from initially unclassified data. The second (discriminant analysis) works with data that is already classified into groups to derive rules for classifying new (and as yet unclassified) individuals on the basis of their observed variable values.
9
Discriminant Analysis
Initially it is wise to take a look at your raw data.
10
Discriminant Analysis
Select matrix scatter
Use Define to select.
11
Discriminant Analysis
Select matrix variables and markers.
Note that greatest length of skull is above the list shown.
Use OK to accept.
12
Discriminant AnalysisWhile this diagram only allows us to asses the group separation in two dimensions, it seems to suggest that face breadth between outer-most points of cheek bones (meas5), greatest length of skull (meas1), and upper face length (meas4) provide the greatest discrimination between the two skull types.
13
Discriminant Analysis
We shall now use Fisher’s linear discriminant function to derive a classification rule for assigning skulls to one of the two predefined groups on the basis of the five measurements available.
14
Discriminant Analysis
Now proceed to complete the analysis.
15
Discriminant Analysis
As before use the secondary screens to select the grouping variable (place) and use Define Range.
16
Discriminant Analysis
From the statistics button make the following selection
Now proceed to complete the analysis.
17
Discriminant Analysis
Select the independents, use OK to run.
18
Discriminant Analysis
The Group Statistics table gives the resulting descriptive output. It displays, means and standard deviations of each of the five measurements for each type of skull, and overall (total).
Group Statistics
174.824 6.7475 17 17.000
139.353 7.6030 17 17.000
132.000 6.0078 17 17.000
69.824 4.5756 17 17.000
130.353 8.1370 17 17.000
185.733 8.6269 15 15.000
138.733 6.1117 15 15.000
134.767 6.0263 15 15.000
76.467 3.9118 15 15.000
137.500 4.2384 15 15.000
179.938 9.3651 32 32.000
139.063 6.8412 32 32.000
133.297 6.0826 32 32.000
72.938 5.3908 32 32.000
133.703 7.4443 32 32.000
Greatest length of skull
Greatest horizontalbreadth of skull
Height of skull
Upper face length
Face breadth betweenoutermost points ofcheek bones
Greatest length of skull
Greatest horizontalbreadth of skull
Height of skull
Upper face length
Face breadth betweenoutermost points ofcheek bones
Greatest length of skull
Greatest horizontalbreadth of skull
Height of skull
Upper face length
Face breadth betweenoutermost points ofcheek bones
Place whereskulls were foundSikkem or Tibet
Lhasa
Total
Mean Std. Deviation Unweighted Weighted
Valid N (listwise)
19
Discriminant AnalysisThe within-group covariance matrices shown in the Covariance Matrices table suggest that the sample values differ to some extent, see Box’s test for equality of covariances (see Log Determinants and Test Results, below). Covariance Matrices
45.529 25.222 12.391 22.154 27.972
25.222 57.805 11.875 7.519 48.055
12.391 11.875 36.094 -.313 1.406
22.154 7.519 -.313 20.936 16.769
27.972 48.055 1.406 16.769 66.211
74.424 -9.523 22.737 17.794 11.125
-9.523 37.352 -11.263 .705 9.464
22.737 -11.263 36.317 10.724 7.196
17.794 .705 10.724 15.302 8.661
11.125 9.464 7.196 8.661 17.964
Greatest length of skull
Greatest horizontalbreadth of skull
Height of skull
Upper face length
Face breadth betweenoutermost points ofcheek bones
Greatest length of skull
Greatest horizontalbreadth of skull
Height of skull
Upper face length
Face breadth betweenoutermost points ofcheek bones
Place whereskulls were foundSikkem or Tibet
Lhasa
Greatestlength of skull
Greatesthorizontalbreadth of
skull Height of skullUpper face
length
Face breadthbetween
outermostpoints of
cheek bones
20
Discriminant Analysis
The within-group covariance matrices shown in the Covariance Matrices table suggest that the sample values differ to some extent, but according to Box’s test for equality of covariances (tables Log Determinants and Test Results) these differences are not statistically significant (F(15,3490) = 1.2, p = 0.25).
Log Determinants
5 16.164
5 15.773
5 16.727
Place where skullswere foundSikkem or Tibet
Lhasa
Pooled within-groups
RankLog
Determinant
The ranks and natural logarithms of determinantsprinted are those of the group covariance matrices.
Test Results
22.371
1.218
15
3489.901
.249
Box's M
Approx.
df1
df2
Sig.
F
Tests null hypothesis of equal population covariance matrices.
21
Discriminant Analysis
It appears that the equality of covariance matrices assumption needed for Fisher’s linear discriminant approach to be strictly correct is valid here.
In practice, Box’s test is not of great use since even if it suggests a departure for the equality hypothesis, the linear discriminant may still be preferable over a quadratic function. Here we shall simply assume normality for our data relying on the robustness of Fisher’s approach to deal with any minor departure from the assumption.
22
Discriminant Analysis
The resulting discriminant analysis shows the eigenvalue (here 0.93) represents the ratio of the between-group sums of squares to the within-group sum of squares of the discriminant scores. It is this criterion that is maximized in discriminant function analysis.
Eigenvalues
.930a 100.0 100.0 .694Function1
Eigenvalue % of Variance Cumulative %CanonicalCorrelation
First 1 canonical discriminant functions were used in theanalysis.
a.
23
Discriminant Analysis
The canonical correlation is simply the Pearson correlation between the discriminant function scores and group membership coded as 0 and 1. For the skull data, the canonical correlation value is 0.694 so that 0.6942 × 100 = 48% of the variance in the discriminant function scores can be explained by group differences.
Eigenvalues
.930a 100.0 100.0 .694Function1
Eigenvalue % of Variance Cumulative %CanonicalCorrelation
First 1 canonical discriminant functions were used in theanalysis.
a.
24
Discriminant AnalysisWilk’s Lambda provides a test for assessing the null hypothesis that in the population the vectors of means of the five measurements are the same in the two groups. The lambda coefficient is defined as the proportion of the total variance in the discriminant scores not explained by differences among the groups, here 51.8%. The formal test confirms that the sets of five mean skull measurements differ significantly between the two sites ( (5) = 18.1, p = 0.003). If the equality of mean vectors hypothesis had been accepted, there would be little point in carrying out a linear discriminant function analysis.
Wilks' Lambda
.518 18.083 5 .003Test of Function(s)1
Wilks'Lambda Chi-square df Sig.
2
2
25
Discriminant Analysis
Next we come to the Classification Function Coefficients. This table is displayed as a result of checking Fisher’s in the Statistics sub-dialogue box.
Classification Function Coefficients
1.468 1.558
2.361 2.205
2.752 2.747
.775 .952
.195 .372
-514.956 -545.419
Greatest length of skull
Greatest horizontalbreadth of skull
Height of skull
Upper face length
Face breadth betweenoutermost points ofcheek bones
(Constant)
Sikkem orTibet Lhasa
Place where skullswere found
Fisher's linear discriminant functions
26
Discriminant Analysis
It can be used to find Fisher’s linear discrimimant function as defined by simply subtracting the coefficients given for each variable in each group giving the following result:
Sikkern or Tibet
Lhasa Difference
Greatest length of skull
(measure 1)1.468 1.558 -0.090
Greatest horizontal
breadth of skull (measure 2)
2.361 2.205 0.156
Height of skull (measure 3)
2.752 2.747 0.005
Upper face length (measure 4)
0.775 0.952 -0.177
Face breadth between
outermost points of cheekbones
(measure 5)
0.195 0.372 -0.177
Z = -0.09 meas1 + 0.156 meas2+ 0.005 meas3 – 0.177 meas4 – 0.177 meas5
27
Discriminant AnalysisZ = -0.09 meas1 + 0.156 meas2+ 0.005 meas3 – 0.177 meas4 – 0.177 meas5
The difference between the constant coefficients (-514.956 and -545.419, bottom row of Classification Function Coefficients, previously) provides the sample mean of the discriminant function scores
463.30z
28
Discriminant Analysis
The coefficients defining Fisher’s linear discriminant function in the equation are proportional to the unstandardised coefficients given in the “Canonical Discriminant Function Coefficients” table which is produced when Unstandardised is checked in the Statistics sub-dialogue box.
Canonical Discriminant Function Coefficients
.048
-.083
-.003
.095
.095
-16.222
Greatest length of skull
Greatest horizontalbreadth of skull
Height of skull
Upper face length
Face breadth betweenoutermost points ofcheek bones
(Constant)
1
Function
Unstandardized coefficients
29
Discriminant Analysis
These scores can be compared with the average of their group means (shown in the Functions at Group Centroids table) to allocate skulls into groups. Here the threshold against which a skull’s discriminant score is evaluated is
0.0585= ½ (-0.877 + 0.994)Functions at Group Centroids
-.877
.994
Place whereskulls were foundSikkem or Tibet
Lhasa
1
Function
Unstandardized canonical discriminantfunctions evaluated at group means
Thus new skulls with discriminant scores above 0.0585 would be assigned to the Lhasa site (type B); otherwise, they would be classified as Sikkim/Tibet (type A).
30
Discriminant Analysis
When variables are measured on different scales, the magnitude of an unstandardised coefficient provides little indication of the relative contribution of the variable to the overall discrimination. The “Standardized Canonical Discriminant Function Coefficients” listed attempt to overcome this problem by rescaling of the variables to unit standard deviation. Standardized Canonical Discriminant Function Coefficients
.367
-.578
-.017
.405
.627
Greatest length of skull
Greatest horizontalbreadth of skull
Height of skull
Upper face length
Face breadth betweenoutermost points ofcheek bones
1
Function
31
Discriminant Analysis
For our data, such standardisation is not necessary since all skull measurements were in millimetres. Standardization should, however, not matter much since the within-group standard deviations were similar across different skull measures. According to the standardized coefficients, skull height (meas3) seems to contribute little to discriminating between the two types of skulls.Standardized Canonical Discriminant Function Coefficients
.367
-.578
-.017
.405
.627
Greatest length of skull
Greatest horizontalbreadth of skull
Height of skull
Upper face length
Face breadth betweenoutermost points ofcheek bones
1
Function
32
Discriminant Analysis
A question of some importance about a discriminant function is: how well does it perform? One possible method of evaluating performance is to apply the derived classification rule to the data set and calculate the misclassification rate.
33
Discriminant Analysis
Repeat using the following classification.
Now proceed to complete the analysis.
34
Discriminant AnalysisThis is known as the re-substitution estimate and the corresponding results are shown in the Original part of the Classification Results table. According to this estimate, 81.3% ((17×82.4+15×80)/(15+17)) of skulls can be correctly classified as type A or type B on the basis of the discriminant rule.
Classification Resultsb,c
14 3 17
3 12 15
82.4 17.6 100.0
20.0 80.0 100.0
12 5 17
6 9 15
70.6 29.4 100.0
40.0 60.0 100.0
Place whereskulls were foundSikkem or Tibet
Lhasa
Sikkem or Tibet
Lhasa
Sikkem or Tibet
Lhasa
Sikkem or Tibet
Lhasa
Count
%
Count
%
Original
Cross-validateda
Sikkem orTibet Lhasa
Predicted GroupMembership
Total
Cross validation is done only for those cases in the analysis. In crossvalidation, each case is classified by the functions derived from all cases otherthan that case.
a.
81.3% of original grouped cases correctly classified.b.
65.6% of cross-validated grouped cases correctly classified.c.
35
Discriminant AnalysisHowever, estimating misclassification rates in this way is known to be overly optimistic and several alternatives for estimating misclassification rates in discriminant analysis have been suggested. One of the most commonly used of these alternatives is the so called leaving one out method, in which the discriminant function is first derived from only n – 1 sample members, and then used to classify the observation left out. The procedure is repeated n times, each time omitting a different observation.
36
Discriminant Analysis
Classification Resultsb,c
14 3 17
3 12 15
82.4 17.6 100.0
20.0 80.0 100.0
12 5 17
6 9 15
70.6 29.4 100.0
40.0 60.0 100.0
Place whereskulls were foundSikkem or Tibet
Lhasa
Sikkem or Tibet
Lhasa
Sikkem or Tibet
Lhasa
Sikkem or Tibet
Lhasa
Count
%
Count
%
Original
Cross-validateda
Sikkem orTibet Lhasa
Predicted GroupMembership
Total
Cross validation is done only for those cases in the analysis. In crossvalidation, each case is classified by the functions derived from all cases otherthan that case.
a.
81.3% of original grouped cases correctly classified.b.
65.6% of cross-validated grouped cases correctly classified.c.
The Cross-validated part of the Classification Results table shows the results from applying this procedure. The correct classification rate now drops to 65.6% ((17×70.6+15×60)/(15+17)), a considerably lower success rate than suggested by the simple re-substitution rule.
37
Discriminant AnalysisWe now turn to applying cluster analysis to the skull data. Here the prior classification of the skulls will be ignored and the data simply “explored” to see if there is any evidence of interesting “natural” groupings of the skulls and if there is, whether these groups correspond in anyway with Morant’s classification.
Here we will use two hierarchical agglomerative clustering procedures, complete and average linkage clustering and then k-means clustering.
38
Discriminant AnalysisSelect Analyze > Classify > Hierarchical Cluster
39
Discriminant AnalysisIn the usual way select the variables of interest
40
Discriminant AnalysisSelect the plots desired
41
Discriminant AnalysisSelect the desired method
Now proceed to complete the analysis.
42
Discriminant AnalysisThe complete linkage clustering output shows which skulls or clusters are combined at each stage of the cluster procedure.
43
Discriminant
AnalysisFirst, skull 8 is joined with skull 13 since the Euclidean distance between these two skulls is smaller than the distance between any other pair of skulls. The distance is shown in the column labelled “Coefficients”.
Agglomeration Schedule
8 13 3.041 0 0 4
15 17 5.385 0 0 14
9 23 5.701 0 0 11
8 19 5.979 1 0 8
24 28 6.819 0 0 17
21 22 6.910 0 0 21
16 29 7.211 0 0 15
7 8 8.703 0 4 13
2 3 8.874 0 0 14
27 30 9.247 0 0 23
5 9 9.579 0 3 13
18 32 9.874 0 0 18
5 7 10.700 11 8 24
2 15 11.522 9 2 28
6 16 12.104 0 7 22
14 25 12.339 0 0 21
24 31 13.528 5 0 23
11 18 13.537 0 12 22
1 20 13.802 0 0 26
4 10 14.062 0 0 28
14 21 15.588 16 6 25
6 11 16.302 15 18 24
24 27 18.554 17 10 27
5 6 18.828 13 22 29
12 14 20.700 0 21 30
1 26 24.597 19 0 27
1 24 25.269 26 23 30
2 4 25.880 14 20 29
2 5 26.930 28 24 31
1 12 36.342 27 25 31
1 2 48.816 30 29 0
Stage1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Cluster 1 Cluster 2
Cluster Combined
Coefficients Cluster 1 Cluster 2
Stage Cluster FirstAppears
Next Stage
44
Discriminant
AnalysisSecond, skull 15 is joined with skull 17 and so on.
Agglomeration Schedule
8 13 3.041 0 0 4
15 17 5.385 0 0 14
9 23 5.701 0 0 11
8 19 5.979 1 0 8
24 28 6.819 0 0 17
21 22 6.910 0 0 21
16 29 7.211 0 0 15
7 8 8.703 0 4 13
2 3 8.874 0 0 14
27 30 9.247 0 0 23
5 9 9.579 0 3 13
18 32 9.874 0 0 18
5 7 10.700 11 8 24
2 15 11.522 9 2 28
6 16 12.104 0 7 22
14 25 12.339 0 0 21
24 31 13.528 5 0 23
11 18 13.537 0 12 22
1 20 13.802 0 0 26
4 10 14.062 0 0 28
14 21 15.588 16 6 25
6 11 16.302 15 18 24
24 27 18.554 17 10 27
5 6 18.828 13 22 29
12 14 20.700 0 21 30
1 26 24.597 19 0 27
1 24 25.269 26 23 30
2 4 25.880 14 20 29
2 5 26.930 28 24 31
1 12 36.342 27 25 31
1 2 48.816 30 29 0
Stage1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Cluster 1 Cluster 2
Cluster Combined
Coefficients Cluster 1 Cluster 2
Stage Cluster FirstAppears
Next Stage
45
Discriminant AnalysisThe dendrogram is simpler to interpret (see next slide).
46
47
Discriminant AnalysisThe dendrogram may, on occasions, also be useful in deciding the number of clusters in a data set with a sudden increase in the size of the difference in adjacent steps taken as an informal indication of the appropriate number of clusters to consider.
48
Discriminant AnalysisA fairly large jump occurs between stages 29 and 30 (indicating a three-group solution) and an even bigger one between this penultimate and the ultimate fusion of groups (a two-group solution).
49
Discriminant AnalysisFor an alternate approach use
Now proceed to produce the plot
50
Discriminant AnalysisThe initial steps agree with the complete linkage solution, but eventually the trees diverge with the average linkage dendrogram successively adding small clusters to one increasingly large cluster. For the average linkage dendrogram (see next slide) it is not clear where to cut the dendrogram to give a specific number of groups.
51
52
Discriminant AnalysisSince we believe there are two groups a final cluster analysis, employing this information, may be attempted.
53
Discriminant AnalysisThe variable selection and number of clusters are shown.
54
Discriminant AnalysisThe resulting cluster output shows the Initial Cluster Centre table displays the starting values used by the algorithm.
Initial Cluster Centers
200.0 167.0
139.5 130.0
143.5 125.5
82.5 69.5
146.0 119.5
Greatest length of skull
Greatest horizontalbreadth of skull
Height of skull
Upper face length
Face breadth betweenoutermost points ofcheek bones
1 2
Cluster
55
Discriminant AnalysisThe Iteration History table indicates that the algorithm has converged.
Iteration Historya
16.626 16.262
.000 .000
Iteration1
2
1 2
Change in ClusterCenters
Convergence achieved due to no or smallchange in cluster centers. The maximumabsolute coordinate change for any center is.000. The current iteration is 2. The minimumdistance between initial centers is 48.729.
a.
56
Discriminant AnalysisThe Final Cluster Centres tables describe the final cluster solution.
Final Cluster Centers
188.4 174.1
141.3 137.6
135.8 131.6
77.6 69.7
138.5 130.4
Greatest length of skull
Greatest horizontalbreadth of skull
Height of skull
Upper face length
Face breadth betweenoutermost points ofcheek bones
1 2
Cluster
57
Discriminant AnalysisThe Number of Cases in each Cluster tables describe the final cluster solution.
Number of Cases in each Cluster
13.000
19.000
32.000
.000
1
2
Cluster
Valid
Missing
58
Discriminant AnalysisHow does the k-means two-group solution compare with the original classification of the skulls into types A and B?
We can investigate this by first using the Save button on the k-Means Cluster Analysis dialogue box to save cluster membership for each skull in the Data View spreadsheet.
59
Discriminant AnalysisThe new categorical variable now available (labelled QCL_1) can be cross-tabulated with assumed skull type (variable place). The display shows the resulting table; the k-means clusters largely agree with the skull types as originally suggested by Morant, with cluster 1 consisting primarily of Type B skulls (those from Lhasa) and cluster 2 containing mostly skulls of Type A (from Sikkim and the neighbouring area of Tibet). Only six skulls are wrongly placed.
60
Discriminant AnalysisThe new categorical variable now available (labelled QCL_1) can be cross-tabulated with assumed skull type.
61
Discriminant AnalysisThe new categorical variable now available (labelled QCL_1) can be cross-tabulated with assumed skull type.
62
Discriminant AnalysisThe new categorical variable now available (labelled QCL_1) can be cross-tabulated with assumed skull type. Assumed type of skull
A B
Count Count
1 2 11 Cluster Number of Case
2 15 4
The k-means clusters largely agree with the skull types as originally suggested, with cluster 1 consisting primarily of Type B skulls (those from Lhasa) and cluster 2 containing mostly skulls of Type A (from Sikkim and the neighbouring area of Tibet). Only six skulls are wrongly placed.