Discriminant Analysis

1

Discriminant Analysis

Discriminant analysis is used to determine which variables discriminate between two or more naturally occurring groups.

Computationally, discriminant function analysis is very similar to analysis of variance (ANOVA).

Wednesday 19 April 2023 04:37 PM

https://www.staff.ncl.ac.uk/mike.cox/III/spss1.ppt

2


For example, an educational researcher may want to investigate which variables discriminate between high school graduates who decide to(1)go to college, (2)attend a trade or professional school, or (3)seek no further training or education. For that purpose the researcher could collect data on numerous variables prior to students' graduation. After graduation, most students will naturally fall into one of the three categories. Discriminant Analysis could then be used to determine which variable(s) are the best predictors of students' subsequent educational choice.

3


For example, a medical researcher may record different variables relating to patients' backgrounds in order to learn which variables best predict whether a patient is likely to recover completely (group 1), partially (group 2), or not at all (group 3). A biologist could record different characteristics of similar types (groups) of flowers, and then perform a discriminant function analysis to determine the set of characteristics that allows for the best discrimination between the types.

4


The data examined here consist of five measurements on each of 32 skulls found in the southwestern and eastern districts of Tibet.

1. Greatest length of skull (measure 1)2. Greatest horizontal breadth of skull (measure

2)3. Height of skull (measure 3)4. Upper face length (measure 4)5. Face breadth between outermost points of

cheekbones (measure 5)There are also location and grouping variables.

5

Discriminant AnalysisThis work is loosely based on A Handbook of Statistical Analyses Using SPSS Sabine Landau, Brian S. Everitt Chapman and Hall CRC 2003 and Handbook of Statistical Analyses Using Stata, Fourth Edition By Brian S. Everitt, Sophia Rabe-Hesketh CRC 2006.

These data, collected by Colonel L.A. Waddel, were first reported in Morant (1923) and are also given in Hand et al. (1994).

Hand, D.J. Daly, F. Lunn, A.D. McConway K.J. and Ostrowski, E. 1994 A Handbook of Small Data Sets London: Chapman & Hall.Morant, G.M. 1923 A first study of the Tibetan skull Biometrika 14 193-260.

https://www.crcpress.com/A-Handbook-of-Statistical-Analyses-Using-SPSS/Landau-Everitt/9781584883692

https://www.crcpress.com/Handbook-of-Statistical-Analyses-Using-Stata-Fourth-Edition/Everitt-RabeHesketh/9781584887560

6


The data can be divided into two groups. The first comprises skulls 1 to 17 found in graves in Sikkim and the neighbouring area of Tibet (Type A skulls). The remaining 15 skulls (Type B skulls) were picked up on a battlefield in the Lhasa district and are believed to be those of native soldiers from the eastern province of Khams. These skulls were of particular interest since it was thought at the time that Tibetans from Khams might be survivors of a particular human type, unrelated to the Mongolian and Indian types that surrounded them.

7


There are two questions that might be of interest for these data:

Do the five measurements discriminate between the two assumed groups of skulls and can they be used to produce a useful rule for classifying other skulls that might become available?

Taking the 32 skulls together, are there any natural groupings in the data and, if so, do they correspond to the groups assumed?

8


Classification is an important component of virtually all scientific research. Statistical techniques concerned with classification are essentially of two types. The first (cluster analysis) aims to uncover groups of observations from initially unclassified data. The second (discriminant analysis) works with data that is already classified into groups to derive rules for classifying new (and as yet unclassified) individuals on the basis of their observed variable values.

https://www.staff.ncl.ac.uk/mike.cox/III/spss4.ppt

9


Initially it is wise to take a look at your raw data.

10


Select matrix scatter

Use Define to select.

11


Select matrix variables and markers.

Note that greatest length of skull is above the list shown.

Use OK to accept.

12

Discriminant AnalysisWhile this diagram only allows us to asses the group separation in two dimensions, it seems to suggest that face breadth between outer-most points of cheek bones (meas5), greatest length of skull (meas1), and upper face length (meas4) provide the greatest discrimination between the two skull types.

13


We shall now use Fisher’s linear discriminant function to derive a classification rule for assigning skulls to one of the two predefined groups on the basis of the five measurements available.

14


Now proceed to complete the analysis.

15


As before use the secondary screens to select the grouping variable (place) and use Define Range.

16


From the statistics button make the following selection


17


Select the independents, use OK to run.

18


The Group Statistics table gives the resulting descriptive output. It displays, means and standard deviations of each of the five measurements for each type of skull, and overall (total).

Group Statistics

174.824 6.7475 17 17.000

139.353 7.6030 17 17.000

132.000 6.0078 17 17.000

69.824 4.5756 17 17.000

130.353 8.1370 17 17.000

185.733 8.6269 15 15.000

138.733 6.1117 15 15.000

134.767 6.0263 15 15.000

76.467 3.9118 15 15.000

137.500 4.2384 15 15.000

179.938 9.3651 32 32.000

139.063 6.8412 32 32.000

133.297 6.0826 32 32.000

72.938 5.3908 32 32.000

133.703 7.4443 32 32.000

Greatest length of skull

Greatest horizontalbreadth of skull

Height of skull

Upper face length

Face breadth betweenoutermost points ofcheek bones



Height of skull

Upper face length




Height of skull

Upper face length


Place whereskulls were foundSikkem or Tibet

Lhasa

Total

Mean Std. Deviation Unweighted Weighted

Valid N (listwise)

19

Discriminant AnalysisThe within-group covariance matrices shown in the Covariance Matrices table suggest that the sample values differ to some extent, see Box’s test for equality of covariances (see Log Determinants and Test Results, below). Covariance Matrices

45.529 25.222 12.391 22.154 27.972

25.222 57.805 11.875 7.519 48.055

12.391 11.875 36.094 -.313 1.406

22.154 7.519 -.313 20.936 16.769

27.972 48.055 1.406 16.769 66.211

74.424 -9.523 22.737 17.794 11.125

-9.523 37.352 -11.263 .705 9.464

22.737 -11.263 36.317 10.724 7.196

17.794 .705 10.724 15.302 8.661

11.125 9.464 7.196 8.661 17.964



Height of skull

Upper face length




Height of skull

Upper face length



Lhasa

Greatestlength of skull

Greatesthorizontalbreadth of

skull Height of skullUpper face

length

Face breadthbetween

outermostpoints of

cheek bones

20


The within-group covariance matrices shown in the Covariance Matrices table suggest that the sample values differ to some extent, but according to Box’s test for equality of covariances (tables Log Determinants and Test Results) these differences are not statistically significant (F(15,3490) = 1.2, p = 0.25).

Log Determinants

5 16.164

5 15.773

5 16.727

Place where skullswere foundSikkem or Tibet

Lhasa

Pooled within-groups

RankLog

Determinant

The ranks and natural logarithms of determinantsprinted are those of the group covariance matrices.

Test Results

22.371

1.218

15

3489.901

.249

Box's M

Approx.

df1

df2

Sig.

F

Tests null hypothesis of equal population covariance matrices.

21


It appears that the equality of covariance matrices assumption needed for Fisher’s linear discriminant approach to be strictly correct is valid here.

In practice, Box’s test is not of great use since even if it suggests a departure for the equality hypothesis, the linear discriminant may still be preferable over a quadratic function. Here we shall simply assume normality for our data relying on the robustness of Fisher’s approach to deal with any minor departure from the assumption.

22


The resulting discriminant analysis shows the eigenvalue (here 0.93) represents the ratio of the between-group sums of squares to the within-group sum of squares of the discriminant scores. It is this criterion that is maximized in discriminant function analysis.

Eigenvalues

.930a 100.0 100.0 .694Function1

Eigenvalue % of Variance Cumulative %CanonicalCorrelation

First 1 canonical discriminant functions were used in theanalysis.

a.

23


The canonical correlation is simply the Pearson correlation between the discriminant function scores and group membership coded as 0 and 1. For the skull data, the canonical correlation value is 0.694 so that 0.6942 × 100 = 48% of the variance in the discriminant function scores can be explained by group differences.

Eigenvalues

.930a 100.0 100.0 .694Function1

Eigenvalue % of Variance Cumulative %CanonicalCorrelation

First 1 canonical discriminant functions were used in theanalysis.

a.

24

Discriminant AnalysisWilk’s Lambda provides a test for assessing the null hypothesis that in the population the vectors of means of the five measurements are the same in the two groups. The lambda coefficient is defined as the proportion of the total variance in the discriminant scores not explained by differences among the groups, here 51.8%. The formal test confirms that the sets of five mean skull measurements differ significantly between the two sites ( (5) = 18.1, p = 0.003). If the equality of mean vectors hypothesis had been accepted, there would be little point in carrying out a linear discriminant function analysis.

Wilks' Lambda

.518 18.083 5 .003Test of Function(s)1

Wilks'Lambda Chi-square df Sig.

2

2

25


Next we come to the Classification Function Coefficients. This table is displayed as a result of checking Fisher’s in the Statistics sub-dialogue box.

Classification Function Coefficients

1.468 1.558

2.361 2.205

2.752 2.747

.775 .952

.195 .372

-514.956 -545.419



Height of skull

Upper face length


(Constant)

Sikkem orTibet Lhasa

Place where skullswere found

Fisher's linear discriminant functions

26


It can be used to find Fisher’s linear discrimimant function as defined by simply subtracting the coefficients given for each variable in each group giving the following result:

Sikkern or Tibet

Lhasa Difference


(measure 1)1.468 1.558 -0.090

Greatest horizontal

breadth of skull (measure 2)

2.361 2.205 0.156

Height of skull (measure 3)

2.752 2.747 0.005

Upper face length (measure 4)

0.775 0.952 -0.177

Face breadth between

outermost points of cheekbones

(measure 5)

0.195 0.372 -0.177

Z = -0.09 meas1 + 0.156 meas2+ 0.005 meas3 – 0.177 meas4 – 0.177 meas5

27

Discriminant AnalysisZ = -0.09 meas1 + 0.156 meas2+ 0.005 meas3 – 0.177 meas4 – 0.177 meas5

The difference between the constant coefficients (-514.956 and -545.419, bottom row of Classification Function Coefficients, previously) provides the sample mean of the discriminant function scores

463.30z

28


The coefficients defining Fisher’s linear discriminant function in the equation are proportional to the unstandardised coefficients given in the “Canonical Discriminant Function Coefficients” table which is produced when Unstandardised is checked in the Statistics sub-dialogue box.

Canonical Discriminant Function Coefficients

.048

-.083

-.003

.095

.095

-16.222



Height of skull

Upper face length


(Constant)

1

Function

Unstandardized coefficients

29


These scores can be compared with the average of their group means (shown in the Functions at Group Centroids table) to allocate skulls into groups. Here the threshold against which a skull’s discriminant score is evaluated is

0.0585= ½ (-0.877 + 0.994)Functions at Group Centroids

-.877

.994


Lhasa

1

Function

Unstandardized canonical discriminantfunctions evaluated at group means

Thus new skulls with discriminant scores above 0.0585 would be assigned to the Lhasa site (type B); otherwise, they would be classified as Sikkim/Tibet (type A).

30


When variables are measured on different scales, the magnitude of an unstandardised coefficient provides little indication of the relative contribution of the variable to the overall discrimination. The “Standardized Canonical Discriminant Function Coefficients” listed attempt to overcome this problem by rescaling of the variables to unit standard deviation. Standardized Canonical Discriminant Function Coefficients

.367

-.578

-.017

.405

.627



Height of skull

Upper face length


1

Function

31


For our data, such standardisation is not necessary since all skull measurements were in millimetres. Standardization should, however, not matter much since the within-group standard deviations were similar across different skull measures. According to the standardized coefficients, skull height (meas3) seems to contribute little to discriminating between the two types of skulls.Standardized Canonical Discriminant Function Coefficients

.367

-.578

-.017

.405

.627



Height of skull

Upper face length


1

Function

32


A question of some importance about a discriminant function is: how well does it perform? One possible method of evaluating performance is to apply the derived classification rule to the data set and calculate the misclassification rate.

33


Repeat using the following classification.


34

Discriminant AnalysisThis is known as the re-substitution estimate and the corresponding results are shown in the Original part of the Classification Results table. According to this estimate, 81.3% ((17×82.4+15×80)/(15+17)) of skulls can be correctly classified as type A or type B on the basis of the discriminant rule.

Classification Resultsb,c

14 3 17

3 12 15

82.4 17.6 100.0

20.0 80.0 100.0

12 5 17

6 9 15

70.6 29.4 100.0

40.0 60.0 100.0


Lhasa

Sikkem or Tibet

Lhasa

Sikkem or Tibet

Lhasa

Sikkem or Tibet

Lhasa

Count

%

Count

%

Original

Cross-validateda


Predicted GroupMembership

Total

Cross validation is done only for those cases in the analysis. In crossvalidation, each case is classified by the functions derived from all cases otherthan that case.

a.

81.3% of original grouped cases correctly classified.b.

65.6% of cross-validated grouped cases correctly classified.c.

35

Discriminant AnalysisHowever, estimating misclassification rates in this way is known to be overly optimistic and several alternatives for estimating misclassification rates in discriminant analysis have been suggested. One of the most commonly used of these alternatives is the so called leaving one out method, in which the discriminant function is first derived from only n – 1 sample members, and then used to classify the observation left out. The procedure is repeated n times, each time omitting a different observation.

36


Classification Resultsb,c

14 3 17

3 12 15

82.4 17.6 100.0

20.0 80.0 100.0

12 5 17

6 9 15

70.6 29.4 100.0

40.0 60.0 100.0


Lhasa

Sikkem or Tibet

Lhasa

Sikkem or Tibet

Lhasa

Sikkem or Tibet

Lhasa

Count

%

Count

%

Original

Cross-validateda


Predicted GroupMembership

Total

Cross validation is done only for those cases in the analysis. In crossvalidation, each case is classified by the functions derived from all cases otherthan that case.

a.

81.3% of original grouped cases correctly classified.b.

65.6% of cross-validated grouped cases correctly classified.c.

The Cross-validated part of the Classification Results table shows the results from applying this procedure. The correct classification rate now drops to 65.6% ((17×70.6+15×60)/(15+17)), a considerably lower success rate than suggested by the simple re-substitution rule.

37

Discriminant AnalysisWe now turn to applying cluster analysis to the skull data. Here the prior classification of the skulls will be ignored and the data simply “explored” to see if there is any evidence of interesting “natural” groupings of the skulls and if there is, whether these groups correspond in anyway with Morant’s classification.

Here we will use two hierarchical agglomerative clustering procedures, complete and average linkage clustering and then k-means clustering.

38

Discriminant AnalysisSelect Analyze > Classify > Hierarchical Cluster

39

Discriminant AnalysisIn the usual way select the variables of interest

40

Discriminant AnalysisSelect the plots desired

41

Discriminant AnalysisSelect the desired method


42

Discriminant AnalysisThe complete linkage clustering output shows which skulls or clusters are combined at each stage of the cluster procedure.

43

Discriminant

AnalysisFirst, skull 8 is joined with skull 13 since the Euclidean distance between these two skulls is smaller than the distance between any other pair of skulls. The distance is shown in the column labelled “Coefficients”.

Agglomeration Schedule

8 13 3.041 0 0 4

15 17 5.385 0 0 14

9 23 5.701 0 0 11

8 19 5.979 1 0 8

24 28 6.819 0 0 17

21 22 6.910 0 0 21

16 29 7.211 0 0 15

7 8 8.703 0 4 13

2 3 8.874 0 0 14

27 30 9.247 0 0 23

5 9 9.579 0 3 13

18 32 9.874 0 0 18

5 7 10.700 11 8 24

2 15 11.522 9 2 28

6 16 12.104 0 7 22

14 25 12.339 0 0 21

24 31 13.528 5 0 23

11 18 13.537 0 12 22

1 20 13.802 0 0 26

4 10 14.062 0 0 28

14 21 15.588 16 6 25

6 11 16.302 15 18 24

24 27 18.554 17 10 27

5 6 18.828 13 22 29

12 14 20.700 0 21 30

1 26 24.597 19 0 27

1 24 25.269 26 23 30

2 4 25.880 14 20 29

2 5 26.930 28 24 31

1 12 36.342 27 25 31

1 2 48.816 30 29 0

Stage1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

Cluster 1 Cluster 2

Cluster Combined

Coefficients Cluster 1 Cluster 2

Stage Cluster FirstAppears

Next Stage

44

Discriminant

AnalysisSecond, skull 15 is joined with skull 17 and so on.

Agglomeration Schedule

8 13 3.041 0 0 4

15 17 5.385 0 0 14

9 23 5.701 0 0 11

8 19 5.979 1 0 8

24 28 6.819 0 0 17

21 22 6.910 0 0 21

16 29 7.211 0 0 15

7 8 8.703 0 4 13

2 3 8.874 0 0 14

27 30 9.247 0 0 23

5 9 9.579 0 3 13

18 32 9.874 0 0 18

5 7 10.700 11 8 24

2 15 11.522 9 2 28

6 16 12.104 0 7 22

14 25 12.339 0 0 21

24 31 13.528 5 0 23

11 18 13.537 0 12 22

1 20 13.802 0 0 26

4 10 14.062 0 0 28

14 21 15.588 16 6 25

6 11 16.302 15 18 24

24 27 18.554 17 10 27

5 6 18.828 13 22 29

12 14 20.700 0 21 30

1 26 24.597 19 0 27

1 24 25.269 26 23 30

2 4 25.880 14 20 29

2 5 26.930 28 24 31

1 12 36.342 27 25 31

1 2 48.816 30 29 0

Stage1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

Cluster 1 Cluster 2

Cluster Combined

Coefficients Cluster 1 Cluster 2

Stage Cluster FirstAppears

Next Stage

45

Discriminant AnalysisThe dendrogram is simpler to interpret (see next slide).

46

47

Discriminant AnalysisThe dendrogram may, on occasions, also be useful in deciding the number of clusters in a data set with a sudden increase in the size of the difference in adjacent steps taken as an informal indication of the appropriate number of clusters to consider.

48

Discriminant AnalysisA fairly large jump occurs between stages 29 and 30 (indicating a three-group solution) and an even bigger one between this penultimate and the ultimate fusion of groups (a two-group solution).

49

Discriminant AnalysisFor an alternate approach use

Now proceed to produce the plot

50

Discriminant AnalysisThe initial steps agree with the complete linkage solution, but eventually the trees diverge with the average linkage dendrogram successively adding small clusters to one increasingly large cluster. For the average linkage dendrogram (see next slide) it is not clear where to cut the dendrogram to give a specific number of groups.

51

52

Discriminant AnalysisSince we believe there are two groups a final cluster analysis, employing this information, may be attempted.

53

Discriminant AnalysisThe variable selection and number of clusters are shown.

54

Discriminant AnalysisThe resulting cluster output shows the Initial Cluster Centre table displays the starting values used by the algorithm.

Initial Cluster Centers

200.0 167.0

139.5 130.0

143.5 125.5

82.5 69.5

146.0 119.5



Height of skull

Upper face length


1 2

Cluster

55

Discriminant AnalysisThe Iteration History table indicates that the algorithm has converged.

Iteration Historya

16.626 16.262

.000 .000

Iteration1

2

1 2

Change in ClusterCenters

Convergence achieved due to no or smallchange in cluster centers. The maximumabsolute coordinate change for any center is.000. The current iteration is 2. The minimumdistance between initial centers is 48.729.

a.

56

Discriminant AnalysisThe Final Cluster Centres tables describe the final cluster solution.

Final Cluster Centers

188.4 174.1

141.3 137.6

135.8 131.6

77.6 69.7

138.5 130.4



Height of skull

Upper face length


1 2

Cluster

57

Discriminant AnalysisThe Number of Cases in each Cluster tables describe the final cluster solution.

Number of Cases in each Cluster

13.000

19.000

32.000

.000

1

2

Cluster

Valid

Missing

58

Discriminant AnalysisHow does the k-means two-group solution compare with the original classification of the skulls into types A and B?

We can investigate this by first using the Save button on the k-Means Cluster Analysis dialogue box to save cluster membership for each skull in the Data View spreadsheet.

59

Discriminant AnalysisThe new categorical variable now available (labelled QCL_1) can be cross-tabulated with assumed skull type (variable place). The display shows the resulting table; the k-means clusters largely agree with the skull types as originally suggested by Morant, with cluster 1 consisting primarily of Type B skulls (those from Lhasa) and cluster 2 containing mostly skulls of Type A (from Sikkim and the neighbouring area of Tibet). Only six skulls are wrongly placed.

60

Discriminant AnalysisThe new categorical variable now available (labelled QCL_1) can be cross-tabulated with assumed skull type.

61

Discriminant AnalysisThe new categorical variable now available (labelled QCL_1) can be cross-tabulated with assumed skull type.

62

Discriminant AnalysisThe new categorical variable now available (labelled QCL_1) can be cross-tabulated with assumed skull type. Assumed type of skull

A B

Count Count

1 2 11 Cluster Number of Case

2 15 4

The k-means clusters largely agree with the skull types as originally suggested, with cluster 1 consisting primarily of Type B skulls (those from Lhasa) and cluster 2 containing mostly skulls of Type A (from Sikkim and the neighbouring area of Tibet). Only six skulls are wrongly placed.

Documents

Discriminant Analysis