Upload
kasia
View
60
Download
0
Embed Size (px)
DESCRIPTION
Rationale and use of DFA The underlying model (What is a discriminant function anyway?) Finding discriminant functions: principles and procedures. Linear versus quadratic discriminant functions Significance testing Rotating discriminant functions - PowerPoint PPT Presentation
Citation preview
2003
Bio 8102A Applied Multivariate Biostatistics L9.1
Université d’Ottawa / University of Ottawa
Lecture 9: Discriminant function Lecture 9: Discriminant function analysis (DFA)analysis (DFA)
Lecture 9: Discriminant function Lecture 9: Discriminant function analysis (DFA)analysis (DFA)
Rationale and use of DFA
The underlying model (What is a discriminant function anyway?)
Finding discriminant functions: principles and procedures
Rationale and use of DFA
The underlying model (What is a discriminant function anyway?)
Finding discriminant functions: principles and procedures
Linear versus quadratic discriminant functions
Significance testing Rotating discriminant
functions Component retention,
significance, and reliability.
Linear versus quadratic discriminant functions
Significance testing Rotating discriminant
functions Component retention,
significance, and reliability.
2003
Bio 8102A Applied Multivariate Biostatistics L9.2
Université d’Ottawa / University of Ottawa
What is discriminant function What is discriminant function analysis?analysis?
What is discriminant function What is discriminant function analysis?analysis?
Given a set of p variables X1, X2,…, Xp,
and a set of N objects belonging to m known groups (classes) G1, G2,…, Gm , we try and construct a set of functions Z1, Z2,…, Zmin{m-1,p} that allow us to classify each object correctly.
Given a set of p variables X1, X2,…, Xp,
and a set of N objects belonging to m known groups (classes) G1, G2,…, Gm , we try and construct a set of functions Z1, Z2,…, Zmin{m-1,p} that allow us to classify each object correctly.
The hope (sometimes faint) is that “good” classification results (i.e., low misclassification rate, high reliability) will be obtained through a relatively small set of simple functions.
The hope (sometimes faint) is that “good” classification results (i.e., low misclassification rate, high reliability) will be obtained through a relatively small set of simple functions.
2003
Bio 8102A Applied Multivariate Biostatistics L9.3
Université d’Ottawa / University of Ottawa
What is a discriminant What is a discriminant function anyway?function anyway?
What is a discriminant What is a discriminant function anyway?function anyway?
A discriminant function is a function:
which maximizes the “separation” between the groups under consideration, or (more technically) maximizes the ratio of between group/within group variation.
A discriminant function is a function:
which maximizes the “separation” between the groups under consideration, or (more technically) maximizes the ratio of between group/within group variation.
1( , , )i i pZ f X X 1Z
2Z
Group 1Group 2
Group 1 Group 2
(not so good)
(better)
Fre
qu
ency
2003
Bio 8102A Applied Multivariate Biostatistics L9.4
Université d’Ottawa / University of Ottawa
The linear discriminant modelThe linear discriminant modelThe linear discriminant modelThe linear discriminant model
For a set of p variables X1, X2,…, Xp, the general model is
where the Xjs are the original variables and the aijs are the discriminant function coefficients.
For a set of p variables X1, X2,…, Xp, the general model is
where the Xjs are the original variables and the aijs are the discriminant function coefficients.
Note: unlike in PCA and FA, the discriminant functions are based on the raw (unstandardized) variables, since the resulting classifications are unaffected by scale.
For p variables and m groups, the maximum number of DFs is min{p, m-1}.
Note: unlike in PCA and FA, the discriminant functions are based on the raw (unstandardized) variables, since the resulting classifications are unaffected by scale.
For p variables and m groups, the maximum number of DFs is min{p, m-1}.
1
p
i ij jj
Z a X
2003
Bio 8102A Applied Multivariate Biostatistics L9.5
Université d’Ottawa / University of Ottawa
The geometry of a single linear The geometry of a single linear discriminant functiondiscriminant function
The geometry of a single linear The geometry of a single linear discriminant functiondiscriminant function
2 groups with measurements of two variables (X1 and X2) on each object.
In this case, the linear DF Z* results in no misclassifications, whereas another possible DF (Z) gives two misclassifications.
2 groups with measurements of two variables (X1 and X2) on each object.
In this case, the linear DF Z* results in no misclassifications, whereas another possible DF (Z) gives two misclassifications. Group 1
Group 2
X2
X1
Z
Z*
Misclassified under Zbut not under Z*
2003
Bio 8102A Applied Multivariate Biostatistics L9.6
Université d’Ottawa / University of Ottawa
Finding discriminant Finding discriminant functions: principlesfunctions: principlesFinding discriminant Finding discriminant functions: principlesfunctions: principles
The first discriminant function is that which maximizes the differences between groups compared to the differences within groups…
…which is equivalent to maximizing F in a one-way ANOVA.
The first discriminant function is that which maximizes the differences between groups compared to the differences within groups…
…which is equivalent to maximizing F in a one-way ANOVA.
a = (a1,…, ap)
F(Z
) Z1
1
( )( ) ,
( )
max{ ( )}
B
W
MS ZF Z
MS Z
Z F Z
2003
Bio 8102A Applied Multivariate Biostatistics L9.7
Université d’Ottawa / University of Ottawa
Finding discriminant Finding discriminant functions: principlesfunctions: principlesFinding discriminant Finding discriminant functions: principlesfunctions: principles
The second discriminant function is that which maximizes the differences between groups compared to the differences within groups unaccounted for by Z1...
…which is equivalent to maximizing F in a one-way ANOVA given the constraint that Z1, Z2 are uncorrelated.
The second discriminant function is that which maximizes the differences between groups compared to the differences within groups unaccounted for by Z1...
…which is equivalent to maximizing F in a one-way ANOVA given the constraint that Z1, Z2 are uncorrelated.
a = (a1,…, ap)
F(Z
) Z2
1, 22
( )( ) ,
( )
max{ ( ) 0}
B
W
Z Z
MS ZF Z
MS Z
Z F Z r
2003
Bio 8102A Applied Multivariate Biostatistics L9.8
Université d’Ottawa / University of Ottawa
The geometry of several linear The geometry of several linear discriminant functionsdiscriminant functionsThe geometry of several linear The geometry of several linear discriminant functionsdiscriminant functions
2 groups with measurements of two variables (X1 and X2) on each individual.
Using only Z1, 4 objects are misclassified, whereas using both Z1 and Z2, only one object is misclassified.
2 groups with measurements of two variables (X1 and X2) on each individual.
Using only Z1, 4 objects are misclassified, whereas using both Z1 and Z2, only one object is misclassified. X2
X1
Z1
Group 1
Group 2
Z2
Misclassified using bothZ1 and Z2
Misclassified using onlyZ1
2003
Bio 8102A Applied Multivariate Biostatistics L9.9
Université d’Ottawa / University of Ottawa
SSCP matrices: SSCP matrices: within, between, and within, between, and
totaltotal
SSCP matrices: SSCP matrices: within, between, and within, between, and
totaltotal The total (T) SSCP matrix
(based on p variables X1, X2,…, Xp ) in a sample of objects belonging to m groups G1, G2,…, Gm with sizes n1, n2,…, nm can be partitioned into within-groups (W) and between-groups (B) SSCP matrices:
The total (T) SSCP matrix (based on p variables X1, X2,…, Xp ) in a sample of objects belonging to m groups G1, G2,…, Gm with sizes n1, n2,…, nm can be partitioned into within-groups (W) and between-groups (B) SSCP matrices:
ijkx
jkx
kx
1 1
1 1
( )( )
( )( )
j
j
nm
rc ijr r ijc cj i
nm
rc ijr jr ijc jcj i
t x x x x
w x x x x
Value of variable Xk forith observation in group j
Mean of variable Xk forgroup j
Overall mean of variable Xk
T B W
,rc rct w Element in row r andcolumn c of total (T, t) and within (W, w) SSCP
2003
Bio 8102A Applied Multivariate Biostatistics L9.10
Université d’Ottawa / University of Ottawa
Finding discriminant functions: Finding discriminant functions: analytic proceduresanalytic proceduresFinding discriminant functions: Finding discriminant functions: analytic proceduresanalytic procedures
Calculate total (T), within (W) and between (W) SSCPs.
Determine eigenvalues and eigenvectors of the product W-1 B.
is ratio of between to within
SSs for the ith discriminant function Zi…
…and the elements of the corresponding eigenvectors are the discriminant function coefficients.
Calculate total (T), within (W) and between (W) SSCPs.
Determine eigenvalues and eigenvectors of the product W-1 B.
is ratio of between to within
SSs for the ith discriminant function Zi…
…and the elements of the corresponding eigenvectors are the discriminant function coefficients.
T B W
11 2( ) ( , , , )p B W
( )
( )B i
iW i
SS Z
SS Z
11 2( ) ( , , , )i i i ipa a a B W
2003
Bio 8102A Applied Multivariate Biostatistics L9.11
Université d’Ottawa / University of Ottawa
AssumptionsAssumptionsAssumptionsAssumptions
Equality of within-group covariance matrices (C1 = C2 = ...) implies that each element of C1 is equal to the corresponding element in C2 , etc.
Equality of within-group covariance matrices (C1 = C2 = ...) implies that each element of C1 is equal to the corresponding element in C2 , etc.
Variable X1 X2 X3
X1 s2
X2 c21 s2
X3 c31 c32 s2
Variance
Covariance
Variable X1 X2 X3
X1 s2
X2 c21 s2
X3 c31 c32 s2
G1
G2
2003
Bio 8102A Applied Multivariate Biostatistics L9.12
Université d’Ottawa / University of Ottawa
The quadratic The quadratic discriminant modeldiscriminant model
The quadratic The quadratic discriminant modeldiscriminant model
For a set of p variables X1, X2,…, Xp, the general quadratic model is
where the Xjs are the original variables and the aijs are the linear coefficients and the bijs are the 2nd order coefficients.
For a set of p variables X1, X2,…, Xp, the general quadratic model is
where the Xjs are the original variables and the aijs are the linear coefficients and the bijs are the 2nd order coefficients.
Because the quadratic model involves many more parameters, sample sizes must be considerably larger to get reasonably stable estimates of coefficients.
Because the quadratic model involves many more parameters, sample sizes must be considerably larger to get reasonably stable estimates of coefficients.
1
p
i ij j ij i jj
Z a X b X X
X2
X1
Group 1
Group 2
Quadratic Z1
Linear Z1
2003
Bio 8102A Applied Multivariate Biostatistics L9.13
Université d’Ottawa / University of Ottawa
Fitting discriminant function Fitting discriminant function models: the problemsmodels: the problems
Goal: find the “best” model, given the available data Problem 1: what is “best”? Problem 2: even if “best” is defined, by what
method do we find it? Possibilities:
If there are m variables, we might compute DFs using all possible subsets (2m -1) of variables models and choose the best one
use some procedure for winnowing down the set of possible models.
2003
Bio 8102A Applied Multivariate Biostatistics L9.14
Université d’Ottawa / University of Ottawa
Criteria for choosing the “best” Criteria for choosing the “best” discriminant modeldiscriminant model
Criteria for choosing the “best” Criteria for choosing the “best” discriminant modeldiscriminant model
Discriminating ability: better models are better able to distinguish among groups
Implication: better models will have lower misclassification rates.
N.B. Raw misclassification rates can be very misleading.
Discriminating ability: better models are better able to distinguish among groups
Implication: better models will have lower misclassification rates.
N.B. Raw misclassification rates can be very misleading.
Parsimony: a discriminant model which includes fewer variables is better than one with more variables.
Implication: if the elimination/addition of a variable does not significantly increase/decrease the misclassification rate, it may not be very useful.
Parsimony: a discriminant model which includes fewer variables is better than one with more variables.
Implication: if the elimination/addition of a variable does not significantly increase/decrease the misclassification rate, it may not be very useful.
2003
Bio 8102A Applied Multivariate Biostatistics L9.15
Université d’Ottawa / University of Ottawa
Criteria for choosing the “best” Criteria for choosing the “best” discriminant model (cont’d)discriminant model (cont’d)
Criteria for choosing the “best” Criteria for choosing the “best” discriminant model (cont’d)discriminant model (cont’d)
Model stability: better models have coefficients that are stable as judged through cross-validation.
Procedure: Judge stability through cross-validation (jackknifing, bootstrapping).
Model stability: better models have coefficients that are stable as judged through cross-validation.
Procedure: Judge stability through cross-validation (jackknifing, bootstrapping).
NB.1. In general, linear discriminant functions will be more stable than quadratic functions, especially if the sample is small.
NB.2. If the sample is small, then ”outliers” may dramatically decrease model stability.
NB.1. In general, linear discriminant functions will be more stable than quadratic functions, especially if the sample is small.
NB.2. If the sample is small, then ”outliers” may dramatically decrease model stability.
2003
Bio 8102A Applied Multivariate Biostatistics L9.16
Université d’Ottawa / University of Ottawa
Fitting discriminant function Fitting discriminant function models: the problemsmodels: the problems
Goal: find the “best” model, given the available data Problem 1: what is “best”? Problem 2: even if “best” is defined, by what
method do we find it? Possibilities:
If there are m variables, we might compute DFs using all possible subsets (2m -1) of variables models and choose the best one
use some procedure for winnowing down the set of possible models.
2003
Bio 8102A Applied Multivariate Biostatistics L9.17
Université d’Ottawa / University of Ottawa
Analytic procedures: general Analytic procedures: general approachapproach
Analytic procedures: general Analytic procedures: general approachapproach
Evaluate significance of a variable (Xi) in DF by computing the difference in group resolution between two models, one with the variable included, the other with it excluded.
Evaluate change in discriminating ability ( DA) associated with inclusion of the variable in question
Unfortunately, change in discriminating ability may depend on what other variables are in model!
Evaluate significance of a variable (Xi) in DF by computing the difference in group resolution between two models, one with the variable included, the other with it excluded.
Evaluate change in discriminating ability ( DA) associated with inclusion of the variable in question
Unfortunately, change in discriminating ability may depend on what other variables are in model!
Model A(Xi in)
Model B(Xi out)
DA
Delete Xi
( small)
Retain Xi
( large)
2003
Bio 8102A Applied Multivariate Biostatistics L9.18
Université d’Ottawa / University of Ottawa
Strategy I: computing all possible Strategy I: computing all possible modelsmodels
Strategy I: computing all possible Strategy I: computing all possible modelsmodels
compute all possible models and choose the “best” one.
Impractical unless number of variables is relatively small.
compute all possible models and choose the “best” one.
Impractical unless number of variables is relatively small.
{X1, X2, X3}
{X2}
{X1}
{X3}
{X1, X2}
{X2, X3}
{X1, X3}
{X1, X2, X3}
2003
Bio 8102A Applied Multivariate Biostatistics L9.19
Université d’Ottawa / University of Ottawa
Strategy II: Strategy II: forward selectionforward selection
Strategy II: Strategy II: forward selectionforward selection start with variable for which
differences among group means are the largest (largest F-value)
add others one at a time based on F to enter (p to enter) until no further significant increase in discriminating ability is achieved.
problem: if Xj is included, it stays in even if it contributes little to discriminating ability
once other variables are included.
start with variable for which differences among group means are the largest (largest F-value)
add others one at a time based on F to enter (p to enter) until no further significant increase in discriminating ability is achieved.
problem: if Xj is included, it stays in even if it contributes little to discriminating ability
once other variables are included.
F2 > F1, F3, F4
F2 > F to enter (< p to enter)
F1 > F3 , F4
F1 > F to enter (< p to enter)
(X1, X2, X3, X4 ) All variables
(X2)
(X1, X2)
(X1, X2)
F4 > F3; F4< F to enter (> p to enter)
Final model
2003
Bio 8102A Applied Multivariate Biostatistics L9.20
Université d’Ottawa / University of Ottawa
What is What is FF to enter/remove ( to enter/remove (pp to to enter/remove) anyway?enter/remove) anyway?
What is What is FF to enter/remove ( to enter/remove (pp to to enter/remove) anyway?enter/remove) anyway?
When no variables are in the model, F to enter is the F-value from a univariate one-way ANOVA comparing group means with respect to the variable in question, and p to enter is the Type I probability associated with the null that all group means are equal.
When no variables are in the model, F to enter is the F-value from a univariate one-way ANOVA comparing group means with respect to the variable in question, and p to enter is the Type I probability associated with the null that all group means are equal.
When other variables are in the model, F to enter corresponds to the F-value for an ANCOVA comparing group means with respect to the variable in question, where the covariates are the variables already entered.
When other variables are in the model, F to enter corresponds to the F-value for an ANCOVA comparing group means with respect to the variable in question, where the covariates are the variables already entered.
2003
Bio 8102A Applied Multivariate Biostatistics L9.21
Université d’Ottawa / University of Ottawa
Strategy III: Strategy III: backward selectionbackward selection
Strategy III: Strategy III: backward selectionbackward selection
Start with all variables and drop that for which differences among group means are the smallest (smallest F-value)
Delete others one at a time based on F to remove (p to remove) until further removal results in a significant reduction in the ability to discriminate groups.
problem: if Xj is excluded, it stays out even if it contributes substantially to discriminating ability once other variables are excluded.
Start with all variables and drop that for which differences among group means are the smallest (smallest F-value)
Delete others one at a time based on F to remove (p to remove) until further removal results in a significant reduction in the ability to discriminate groups.
problem: if Xj is excluded, it stays out even if it contributes substantially to discriminating ability once other variables are excluded.
F2 < F1, F3, F4
F2 < F to remove (> p to remove)
F1 < F3 , F4
F1 < F to remove (> p to remove)
(X1, X2, X3, X4 ) All variables in
(X3, X4)
(X3, X4)
F4 < F3; F4 > F to remove (< p to remove)
Final model
(X1, X3, X4 )
2003
Bio 8102A Applied Multivariate Biostatistics L9.22
Université d’Ottawa / University of Ottawa
Canonical scoresCanonical scoresCanonical scoresCanonical scores
Because discriminant functions are functions, we can “plug in” the values for each variable for each observation, and calculate a canonical score for each observation and each discriminant function.
Because discriminant functions are functions, we can “plug in” the values for each variable for each observation, and calculate a canonical score for each observation and each discriminant function.
Observation X1 X2
1 3.7 11.5
2 2.3 10.2
..
..a
11
12
21
22
.027(3.7) 0.97(11.5)
0.92(3.7) 0.39(11.5)
.027(2.3) 0.97(10.2)
0.92(2.3) 0.39(10.2)
Z
Z
Z
Z
2003
Bio 8102A Applied Multivariate Biostatistics L9.23
Université d’Ottawa / University of Ottawa
Canonical scores Canonical scores plotsplots
Canonical scores Canonical scores plotsplots
Plots of canonical scores for each object.
The better the model, the greater the separation between clouds of points representing individual groups, e.g. Fisher’s famous irises.
Plots of canonical scores for each object.
The better the model, the greater the separation between clouds of points representing individual groups, e.g. Fisher’s famous irises.
Canonical Scores Plot
-10 -5 0 5 10FACTOR(1)
-10
-5
0
5
10
FAC
TO
R(2
)
321
SPECIES
Canonical scores of group means
1 21 7.608 0.2152 -1.825 -0.728
3 -5.783 0.513
95% confidenceellipse
2003
Bio 8102A Applied Multivariate Biostatistics L9.24
Université d’Ottawa / University of Ottawa
PriorsPriorsPriorsPriors In standard DFA, it is
assumed that in the absence of any information, the a priori (prior) probability i of a
given object belonging to one of i = 1,…,m groups is the same for all groups:
In standard DFA, it is assumed that in the absence of any information, the a priori (prior) probability i of a
given object belonging to one of i = 1,…,m groups is the same for all groups:
But, if each group is not equally likely, then priors should be adjusted so as to reflect this bias.
E.g. in species with biased sex-ratios, males and females should have unequal priors.
But, if each group is not equally likely, then priors should be adjusted so as to reflect this bias.
E.g. in species with biased sex-ratios, males and females should have unequal priors.
1i m
2003
Bio 8102A Applied Multivariate Biostatistics L9.25
Université d’Ottawa / University of Ottawa
Caveats: unequal priorsCaveats: unequal priorsCaveats: unequal priorsCaveats: unequal priors
For a given set of discriminant functions, misclassification rates will usually depend on the priors…
…so that artificially low misclassification rates can be obtained simply by strategically adjusting the priors.
For a given set of discriminant functions, misclassification rates will usually depend on the priors…
…so that artificially low misclassification rates can be obtained simply by strategically adjusting the priors.
So, only adjust priors if you are confident that the true frequency of each group in the population is (reasonably) accurately estimated by the group frequencies in the sample.
So, only adjust priors if you are confident that the true frequency of each group in the population is (reasonably) accurately estimated by the group frequencies in the sample.
2003
Bio 8102A Applied Multivariate Biostatistics L9.26
Université d’Ottawa / University of Ottawa
Significance testingSignificance testingSignificance testingSignificance testing
Question: which discriminant functions are statistically “significant”?
For testing significance of all r DFs for m groups based on p variables, calculate Bartlett’s V and compare to 2 distribution with p(m-1) degrees of freedom
Question: which discriminant functions are statistically “significant”?
For testing significance of all r DFs for m groups based on p variables, calculate Bartlett’s V and compare to 2 distribution with p(m-1) degrees of freedom
1
11 ( )2
ln(1 )r
ii
V N p m
i Eigenvalue associatedwith ith discriminantfunction
2003
Bio 8102A Applied Multivariate Biostatistics L9.27
Université d’Ottawa / University of Ottawa
Significance testing (cont’d)Significance testing (cont’d)Significance testing (cont’d)Significance testing (cont’d) Each DF is tested in a
hierarchical fashion by first testing significance of all DFs combined.
If all DFs combined not significant, then no DF is significant.
If all DFs combined are significant, then remove first DF and recalculate V (= V1) and test.
Continue until residual Vj no longer significant at df = (p – j)(m – j - 1)
Each DF is tested in a hierarchical fashion by first testing significance of all DFs combined.
If all DFs combined not significant, then no DF is significant.
If all DFs combined are significant, then remove first DF and recalculate V (= V1) and test.
Continue until residual Vj no longer significant at df = (p – j)(m – j - 1)
1
12
23
11 ( ) ln(1 )2
11 ( 1) ln(1 )2
11 ( 2) ln(1 )2
r
ii
r
ii
r
ii
V N p m
V N p m
V N p m
11 ( ) ln(1 )2
r
j ii j
V N p m j
2003
Bio 8102A Applied Multivariate Biostatistics L9.28
Université d’Ottawa / University of Ottawa
Caveats/assumptions: tests of Caveats/assumptions: tests of significancesignificance
Caveats/assumptions: tests of Caveats/assumptions: tests of significancesignificance
Tests of significance assume that within-group covariance matrices are the same for all groups, and that within groups, observations have a multivariate normal distribution
Tests of significance can be very misleading because jth discriminant function in the population may not appear as jth discriminant function in the sample due to sampling errors…
So be careful, especially if the sample is small!
Tests of significance assume that within-group covariance matrices are the same for all groups, and that within groups, observations have a multivariate normal distribution
Tests of significance can be very misleading because jth discriminant function in the population may not appear as jth discriminant function in the sample due to sampling errors…
So be careful, especially if the sample is small!
2003
Bio 8102A Applied Multivariate Biostatistics L9.29
Université d’Ottawa / University of Ottawa
Caveats/assumptions: tests of Caveats/assumptions: tests of significancesignificance
Caveats/assumptions: tests of Caveats/assumptions: tests of significancesignificance
If stepwise (forward or backward) procedures are used, significance tests are biased because given enough variables, significant discriminant functions can be produced by chance alone.
In such cases, it is advisable to (1) test results with more standard analyses or (2) use randomization procedures whereby objects are randomly assigned to groups.
If stepwise (forward or backward) procedures are used, significance tests are biased because given enough variables, significant discriminant functions can be produced by chance alone.
In such cases, it is advisable to (1) test results with more standard analyses or (2) use randomization procedures whereby objects are randomly assigned to groups.
2003
Bio 8102A Applied Multivariate Biostatistics L9.30
Université d’Ottawa / University of Ottawa
Assessing classification accuracy I. Raw Assessing classification accuracy I. Raw classification resultsclassification resultsAssessing classification accuracy I. Raw Assessing classification accuracy I. Raw classification resultsclassification results
The derived discriminant functions are used to classify all objects in the sample, and a classification table is produced.
Classification accuracy is likely to be overestimated, since the data used to generate the DFs in the first place are themselves being classified.
The derived discriminant functions are used to classify all objects in the sample, and a classification table is produced.
Classification accuracy is likely to be overestimated, since the data used to generate the DFs in the first place are themselves being classified.
Group Total
Group 1 2
1 43 5 48
2 8 14 22
Total 51 19 70
Misclassification (G2) = 8/22
Misclassification (G1) = 5/48
Overallmisclassification = 13/70
2003
Bio 8102A Applied Multivariate Biostatistics L9.31
Université d’Ottawa / University of Ottawa
Assessing classification accuracy II. Assessing classification accuracy II. Jackknifed classificationJackknifed classificationAssessing classification accuracy II. Assessing classification accuracy II. Jackknifed classificationJackknifed classification
Discriminant functions are derived using N – 1 objects, and the Nth object is then classified.
This procedure is repeated for all N objects, each time leaving a different one out, and a classification table produced.
In general, jackknifed classification results are worse than raw classification results, but more reliable.
Discriminant functions are derived using N – 1 objects, and the Nth object is then classified.
This procedure is repeated for all N objects, each time leaving a different one out, and a classification table produced.
In general, jackknifed classification results are worse than raw classification results, but more reliable.
Group Total
Group 1 2
1 41 7 48
2 9 13 22
Total 51 19 70
Misclassification (G2) = 9/22
Misclassification (G1) = 7/48
Overallmisclassification = 16/70
2003
Bio 8102A Applied Multivariate Biostatistics L9.32
Université d’Ottawa / University of Ottawa
Assessing classification accuracy III. Assessing classification accuracy III. Data splittingData splittingAssessing classification accuracy III. Assessing classification accuracy III. Data splittingData splitting
Use 2/3 of sample data (randomly) selected to generate discriminant functions (learning set)
Use derived discriminant functions to classified other 1/3 (test set) and produce classification table.
In general, data-splitting classification results are worse than both raw and jackknifed classification results, but more reliable.
Use 2/3 of sample data (randomly) selected to generate discriminant functions (learning set)
Use derived discriminant functions to classified other 1/3 (test set) and produce classification table.
In general, data-splitting classification results are worse than both raw and jackknifed classification results, but more reliable.
Group Total
Group 1 2
1 40 8 48
2 9 13 22
Total 51 19 70
Misclassification (G2) = 9/22
Misclassification (G1) = 8/48
Overallmisclassification = 17/70
2003
Bio 8102A Applied Multivariate Biostatistics L9.33
Université d’Ottawa / University of Ottawa
Assessing classification accuracy IV. Assessing classification accuracy IV. Bootstrapped data splittingBootstrapped data splittingAssessing classification accuracy IV. Assessing classification accuracy IV. Bootstrapped data splittingBootstrapped data splitting
Use 2/3 of sample data (randomly sampled) to generate discriminant functions (learning set)
Use derived discriminant functions to classify other 1/3 (test set) and produce classification results.
Repeat a large number (e.g. 1000) times, each time sampling with replacement.
Generate classification statistics over bootstrapped samples, e.g. mean classification results, standard errors, etc.
Use 2/3 of sample data (randomly sampled) to generate discriminant functions (learning set)
Use derived discriminant functions to classify other 1/3 (test set) and produce classification results.
Repeat a large number (e.g. 1000) times, each time sampling with replacement.
Generate classification statistics over bootstrapped samples, e.g. mean classification results, standard errors, etc.
Group Total
Group 1 2
141.2
1.7
6.8
0 .648
29.3
0.5
12.7
1.122
Total 51 19 70
Misclassification (G2) = 42.3%
Misclassification (G1) = 14.2%
Overallmisclassification = 23.0%
2003
Bio 8102A Applied Multivariate Biostatistics L9.34
Université d’Ottawa / University of Ottawa
Interpreting discriminant functionsInterpreting discriminant functionsInterpreting discriminant functionsInterpreting discriminant functions
Examine standardized coefficients (coefficients of discriminant functions based on standardized values)
For interpretation, use variables with large absolute standardized coefficients.
Examine standardized coefficients (coefficients of discriminant functions based on standardized values)
For interpretation, use variables with large absolute standardized coefficients.
Examine the discriminant-variable correlations.
For interpretation, use variables with high correlations with important discriminant functions.
Examine the discriminant-variable correlations.
For interpretation, use variables with high correlations with important discriminant functions.
2003
Bio 8102A Applied Multivariate Biostatistics L9.35
Université d’Ottawa / University of Ottawa
Example: Example: Fisher’s Fisher’s
famous irisesfamous irises
Example: Example: Fisher’s Fisher’s
famous irisesfamous irises Data: four variables
(sepal length, sepal width, petal length, petal width), 3 species, N = 150 (50 for each species).
Problem: find the “best” set of DFs.
Data: four variables (sepal length, sepal width, petal length, petal width), 3 species, N = 150 (50 for each species).
Problem: find the “best” set of DFs.
SE
PA
LL
EN
SE
PA
L WI D
PE
TA
LL
EN
SEPALLEN
PE
TA
LWI D
SEPALWID PETALLEN PETALWID
SE
PA
LL
EN
SE
PA
L WI D
PE
TA
LL
EN
SEPALLEN
PE
TA
LWI D
SEPALWID PETALLEN PETALWID
SE
PA
LL
EN
SE
PA
L WI D
PE
TA
LL
EN
SEPALLEN
PE
TA
LWI D
SEPALWID PETALLEN PETALWID321
SPECIES
2003
Bio 8102A Applied Multivariate Biostatistics L9.36
Université d’Ottawa / University of Ottawa
Example: Fisher’s Example: Fisher’s famous irises: between-famous irises: between-
groups F-matrixgroups F-matrix
Example: Fisher’s Example: Fisher’s famous irises: between-famous irises: between-
groups F-matrixgroups F-matrix
Matrix entries are F – values from one-way MANOVA comparing group means, and can be considered measures of the distance between group centroids.
Do not use associated probabilities to determine “significance” unless you correct for multiple tests.
Matrix entries are F – values from one-way MANOVA comparing group means, and can be considered measures of the distance between group centroids.
Do not use associated probabilities to determine “significance” unless you correct for multiple tests.
Species
Species 1 2 3
1 0.0
2 550.2 0.0
3 1098.3 105.3 0.0Canonical Scores Plot
-10 -5 0 5 10FACTOR(1)
-10
-5
0
5
10
FAC
TO
R(2
)
321
SPECIES
2003
Bio 8102A Applied Multivariate Biostatistics L9.37
Université d’Ottawa / University of Ottawa
Example: Fisher’s Example: Fisher’s famous irises: famous irises:
canonical discriminant canonical discriminant functionsfunctions
Example: Fisher’s Example: Fisher’s famous irises: famous irises:
canonical discriminant canonical discriminant functionsfunctions
Four variables (sepal length, sepal width, petal length, petal width), 3 species, N = 150 (50 for each species).
Four variables (sepal length, sepal width, petal length, petal width), 3 species, N = 150 (50 for each species).
Canonical discriminant functions
1 2Constant 2.105 -6.661
SEPALLEN 0.829 0.024SEPALWID 1.534 2.165PETALLEN -2.201 -0.932
PETALWID -2.810 2.839
Note: discriminant functions are derivedusing equal priors.
2003
Bio 8102A Applied Multivariate Biostatistics L9.38
Université d’Ottawa / University of Ottawa
Example: Fisher’s Example: Fisher’s famous irises: famous irises: standardized standardized
canonical canonical discriminant discriminant
functionsfunctions
Example: Fisher’s Example: Fisher’s famous irises: famous irises: standardized standardized
canonical canonical discriminant discriminant
functionsfunctions Four variables
(sepal length, sepal width, petal length, petal width), 3 species, N = 150 (50 for each species).
Four variables (sepal length, sepal width, petal length, petal width), 3 species, N = 150 (50 for each species).
Standardized canonical discriminant
functions
1 2SEPALLEN 0.427 0.012SEPALWID 0.521 0.735PETALLEN -0.942 -0.401
PETALWID -2.810 0.581
Note: standardized canonical discriminant functions
are based on standardized values.
2003
Bio 8102A Applied Multivariate Biostatistics L9.39
Université d’Ottawa / University of Ottawa
Eigenvalues give amount of differences among groups captured by a a particular discriminant function, and cumulative proportion of dispersion is the corresponding proportion.
Eigenvalues give amount of differences among groups captured by a a particular discriminant function, and cumulative proportion of dispersion is the corresponding proportion.
Discriminant
function
Parameter 1 2
Eigenvalues 32.192 0.285
Canonical
correlation0.985 0.471
Cumulative proportion of
dispersion0.991 1.000
Canonical correlation is the correlation between a given canonical variate (DF) and a set of two dummy variables representing each group.
Canonical correlation is the correlation between a given canonical variate (DF) and a set of two dummy variables representing each group.
Canonical Scores Plot
-10 -5 0 5 10FACTOR(1)
-10
-5
0
5
10
FAC
TO
R(2
)321
SPECIES
2003
Bio 8102A Applied Multivariate Biostatistics L9.40
Université d’Ottawa / University of Ottawa
Fisher’s irises: raw and Fisher’s irises: raw and jackknifed classification jackknifed classification resultsresults
Fisher’s irises: raw and Fisher’s irises: raw and jackknifed classification jackknifed classification resultsresults In this case,
results are identical (a relatively rare occurrence!)
In this case, results are identical (a relatively rare occurrence!)
Species%
correct
Species 1 2 3
1 50 0 0 100
2 0 48 2 96
3 0 1 49 98
Total 50 49 51 98
Species%
correct
Species 1 2 3
1 50 0 0 100
2 0 48 2 96
3 0 1 49 98
Total 50 49 51 98
2003
Bio 8102A Applied Multivariate Biostatistics L9.41
Université d’Ottawa / University of Ottawa
Dicriminant function analysis: Dicriminant function analysis: caveats and notescaveats and notes
Dicriminant function analysis: Dicriminant function analysis: caveats and notescaveats and notes
Unless the ratio of number of objects/number of variables is large (> 20), standardized coefficients and correlations are unstable.
DFA is unaffected by differences among variables in scale, so standardization is not required (unlike PCA, FA, etc.)
Unless the ratio of number of objects/number of variables is large (> 20), standardized coefficients and correlations are unstable.
DFA is unaffected by differences among variables in scale, so standardization is not required (unlike PCA, FA, etc.)
Linear DFA is quite sensitive to the assumption of equality of covariance matrices among groups. If this assumption is violated, use quadratic classification.
However, quadratic DFA is more unstable when N is small and normality does not hold.
Linear DFA is quite sensitive to the assumption of equality of covariance matrices among groups. If this assumption is violated, use quadratic classification.
However, quadratic DFA is more unstable when N is small and normality does not hold.