41
2003 Bio 8102A Applied Multivariate Biostatistics L9.1 Université d’Ottawa / University of Ottawa Lecture 9: Discriminant Lecture 9: Discriminant function analysis (DFA) function analysis (DFA) Rationale and use of DFA The underlying model (What is a discriminant function anyway?) Finding discriminant functions: principles and procedures Linear versus quadratic discriminant functions Significance testing Rotating discriminant functions Component retention, significance, and reliability.

Lecture 9: Discriminant function analysis (DFA)

  • Upload
    kasia

  • View
    60

  • Download
    0

Embed Size (px)

DESCRIPTION

Rationale and use of DFA The underlying model (What is a discriminant function anyway?) Finding discriminant functions: principles and procedures. Linear versus quadratic discriminant functions Significance testing Rotating discriminant functions - PowerPoint PPT Presentation

Citation preview

Page 1: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.1

Université d’Ottawa / University of Ottawa

Lecture 9: Discriminant function Lecture 9: Discriminant function analysis (DFA)analysis (DFA)

Lecture 9: Discriminant function Lecture 9: Discriminant function analysis (DFA)analysis (DFA)

Rationale and use of DFA

The underlying model (What is a discriminant function anyway?)

Finding discriminant functions: principles and procedures

Rationale and use of DFA

The underlying model (What is a discriminant function anyway?)

Finding discriminant functions: principles and procedures

Linear versus quadratic discriminant functions

Significance testing Rotating discriminant

functions Component retention,

significance, and reliability.

Linear versus quadratic discriminant functions

Significance testing Rotating discriminant

functions Component retention,

significance, and reliability.

Page 2: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.2

Université d’Ottawa / University of Ottawa

What is discriminant function What is discriminant function analysis?analysis?

What is discriminant function What is discriminant function analysis?analysis?

Given a set of p variables X1, X2,…, Xp,

and a set of N objects belonging to m known groups (classes) G1, G2,…, Gm , we try and construct a set of functions Z1, Z2,…, Zmin{m-1,p} that allow us to classify each object correctly.

Given a set of p variables X1, X2,…, Xp,

and a set of N objects belonging to m known groups (classes) G1, G2,…, Gm , we try and construct a set of functions Z1, Z2,…, Zmin{m-1,p} that allow us to classify each object correctly.

The hope (sometimes faint) is that “good” classification results (i.e., low misclassification rate, high reliability) will be obtained through a relatively small set of simple functions.

The hope (sometimes faint) is that “good” classification results (i.e., low misclassification rate, high reliability) will be obtained through a relatively small set of simple functions.

Page 3: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.3

Université d’Ottawa / University of Ottawa

What is a discriminant What is a discriminant function anyway?function anyway?

What is a discriminant What is a discriminant function anyway?function anyway?

A discriminant function is a function:

which maximizes the “separation” between the groups under consideration, or (more technically) maximizes the ratio of between group/within group variation.

A discriminant function is a function:

which maximizes the “separation” between the groups under consideration, or (more technically) maximizes the ratio of between group/within group variation.

1( , , )i i pZ f X X 1Z

2Z

Group 1Group 2

Group 1 Group 2

(not so good)

(better)

Fre

qu

ency

Page 4: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.4

Université d’Ottawa / University of Ottawa

The linear discriminant modelThe linear discriminant modelThe linear discriminant modelThe linear discriminant model

For a set of p variables X1, X2,…, Xp, the general model is

where the Xjs are the original variables and the aijs are the discriminant function coefficients.

For a set of p variables X1, X2,…, Xp, the general model is

where the Xjs are the original variables and the aijs are the discriminant function coefficients.

Note: unlike in PCA and FA, the discriminant functions are based on the raw (unstandardized) variables, since the resulting classifications are unaffected by scale.

For p variables and m groups, the maximum number of DFs is min{p, m-1}.

Note: unlike in PCA and FA, the discriminant functions are based on the raw (unstandardized) variables, since the resulting classifications are unaffected by scale.

For p variables and m groups, the maximum number of DFs is min{p, m-1}.

1

p

i ij jj

Z a X

Page 5: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.5

Université d’Ottawa / University of Ottawa

The geometry of a single linear The geometry of a single linear discriminant functiondiscriminant function

The geometry of a single linear The geometry of a single linear discriminant functiondiscriminant function

2 groups with measurements of two variables (X1 and X2) on each object.

In this case, the linear DF Z* results in no misclassifications, whereas another possible DF (Z) gives two misclassifications.

2 groups with measurements of two variables (X1 and X2) on each object.

In this case, the linear DF Z* results in no misclassifications, whereas another possible DF (Z) gives two misclassifications. Group 1

Group 2

X2

X1

Z

Z*

Misclassified under Zbut not under Z*

Page 6: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.6

Université d’Ottawa / University of Ottawa

Finding discriminant Finding discriminant functions: principlesfunctions: principlesFinding discriminant Finding discriminant functions: principlesfunctions: principles

The first discriminant function is that which maximizes the differences between groups compared to the differences within groups…

…which is equivalent to maximizing F in a one-way ANOVA.

The first discriminant function is that which maximizes the differences between groups compared to the differences within groups…

…which is equivalent to maximizing F in a one-way ANOVA.

a = (a1,…, ap)

F(Z

) Z1

1

( )( ) ,

( )

max{ ( )}

B

W

MS ZF Z

MS Z

Z F Z

Page 7: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.7

Université d’Ottawa / University of Ottawa

Finding discriminant Finding discriminant functions: principlesfunctions: principlesFinding discriminant Finding discriminant functions: principlesfunctions: principles

The second discriminant function is that which maximizes the differences between groups compared to the differences within groups unaccounted for by Z1...

…which is equivalent to maximizing F in a one-way ANOVA given the constraint that Z1, Z2 are uncorrelated.

The second discriminant function is that which maximizes the differences between groups compared to the differences within groups unaccounted for by Z1...

…which is equivalent to maximizing F in a one-way ANOVA given the constraint that Z1, Z2 are uncorrelated.

a = (a1,…, ap)

F(Z

) Z2

1, 22

( )( ) ,

( )

max{ ( ) 0}

B

W

Z Z

MS ZF Z

MS Z

Z F Z r

Page 8: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.8

Université d’Ottawa / University of Ottawa

The geometry of several linear The geometry of several linear discriminant functionsdiscriminant functionsThe geometry of several linear The geometry of several linear discriminant functionsdiscriminant functions

2 groups with measurements of two variables (X1 and X2) on each individual.

Using only Z1, 4 objects are misclassified, whereas using both Z1 and Z2, only one object is misclassified.

2 groups with measurements of two variables (X1 and X2) on each individual.

Using only Z1, 4 objects are misclassified, whereas using both Z1 and Z2, only one object is misclassified. X2

X1

Z1

Group 1

Group 2

Z2

Misclassified using bothZ1 and Z2

Misclassified using onlyZ1

Page 9: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.9

Université d’Ottawa / University of Ottawa

SSCP matrices: SSCP matrices: within, between, and within, between, and

totaltotal

SSCP matrices: SSCP matrices: within, between, and within, between, and

totaltotal The total (T) SSCP matrix

(based on p variables X1, X2,…, Xp ) in a sample of objects belonging to m groups G1, G2,…, Gm with sizes n1, n2,…, nm can be partitioned into within-groups (W) and between-groups (B) SSCP matrices:

The total (T) SSCP matrix (based on p variables X1, X2,…, Xp ) in a sample of objects belonging to m groups G1, G2,…, Gm with sizes n1, n2,…, nm can be partitioned into within-groups (W) and between-groups (B) SSCP matrices:

ijkx

jkx

kx

1 1

1 1

( )( )

( )( )

j

j

nm

rc ijr r ijc cj i

nm

rc ijr jr ijc jcj i

t x x x x

w x x x x

Value of variable Xk forith observation in group j

Mean of variable Xk forgroup j

Overall mean of variable Xk

T B W

,rc rct w Element in row r andcolumn c of total (T, t) and within (W, w) SSCP

Page 10: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.10

Université d’Ottawa / University of Ottawa

Finding discriminant functions: Finding discriminant functions: analytic proceduresanalytic proceduresFinding discriminant functions: Finding discriminant functions: analytic proceduresanalytic procedures

Calculate total (T), within (W) and between (W) SSCPs.

Determine eigenvalues and eigenvectors of the product W-1 B.

is ratio of between to within

SSs for the ith discriminant function Zi…

…and the elements of the corresponding eigenvectors are the discriminant function coefficients.

Calculate total (T), within (W) and between (W) SSCPs.

Determine eigenvalues and eigenvectors of the product W-1 B.

is ratio of between to within

SSs for the ith discriminant function Zi…

…and the elements of the corresponding eigenvectors are the discriminant function coefficients.

T B W

11 2( ) ( , , , )p B W

( )

( )B i

iW i

SS Z

SS Z

11 2( ) ( , , , )i i i ipa a a B W

Page 11: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.11

Université d’Ottawa / University of Ottawa

AssumptionsAssumptionsAssumptionsAssumptions

Equality of within-group covariance matrices (C1 = C2 = ...) implies that each element of C1 is equal to the corresponding element in C2 , etc.

Equality of within-group covariance matrices (C1 = C2 = ...) implies that each element of C1 is equal to the corresponding element in C2 , etc.

Variable X1 X2 X3

X1 s2

X2 c21 s2

X3 c31 c32 s2

Variance

Covariance

Variable X1 X2 X3

X1 s2

X2 c21 s2

X3 c31 c32 s2

G1

G2

Page 12: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.12

Université d’Ottawa / University of Ottawa

The quadratic The quadratic discriminant modeldiscriminant model

The quadratic The quadratic discriminant modeldiscriminant model

For a set of p variables X1, X2,…, Xp, the general quadratic model is

where the Xjs are the original variables and the aijs are the linear coefficients and the bijs are the 2nd order coefficients.

For a set of p variables X1, X2,…, Xp, the general quadratic model is

where the Xjs are the original variables and the aijs are the linear coefficients and the bijs are the 2nd order coefficients.

Because the quadratic model involves many more parameters, sample sizes must be considerably larger to get reasonably stable estimates of coefficients.

Because the quadratic model involves many more parameters, sample sizes must be considerably larger to get reasonably stable estimates of coefficients.

1

p

i ij j ij i jj

Z a X b X X

X2

X1

Group 1

Group 2

Quadratic Z1

Linear Z1

Page 13: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.13

Université d’Ottawa / University of Ottawa

Fitting discriminant function Fitting discriminant function models: the problemsmodels: the problems

Goal: find the “best” model, given the available data Problem 1: what is “best”? Problem 2: even if “best” is defined, by what

method do we find it? Possibilities:

If there are m variables, we might compute DFs using all possible subsets (2m -1) of variables models and choose the best one

use some procedure for winnowing down the set of possible models.

Page 14: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.14

Université d’Ottawa / University of Ottawa

Criteria for choosing the “best” Criteria for choosing the “best” discriminant modeldiscriminant model

Criteria for choosing the “best” Criteria for choosing the “best” discriminant modeldiscriminant model

Discriminating ability: better models are better able to distinguish among groups

Implication: better models will have lower misclassification rates.

N.B. Raw misclassification rates can be very misleading.

Discriminating ability: better models are better able to distinguish among groups

Implication: better models will have lower misclassification rates.

N.B. Raw misclassification rates can be very misleading.

Parsimony: a discriminant model which includes fewer variables is better than one with more variables.

Implication: if the elimination/addition of a variable does not significantly increase/decrease the misclassification rate, it may not be very useful.

Parsimony: a discriminant model which includes fewer variables is better than one with more variables.

Implication: if the elimination/addition of a variable does not significantly increase/decrease the misclassification rate, it may not be very useful.

Page 15: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.15

Université d’Ottawa / University of Ottawa

Criteria for choosing the “best” Criteria for choosing the “best” discriminant model (cont’d)discriminant model (cont’d)

Criteria for choosing the “best” Criteria for choosing the “best” discriminant model (cont’d)discriminant model (cont’d)

Model stability: better models have coefficients that are stable as judged through cross-validation.

Procedure: Judge stability through cross-validation (jackknifing, bootstrapping).

Model stability: better models have coefficients that are stable as judged through cross-validation.

Procedure: Judge stability through cross-validation (jackknifing, bootstrapping).

NB.1. In general, linear discriminant functions will be more stable than quadratic functions, especially if the sample is small.

NB.2. If the sample is small, then ”outliers” may dramatically decrease model stability.

NB.1. In general, linear discriminant functions will be more stable than quadratic functions, especially if the sample is small.

NB.2. If the sample is small, then ”outliers” may dramatically decrease model stability.

Page 16: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.16

Université d’Ottawa / University of Ottawa

Fitting discriminant function Fitting discriminant function models: the problemsmodels: the problems

Goal: find the “best” model, given the available data Problem 1: what is “best”? Problem 2: even if “best” is defined, by what

method do we find it? Possibilities:

If there are m variables, we might compute DFs using all possible subsets (2m -1) of variables models and choose the best one

use some procedure for winnowing down the set of possible models.

Page 17: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.17

Université d’Ottawa / University of Ottawa

Analytic procedures: general Analytic procedures: general approachapproach

Analytic procedures: general Analytic procedures: general approachapproach

Evaluate significance of a variable (Xi) in DF by computing the difference in group resolution between two models, one with the variable included, the other with it excluded.

Evaluate change in discriminating ability ( DA) associated with inclusion of the variable in question

Unfortunately, change in discriminating ability may depend on what other variables are in model!

Evaluate significance of a variable (Xi) in DF by computing the difference in group resolution between two models, one with the variable included, the other with it excluded.

Evaluate change in discriminating ability ( DA) associated with inclusion of the variable in question

Unfortunately, change in discriminating ability may depend on what other variables are in model!

Model A(Xi in)

Model B(Xi out)

DA

Delete Xi

( small)

Retain Xi

( large)

Page 18: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.18

Université d’Ottawa / University of Ottawa

Strategy I: computing all possible Strategy I: computing all possible modelsmodels

Strategy I: computing all possible Strategy I: computing all possible modelsmodels

compute all possible models and choose the “best” one.

Impractical unless number of variables is relatively small.

compute all possible models and choose the “best” one.

Impractical unless number of variables is relatively small.

{X1, X2, X3}

{X2}

{X1}

{X3}

{X1, X2}

{X2, X3}

{X1, X3}

{X1, X2, X3}

Page 19: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.19

Université d’Ottawa / University of Ottawa

Strategy II: Strategy II: forward selectionforward selection

Strategy II: Strategy II: forward selectionforward selection start with variable for which

differences among group means are the largest (largest F-value)

add others one at a time based on F to enter (p to enter) until no further significant increase in discriminating ability is achieved.

problem: if Xj is included, it stays in even if it contributes little to discriminating ability

once other variables are included.

start with variable for which differences among group means are the largest (largest F-value)

add others one at a time based on F to enter (p to enter) until no further significant increase in discriminating ability is achieved.

problem: if Xj is included, it stays in even if it contributes little to discriminating ability

once other variables are included.

F2 > F1, F3, F4

F2 > F to enter (< p to enter)

F1 > F3 , F4

F1 > F to enter (< p to enter)

(X1, X2, X3, X4 ) All variables

(X2)

(X1, X2)

(X1, X2)

F4 > F3; F4< F to enter (> p to enter)

Final model

Page 20: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.20

Université d’Ottawa / University of Ottawa

What is What is FF to enter/remove ( to enter/remove (pp to to enter/remove) anyway?enter/remove) anyway?

What is What is FF to enter/remove ( to enter/remove (pp to to enter/remove) anyway?enter/remove) anyway?

When no variables are in the model, F to enter is the F-value from a univariate one-way ANOVA comparing group means with respect to the variable in question, and p to enter is the Type I probability associated with the null that all group means are equal.

When no variables are in the model, F to enter is the F-value from a univariate one-way ANOVA comparing group means with respect to the variable in question, and p to enter is the Type I probability associated with the null that all group means are equal.

When other variables are in the model, F to enter corresponds to the F-value for an ANCOVA comparing group means with respect to the variable in question, where the covariates are the variables already entered.

When other variables are in the model, F to enter corresponds to the F-value for an ANCOVA comparing group means with respect to the variable in question, where the covariates are the variables already entered.

Page 21: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.21

Université d’Ottawa / University of Ottawa

Strategy III: Strategy III: backward selectionbackward selection

Strategy III: Strategy III: backward selectionbackward selection

Start with all variables and drop that for which differences among group means are the smallest (smallest F-value)

Delete others one at a time based on F to remove (p to remove) until further removal results in a significant reduction in the ability to discriminate groups.

problem: if Xj is excluded, it stays out even if it contributes substantially to discriminating ability once other variables are excluded.

Start with all variables and drop that for which differences among group means are the smallest (smallest F-value)

Delete others one at a time based on F to remove (p to remove) until further removal results in a significant reduction in the ability to discriminate groups.

problem: if Xj is excluded, it stays out even if it contributes substantially to discriminating ability once other variables are excluded.

F2 < F1, F3, F4

F2 < F to remove (> p to remove)

F1 < F3 , F4

F1 < F to remove (> p to remove)

(X1, X2, X3, X4 ) All variables in

(X3, X4)

(X3, X4)

F4 < F3; F4 > F to remove (< p to remove)

Final model

(X1, X3, X4 )

Page 22: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.22

Université d’Ottawa / University of Ottawa

Canonical scoresCanonical scoresCanonical scoresCanonical scores

Because discriminant functions are functions, we can “plug in” the values for each variable for each observation, and calculate a canonical score for each observation and each discriminant function.

Because discriminant functions are functions, we can “plug in” the values for each variable for each observation, and calculate a canonical score for each observation and each discriminant function.

Observation X1 X2

1 3.7 11.5

2 2.3 10.2

..

..a

11

12

21

22

.027(3.7) 0.97(11.5)

0.92(3.7) 0.39(11.5)

.027(2.3) 0.97(10.2)

0.92(2.3) 0.39(10.2)

Z

Z

Z

Z

Page 23: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.23

Université d’Ottawa / University of Ottawa

Canonical scores Canonical scores plotsplots

Canonical scores Canonical scores plotsplots

Plots of canonical scores for each object.

The better the model, the greater the separation between clouds of points representing individual groups, e.g. Fisher’s famous irises.

Plots of canonical scores for each object.

The better the model, the greater the separation between clouds of points representing individual groups, e.g. Fisher’s famous irises.

Canonical Scores Plot

-10 -5 0 5 10FACTOR(1)

-10

-5

0

5

10

FAC

TO

R(2

)

321

SPECIES

Canonical scores of group means

1 21 7.608 0.2152 -1.825 -0.728

3 -5.783 0.513

95% confidenceellipse

Page 24: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.24

Université d’Ottawa / University of Ottawa

PriorsPriorsPriorsPriors In standard DFA, it is

assumed that in the absence of any information, the a priori (prior) probability i of a

given object belonging to one of i = 1,…,m groups is the same for all groups:

In standard DFA, it is assumed that in the absence of any information, the a priori (prior) probability i of a

given object belonging to one of i = 1,…,m groups is the same for all groups:

But, if each group is not equally likely, then priors should be adjusted so as to reflect this bias.

E.g. in species with biased sex-ratios, males and females should have unequal priors.

But, if each group is not equally likely, then priors should be adjusted so as to reflect this bias.

E.g. in species with biased sex-ratios, males and females should have unequal priors.

1i m

Page 25: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.25

Université d’Ottawa / University of Ottawa

Caveats: unequal priorsCaveats: unequal priorsCaveats: unequal priorsCaveats: unequal priors

For a given set of discriminant functions, misclassification rates will usually depend on the priors…

…so that artificially low misclassification rates can be obtained simply by strategically adjusting the priors.

For a given set of discriminant functions, misclassification rates will usually depend on the priors…

…so that artificially low misclassification rates can be obtained simply by strategically adjusting the priors.

So, only adjust priors if you are confident that the true frequency of each group in the population is (reasonably) accurately estimated by the group frequencies in the sample.

So, only adjust priors if you are confident that the true frequency of each group in the population is (reasonably) accurately estimated by the group frequencies in the sample.

Page 26: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.26

Université d’Ottawa / University of Ottawa

Significance testingSignificance testingSignificance testingSignificance testing

Question: which discriminant functions are statistically “significant”?

For testing significance of all r DFs for m groups based on p variables, calculate Bartlett’s V and compare to 2 distribution with p(m-1) degrees of freedom

Question: which discriminant functions are statistically “significant”?

For testing significance of all r DFs for m groups based on p variables, calculate Bartlett’s V and compare to 2 distribution with p(m-1) degrees of freedom

1

11 ( )2

ln(1 )r

ii

V N p m

i Eigenvalue associatedwith ith discriminantfunction

Page 27: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.27

Université d’Ottawa / University of Ottawa

Significance testing (cont’d)Significance testing (cont’d)Significance testing (cont’d)Significance testing (cont’d) Each DF is tested in a

hierarchical fashion by first testing significance of all DFs combined.

If all DFs combined not significant, then no DF is significant.

If all DFs combined are significant, then remove first DF and recalculate V (= V1) and test.

Continue until residual Vj no longer significant at df = (p – j)(m – j - 1)

Each DF is tested in a hierarchical fashion by first testing significance of all DFs combined.

If all DFs combined not significant, then no DF is significant.

If all DFs combined are significant, then remove first DF and recalculate V (= V1) and test.

Continue until residual Vj no longer significant at df = (p – j)(m – j - 1)

1

12

23

11 ( ) ln(1 )2

11 ( 1) ln(1 )2

11 ( 2) ln(1 )2

r

ii

r

ii

r

ii

V N p m

V N p m

V N p m

11 ( ) ln(1 )2

r

j ii j

V N p m j

Page 28: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.28

Université d’Ottawa / University of Ottawa

Caveats/assumptions: tests of Caveats/assumptions: tests of significancesignificance

Caveats/assumptions: tests of Caveats/assumptions: tests of significancesignificance

Tests of significance assume that within-group covariance matrices are the same for all groups, and that within groups, observations have a multivariate normal distribution

Tests of significance can be very misleading because jth discriminant function in the population may not appear as jth discriminant function in the sample due to sampling errors…

So be careful, especially if the sample is small!

Tests of significance assume that within-group covariance matrices are the same for all groups, and that within groups, observations have a multivariate normal distribution

Tests of significance can be very misleading because jth discriminant function in the population may not appear as jth discriminant function in the sample due to sampling errors…

So be careful, especially if the sample is small!

Page 29: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.29

Université d’Ottawa / University of Ottawa

Caveats/assumptions: tests of Caveats/assumptions: tests of significancesignificance

Caveats/assumptions: tests of Caveats/assumptions: tests of significancesignificance

If stepwise (forward or backward) procedures are used, significance tests are biased because given enough variables, significant discriminant functions can be produced by chance alone.

In such cases, it is advisable to (1) test results with more standard analyses or (2) use randomization procedures whereby objects are randomly assigned to groups.

If stepwise (forward or backward) procedures are used, significance tests are biased because given enough variables, significant discriminant functions can be produced by chance alone.

In such cases, it is advisable to (1) test results with more standard analyses or (2) use randomization procedures whereby objects are randomly assigned to groups.

Page 30: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.30

Université d’Ottawa / University of Ottawa

Assessing classification accuracy I. Raw Assessing classification accuracy I. Raw classification resultsclassification resultsAssessing classification accuracy I. Raw Assessing classification accuracy I. Raw classification resultsclassification results

The derived discriminant functions are used to classify all objects in the sample, and a classification table is produced.

Classification accuracy is likely to be overestimated, since the data used to generate the DFs in the first place are themselves being classified.

The derived discriminant functions are used to classify all objects in the sample, and a classification table is produced.

Classification accuracy is likely to be overestimated, since the data used to generate the DFs in the first place are themselves being classified.

Group Total

Group 1 2

1 43 5 48

2 8 14 22

Total 51 19 70

Misclassification (G2) = 8/22

Misclassification (G1) = 5/48

Overallmisclassification = 13/70

Page 31: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.31

Université d’Ottawa / University of Ottawa

Assessing classification accuracy II. Assessing classification accuracy II. Jackknifed classificationJackknifed classificationAssessing classification accuracy II. Assessing classification accuracy II. Jackknifed classificationJackknifed classification

Discriminant functions are derived using N – 1 objects, and the Nth object is then classified.

This procedure is repeated for all N objects, each time leaving a different one out, and a classification table produced.

In general, jackknifed classification results are worse than raw classification results, but more reliable.

Discriminant functions are derived using N – 1 objects, and the Nth object is then classified.

This procedure is repeated for all N objects, each time leaving a different one out, and a classification table produced.

In general, jackknifed classification results are worse than raw classification results, but more reliable.

Group Total

Group 1 2

1 41 7 48

2 9 13 22

Total 51 19 70

Misclassification (G2) = 9/22

Misclassification (G1) = 7/48

Overallmisclassification = 16/70

Page 32: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.32

Université d’Ottawa / University of Ottawa

Assessing classification accuracy III. Assessing classification accuracy III. Data splittingData splittingAssessing classification accuracy III. Assessing classification accuracy III. Data splittingData splitting

Use 2/3 of sample data (randomly) selected to generate discriminant functions (learning set)

Use derived discriminant functions to classified other 1/3 (test set) and produce classification table.

In general, data-splitting classification results are worse than both raw and jackknifed classification results, but more reliable.

Use 2/3 of sample data (randomly) selected to generate discriminant functions (learning set)

Use derived discriminant functions to classified other 1/3 (test set) and produce classification table.

In general, data-splitting classification results are worse than both raw and jackknifed classification results, but more reliable.

Group Total

Group 1 2

1 40 8 48

2 9 13 22

Total 51 19 70

Misclassification (G2) = 9/22

Misclassification (G1) = 8/48

Overallmisclassification = 17/70

Page 33: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.33

Université d’Ottawa / University of Ottawa

Assessing classification accuracy IV. Assessing classification accuracy IV. Bootstrapped data splittingBootstrapped data splittingAssessing classification accuracy IV. Assessing classification accuracy IV. Bootstrapped data splittingBootstrapped data splitting

Use 2/3 of sample data (randomly sampled) to generate discriminant functions (learning set)

Use derived discriminant functions to classify other 1/3 (test set) and produce classification results.

Repeat a large number (e.g. 1000) times, each time sampling with replacement.

Generate classification statistics over bootstrapped samples, e.g. mean classification results, standard errors, etc.

Use 2/3 of sample data (randomly sampled) to generate discriminant functions (learning set)

Use derived discriminant functions to classify other 1/3 (test set) and produce classification results.

Repeat a large number (e.g. 1000) times, each time sampling with replacement.

Generate classification statistics over bootstrapped samples, e.g. mean classification results, standard errors, etc.

Group Total

Group 1 2

141.2

1.7

6.8

0 .648

29.3

0.5

12.7

1.122

Total 51 19 70

Misclassification (G2) = 42.3%

Misclassification (G1) = 14.2%

Overallmisclassification = 23.0%

Page 34: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.34

Université d’Ottawa / University of Ottawa

Interpreting discriminant functionsInterpreting discriminant functionsInterpreting discriminant functionsInterpreting discriminant functions

Examine standardized coefficients (coefficients of discriminant functions based on standardized values)

For interpretation, use variables with large absolute standardized coefficients.

Examine standardized coefficients (coefficients of discriminant functions based on standardized values)

For interpretation, use variables with large absolute standardized coefficients.

Examine the discriminant-variable correlations.

For interpretation, use variables with high correlations with important discriminant functions.

Examine the discriminant-variable correlations.

For interpretation, use variables with high correlations with important discriminant functions.

Page 35: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.35

Université d’Ottawa / University of Ottawa

Example: Example: Fisher’s Fisher’s

famous irisesfamous irises

Example: Example: Fisher’s Fisher’s

famous irisesfamous irises Data: four variables

(sepal length, sepal width, petal length, petal width), 3 species, N = 150 (50 for each species).

Problem: find the “best” set of DFs.

Data: four variables (sepal length, sepal width, petal length, petal width), 3 species, N = 150 (50 for each species).

Problem: find the “best” set of DFs.

SE

PA

LL

EN

SE

PA

L WI D

PE

TA

LL

EN

SEPALLEN

PE

TA

LWI D

SEPALWID PETALLEN PETALWID

SE

PA

LL

EN

SE

PA

L WI D

PE

TA

LL

EN

SEPALLEN

PE

TA

LWI D

SEPALWID PETALLEN PETALWID

SE

PA

LL

EN

SE

PA

L WI D

PE

TA

LL

EN

SEPALLEN

PE

TA

LWI D

SEPALWID PETALLEN PETALWID321

SPECIES

Page 36: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.36

Université d’Ottawa / University of Ottawa

Example: Fisher’s Example: Fisher’s famous irises: between-famous irises: between-

groups F-matrixgroups F-matrix

Example: Fisher’s Example: Fisher’s famous irises: between-famous irises: between-

groups F-matrixgroups F-matrix

Matrix entries are F – values from one-way MANOVA comparing group means, and can be considered measures of the distance between group centroids.

Do not use associated probabilities to determine “significance” unless you correct for multiple tests.

Matrix entries are F – values from one-way MANOVA comparing group means, and can be considered measures of the distance between group centroids.

Do not use associated probabilities to determine “significance” unless you correct for multiple tests.

Species

Species 1 2 3

1 0.0

2 550.2 0.0

3 1098.3 105.3 0.0Canonical Scores Plot

-10 -5 0 5 10FACTOR(1)

-10

-5

0

5

10

FAC

TO

R(2

)

321

SPECIES

Page 37: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.37

Université d’Ottawa / University of Ottawa

Example: Fisher’s Example: Fisher’s famous irises: famous irises:

canonical discriminant canonical discriminant functionsfunctions

Example: Fisher’s Example: Fisher’s famous irises: famous irises:

canonical discriminant canonical discriminant functionsfunctions

Four variables (sepal length, sepal width, petal length, petal width), 3 species, N = 150 (50 for each species).

Four variables (sepal length, sepal width, petal length, petal width), 3 species, N = 150 (50 for each species).

Canonical discriminant functions

1 2Constant 2.105 -6.661

SEPALLEN 0.829 0.024SEPALWID 1.534 2.165PETALLEN -2.201 -0.932

PETALWID -2.810 2.839

Note: discriminant functions are derivedusing equal priors.

Page 38: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.38

Université d’Ottawa / University of Ottawa

Example: Fisher’s Example: Fisher’s famous irises: famous irises: standardized standardized

canonical canonical discriminant discriminant

functionsfunctions

Example: Fisher’s Example: Fisher’s famous irises: famous irises: standardized standardized

canonical canonical discriminant discriminant

functionsfunctions Four variables

(sepal length, sepal width, petal length, petal width), 3 species, N = 150 (50 for each species).

Four variables (sepal length, sepal width, petal length, petal width), 3 species, N = 150 (50 for each species).

Standardized canonical discriminant

functions

1 2SEPALLEN 0.427 0.012SEPALWID 0.521 0.735PETALLEN -0.942 -0.401

PETALWID -2.810 0.581

Note: standardized canonical discriminant functions

are based on standardized values.

Page 39: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.39

Université d’Ottawa / University of Ottawa

Eigenvalues give amount of differences among groups captured by a a particular discriminant function, and cumulative proportion of dispersion is the corresponding proportion.

Eigenvalues give amount of differences among groups captured by a a particular discriminant function, and cumulative proportion of dispersion is the corresponding proportion.

Discriminant

function

Parameter 1 2

Eigenvalues 32.192 0.285

Canonical

correlation0.985 0.471

Cumulative proportion of

dispersion0.991 1.000

Canonical correlation is the correlation between a given canonical variate (DF) and a set of two dummy variables representing each group.

Canonical correlation is the correlation between a given canonical variate (DF) and a set of two dummy variables representing each group.

Canonical Scores Plot

-10 -5 0 5 10FACTOR(1)

-10

-5

0

5

10

FAC

TO

R(2

)321

SPECIES

Page 40: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.40

Université d’Ottawa / University of Ottawa

Fisher’s irises: raw and Fisher’s irises: raw and jackknifed classification jackknifed classification resultsresults

Fisher’s irises: raw and Fisher’s irises: raw and jackknifed classification jackknifed classification resultsresults In this case,

results are identical (a relatively rare occurrence!)

In this case, results are identical (a relatively rare occurrence!)

Species%

correct

Species 1 2 3

1 50 0 0 100

2 0 48 2 96

3 0 1 49 98

Total 50 49 51 98

Species%

correct

Species 1 2 3

1 50 0 0 100

2 0 48 2 96

3 0 1 49 98

Total 50 49 51 98

Page 41: Lecture 9: Discriminant function analysis (DFA)

2003

Bio 8102A Applied Multivariate Biostatistics L9.41

Université d’Ottawa / University of Ottawa

Dicriminant function analysis: Dicriminant function analysis: caveats and notescaveats and notes

Dicriminant function analysis: Dicriminant function analysis: caveats and notescaveats and notes

Unless the ratio of number of objects/number of variables is large (> 20), standardized coefficients and correlations are unstable.

DFA is unaffected by differences among variables in scale, so standardization is not required (unlike PCA, FA, etc.)

Unless the ratio of number of objects/number of variables is large (> 20), standardized coefficients and correlations are unstable.

DFA is unaffected by differences among variables in scale, so standardization is not required (unlike PCA, FA, etc.)

Linear DFA is quite sensitive to the assumption of equality of covariance matrices among groups. If this assumption is violated, use quadratic classification.

However, quadratic DFA is more unstable when N is small and normality does not hold.

Linear DFA is quite sensitive to the assumption of equality of covariance matrices among groups. If this assumption is violated, use quadratic classification.

However, quadratic DFA is more unstable when N is small and normality does not hold.