29
Lecture 9: ANOVA tables F-tests BMTRY 701 Biostatistical Methods II

Lecture 9: ANOVA tables F-tests

  • Upload
    kenley

  • View
    52

  • Download
    0

Embed Size (px)

DESCRIPTION

Lecture 9: ANOVA tables F-tests. BMTRY 701 Biostatistical Methods II. ANOVA. Analysis of Variance Similar in derivation to ANOVA that is generalization of two-sample t-test Partitioning of variance into several parts that due to the ‘model’: SSR that due to ‘error’: SSE - PowerPoint PPT Presentation

Citation preview

Page 1: Lecture 9: ANOVA tables F-tests

Lecture 9:ANOVA tablesF-tests

BMTRY 701Biostatistical Methods II

Page 2: Lecture 9: ANOVA tables F-tests

ANOVA

Analysis of Variance Similar in derivation to ANOVA that is

generalization of two-sample t-test Partitioning of variance into several parts

• that due to the ‘model’: SSR• that due to ‘error’: SSE

The sum of the two parts is the total sum of squares: SST

Page 3: Lecture 9: ANOVA tables F-tests

Total Deviations:

0 200 400 600 800

2.0

2.2

2.4

2.6

2.8

3.0

data$BEDS

da

ta$

log

LO

S

YYi

Page 4: Lecture 9: ANOVA tables F-tests

Regression Deviations:

0 200 400 600 800

2.0

2.2

2.4

2.6

2.8

3.0

data$BEDS

da

ta$

log

LO

S

YYi ˆ

Page 5: Lecture 9: ANOVA tables F-tests

Error Deviations:

0 200 400 600 800

2.0

2.2

2.4

2.6

2.8

3.0

data$BEDS

da

ta$

log

LO

S

ii YY ˆ

Page 6: Lecture 9: ANOVA tables F-tests

Definitions

SSESSRSST

YYSSE

YYSSR

YYSST

i

i

i

2

2

2

)ˆ(

)ˆ(

)(

iiii YYYYYY ˆˆ

Page 7: Lecture 9: ANOVA tables F-tests

Example: logLOS ~ BEDS

> ybar <- mean(data$logLOS)> yhati <- reg$fitted.values> sst <- sum((data$logLOS- ybar)^2)> ssr <- sum((yhati - ybar )^2)> sse <- sum((data$logLOS - yhati)^2)> > sst[1] 3.547454> ssr[1] 0.6401715> sse[1] 2.907282> sse+ssr[1] 3.547454>

Page 8: Lecture 9: ANOVA tables F-tests

Degrees of Freedom

Degrees of freedom for SST: n - 1• one df is lost because it is used to estimate mean Y

Degrees of freedom for SSR: 1• only one df because all estimates are based on same

fitted regression line

Degrees of freedom for SSE: n - 2• two lost due to estimating regression line (slope and

intercept)

Page 9: Lecture 9: ANOVA tables F-tests

Mean Squares

“Scaled” version of Sum of Squares Mean Square = SS/df

MSR = SSR/1

MSE = SSE/(n-2) Notes:

• mean squares are not additive! That is, MSR + MSE ≠SST/(n-1)

• MSE is the same as we saw previously

Page 10: Lecture 9: ANOVA tables F-tests

Standard ANOVA Table

SS df MS

Regression SSR 1 MSR

Error SSE n-2 MSE

Total SST n-1

Page 11: Lecture 9: ANOVA tables F-tests

ANOVA for logLOS ~ BEDS

> anova(reg)Analysis of Variance Table

Response: logLOS Df Sum Sq Mean Sq F value Pr(>F) BEDS 1 0.64017 0.64017 24.442 2.737e-06 ***Residuals 111 2.90728 0.02619

Page 12: Lecture 9: ANOVA tables F-tests

Inference?

What is of interest and how do we interpret? We’d like to know if BEDS is related to logLOS. How do we do that using ANOVA table?

We need to know the expected value of the MSR and MSE:

22

12

2

)()(

)(

XXMSRE

MSEE

i

Page 13: Lecture 9: ANOVA tables F-tests

Implications

mean of sampling distribution of MSE is σ2 regardless of whether or not β1= 0

If β1= 0, E(MSE) = E(MSR)

If β1≠ 0, E(MSE) < E(MSR)

To test significance of β1, we can test if MSR and MSE are of the same magnitude.

22

12

2

)()(

)(

XXMSRE

MSEE

i

Page 14: Lecture 9: ANOVA tables F-tests

F-test

Derived naturally from the arguments just made

Hypotheses:• H0: β1= 0

• H1: β1≠ 0

Test statistic: F* = MSR/MSE

Based on earlier argument we expect F* >1 if H1 is true.

Implies one-sided test.

Page 15: Lecture 9: ANOVA tables F-tests

F-test

The distribution of F under the null has two sets of degrees of freedom (df)• numerator degrees of freedom• denominator degrees of freedom

These correspond to the df as shown in the ANOVA table• numerator df = 1• denominator df = n-2

Test is based on)2,1(~* nF

MSE

MSRF

Page 16: Lecture 9: ANOVA tables F-tests

Implementing the F-test

The decision rule

If F* > F(1-α; 1, n-2), then reject Ho If F* ≤ F(1-α; 1, n-2), then fail to reject Ho

Page 17: Lecture 9: ANOVA tables F-tests

F-distributions

0 1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

x

f1

F(1,10)F(1,1000)F(5,10)F(5,1000)

Page 18: Lecture 9: ANOVA tables F-tests

ANOVA for logLOS ~ BEDS

> anova(reg)Analysis of Variance Table

Response: logLOS Df Sum Sq Mean Sq F value Pr(>F) BEDS 1 0.64017 0.64017 24.442 2.737e-06 ***Residuals 111 2.90728 0.02619

> qf(0.95, 1, 111)[1] 3.926607

> 1-pf(24.44,1,111)[1] 2.739016e-06

Page 19: Lecture 9: ANOVA tables F-tests

More interesting: MLR

You can test that several coefficients are zero at the same time

Otherwise, F-test gives the same result as a t-test

That is: for testing the significance of ONE covariate in a linear regression model, an F-test and a t-test give the same result:• H0: β1= 0

• H1: β1≠ 0

Page 20: Lecture 9: ANOVA tables F-tests

general F testing approach

Previous seems simple It is in this case, but can be generalized to be

more useful Imagine more general test:

• Ho: small model• Ha: large model

Constraint: the small model must be ‘nested’ in the large model

That is, the small model must be a ‘subset’ of the large model

Page 21: Lecture 9: ANOVA tables F-tests

Example of ‘nested’ models

ii eNURSENURSEMSINFRISKLOS 243210

ii eNURSENURSEINFRISKLOS 24310

ii eMSINFRISKLOS 210

Model 1:

Model 2:

Model 3:

Models 2 and 3 are nested in Model 1Model 2 is not nested in Model 3Model 3 is not nested in Model 2

Page 22: Lecture 9: ANOVA tables F-tests

Testing: Models must be nested!

To test Model 1 vs. Model 2• we are testing that β2 = 0

• Ho: β2 = 0 vs. Ha: β2 ≠ 0

• If β2 = 0 , then we conclude that Model 2 is superior to Model 1

• That is, if we reject the null hypothesis

ii eNURSENURSEMSINFRISKLOS 243210

ii eNURSENURSEINFRISKLOS 24310

Model 2:

Model 1:

Page 23: Lecture 9: ANOVA tables F-tests

R

reg1 <- lm(LOS ~ INFRISK + ms + NURSE + nurse2, data=data)reg2 <- lm(LOS ~ INFRISK + NURSE + nurse2, data=data)reg3 <- lm(LOS ~ INFRISK + ms, data=data)

> anova(reg1)Analysis of Variance Table

Response: LOS Df Sum Sq Mean Sq F value Pr(>F) INFRISK 1 116.446 116.446 45.4043 8.115e-10 ***ms 1 12.897 12.897 5.0288 0.02697 * NURSE 1 1.097 1.097 0.4277 0.51449 nurse2 1 1.789 1.789 0.6976 0.40543 Residuals 108 276.981 2.565 ---

Page 24: Lecture 9: ANOVA tables F-tests

R > anova(reg2)Analysis of Variance Table

Response: LOS Df Sum Sq Mean Sq F value Pr(>F) INFRISK 1 116.446 116.446 44.8865 9.507e-10 ***NURSE 1 8.212 8.212 3.1653 0.078 . nurse2 1 1.782 1.782 0.6870 0.409 Residuals 109 282.771 2.594 ---

> anova(reg1, reg2)Analysis of Variance Table

Model 1: LOS ~ INFRISK + ms + NURSE + nurse2Model 2: LOS ~ INFRISK + NURSE + nurse2 Res.Df RSS Df Sum of Sq F Pr(>F)1 108 276.981 2 109 282.771 -1 -5.789 2.2574 0.1359

Page 25: Lecture 9: ANOVA tables F-tests

R

> summary(reg1)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.355e+00 5.266e-01 12.068 < 2e-16 ***INFRISK 6.289e-01 1.339e-01 4.696 7.86e-06 ***ms 7.829e-01 5.211e-01 1.502 0.136 NURSE 4.136e-03 4.093e-03 1.010 0.315 nurse2 -5.676e-06 6.796e-06 -0.835 0.405 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.601 on 108 degrees of freedomMultiple R-squared: 0.3231, Adjusted R-squared: 0.2981 F-statistic: 12.89 on 4 and 108 DF, p-value: 1.298e-08

>

Page 26: Lecture 9: ANOVA tables F-tests

Testing more than two covariates

To test Model 1 vs. Model 3• we are testing that β3 = 0 AND β4 = 0

• Ho: β3 = β4 = 0 vs. Ha: β3 ≠ 0 or β4 ≠ 0

• If β3 = β4 = 0, then we conclude that Model 3 is superior to Model 1

• That is, if we reject the null hypothesis

ii eNURSENURSEMSINFRISKLOS 243210

ii eMSINFRISKLOS 210

Model 1:

Model 3:

Page 27: Lecture 9: ANOVA tables F-tests

R> anova(reg3)Analysis of Variance Table

Response: LOS Df Sum Sq Mean Sq F value Pr(>F) INFRISK 1 116.446 116.446 45.7683 6.724e-10 ***ms 1 12.897 12.897 5.0691 0.02634 * Residuals 110 279.867 2.544 --- > anova(reg1, reg3)Analysis of Variance Table

Model 1: LOS ~ INFRISK + ms + NURSE + nurse2Model 2: LOS ~ INFRISK + ms Res.Df RSS Df Sum of Sq F Pr(>F)1 108 276.981 2 110 279.867 -2 -2.886 0.5627 0.5713

Page 28: Lecture 9: ANOVA tables F-tests

R > summary(reg3)

Call:lm(formula = LOS ~ INFRISK + ms, data = data)

Residuals: Min 1Q Median 3Q Max -2.9037 -0.8739 -0.1142 0.5965 8.5568

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.4547 0.5146 12.542 <2e-16 ***INFRISK 0.6998 0.1156 6.054 2e-08 ***ms 0.9717 0.4316 2.251 0.0263 * ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.595 on 110 degrees of freedomMultiple R-squared: 0.3161, Adjusted R-squared: 0.3036 F-statistic: 25.42 on 2 and 110 DF, p-value: 8.42e-10

Page 29: Lecture 9: ANOVA tables F-tests

Testing multiple coefficients simultaneously

Region: it is a ‘factor’ variable with 4 categories

iiiii eRIRIRILOS )4()3()2( 3210