23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates...

Preview:

Citation preview

23-1

Analysis of Covariance(Chapter 16)

• A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable, X, sometimes called a covariate.

• The procedure, ANCOVA, is a combination of ANOVA with regression.

23-2

Example: Calf Weight Gain

• An animal scientist wishes to examine the impact of a pair of new dietary supplements on calf weight gain (response).

• Three treatments are defined: standard diet, standard diet + supplement Q, and standard diet + supplement R.

• All new calves from a large herd are available for use as study units. She selects 30 calves for study. Calves are randomized to the three diets at random (completely randomized design).

• Initial weights are recorded, then calves are placed on the diets. At the end of four weeks the final weight is taken and weight gain is computed.

• Simple analysis of variance and associated multiple comparisons procedures indicate no significant differences in weight gain between the two supplementary diets, but big differences between the supplemental diets and the standard diet.

• Is this the end of the story? …

23-3

ANOVA Results

Average Weight Gain(Response g/day)

StandardDiet

+ Supplement Q

+ Supplement R

xx x xx x x xx

xxx xxx x xx

xx xx x xx x x x

Simple ANOVA of a one-way classification would suggest no difference between Supplements Q and R but both different from Standard diet.

23-4

Initial Weights

Initial Weight

StandardDiet

+ Supplement Q

+ Supplement R

xx xx xx x x xx

xx x xxxx x xx

x x xxx xx x x x

Plotting of the initial weights by group shows that the groups were not equal when it came to initial weights.

23-5

Weight Gain to Initial WeightStandard Diet

Weight (kg)

wF1

age

wi1

wi2

wgain1

wF2

2gainw

If animals come into the study at different ages, they have different initial weights and are at different points on the growth curve. Expected weight gains will be different depending on age at entry into study.

23-6

Regression of Initial Weight to Weight Gain

Initial Weight

(x)

WeightGain

(g/day)(Y)

wi1

wi2

wgain1

wgain2

If we disregard the age of the animal but instead focus on the initial weight, we see that there is a linear relationship between initial weight and the weight gain expected.

23-7

Covariates

Initial weight in the previous example is a covariable or covariate.

A covariate is a disturbing variable (confounder), that is, it is known to have an effect on the response. Usually, the covariate can be measured but often we may not be able to control its effect through blocking.

In the EXAMPLE, had the animal scientist known that the calves were very variable in initial weight (or age), she could have:

• Created blocks of 3 or 6 equal weight animals, and randomized treatments to calves within these blocks.

• This would have entailed some cost in terms of time spent sorting the calves and then keeping track of block membership over the life of the study.

• It was much easier to simply record the calf initial weight and then use analysis of covariance for the final analysis.

• In many cases, due to the continuous nature of the covariate, blocking is just not feasible.

23-8

Expectations under Ho

Initial Weight

(x)

ExpectedWeightGain

(g/day)(Y)

Under Ho: no treatment effects.

If all animals had come in with the same initial weight, All three treatments would produce the same weight gain.

Average Weight Animal

23-9

Expectations under HA

Initial Weight

(x)

ExpectedWeightGain

(g/day)(Y)

Standard Diet (c)

+ Supplement Q (q)

+ Supplement R (r)

Under Ha: Significant Treatment effects

Average Weight Animal

WGs

WGQ

WGR

Different treatments produce different weight gains for animals of the same initial weight.

23-10

Different Initial Weights

Initial Weight

(x)

ExpectedWeightGain

(g/day)(Y)

cc cc cc c c cc qq

q qqqq q qq r r rrr rr r r r

Under Ho: no treatment effects.

If the average initial weights in the treatment groups differ, the observed weight gains will be different, even if treatments have no effect.

WGs

WGQ

WGR

23-11

Observed Responses under HA

Initial Weight

(x)

WeightGain

(g/day)(Y)

cc cc cc c c cc qq

q qqqq q qq r r rrr rr r r r

qq

q

qq

qq

qq

q

cc

c

c

c

cc

c

cc

rrr

rrrrrr

r Standard Diet

+ Supplement Q

+ Supplement R

Under HA: Significant Treatment effects

Suppose now that different supplements actually do increase weight gain.This translates to animals in different treatment groups following different, but parallel regression lines with initial weight.

WGR

WGs

WGQ

What difference in weight gain is due to Initial weight and what is due to Treatment?

23-12

Observed Group Means

Initial Weight

(x)

WeightGain

(g/day)(Y)

cc cc cc c c cc qq

q qqqq q qq r r rrr rr r r r

qq

q

qq

qq

qq

qc

cc

c

c

c

c

c

cc

rrr

rr

rrrr

rStandard Diet

+ Supplement Q

+ Supplement R

yc

y ryq

Unadjusted treatment means

Simple one-way classification ANOVA (without accounting for initial weight) gives us the wrong answer!

23-13

Predicted Average Responses

Initial Weight

(x)

WeightGain

(g/day)(Y)

cc cc cc c c cc qq

q qqqq q qq r r rrr rr r r r

qq

q

qq

qq

qq

qc

cc

c

c

c

c

c

cc

rrr

rr

rrrr

rStandard Diet

+ Supplement Q

+ Supplement R

|yq X x

|yc X x

|y r X x

X x

Expected weight gain is computed for treatments for the average initial weight and comparisons are then made.

Adjusted treatment means

23-14

ANCOVA: Objectives

The objective of an analysis of covariance is to compare the treatment means after adjusting for differences among the treatments due to differences in the covariate levels for the treatments groups.

The analysis proceeds by combining a regression model with an analysis of variance model.

23-15

Model

ij i ijE(y ) x=m+a +b

The i, i=1,…,t, are estimates of how each of the t treatments modifies the

overall mean response. (The index j=1,…,n, runs over the n replicates for each treatment.)

The slope coefficient, , is a measure of how the average response changes as the value of the covariate changes.

The analysis proceeds by fitting a linear regression model with dummy variables to code for the different treatment levels.

23-16

A Priori Assumptions

The covariate is related to the response, and can account for variation in the response.

Check with a scatterplot of Y vs. X.

The covariate is NOT related to the treatments.If Y is related to X, then the variance of the treatment differences is

increased relative to that obtained from an ANOVA model without X, which results in a loss of precision.

The treatment’s regression equations are linear in the covariate.

Check with a scatterplot of Y vs. X, for each treatment. Non-linearity can be accommodated (e.g. polynomial terms, transforms), but analysis may be more complex.

The regression lines for the different treatments are parallel.This means there is only one slope in the Y vs. X plots. Non-parallel

lines can be accommodated, but this complicates the analysis since differences in treatments will now depend on the value of X.

23-17

Example

Four different formulations of an industrial glue are being tested. The tensile strength (response) of the glue is known to be related to the thickness as applied. Five observations on strength (Y) in pounds, and thickness (X) in 0.01 inches are made for each formulation.

Here: • There are t=4 treatments (formulations of glue). • Covariate X is thickness of applied glue.• Each treatment is replicated n=5 times at different values of X.

Formulation Strength

Thickness

1 46.5 13

1 45.9 14

1 49.8 12

1 46.1 12

1 44.3 14

2 48.7 12

2 49.0 10

2 50.1 11

2 48.5 12

2 45.2 14

3 46.3 15

3 47.1 14

3 48.9 11

3 48.2 11

3 50.3 10

4 44.7 16

4 43.0 15

4 51.0 10

4 48.1 12

4 46.8 11

23-18

Formulation Profiles

40.0

44.0

48.0

52.0

16 15 10 12 11

Thickness (X)

Strength(Y)

Form_1 Form_2 Form_3 Form_4

23-19

SAS Program data glue; input Formulation Strength Thickness; datalines;1 46.5 131 45.9 141 49.8 121 46.1 121 44.3 142 48.7 122 49.0 102 50.1 112 48.5 122 45.2 143 46.3 153 47.1 143 48.9 113 48.2 113 50.3 104 44.7 164 43.0 154 51.0 104 48.1 124 46.8 11;run;proc glm; class formulation; model strength = thickness formulation / solution ; lsmeans formulation / stderr pdiff; run;

The basic model is a combination of regression and one-way classification.

23-20

Output: Use Type III SS to test significance of each variable

Source DF Squares Mean Square F Value Pr > F Model 4 66.31065753 16.57766438 10.17 0.0003 Error 15 24.44684247 1.62978950 Corrected Total 19 90.75750000

R-Square Coeff Var Root MSE Strength Mean 0.730636 2.691897 1.276632 47.42500

Source DF Type I SS Mean Square F Value Pr > F Thickness 1 63.50120135 63.50120135 38.96 <.0001 Formulation 3 2.80945618 0.93648539 0.57 0.6405

Source DF Type III SS Mean Square F Value Pr > F Thickness 1 53.20115753 53.20115753 32.64 <.0001 Formulation 3 2.80945618 0.93648539 0.57 0.6405

Standard Parameter Estimate Error t Value Pr > |t|

Intercept 58.93698630 B 2.21321008 26.63 <.0001 Thickness -0.95445205 0.16705494 -5.71 <.0001 Formulation 1 -0.00910959 B 0.80810401 -0.01 0.9912 Formulation 2 0.62554795 B 0.82451389 0.76 0.4598 Formulation 3 0.86732877 B 0.81361075 1.07 0.3033 Formulation 4 0.00000000 B . . .

Regression on thickness is significant.No formulation differences.

Divide by MSE to get mean squares.

MSE

23-21

Least Squares Means(Adjusted Formulation means computed at the

average value of Thickness [=12.45])

The GLM Procedure Least Squares Means

Strength Standard LSMEAN Formulation LSMEAN Error Pr > |t| Number

1 47.0449486 0.5782732 <.0001 1 2 47.6796062 0.5811616 <.0001 2 3 47.9213870 0.5724527 <.0001 3 4 47.0540582 0.5739134 <.0001 4

Least Squares Means for effect Formulation Pr > |t| for H0: LSMean(i)=LSMean(j)

Dependent Variable: Strength

i/j 1 2 3 4 1 0.4574 0.3011 0.9912 2 0.4574 0.7695 0.4598 3 0.3011 0.7695 0.3033 4 0.9912 0.4598 0.3033

23-22

ANCOVA in MinitabFormulation Strength Thickness

1 46.5 131 45.9 141 49.8 121 46.1 121 44.3 142 48.7 122 49.0 102 50.1 112 48.5 122 45.2 143 46.3 153 47.1 143 48.9 113 48.2 113 50.3 104 44.7 164 43.0 154 51.0 104 48.1 124 46.8 11

Stat > ANOVA > General Linear Model …

> Responses: Strength

> Model: Formulation

> Covariates: Thickness

> Options: Adjusted (Type III) Sums of Squares

General Linear Model: Strength versus Formulation

Factor Type Levels Values Formulat fixed 4 1 2 3 4

Source DF Seq SS Adj SS Adj MS F PThicknes 1 63.501 53.201 53.201 32.64 0.000Formulat 3 2.809 2.809 0.936 0.57 0.640Error 15 24.447 24.447 1.630Total 19 90.758

Term Coef SE Coef T PConstant 59.308 2.099 28.25 0.000Thicknes -0.9545 0.1671 -5.71 0.000Formulat1 -0.3801 0.5029 -0.76 0.4622 0.2546 0.5062 0.50 0.6223 0.4964 0.4962 1.00 0.333

23-23

4321

47.9

47.8

47.7

47.6

47.5

47.4

47.3

47.2

47.1

47.0

Formulation

Streng

thMain Effects Plot - LS Means for Strength

Factor Plots… > Main Effects Plot > Formulation

23-24

ANCOVA in R> glue <- read.table("glue.txt",header=TRUE)> glue$Formulation <- as.factor(glue$Formulation) > # fit linear models: full, thickness only, formulation only> full.lm <- lm(Strength ~ Formulation + Thickness, data=glue)> thick.lm <- lm(Strength ~ Thickness, data=glue)> formu.lm <- lm(Strength ~ Formulation, data=glue)>> anova(thick.lm,full.lm)Analysis of Variance Table

Model 1: Strength ~ ThicknessModel 2: Strength ~ Formulation + Thickness Res.Df RSS Df Sum of Sq F Pr(>F)1 18 27.2563 2 15 24.4468 3 2.8095 0.5746 0.6405

> anova(formu.lm,full.lm)Analysis of Variance Table

Model 1: Strength ~ FormulationModel 2: Strength ~ Formulation + Thickness Res.Df RSS Df Sum of Sq F Pr(>F) 1 16 77.648 2 15 24.447 1 53.201 32.643 4.105e-05 ***

Test for Formulation differences

Test for significance of Thickness

23-25

R> summary(full.lm)Call: lm(formula = Strength ~ Formulation + Thickness, data = glue)

Residuals: Min 1Q Median 3Q Max -1.6380 -1.0398 0.1873 0.6966 2.3255

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 58.92788 2.24551 26.243 5.97e-14 ***Formulation2 0.63466 0.83193 0.763 0.457 Formulation3 0.87644 0.81840 1.071 0.301 Formulation4 0.00911 0.80810 0.011 0.991 Thickness -0.95445 0.16706 -5.713 4.11e-05 ***

> summary(thick.lm)Call: lm(formula = Strength ~ Thickness, data = glue)

Residuals: Min 1Q Median 3Q Max -2.0813 -0.7324 0.1274 0.9090 1.9230

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 59.9294 1.9504 30.726 < 2e-16 ***Thickness -1.0044 0.1551 -6.476 4.32e-06 ***

Residual standard error: 1.231 on 18 degrees of freedomMultiple R-Squared: 0.6997, Adjusted R-squared: 0.683 F-statistic: 41.94 on 1 and 18 DF, p-value: 4.317e-06

Full model (can be refined by omitting formulation)

Reduced model (formulation omitted)

23-26

RPlot lines for full model; but these can all be replaced by single line for reduced model (blue).

23-27

RCheck fit of reduced model (with just thickness).

Recommended