Chapter7.pdf

Mixed Linear Models

Basic model:Y = X + Zu +

X is a n p model matrix of known constants is a p 1 vector of fixed unknown parameter valuesZ is a n q model matrix of known constantsu is a q 1 random vector is a n 1 vector of random errors with

E() = 0 V ar() = R

E(u) = 0 V ar(u) = G

Cov(,u) = 0

E(Y ) = X

V ar(Y ) = ZGZ +R

Normal-theory mixed model[u

] N

([00

],

[G 00 R

])Then,

Y N (X, ZGZ +R)Y N (X,)

Example 1: Random Blocks

Comparison of four processes for producing penicillin

Process AProcess BProcess CProcess D

}fixed treatment levels

of a design factor

Blocks correspond to different batches of an importantraw material, corn steep liquor

Random sample of five batches

Split each batch into four parts:- run each process on one part- randomize the order in which the processes are runwithin each batch

Here, batch effects are considered as random block effects:

Batches are sampled from a population of manypossible batches

Need to use a different set of batches of raw materialto repeat this experiment

Model:

Yij = + i + j + eij

Overall mean: Fixed treatment effect: i, for i = 1, , a = 4Random block (batch) effect:

j N(0, 2), j = 1, , b = 5Random error: ij N(0, 2 )and any ij is independent of any j

Mean yield for the i-th process, averaging across the entirepopulation of possible batches: i = + i

Variance-covariance structure:

V ar(Yij) = V ar(j + ij)

= 2 + 2

Different runs on the same batch:

Cov(Yij, Ykj)

= Cov(j + ij, j + kj)

= V ar(j) + Cov(j, kj) + Cov(ij, j) + Cov(ij, kj)

= 2 for all i 6= k

Correlation among observations for runs on the same batch:

=Cov(Yij, Ykj)V ar(Yij)V ar(Ykj)

=2

2 + 2

for i 6= k

Results for runs on different batches are uncorrelated(independent):

Cov(Yij, Ykl) = 0 for j 6= l

Results from the four runs on a single batch:

V ar

Y1jY2jY3jY4j

=2 +

2

2

2

2

2 2 +

2

2

2

2 2

2 +

2

2

2 2

2

2 +

2

= 2Ja +

2 Ia

G = V ar(u) = 2Ibb

R = V ar() = 2 Iabab

V ar(Y ) = ZGZ +R

= 2ZZ + 2 I

=

2Ja +

2 Ia

2Ja + 2 Ia

. . .

2Ja + 2 Ia

Example 2: Subsampling Random Effects Model

Analysis of sources of variation in a process used tomonitor the production of a pigment paste.

Current Procedure:

Sample barrels of pigment paste

One sample from each barrel

Send the sample to a lab for determination of moisturecontent

Measured Response: (Y ) moisture content of the pigmentpaste (units of one tenth of 1%).

Problem: Variation in moisture content is too large

Data Collection: Subsampling (or nested) Study Design

Sample b = 15 barrels of pigment paste

s = 2 samples are taken from the content of each barrel

Each sample is mixed and divided into r = 2 parts.

Each part is sent to the lab for measuring moisturecontent

There are n = bsr = 60 observations.

Yijk = + i + ij + ijk

Yijk is the moisture content determination for the k-th partof the j-th sample from the i-th barrel is the mean moisture contenti is a random barrel effect: i N(0, 2), i = 1, . . . , bij is a random sample effect: ij N(0, 2 ), j = 1, , sijk is the random measurement error:ijk N(0, 2 ), k = 1, , r

Variance-covariance structure:

V ar(Yijk) = V ar(i) + V ar(ij) + V ar(ijk)

= 2 + 2 +

2

Two parts of one sample:

Cov(Yijk, Yijk) = Cov(i, i) + Cov(ij, ij)

= 2 + 2 for k 6= k

Observations on different samples taken from the samebarrel:

Cov(Yijk, Yijk) = Cov(i, i)

= 2 for j 6= j

Observations from different barrels:

Cov(Yijk, Yijk) = 0; for i 6= i

R = V ar() = 2 Ibsr

G = V ar(u) =

[2Ib 0

0 2Ibs

]Then

E(Y ) = X = 1

V ar(Y ) = ZGZ +R

= 2(Ib Jsr) + 2 (Ibs Jr) + 2 Ibsr

Note that Z = [Ib 1sr|Ibs 1r]

Analysis of Mixed Linear Models:

Inferences about estimable functions of fixed effects

point estimatesconfidence intervalstests of hypotheses

Estimation of variance components

Predictions of random effects (BLUP)

Predictions of future observations

Estimation methods:

Ordinary Least Squares (OLS) Estimation:

Solutions OLS = (XX)X Y

The Gauss-Markov Theorem cannot be applied sinceV ar(Y ) = ZGZ +R 6= 2IThe OLS estimator is not necessarily a BLUEThe OLS estimator for C isC = C (X X)X Y N (C , C (X X)X (ZGZ +R)X(X X)C)

Generalized Least Squares (GLS) estimation:

Solutions GLS = (X1X)X 1Y where

= ZGZ +RFor any estimable function C , the unique BLUE isC GLS = C

(X 1X)X 1Y N(C , C (X 1X)C

)When G and/or R contain unknown parameters, youcould obtain an approximate BLUE by replacingthe unknown parameters with consistent estimators toobtain = ZGZ + R andC

GLS = C

(X 1X)X 1Y

C GLS is not a linear function of Y

C GLS is not a BLUE

For large samplesC

GLS

N (C , C (X 1X)C)C (X 1X)C tends to underestimate

V ar(C

GLS

)

Variance component estimation

Estimation of parameters in G and R

Crucial to the estimation of estimable functions offixed effects

Of interest in its own right (source of variation)

Basic approaches:

1 ANOVA methods (method of moments):- set observed values of mean squares equal to theirexpectations and solve the resulting equations

2 Maximum likelihood estimation (ML)

3 Restricted maximum likelihood estimation (REML)

ANOVA method (Method of moments):

Compute an ANOVA table

Equate mean squares to their expected values

Solve the resulting equations

Example 1: Penicillin production

Yij = + i + j + ij

with j N(0, 2) and ij N(0, 2 )i = 1, , a and j = 1, , b

ANOVA table:

Source d.f. Sum of Squares

Blocks b 1 = 4 SSblocks = ab

j=1(Yj Y)2Processes a 1 = 3 SSprocesses = b

ai=1(Yi Y)2

Error (a 1)(b 1) SSE == 12

ai=1

bj=1(Yij Yi Yj + Y)2

C. Total ab 1 = 19a

i=1

bj=1(Yij Y)2

MSE =SSE

(a 1)(b 1)E(MSE) = 2

Then an unbiased estimator for is

=MSE

MSblocks =SSblocksb 1

E(MSblocks) = 2 + a

2

Then,

2 =E(MSblocks) E(MSE)

a

An unbiased estimator for 2 is

2 =MSblocks MSE

a

Also,V ar(Yij) =

2 +

2

Inference about treatment means:

Consider the sample mean:

Yi =1

b

bj=1

Yij

withE(Yi) = + i

for random block effects j N(0, 2) and

V ar(Yi) =1

b

(2 +

2

)

S2Yi =1

b

(2 +

2

)=

a 1ab

MSE +1

abMSblocks

=1

ab(b 1)[SSE + SSblocks]

whereSSE 22(a1)(b1)

SSblocks (2 + a

2

)2(b1)

Cochran Satterthwaite approximation

Suppose MS1,MS2, ,MSk are mean squares withindependent distributions

degrees of freedom = dfi(dfi)MSiE(MSi)

2dfi

Then, for positive constants ai > 0, i = 1, 2, , k, thedistribution of

S2 = a1MS1 + a2MS2 + + akMSk

is approximated byvS2

E(S2)

2v

where

v =[E(S2)]2

[a1E(MS1)]2

df1+ + [akE(MSk)]2

dfk

is the value for the degrees of freedom.

In practice, the degrees of freedom are evaluated as

v =[S2]2

[a1MS1]2

df1+ + [akMSk]2

dfk

These are called the Cochran-Satterthwaite degrees offreedom.

An approximate (1 ) 100% confidence interval for+ i is

Yi tv,/2

a 1ab

MSE +1

abMSblocks

where v =[a1

abMSE+ 1

abMSblocks]

2

(a1ab

MSE)2

(a1)(b1) +( 1ab

MSblocks)2

b1

Difference between two means:

E(Yi Yk

)= E

(1

b

bj=1

(Yij Ykj)

)

= i k +1

b

bj=1

E(ij kj)

= i k= (+ i) (+ k)

V ar(Yi Yk

)= V ar

(i k +

1

b

bj=1

(ij kj)

)

=1

b2

bj=1

V ar(ij kj)

=22b

The standard error for Yi Yk is

SYiYk =

2MSE

b

A (1 ) 100% confidence interval for i k is

Yi Yk t(a1)(b1),/2

2MSE

b

T-test about H0 : i k = 0: Reject H0 if

|t| = |Yi Yk|2MSEb

> t(a1)(b1),/2

Properties of ANOVA methods for variance componentestimation:

Broad applicability- easy to compute in balanced cases- ANOVA is widely known

May produce negative estimates of variances

Can produce the same estimates of variances as REML- in simple balanced cases- when ANOVA estimates of variance components areinside the parameter space

If MS1,MS2, ,MSk are distributed independently with

dfiMSiE(MSi)

2dfi

and constants ai > 0, i = 1, 2, , k, are selected so that

2 =ki

aiMSi

has expectation 2, then

V ar(2) = 2ki

a2i [E(MSi)]2

dfi

and an unbiased estimator of this variance is

V ar(2) =ki

2a2iMS2i

(dfi + 2)

Using the Cochran-Satterthwaite approximation, anapproximate (1 ) 100% confidence interval for 2could be constructed as

1 .= Pr{2v,1/2 v2

2 2v,/2}

= Pr{ v2

2v,/2 2 v

2

2v,1/2}

where 2 =k

i aiMSi and v =[k

i aiMSi]2k

i=1[aiMSi]

2

dfi

Maximum Likelihood Estimation:

The model must include a specification of the jointdistribution of the observations

Find the parameter values that maximize thelikelihood of the observed data

Example: Normal theory Gauss-Markov model:

Y = X + , N(0, 2I

)Y N

(X, 2I

)

The likelihood function is

L(, 2;Y) =1

(2)n/2ne{

122

(YX)(YX)}

Need to find values of and 2 that maximize thislikelihood function

This is equivalent to finding values of and 2 thatmaximize the log-likelihood

The log-likelihood function is

l(, 2;Y) = log(L(, 2;Y)

)= n

2log(2) n

2log(2)

122

(Y X)(Y X)

Solve the likelihood equations:

0 =1

2(X X X Y)

0 = n22

+1

2(2)2(Y X)(Y X)

Solution:

ML = OLS = (XX)X Y

2ML =1

n(Y XML)(Y XML)

=1

nY (I PX)Y =

1

nSSE

2ML =1nSSE is a biased estimator for 2

2OLS =1

nrank(X)SSE is an unbiased estimator for 2

The two are asymptotically equivalent

Normal-theory Aitken model:

Y = X + , N(0, 2H

)H is a known positive definite matrix.

The multivariate normal likelihood function is

L(, 2;Y) =1

(22)n/2|H|1/2e

122

(YX)H1(YX)


l(, 2;Y) = n2log(2) 1

2log(|H|) n

2log(2)

122

(Y X)H1(Y X)

For any value of , the log-likelihood is maximized byfinding a that minimizes

(Y X)H1(Y X)

The estimating equations are(X H1X

) = X H1Y

Solutions are of the form

ML = GLS =(X H1X

)X H1Y

When H is known, the MLE for is also the generalizedleast squares estimator.

The additional estimating equation corresponding to 2 is

0 = n22

+1

2(2)2(Y X)H1(Y X)

Substituting the solution to the other estimating equationsfor , the solution is

2ML =1

n(Y XGLS)H1(Y XGLS)

This is a biased estimator for 2.

When H contains unknown parameters:

We could maximize the log-likelihood with respect to, 2, and the parameters in H

There may be no algebraic formulas for the solutionsto the joint likelihood equations

The MLEs for 2 and the parameters in H are usuallybiased (too small)

Restricted (Residual) maximum likelihood (REML)estimates are often used

Consider the mixed linear model

Y = X + Zu + [u

] N

([00

],

[G 00 R

])Then,

Y N (X,)

where = ZGZ +R.

Multivariate normal likelihood:

L(,;Y) = (2)n/2||1/2 exp{12

(YX)1(YX)}


l(,;Y) = n2log(2) 1

2log(||)

12

(Y X)1(Y X)

Given the values of the observed responses, Y, find values and that maximize the log-likelihood function.

This is a difficult computational problem:

no analytic solution except in some balanced cases

use iterative numerical methods- Need starting values (initial guesses at the values of and = ZGZ + R- local or global maxima?- what if becomes singular or is not positive definite?

constrained optimization- estimates of variances cannot be negative

estimates of variance components tend to be small

Consider a simple case when Y1, , Yn N(, 2).

2OLS =1

n 1

nj=1

(Yj Y

)2is an unbiased estimator for 2. While the MLE for 2 is

2ML =1

n

nj=1

(Yj Y

)2with E(2ML) =

(n1n

)2 < 2.

Note that 2OLS and 2ML are based on error contrasts

e1 = Y1 Y =(n 1n

, 1n, , 1

n

)Y

...

en = Yn Y =( 1n, 1

n, , 1

n,n 1n

)Y

whose distribution does not depend on = E(Yj).

When Y N(1, 2I),

e =

e1...en

= (I P1)Y N (0, 2(I P1))

The MLE 2ML =1n

nj=1 e

2j fails to acknowledge that e

is restricted to an (n 1)dimensional space, i.e.,j ej = 0

The MLE fails to make the appropriate adjustment indegrees of freedom needed to obtain an unbiasedestimator for 2

Example: Suppose n = 4 and Y N(14, 2I4). Then

e =

Y1 YY2 YY3 YY4 Y

= (I P1)Y N (0, 2(I P1)).Note that the covariance matrix (I P1) is singular andm = rank(I P1) = n 1 = 3.

Definer = Me = M(I PX)Y

where

M =

1 1 1 11 1 1 11 1 1 1

has row rank equal to m = rank(I PX). Then

r =

r1r2r3

=Y1 + Y2 Y3 Y4Y1 Y2 + Y3 Y4Y1 Y2 Y3 + Y4

= M(I P1)Y N

(0, 2M(I P1)M

)

Let W = M(I P1)M , and define restricted likelihoodfunction:

L(2; r) =1

(2)m/2|2W |1/2e

122

rW1r

Restricted log-likelihood function:

l(2; r) = m2log(2) m

2log(2)

12log(|W |) 1

22rW1r

(Restricted) likelihood equation:

0 =l(2; r)

2= m

22+

1

2(2)2rW1r

Solution (REML estimator for 2):

2REML =1

mrW1r

=1

mY(I P1)Y

=1

n 1

nj=1

(Yj Y )2 = 2OLS

REML (Restricted Maximum Likelihood)estimation

Estimate parameters in = ZGZ +R by maximizingthe part of the likelihood that does not depend onE(Y ) = X

Maximize a likelihood function for error contrasts

linear combinations of observations that do notdepend on XFind a set of n rank(X) linearly independent errorcontrasts

Mixed (normal-theory) model:

Y = X + Zu +

where

[u

] N

([00

],

[G 00 R

]). Then

LY = L (X + Zu + )

= LX + LZu + L

is invariant to X if and only if LX = 0, if and only if

L = M(I PX)

for some M .

To avoid losing information we must have

row rank(M) = n rank(X) = n r

Then a set of n rank(X) error contrasts is

r = M(I PX)Y Nnp

(0,M(I PX)1(I PX)M

)

Define W = M(I PX)1(I PX)M , whererank(W ) = n rank(X) and W1 exists.

The restricted likelihood function is

L(; r) =1

(2)(nr)/2|W |1/2e

122

rW1r

The restricted log-likelihood function is

l(; r) = (n r)2

log(2) 12log(|W |) 1

22rW1r

For any M(nr)n with row rank equal ton r = n rank(X), the log-likelihood can be expressed interms of

e =(I X(X1X )X 1

)Y

as

l(; e) = constant 12log(||)

12log(|X 1X |)

1

2e1e

where X is any set of r = rank(X) linearly independentcolumns of X

An MLE for the vector of unknown variancecomponents in can be found in the general case usingnumerical methods to obtain the REML estimates

It can be proved that every set of n r linearlyindependent error contrasts yield s the same REMLestimates of the variance components

For the normal theory Gauss-Markov linear model,Y N (X, 2I), the REML estimator of 2 is2 = Y(I PX)Y/(n r), which is the same as theunbiased OLS estimator

For linear mixed effects models, the REML estimatorsof variance components produce the same estimates asthe unbiased ANOVA-based estimators formed bytaking appropriate linear combinations of mean squareswhen the latter are positive and data are balanced

Denote the resulting REML estimators as

G, R, and = ZGZ + G

Estimation of fixed effects:By using the REML estimator for , an approximationof the BLUE is

C = C (X1X )X 1Y

Prediction of random effects: Given the observedresponses Y, predict the value of u

For the mixed model with

[u

] N

([00

],

[G 00 R

]),

[uY

]=

[u

X + Zu +

]=

[0X

]+

[I 0Z I

] [u

] N

([0X

],

[G GZ

ZG ZGZ +R

])

The Best Linear Unbiased Predictor (BLUP) is the BLUEfor

E(u|Y) = E(u) + (GZ ) (ZGZ +R)1 (Y E(Y))= 0 + (GZ ) (ZGZ +R)

1(Y X)

Then, the BLUP for u is

BLUP (u) = (GZ )1(Y XGLS

)= (GZ )1

(I X(X1X )X 1

)Y

when G and = ZGZ +R are known.

Substituting REML estimators G and R for G and R, anapproximate BLUP for u is

u = (GZ )1(I X(X1X )X 1

)Y

= (GZ )1(Y X

GLS

)For large samples, the distribution of u is approximatelymultivariate normal with mean vector 0 and covariancematrix

GZ 1(I P )(I P )1ZG

where P = X(X 1X)X 1.

>penicillinbatchprocessyield

11A8921B8831C9741D9452A8462B7772C9282D7993A81103B87113C87123D85134A87144B92154C89164D84175A79185B81195C80205D88

Rcode&output:

>penicillin$batch=as.factor(penicillin$batch)>penicillin$process=as.factor(penicillin$process)

>##ANOVAmethod:

>options(contrast=c("contr.sum","contr.sum"))>lm1=lm(yield~process+batch,data=penicillin)>anova(lm1)AnalysisofVarianceTable

Response:yieldDf SumSq MeanSq FvaluePr(>F)

process37023.3331.23890.33866batch426466.0003.50440.04075*Residuals1222618.833Signif.codes:0***0.001**0.01*0.05.0.11

, 11.792 3.4339

, 18.833 4.3397

>#fitamixedmodelwithadditivebatch(random)andporcess (fixed)effects>library(nlme)>options(contrasts=c("contr.sum","contr.sum"))>lme2=lme(yield~process,random=~1|batch,data=penicillin)>summary(lme2)LinearmixedeffectsmodelfitbyREMLData:penicillin

AICBIClogLik118.6025123.238153.30127

Randomeffects:Formula:~1|batch

(Intercept) ResidualStdDev:3.4339 4.339739

, ,

REML produces the same estimates of variance components as the ANOVA method for balanced data

Fixedeffects:yield~processValueStd.Error DFtvaluepvalue

(Intercept)861.8165901247.341440.0000process121.680774121.189930.2571process211.680774120.594960.5629process331.680774121.784890.0996Correlation:

(Intr)prcss1prcss2process10.000process20.0000.333process30.0000.3330.333

StandardizedWithinGroupResiduals:MinQ1MedQ3Max

1.41515750.50173500.16438400.68299391.2836503

NumberofObservations:20NumberofGroups:5

>lme3=lme(yield~process,random=~1|batch,method="ML",data=penicillin)>summary(lme3)LinearmixedeffectsmodelfitbymaximumlikelihoodData:penicillin

AICBIClogLik129.2774135.251858.63868

Randomeffects:Formula:~1|batch

(Intercept) ResidualStdDev:3.071377 3.881579

DefaultmethodisREML

Fixedeffects:yield~processValueStd.Error DFtvaluepvalue

(Intercept)861.8165921247.341400.0000process121.680773121.189930.2571process211.680773120.594960.5629process331.680773121.784890.0996Correlation:

(Intr)prcss1prcss2process10.000process20.0000.333process30.0000.3330.333

StandardizedWithinGroupResiduals:MinQ1MedQ3Max

1.58219400.56095640.18378660.76361051.4351647

NumberofObservations:20NumberofGroups:5

>anova(lme2)numDF denDF Fvaluepvalue

(Intercept)1122241.2121#estimatedparametersforfixedeffects>fixed.effects(lme2)(Intercept)process1process2process3

86213

>#BLUP'sforrandomeffects>random.effects(lme2)(Intercept)14.287878822.143939430.714646541.42929295 2.8585859

>#Confidenceintervalsforfixedeffectsandestimatestandarddeviations>intervals(lme2)Approximate95%confidenceintervals

Fixedeffects:lowerest.upper

(Intercept)82.04198998689.958010process15.662091221.662091process24.662091212.662091process30.662091236.662091attr(,"label")[1]"Fixedeffects:"

RandomEffects:Level:batch

lowerest.uppersd((Intercept))1.2851973.43399.174989

Withingroupstandarderror:lowerest.upper

2.9087734.3397396.474665

>lme2$fittedfixedbatch

18488.2878828589.2878838993.2878848690.2878858481.8560668582.8560678986.8560688683.85606178481.14141188582.14141198986.1414120 8683.14141>predict(lme2)

1111222288.2878889.2878893.2878890.2878881.8560682.8560686.8560683.85606

3333444483.2853584.2853588.2853585.2853585.4292986.4292990.4292987.42929

555581.1414182.1414186.1414183.14141

>lme2$residfixedbatch

150.7121212231.2878788383.7121212483.7121212502.1439394685.8560606735.1439394874.85606061752.14141411841.14141411996.141414120 24.8585859attr(,"std")[1]4.3397394.3397394.3397394.3397394.3397394.3397394.3397394.339739[9]4.3397394.3397394.3397394.3397394.3397394.3397394.3397394.339739[17]4.3397394.3397394.3397394.339739

>#CreateResidualplots>plot(lme2$fitted[,2],lme2$resid[,2],xlab="EstimatedMeans",ylab="Residuals",main="ResidualPlot",pch=16,cex=1.3)>abline(h=0,lty=2,lwd=2)>#Normalprobabilityplotforresiduals>qqnorm(lme2$resid,pch=16,main="NormalProbabilityPlotforResiduals")>qqline(lme2$resid,lty=2)

82 84 86 88 90 92

-6-4

-20

24

6

Residual Plot

Estimated Means

Res

idua

ls

-2 -1 0 1 2

-50

5

Normal Probability Plot for Residuals

Theoretical Quantiles

Sam

ple

Qua

ntile

s
chapter7chapter7_part2Chapt7_R

Documents

Chapter7.pdf