Upload
nawarajpokhrel
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Mixed Linear Models
Basic model:Y = X + Zu +
X is a n p model matrix of known constants is a p 1 vector of fixed unknown parameter valuesZ is a n q model matrix of known constantsu is a q 1 random vector is a n 1 vector of random errors with
E() = 0 V ar() = R
E(u) = 0 V ar(u) = G
Cov(,u) = 0
E(Y ) = X
V ar(Y ) = ZGZ +R
Normal-theory mixed model[u
] N
([00
],
[G 00 R
])Then,
Y N (X, ZGZ +R)Y N (X,)
Example 1: Random Blocks
Comparison of four processes for producing penicillin
Process AProcess BProcess CProcess D
}fixed treatment levels
of a design factor
Blocks correspond to different batches of an importantraw material, corn steep liquor
Random sample of five batches
Split each batch into four parts:- run each process on one part- randomize the order in which the processes are runwithin each batch
Here, batch effects are considered as random block effects:
Batches are sampled from a population of manypossible batches
Need to use a different set of batches of raw materialto repeat this experiment
Model:
Yij = + i + j + eij
Overall mean: Fixed treatment effect: i, for i = 1, , a = 4Random block (batch) effect:
j N(0, 2), j = 1, , b = 5Random error: ij N(0, 2 )and any ij is independent of any j
Mean yield for the i-th process, averaging across the entirepopulation of possible batches: i = + i
Variance-covariance structure:
V ar(Yij) = V ar(j + ij)
= 2 + 2
Different runs on the same batch:
Cov(Yij, Ykj)
= Cov(j + ij, j + kj)
= V ar(j) + Cov(j, kj) + Cov(ij, j) + Cov(ij, kj)
= 2 for all i 6= k
Correlation among observations for runs on the same batch:
=Cov(Yij, Ykj)V ar(Yij)V ar(Ykj)
=2
2 + 2
for i 6= k
Results for runs on different batches are uncorrelated(independent):
Cov(Yij, Ykl) = 0 for j 6= l
Results from the four runs on a single batch:
V ar
Y1jY2jY3jY4j
=2 +
2
2
2
2
2 2 +
2
2
2
2 2
2 +
2
2
2 2
2
2 +
2
= 2Ja +
2 Ia
G = V ar(u) = 2Ibb
R = V ar() = 2 Iabab
V ar(Y ) = ZGZ +R
= 2ZZ + 2 I
=
2Ja +
2 Ia
2Ja + 2 Ia
. . .
2Ja + 2 Ia
Example 2: Subsampling Random Effects Model
Analysis of sources of variation in a process used tomonitor the production of a pigment paste.
Current Procedure:
Sample barrels of pigment paste
One sample from each barrel
Send the sample to a lab for determination of moisturecontent
Measured Response: (Y ) moisture content of the pigmentpaste (units of one tenth of 1%).
Problem: Variation in moisture content is too large
Data Collection: Subsampling (or nested) Study Design
Sample b = 15 barrels of pigment paste
s = 2 samples are taken from the content of each barrel
Each sample is mixed and divided into r = 2 parts.
Each part is sent to the lab for measuring moisturecontent
There are n = bsr = 60 observations.
Yijk = + i + ij + ijk
Yijk is the moisture content determination for the k-th partof the j-th sample from the i-th barrel is the mean moisture contenti is a random barrel effect: i N(0, 2), i = 1, . . . , bij is a random sample effect: ij N(0, 2 ), j = 1, , sijk is the random measurement error:ijk N(0, 2 ), k = 1, , r
Variance-covariance structure:
V ar(Yijk) = V ar(i) + V ar(ij) + V ar(ijk)
= 2 + 2 +
2
Two parts of one sample:
Cov(Yijk, Yijk) = Cov(i, i) + Cov(ij, ij)
= 2 + 2 for k 6= k
Observations on different samples taken from the samebarrel:
Cov(Yijk, Yijk) = Cov(i, i)
= 2 for j 6= j
Observations from different barrels:
Cov(Yijk, Yijk) = 0; for i 6= i
R = V ar() = 2 Ibsr
G = V ar(u) =
[2Ib 0
0 2Ibs
]Then
E(Y ) = X = 1
V ar(Y ) = ZGZ +R
= 2(Ib Jsr) + 2 (Ibs Jr) + 2 Ibsr
Note that Z = [Ib 1sr|Ibs 1r]
Analysis of Mixed Linear Models:
Inferences about estimable functions of fixed effects
point estimatesconfidence intervalstests of hypotheses
Estimation of variance components
Predictions of random effects (BLUP)
Predictions of future observations
Estimation methods:
Ordinary Least Squares (OLS) Estimation:
Solutions OLS = (XX)X Y
The Gauss-Markov Theorem cannot be applied sinceV ar(Y ) = ZGZ +R 6= 2IThe OLS estimator is not necessarily a BLUEThe OLS estimator for C isC = C (X X)X Y N (C , C (X X)X (ZGZ +R)X(X X)C)
Generalized Least Squares (GLS) estimation:
Solutions GLS = (X1X)X 1Y where
= ZGZ +RFor any estimable function C , the unique BLUE isC GLS = C
(X 1X)X 1Y N(C , C (X 1X)C
)When G and/or R contain unknown parameters, youcould obtain an approximate BLUE by replacingthe unknown parameters with consistent estimators toobtain = ZGZ + R andC
GLS = C
(X 1X)X 1Y
C GLS is not a linear function of Y
C GLS is not a BLUE
For large samplesC
GLS
N (C , C (X 1X)C)C (X 1X)C tends to underestimate
V ar(C
GLS
)
Variance component estimation
Estimation of parameters in G and R
Crucial to the estimation of estimable functions offixed effects
Of interest in its own right (source of variation)
Basic approaches:
1 ANOVA methods (method of moments):- set observed values of mean squares equal to theirexpectations and solve the resulting equations
2 Maximum likelihood estimation (ML)
3 Restricted maximum likelihood estimation (REML)
ANOVA method (Method of moments):
Compute an ANOVA table
Equate mean squares to their expected values
Solve the resulting equations
Example 1: Penicillin production
Yij = + i + j + ij
with j N(0, 2) and ij N(0, 2 )i = 1, , a and j = 1, , b
ANOVA table:
Source d.f. Sum of Squares
Blocks b 1 = 4 SSblocks = ab
j=1(Yj Y)2Processes a 1 = 3 SSprocesses = b
ai=1(Yi Y)2
Error (a 1)(b 1) SSE == 12
ai=1
bj=1(Yij Yi Yj + Y)2
C. Total ab 1 = 19a
i=1
bj=1(Yij Y)2
MSE =SSE
(a 1)(b 1)E(MSE) = 2
Then an unbiased estimator for is
=MSE
MSblocks =SSblocksb 1
E(MSblocks) = 2 + a
2
Then,
2 =E(MSblocks) E(MSE)
a
An unbiased estimator for 2 is
2 =MSblocks MSE
a
Also,V ar(Yij) =
2 +
2
Inference about treatment means:
Consider the sample mean:
Yi =1
b
bj=1
Yij
withE(Yi) = + i
for random block effects j N(0, 2) and
V ar(Yi) =1
b
(2 +
2
)
S2Yi =1
b
(2 +
2
)=
a 1ab
MSE +1
abMSblocks
=1
ab(b 1)[SSE + SSblocks]
whereSSE 22(a1)(b1)
SSblocks (2 + a
2
)2(b1)
Cochran Satterthwaite approximation
Suppose MS1,MS2, ,MSk are mean squares withindependent distributions
degrees of freedom = dfi(dfi)MSiE(MSi)
2dfi
Then, for positive constants ai > 0, i = 1, 2, , k, thedistribution of
S2 = a1MS1 + a2MS2 + + akMSk
is approximated byvS2
E(S2)
2v
where
v =[E(S2)]2
[a1E(MS1)]2
df1+ + [akE(MSk)]2
dfk
is the value for the degrees of freedom.
In practice, the degrees of freedom are evaluated as
v =[S2]2
[a1MS1]2
df1+ + [akMSk]2
dfk
These are called the Cochran-Satterthwaite degrees offreedom.
An approximate (1 ) 100% confidence interval for+ i is
Yi tv,/2
a 1ab
MSE +1
abMSblocks
where v =[a1
abMSE+ 1
abMSblocks]
2
(a1ab
MSE)2
(a1)(b1) +( 1ab
MSblocks)2
b1
Difference between two means:
E(Yi Yk
)= E
(1
b
bj=1
(Yij Ykj)
)
= i k +1
b
bj=1
E(ij kj)
= i k= (+ i) (+ k)
V ar(Yi Yk
)= V ar
(i k +
1
b
bj=1
(ij kj)
)
=1
b2
bj=1
V ar(ij kj)
=22b
The standard error for Yi Yk is
SYiYk =
2MSE
b
A (1 ) 100% confidence interval for i k is
Yi Yk t(a1)(b1),/2
2MSE
b
T-test about H0 : i k = 0: Reject H0 if
|t| = |Yi Yk|2MSEb
> t(a1)(b1),/2
Properties of ANOVA methods for variance componentestimation:
Broad applicability- easy to compute in balanced cases- ANOVA is widely known
May produce negative estimates of variances
Can produce the same estimates of variances as REML- in simple balanced cases- when ANOVA estimates of variance components areinside the parameter space
If MS1,MS2, ,MSk are distributed independently with
dfiMSiE(MSi)
2dfi
and constants ai > 0, i = 1, 2, , k, are selected so that
2 =ki
aiMSi
has expectation 2, then
V ar(2) = 2ki
a2i [E(MSi)]2
dfi
and an unbiased estimator of this variance is
V ar(2) =ki
2a2iMS2i
(dfi + 2)
Using the Cochran-Satterthwaite approximation, anapproximate (1 ) 100% confidence interval for 2could be constructed as
1 .= Pr{2v,1/2 v2
2 2v,/2}
= Pr{ v2
2v,/2 2 v
2
2v,1/2}
where 2 =k
i aiMSi and v =[k
i aiMSi]2k
i=1[aiMSi]
2
dfi
Maximum Likelihood Estimation:
The model must include a specification of the jointdistribution of the observations
Find the parameter values that maximize thelikelihood of the observed data
Example: Normal theory Gauss-Markov model:
Y = X + , N(0, 2I
)Y N
(X, 2I
)
The likelihood function is
L(, 2;Y) =1
(2)n/2ne{
122
(YX)(YX)}
Need to find values of and 2 that maximize thislikelihood function
This is equivalent to finding values of and 2 thatmaximize the log-likelihood
The log-likelihood function is
l(, 2;Y) = log(L(, 2;Y)
)= n
2log(2) n
2log(2)
122
(Y X)(Y X)
Solve the likelihood equations:
0 =1
2(X X X Y)
0 = n22
+1
2(2)2(Y X)(Y X)
Solution:
ML = OLS = (XX)X Y
2ML =1
n(Y XML)(Y XML)
=1
nY (I PX)Y =
1
nSSE
2ML =1nSSE is a biased estimator for 2
2OLS =1
nrank(X)SSE is an unbiased estimator for 2
The two are asymptotically equivalent
Normal-theory Aitken model:
Y = X + , N(0, 2H
)H is a known positive definite matrix.
The multivariate normal likelihood function is
L(, 2;Y) =1
(22)n/2|H|1/2e
122
(YX)H1(YX)
The log-likelihood function is
l(, 2;Y) = n2log(2) 1
2log(|H|) n
2log(2)
122
(Y X)H1(Y X)
For any value of , the log-likelihood is maximized byfinding a that minimizes
(Y X)H1(Y X)
The estimating equations are(X H1X
) = X H1Y
Solutions are of the form
ML = GLS =(X H1X
)X H1Y
When H is known, the MLE for is also the generalizedleast squares estimator.
The additional estimating equation corresponding to 2 is
0 = n22
+1
2(2)2(Y X)H1(Y X)
Substituting the solution to the other estimating equationsfor , the solution is
2ML =1
n(Y XGLS)H1(Y XGLS)
This is a biased estimator for 2.
When H contains unknown parameters:
We could maximize the log-likelihood with respect to, 2, and the parameters in H
There may be no algebraic formulas for the solutionsto the joint likelihood equations
The MLEs for 2 and the parameters in H are usuallybiased (too small)
Restricted (Residual) maximum likelihood (REML)estimates are often used
Consider the mixed linear model
Y = X + Zu + [u
] N
([00
],
[G 00 R
])Then,
Y N (X,)
where = ZGZ +R.
Multivariate normal likelihood:
L(,;Y) = (2)n/2||1/2 exp{12
(YX)1(YX)}
The log-likelihood function is
l(,;Y) = n2log(2) 1
2log(||)
12
(Y X)1(Y X)
Given the values of the observed responses, Y, find values and that maximize the log-likelihood function.
This is a difficult computational problem:
no analytic solution except in some balanced cases
use iterative numerical methods- Need starting values (initial guesses at the values of and = ZGZ + R- local or global maxima?- what if becomes singular or is not positive definite?
constrained optimization- estimates of variances cannot be negative
estimates of variance components tend to be small
Consider a simple case when Y1, , Yn N(, 2).
2OLS =1
n 1
nj=1
(Yj Y
)2is an unbiased estimator for 2. While the MLE for 2 is
2ML =1
n
nj=1
(Yj Y
)2with E(2ML) =
(n1n
)2 < 2.
Note that 2OLS and 2ML are based on error contrasts
e1 = Y1 Y =(n 1n
, 1n, , 1
n
)Y
...
en = Yn Y =( 1n, 1
n, , 1
n,n 1n
)Y
whose distribution does not depend on = E(Yj).
When Y N(1, 2I),
e =
e1...en
= (I P1)Y N (0, 2(I P1))
The MLE 2ML =1n
nj=1 e
2j fails to acknowledge that e
is restricted to an (n 1)dimensional space, i.e.,j ej = 0
The MLE fails to make the appropriate adjustment indegrees of freedom needed to obtain an unbiasedestimator for 2
Example: Suppose n = 4 and Y N(14, 2I4). Then
e =
Y1 YY2 YY3 YY4 Y
= (I P1)Y N (0, 2(I P1)).Note that the covariance matrix (I P1) is singular andm = rank(I P1) = n 1 = 3.
Definer = Me = M(I PX)Y
where
M =
1 1 1 11 1 1 11 1 1 1
has row rank equal to m = rank(I PX). Then
r =
r1r2r3
=Y1 + Y2 Y3 Y4Y1 Y2 + Y3 Y4Y1 Y2 Y3 + Y4
= M(I P1)Y N
(0, 2M(I P1)M
)
Let W = M(I P1)M , and define restricted likelihoodfunction:
L(2; r) =1
(2)m/2|2W |1/2e
122
rW1r
Restricted log-likelihood function:
l(2; r) = m2log(2) m
2log(2)
12log(|W |) 1
22rW1r
(Restricted) likelihood equation:
0 =l(2; r)
2= m
22+
1
2(2)2rW1r
Solution (REML estimator for 2):
2REML =1
mrW1r
=1
mY(I P1)Y
=1
n 1
nj=1
(Yj Y )2 = 2OLS
REML (Restricted Maximum Likelihood)estimation
Estimate parameters in = ZGZ +R by maximizingthe part of the likelihood that does not depend onE(Y ) = X
Maximize a likelihood function for error contrasts
linear combinations of observations that do notdepend on XFind a set of n rank(X) linearly independent errorcontrasts
Mixed (normal-theory) model:
Y = X + Zu +
where
[u
] N
([00
],
[G 00 R
]). Then
LY = L (X + Zu + )
= LX + LZu + L
is invariant to X if and only if LX = 0, if and only if
L = M(I PX)
for some M .
To avoid losing information we must have
row rank(M) = n rank(X) = n r
Then a set of n rank(X) error contrasts is
r = M(I PX)Y Nnp
(0,M(I PX)1(I PX)M
)
Define W = M(I PX)1(I PX)M , whererank(W ) = n rank(X) and W1 exists.
The restricted likelihood function is
L(; r) =1
(2)(nr)/2|W |1/2e
122
rW1r
The restricted log-likelihood function is
l(; r) = (n r)2
log(2) 12log(|W |) 1
22rW1r
For any M(nr)n with row rank equal ton r = n rank(X), the log-likelihood can be expressed interms of
e =(I X(X1X )X 1
)Y
as
l(; e) = constant 12log(||)
12log(|X 1X |)
1
2e1e
where X is any set of r = rank(X) linearly independentcolumns of X
An MLE for the vector of unknown variancecomponents in can be found in the general case usingnumerical methods to obtain the REML estimates
It can be proved that every set of n r linearlyindependent error contrasts yield s the same REMLestimates of the variance components
For the normal theory Gauss-Markov linear model,Y N (X, 2I), the REML estimator of 2 is2 = Y(I PX)Y/(n r), which is the same as theunbiased OLS estimator
For linear mixed effects models, the REML estimatorsof variance components produce the same estimates asthe unbiased ANOVA-based estimators formed bytaking appropriate linear combinations of mean squareswhen the latter are positive and data are balanced
Denote the resulting REML estimators as
G, R, and = ZGZ + G
Estimation of fixed effects:By using the REML estimator for , an approximationof the BLUE is
C = C (X1X )X 1Y
Prediction of random effects: Given the observedresponses Y, predict the value of u
For the mixed model with
[u
] N
([00
],
[G 00 R
]),
[uY
]=
[u
X + Zu +
]=
[0X
]+
[I 0Z I
] [u
] N
([0X
],
[G GZ
ZG ZGZ +R
])
The Best Linear Unbiased Predictor (BLUP) is the BLUEfor
E(u|Y) = E(u) + (GZ ) (ZGZ +R)1 (Y E(Y))= 0 + (GZ ) (ZGZ +R)
1(Y X)
Then, the BLUP for u is
BLUP (u) = (GZ )1(Y XGLS
)= (GZ )1
(I X(X1X )X 1
)Y
when G and = ZGZ +R are known.
Substituting REML estimators G and R for G and R, anapproximate BLUP for u is
u = (GZ )1(I X(X1X )X 1
)Y
= (GZ )1(Y X
GLS
)For large samples, the distribution of u is approximatelymultivariate normal with mean vector 0 and covariancematrix
GZ 1(I P )(I P )1ZG
where P = X(X 1X)X 1.
>penicillinbatchprocessyield
11A8921B8831C9741D9452A8462B7772C9282D7993A81103B87113C87123D85134A87144B92154C89164D84175A79185B81195C80205D88
Rcode&output:
>penicillin$batch=as.factor(penicillin$batch)>penicillin$process=as.factor(penicillin$process)
>##ANOVAmethod:
>options(contrast=c("contr.sum","contr.sum"))>lm1=lm(yield~process+batch,data=penicillin)>anova(lm1)AnalysisofVarianceTable
Response:yieldDf SumSq MeanSq FvaluePr(>F)
process37023.3331.23890.33866batch426466.0003.50440.04075*Residuals1222618.833Signif.codes:0***0.001**0.01*0.05.0.11
, 11.792 3.4339
, 18.833 4.3397
>#fitamixedmodelwithadditivebatch(random)andporcess (fixed)effects>library(nlme)>options(contrasts=c("contr.sum","contr.sum"))>lme2=lme(yield~process,random=~1|batch,data=penicillin)>summary(lme2)LinearmixedeffectsmodelfitbyREMLData:penicillin
AICBIClogLik118.6025123.238153.30127
Randomeffects:Formula:~1|batch
(Intercept) ResidualStdDev:3.4339 4.339739
, ,
REML produces the same estimates of variance components as the ANOVA method for balanced data
Fixedeffects:yield~processValueStd.Error DFtvaluepvalue
(Intercept)861.8165901247.341440.0000process121.680774121.189930.2571process211.680774120.594960.5629process331.680774121.784890.0996Correlation:
(Intr)prcss1prcss2process10.000process20.0000.333process30.0000.3330.333
StandardizedWithinGroupResiduals:MinQ1MedQ3Max
1.41515750.50173500.16438400.68299391.2836503
NumberofObservations:20NumberofGroups:5
>lme3=lme(yield~process,random=~1|batch,method="ML",data=penicillin)>summary(lme3)LinearmixedeffectsmodelfitbymaximumlikelihoodData:penicillin
AICBIClogLik129.2774135.251858.63868
Randomeffects:Formula:~1|batch
(Intercept) ResidualStdDev:3.071377 3.881579
DefaultmethodisREML
Fixedeffects:yield~processValueStd.Error DFtvaluepvalue
(Intercept)861.8165921247.341400.0000process121.680773121.189930.2571process211.680773120.594960.5629process331.680773121.784890.0996Correlation:
(Intr)prcss1prcss2process10.000process20.0000.333process30.0000.3330.333
StandardizedWithinGroupResiduals:MinQ1MedQ3Max
1.58219400.56095640.18378660.76361051.4351647
NumberofObservations:20NumberofGroups:5
>anova(lme2)numDF denDF Fvaluepvalue
(Intercept)1122241.2121#estimatedparametersforfixedeffects>fixed.effects(lme2)(Intercept)process1process2process3
86213
>#BLUP'sforrandomeffects>random.effects(lme2)(Intercept)14.287878822.143939430.714646541.42929295 2.8585859
>#Confidenceintervalsforfixedeffectsandestimatestandarddeviations>intervals(lme2)Approximate95%confidenceintervals
Fixedeffects:lowerest.upper
(Intercept)82.04198998689.958010process15.662091221.662091process24.662091212.662091process30.662091236.662091attr(,"label")[1]"Fixedeffects:"
RandomEffects:Level:batch
lowerest.uppersd((Intercept))1.2851973.43399.174989
Withingroupstandarderror:lowerest.upper
2.9087734.3397396.474665
>lme2$fittedfixedbatch
18488.2878828589.2878838993.2878848690.2878858481.8560668582.8560678986.8560688683.85606178481.14141188582.14141198986.1414120 8683.14141>predict(lme2)
1111222288.2878889.2878893.2878890.2878881.8560682.8560686.8560683.85606
3333444483.2853584.2853588.2853585.2853585.4292986.4292990.4292987.42929
555581.1414182.1414186.1414183.14141
>lme2$residfixedbatch
150.7121212231.2878788383.7121212483.7121212502.1439394685.8560606735.1439394874.85606061752.14141411841.14141411996.141414120 24.8585859attr(,"std")[1]4.3397394.3397394.3397394.3397394.3397394.3397394.3397394.339739[9]4.3397394.3397394.3397394.3397394.3397394.3397394.3397394.339739[17]4.3397394.3397394.3397394.339739
>#CreateResidualplots>plot(lme2$fitted[,2],lme2$resid[,2],xlab="EstimatedMeans",ylab="Residuals",main="ResidualPlot",pch=16,cex=1.3)>abline(h=0,lty=2,lwd=2)>#Normalprobabilityplotforresiduals>qqnorm(lme2$resid,pch=16,main="NormalProbabilityPlotforResiduals")>qqline(lme2$resid,lty=2)
82 84 86 88 90 92
-6-4
-20
24
6
Residual Plot
Estimated Means
Res
idua
ls
-2 -1 0 1 2
-50
5
Normal Probability Plot for Residuals
Theoretical Quantiles
Sam
ple
Qua
ntile
s
chapter7chapter7_part2Chapt7_R