Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
The Hierarchical Linear Model560 Hierarchical modeling
Peter Hoff
Statistics, University of Washington
1/1
Heterogeneity of β̂j ’s
β̂j = (XTj Xj)
−1XTj yj
hist(BETA.OLS[,1]) hist(BETA.OLS[,2])
intercept
20 30 40 50 60 70 80 90
050
100
150
200
250
300
slope
−10 0 10 20
050
100
150
200
250
300
2/1
Heterogeneity of β̂j ’s
plot(BETA.OLS)
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
● ●
●
●●
●●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●●
●●
●●
● ●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
● ●●●
●
●
●
●
● ●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●● ●
●●●
●●
●
●
● ●
●
●●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
● ●●
●●
●
●
●
●
●
●
●●
●
●
●●
●● ●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●●
●
●● ●
●
●
●●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
● ●●
●
●
●●
●
●
●●
●
●
● ●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●●
● ●
●
● ●
●●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
● ●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
● ●
●●
●●
●
● ●●
●
● ●
●●●
●●
● ●
●
●
●
●
●●
●
●
●
●
●
●●
●●●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●●
●
●●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
● ●
●
● ●
●●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●● ●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
30 40 50 60 70 80 90
−10
010
20
intercept
slop
e
Var[β̂j ] = σ2(XTj Xj)
−1
3/1
Heterogeneity as a function of sample size
30 40 50 60 70 80 90
−10
010
20
intercept
slop
e
3015
122913
22
1824
3216 22
12
29
27
182327
24
2525 16
16
27251726
2817
5
15
8
24
16
3022
28
28 25
26
2931
27272916
3017
1714
142624
18
2016
1116
20
29
19
21
19
22
2011
22
25
27
2515
22
19
12
21
23
1823
31
25
1116
1326
18
21
2116
26
16
2821
28231427
16
2017
2522
16
3224
20
2312
11
16
29
32
21
18
1715
12
23
24
111213
27
222217
1517
7
182030
1418 1821
19
15
14
30
3631
22
27
274
23
11
18
122622
23
1522
22
9
146
9
12
19
1424
1422
14
24
1725 19271919
2017
19
1819 24
232114 1710
8 12
19
17
9
28
29
12
15
16
10
2126
1127
191426
12
1118
1921 1626
2025
31
1221
23
22
171613
11
23
1418
1723 12
17
10
232114
20
20
16
1426
2012
22
821
18 27
19 1924
9
22
12
1520
2519
12
21
19
18
24
14
229
20
1813
13
1218
8
18
15
26
14
14
26162717
2315
25
16
17
7
12
2916
18
1420
16 16
19
1725
16 13
1919 21
20
12
8 261725
16
20 22
2223 11
22
11
1210
6
1822
17
3019
18
7
2325 21
2421
28
15
17
14
17
18
1814
1719
23
2523
24
21
25
17
27 24 29
18
31
26
2019
18
29
18
20
26
2532
24
17
14
2028
26
24
29
311922
2
26
27
1620
1815
2322
19 13
1622
1715
47
261317
21
18
15
621
8
16
10
17
11
106 10
2110
28
1717
22
2522
14
11
18
171319
17
29
18
25
23
201320
24 15 1931
14
17 2615
2013
18
13
1621 32
5
1720
1916
23
14
1716
17
15
2215
13
1823202213
24
21 22
1518
17
22
21
2311
242113
2310
12
13
2018
1719
2919
19
9
24
20
19
21
7182024 25
2024 22
1624
18
231714
1419
14
2211
10
12
21
13
17
16
18
1815
1816222120 16
3019
19222320 169
19
23 19222212
23 2318 18
17
2830
18
2524
2020
15
16
14
17 17
102529
193212
2618
1415
1218
23
13
18
2126
11
11
34
20
2628 24
17
1512
28 2514151616
19
17
22
1820
22141720
16
2829
13
16
27
242621
28
16
25
12
3148
2
16171530
143
68
32313 1320
46 27
4 9
11
102724
19
920
2822
50
4
29
17
8
2
189
152324
27
24
21
3
2215
11 18 2424
14
1727
1019 15
39
3
2916
20
25 13
28
13
5
18
15
2411
22
6
1513
13
Var[β̂j ] = σ2(XTj Xj)
−1
4/1
Modeling heterogeneity
In the hierarchical normal model:
θj = {µj , σ2}
yi,j = µj + σ2, {εi,j} ∼ i.i.d normal(µj , σ2)
µ1, . . . , µm ∼ i.i.d. normal (µ, τ 2)
What should we do for a hierarchical regression model?
θj = {βj , σ2}
yi,j = βTj xi,j + εi,j , {εi,j} ∼ i.i.d. normal(0, σ2)
β1, . . . ,βm ∼ i.i.d. p(βj)
5/1
HLM
MVN model for across-group heterogeneity:
β1, . . . ,βm ∼ i.i.d. multivariate normal(β,Σβ)
The parameters in this model include
β, an across-group mean regression vector
Σβ , a covariance matrix describing the variability of the βj ’s around β.
6/1
Ad-hoc estimates
## rough estimate of betaapply(BETA.OLS,2,mean,na.rm=TRUE)
## (Intercept) xj## 50.618228 3.672483
This estimator of β equally weights all schools.Generally, we want to assign a lower weight to schools with less data.
## rough estimate of Sigma_betacov(BETA.OLS,use="complete.obs")
## (Intercept) xj## (Intercept) 26.795851 1.001585## xj 1.001585 15.818939
This is a very rough estimate of Σβ :
• It ignores sample size differences;• It ignores the variability of β̂j around βj .
Var[β̂j ’s around β̂ ] ≈ Var[βj ’s around β ] + Var[β̂j ’s around βj ’s ]
Sample covariance of β̂j ’s ≈ Σβ + Estimation error
7/1
Fixed and random effects
Recall the following:
µj ∼ N(µ, τ 2)⇔ µj = µ+ aj , aj ∼ N(0, τ 2)
Analogously,
βj ∼ N(β,Σβ)⇔ βj = β + bj , bj ∼ N(0,Σβ)
Therefore, our hierarchical model says that
yj = Xjβj + εj
= Xj(β + bj) + εj
= Xjβ + Xjbj + εj
• β is sometimes called a fixed effect, as it is fixed across all groups.
• bj is sometimes called a random effect“random” as it varies across groups, or“random” if the groups were randomly sampled.
A model with fixed and random effects is called a mixed-effects model.
8/1
Within-group covariance
Recall the HNM:yi,j = µ+ aj + εi,j
What was the within-group covariance?
Cov[yi1,j , yi2,j ] = E[(yi,j − µ)(yi2,j − µ)]
= E[(aj + εi1,j)(aj + εi2,j)]
= E[a2j ] + 0 + 0 + 0
= τ 2
9/1
Within-group covariance, matrix form
More generally, we might want the within-group covariance matrix:
yj =
y1,j
...yn,j
Cov[yj ] =
Var[y1,j ] Cov[y1,j , y2,j ] · · · Cov[y1,j , yn,j ]
Cov[y1,j , y2,j ] Var[y2,j ] · · · Cov[y2,j , y2,j ]...
...Cov[y1,j , yn,j ] Cov[y2,j , yn,j ] · · · Var[yn,j ]
Our calculations have shown that for the HNM
Cov[yj ] =
σ2 + τ 2 τ 2 · · · τ 2
......
τ 2 τ 2 · · · σ2 + τ 2
10/1
Within-group covariance, matrix form
In general,Cov[yj ] = E[(yj − E[yj ])(yj − E[yj ])
T ]
For the HLM,yj − E[yj ] = yj − Xjβ = Xjbj + εj ,
so
Cov[yj ] = E[(Xjbj + εj)(Xjbj + εj)T ]
= E[(XjbjbTj XT
j ] + E[εjεTj ]
= XjΣβXTj + σ2I
11/1
Dependence and conditional independence
Thus p(yj |β,Σβ , σ2), unconditional on bj , is
yj ∼ multivariate normal(Xjβ,XjΣβXTj + σ2I).
On the other hand, conditional on bj ,
yj ∼ multivariate normal(Xjβ + Xjbj , σ2I).
12/1
Dependence and conditional independence
Marginal dependence: If I don’t know βj (or bj), then knowing yi1,j gives mea bit of information about βj , which in turn gives me information about yi2,j ,and so the observations are dependent: My information about yi2,j depends onthe value of yi1,j if I don’t know βj .
Conditional independence: If I know βj , then knowing yi1,j doesn’t give meany information about yi2,j , and so they are independent. My informationabout yi2,j does not depend on the value of yi1,j if I know βj .
Note: Within-group covariance can be positive or negative, depending on Xj .
13/1
Within-group covariance
Consider the case that xi,j = {1, xi,j} and βj = {β0,j , β1,j}.• Xj is nj × 2
• XjΣXTj is nj × nj , the covariances between observations within a group.
Cov[y1,j , y2,j ] = xT1,jΣx2,j
= Σ1,1 + Σ1,2(x1,j + x2,j) + Σ2,2x1,jx2,j
= Var[β0,j ] + Var[β1,j ]x1,jx2,j + Cov[β0,j , β1,j ](x1,j + x2,j)
• Intercept variance positivly correlates the observations within a group.
• Slope variance can lead to positive or negative correlation, depending onhow close x1,j and x2,j are.
14/1
Sources of variation and correlation
−2 −1 0 1 2
−4
−2
02
4
x
●●
●
●●●
●●
●
●
●●
●
●
●●
●●
●
●
●●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
● ●
●
● ●
●
●
●●●
●
●●●
●
●
●
●
●
●
●●
●
−2 −1 0 1 2
−4
−2
02
4
x
mu0 ●
●●
●●●
●●
●
●
●●
●
●
●●
●●
●
●
●●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
● ●
●
● ●
●
●
●●●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●●●●
●
● ●●●●●●
●● ●
●● ●
●●●
● ●● ● ●●
●●●
●
●
● ●
●
● ●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
−2 −1 0 1 2
−4
−2
02
4
x
mu0
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●●●●
●
● ●●●●●●
●● ●
●● ●
●●●
● ●● ● ●●
●●●
●
●
● ●
●
● ●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
● ● ●●
●
●●●●
● ●
●
●●
●●
●●
●
●
● ●●
●
●
● ●
●
● ●
●
●●
●
●
●
● ●
●●
● ●
●
●
●
●
●
●
●
●
●
15/1
Fitting a HLM
Assuming data are independent across groups, the likelihood at a value(β,Σβ , σ
2) can be computed as follows:
0. Set ll= 0.
1. Set ll= ll + ldmvnorm( y1 , X1β , X1ΣβX1 + σ2I).
2. Set ll= ll + ldmvnorm( y2 , X2β , X2ΣβX2 + σ2I).
...
m. Set ll= ll + ldmvnorm( ym , Xmβ , XmΣβXm1 + σ2I).
We can then numerically optimize the likelihood to find the MLEs.
16/1
Fitting the HLM with lmer
library(lme4)fit.lme<-lmer( y.nels ~ ses.nels + (ses.nels | g.nels),REML=FALSE)
summary(fit.lme)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: y.nels ~ ses.nels + (ses.nels | g.nels)
##
## AIC BIC logLik deviance df.resid
## 92553.1 92597.9 -46270.5 92541.1 12968
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.8910 -0.6382 0.0179 0.6669 4.4613
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## g.nels (Intercept) 12.223 3.496
## ses.nels 1.515 1.231 0.11
## Residual 67.345 8.206
## Number of obs: 12974, groups: g.nels, 684
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 50.6767 0.1551 326.7
## ses.nels 4.3594 0.1231 35.4
##
## Correlation of Fixed Effects:
## (Intr)
## ses.nels 0.007
17/1
Extracting results - fixed effects
### fixed effectsbeta.hat<-fixef(fit.lme)beta.hat
## (Intercept) ses.nels## 50.676704 4.359399
### variance-covariance of fixed effects estimatesVBETA<-vcov(fit.lme)VBETA
## 2 x 2 Matrix of class "dpoMatrix"## (Intercept) ses.nels## (Intercept) 0.0240606603 0.0001309645## ses.nels 0.0001309645 0.0151610507
### standard errorssqrt(diag(VBETA))
## [1] 0.1551150 0.1231302
### t-valuesbeta.hat/sqrt(diag(VBETA))
## (Intercept) ses.nels## 326.70410 35.40479
18/1
Extracting results - variance components
### within-group variances2.hat<-sigma(fit.lme)^2
### across-group varianceVarCorr(fit.lme)$g.nels
## (Intercept) ses.nels## (Intercept) 12.2231940 0.4887692## ses.nels 0.4887692 1.5148005## attr(,"stddev")## (Intercept) ses.nels## 3.496168 1.230772## attr(,"correlation")## (Intercept) ses.nels## (Intercept) 1.0000000 0.1135884## ses.nels 0.1135884 1.0000000
### remove the S4 uglinessVB<-matrix(VarCorr(fit.lme)$g.nels,2,2)
VB
## [,1] [,2]## [1,] 12.2231940 0.4887692## [2,] 0.4887692 1.5148005
19/1
Random effects estimates
B.LME<-as.matrix(ranef(fit.lme)$g.nels)BETA.LME<-sweep( B.LME , 2 , beta.hat, "+" )
−2 −1 0 1 2
2030
4050
6070
80
ses
mat
h sc
ore
OLS regression lines
−2 −1 0 1 2
2030
4050
6070
80
ses
mat
h sc
ore
HLM shrinkage estimates
20/1
Range of shrinkage estimates
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●●●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●●
●
●●
●●
●●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●●
●
●
●●●●
●
●
●
●
●●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●●
●
●●
●
●
●
● ●●
●
●
●
●
●
●●
●●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
● ●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●● ●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
30 40 50 60 70 80 90
4550
5560
OLS intercept
LME
inte
rcep
t
●
●●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
● ●
●●●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●●
●●
●●
●
●●
●
●
●
●
●●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●●●●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●
●●
●
●●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●●
● ●
● ●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
● ●
●
●
●
●
●
●●●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●
●●
●●
●
●
●●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
−10 0 10 203
45
6
OLS slope
MLE
slo
pe
21/1
Formula for shrinkage estimates
Intuitively:β̃j = wj β̂j + (1− wj)β̂
where wj depends on Σb and σ2(XTj Xj)
1:
• wj is big if σ2(XTj Xj)
1 small compared to Σb;
• wj is small if σ2(XTj Xj)
1 large compared to Σb.
This is almost right. The averaging has to be done using matrices:
β̃j =(
XTj Xj/σ
2 + Σ−1β
)−1 (Xjyj/σ
2 + Σ−1β β
)In practice, σ2,Σβ ,β are usually replaced with σ̂2, Σ̂β , β̂.
Quiz: How does β̃j vary with Xj , σ2 and Σβ?
22/1
Macro-level effectsLME regression estimates:
●
●●
●
●
●
●●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●●
●
●●
●
●
● ●
●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●
●
● ●
●
●
●
●●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●●●
● ●
●
●
●
●
● ●
●
●
●●
●
● ●
●●●
●●
●●
●
●●
●
●
●
●
●●
●
●
●
●● ● ●
●
●
● ●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●● ●●
●●
●
●
●● ●●
● ●
●
●
●
●
●
●●●
●
●
●●
●
● ●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●●
●●
●
●●
● ●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●● ●● ●
●
●●
●
●
●
●
●
●●
●
●
●
● ●
●
●●●
●
●●
●●
●
● ●
●
●●●
●
●● ●
●●
●
●
●
●●
●
●
●
●
●●● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
● ●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
● ● ●
●
●
● ●
●
●
●
●
●
●●
●
●
●●
●
●●
● ●
●●
●
●
●
●
●●●
●●
●
●
●
●
●●
●
●
●
●
●
●
● ●●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●● ●
●
●
● ●
●
●
● ●
●
●
●
●
●
●●●
●
●
●
●
●
●●●●
●●
●
●
●●
●●●
●
●
●
●●
●
●
●●
●●
●
●
●
●●
●
●●
●
●●
●
● ●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●●
●●●●
●●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●●
●●
●●
● ●●●
●
●
●
●
●●
●●
●
●
●●
●●
● ●●●
●●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●●
●
●● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
45 50 55 60
34
56
intercept
slop
e
Questions:
• What kind of schools have big intercepts?
• What kind of schools have big slopes?
Can we relate macro-level parameters to macro-level effects ?
23/1
Macro-level effects### FLP variableflp.school<-tapply( flp.nels , g.nels, mean)table(flp.school)
## flp.school## 1 2 3## 226 257 201
### RE and FLP associationmpar()par(mfrow=c(1,2))boxplot(BETA.LME[,1]~flp.school,col="lightblue")boxplot(BETA.LME[,2]~flp.school,col="lightblue")
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●●●
●
1 2 3
4550
5560
●
●●
●
●●
●
●
●
●
●
1 2 3
34
56
24/1
Macro-level effects
It seems that β0,j and possibly β1,j are associated with flpj .
• Testing: Is there evidence for the association?
• Estimation: What is the association?
These questions can be addressed by expanding the model:
Old model:
yi,j = β0,j + β1,j × sesi,j + εi,j
= (β0 + b0,j) + (β1 + b1,j)× sesi,j + εi,j
New model:
yi,j = β0,j + β1,j × sesi,j + εi,j
= (β0 + α0 × flpj + b0,j) + (β1 + α1 × flpj + b1,j)× sesi,j + εi,j
Note that under this model,
• The intercept for school j is β0,j = (β0 + α0 × flpj + b0,j)
• The slope for school j is β1,j = (β1 + α1 × flpj + b1,j)
(Alternatively, we could treat flpj as a categorical variable)
25/1
Macro-level fixed effects
yi,j = β0,j + β1,j × sesi,j + εi,j
= (β0 + α0 × flpj + b0,j) + (β1 + α1 × flpj + b1,j)× sesi,j + εi,j
• α0 represents the macro effect of flpj on the intercept/mean in group j
• α1 represents the macro effect of flpj on the slope with sesi,j in group j
Note: α0 and α1 do not vary across groups. If they did, they would beconfouned with b0,j and b1,j .
Note: As they are fixed across groups, they are in fact fixed effects:
26/1
Macro-level fixed effects
yi,j = (β0 + α0 × flpj + b0,j) + (β1 + α1 × flpj + b1,j)× sesi,j + εi,j
Rearranging, we get
yi,j =β0 + α0 × flpj + β1 × sesi,j + α1 × flpj × sesi,j +
b0,j + b1,j × sesi,j +
εi,j
Fixed effects regression: β0 + α0 × flpj + β1 × sesi,j + α1 × flpj × sesi,j
Random effects regression: b0,j + b1,j × sesi,j
Note:
• The covariates for the two regressions are different.
• Macro-effects do not appear in the random effects regression.
27/1
Mixed-effects model
yi,j =β0 + α0 × flpj + β1 × sesi,j + α1 × flpj × sesi,j +
b0,j + b1,j × sesi,j +
εi,j
We see the distinction between α’s and β’s is meaningless.
We rewrite the model as
yi,j = β0 + β1 × flpj + β2 × sesi,j + β3 × flpj × sesi,j +
b0,j + b1,j × sesi,j +
εi,j
=βTxi,j + bjT zi,j + εi,j
• xi,j = (1, flpj , sesi,j)
• zi,j = (1, sesi,j)
28/1
Group-level representation
Micro-level representation:
yi,j = βTxi,j + bTj zi,j + εi,j
Combining observations within a group:y1,j
...yn,j
=
x1,j →...
xn,j →
β1
...βp
+
z1,j →...
zn,j →
b1,j
...bp,j
+
ε1,j
...εn,j
Two-level HLM: General form
yj = Xjβ + Zjbj + εj
Note: This formulation allows the fixed effects predictors to be different fromthe random effects predictors.
29/1
Two-level HLM: General form
This is the general form of a two-level hierarchical linear model
yj = Xjβ + Zjbj + εj
where bj and εj are multivariate normal.
• β are the fixed effects coefficients;
• Xj is the design matrix for the fixed effects.
• bj are the random effects coefficients for group j ;
• Zj is the design matrix for the fixed effects.
30/1
Variance components
yj = Xjβ + Zjbj + εj
E
[bj
εj
]=
[00
]and Cov
[bj
εj
]=
[Ψ 00 Σ
].
Across-group heterogeneity: Ψ is the variance-covariance in b1, . . . , bm.
Within-group heterogeneity: Σ is the variance-covariance of y1,j , . . . , ynj ,j .
Note: We should write Σj instead of Σ, as
Cov[yj ] = Cov[εj ] = Σj is an nj × nj matrix.
Note: In the examples so far,
Σj = σ2Inj .
Question: What other forms for Σj might be useful?
31/1
Example: One-way random effects model, aka the HNM
yi,j = µ+ aj + εi,j
{aj} ∼ iid N(0, τ 2)
{εi,j} ∼ iid N(0, σ2)
Exercise: Express this model as yj = Xjβ + Zjbj + εj
• Regression parameters:β = µ , bj = aj
• Design matrices:
Xj = Zj =
1...1
for each j ∈ {1, . . . ,m}
• Covariance terms:Ψ = Var[aj ] = τ 2 , Σ = σ2I
Exercise: Check your work by going in reverse.32/1
Example: One-way random effects model, aka the HNM
fit.0<-lmer(y.nels~ 1 + (1|g.nels), REML=FALSE)
summary(fit.0)
## Linear mixed model fit by maximum likelihood ['lmerMod']## Formula: y.nels ~ 1 + (1 | g.nels)#### AIC BIC logLik deviance df.resid## 93919.3 93941.7 -46956.6 93913.3 12971#### Scaled residuals:## Min 1Q Median 3Q Max## -3.8112 -0.6534 0.0093 0.6732 4.6999#### Random effects:## Groups Name Variance Std.Dev.## g.nels (Intercept) 23.63 4.861## Residual 73.71 8.585## Number of obs: 12974, groups: g.nels, 684#### Fixed effects:## Estimate Std. Error t value## (Intercept) 50.9391 0.2026 251.4
33/1
Group-specific linear regression
yi,j = βTxi,j + bTj xi,j + εi,j
{bj} ∼ iid N(0,Ψ)
{εi,j} ∼ iid N(0, σ2)
Exercise: Express this model as yj = Xjβ + Zjbj + εj
• Design matrices:
Xj = Zj =
x1,j →...
xnj ,j →
for each j ∈ {1, . . . ,m}
• Regression parameters:β = β , bj = bj
• Covariance terms:Ψ = Cov[bj ], Σ = σ2I
This is just a special case where Xj = Zj .34/1
Group-specific linear regression
fit.1<-lmer(y.nels~ ses.nels + (ses.nels|g.nels), REML=FALSE)
summary(fit.1)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: y.nels ~ ses.nels + (ses.nels | g.nels)
##
## AIC BIC logLik deviance df.resid
## 92553.1 92597.9 -46270.5 92541.1 12968
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.8910 -0.6382 0.0179 0.6669 4.4613
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## g.nels (Intercept) 12.223 3.496
## ses.nels 1.515 1.231 0.11
## Residual 67.345 8.206
## Number of obs: 12974, groups: g.nels, 684
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 50.6767 0.1551 326.7
## ses.nels 4.3594 0.1231 35.4
##
## Correlation of Fixed Effects:
## (Intr)
## ses.nels 0.007
35/1
General LME
yi,j = βTxi,j + bTj zi,j + εi,j
{bj} ∼ iid N(0,Ψ)
{εj} ∼ iid N(0,Σ)∗
* modulo different sample sizes.
Review of benefits of model extension:
• Group-specific regressors should appear in Xj but not Zj ;
• If {bk,1, . . . , bk,m} shows little variability (ψk,k small), we may want toremove xi,j,k from the random effects model, and include it as a fixedeffect only.
• Within-group covariances other than Σ = σ2I might be useful:• Σ with temporal correlation for longitudinal/panel data;• Unrestricted Σ for correlation but unordered outcomes (teeth, eg.)
36/1
General LMEfit.2<-lmer(y.nels~ flp.nels + ses.nels + flp.nels*ses.nels + (ses.nels | g.nels), REML=FALSE)
summary(fit.2)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: y.nels ~ flp.nels + ses.nels + flp.nels * ses.nels + (ses.nels |
## g.nels)
##
## AIC BIC logLik deviance df.resid
## 92396.3 92456.0 -46190.1 92380.3 12966
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.9773 -0.6417 0.0201 0.6659 4.5202
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## g.nels (Intercept) 9.012 3.002
## ses.nels 1.572 1.254 0.06
## Residual 67.260 8.201
## Number of obs: 12974, groups: g.nels, 684
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 55.3975 0.3860 143.52
## flp.nels -2.4062 0.1819 -13.23
## ses.nels 4.4909 0.3327 13.50
## flp.nels:ses.nels -0.1931 0.1587 -1.22
##
## Correlation of Fixed Effects:
## (Intr) flp.nl ss.nls
## flp.nels -0.930
## ses.nels -0.158 0.088
## flp.nls:ss. 0.086 -0.007 -0.92637/1