The Hierarchical Linear Model - Duke University

The Hierarchical Linear Model560 Hierarchical modeling

Peter Hoff

Statistics, University of Washington

1/1

Heterogeneity of β̂j ’s

β̂j = (XTj Xj)

−1XTj yj

hist(BETA.OLS[,1]) hist(BETA.OLS[,2])

intercept

20 30 40 50 60 70 80 90

050

100

150

200

250

300

slope

−10 0 10 20

050

100

150

200

250

300

2/1

Heterogeneity of β̂j ’s

plot(BETA.OLS)

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

● ●

●

●●

●●

●

●

●

●

●

●

●

●●

●

● ●

●

●

●

●●

●●

●●

● ●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

● ●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

● ●●●

●

●

●

●

● ●

●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●● ●

●●●

●●

●

●

● ●

●

●●

●●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●●

●

● ●●

●●

●

●

●

●

●

●

●●

●

●

●●

●● ●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

● ●

●●

●

●

●

●

●

●●●

●

●

●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●● ●

●●

●

●

●

●

●

●

●●

●

●

●

● ●

●

●

●●

●

●● ●

●

●

●●

●

●

●

● ●

●

● ●

●

●

●

●

●

●

●●

●●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

● ●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●● ●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●●

● ●●

●

●

●●

●

●

●●

●

●

● ●

●

●●

●

●

●

●

●●

●

●

●

●

●

●●●

● ●

●

● ●

●●

●

●

●

●

●

●

●●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

● ●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

● ●

●●

●●

●

● ●●

●

● ●

●●●

●●

● ●

●

●

●

●

●●

●

●

●

●

●

●●

●●●

●●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●●

●

●●●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●●

● ●

●

● ●

●●

●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●● ●

●

●

●

●

●

● ●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

30 40 50 60 70 80 90

−10

010

20

intercept

slop

e

Var[β̂j ] = σ2(XTj Xj)

−1

3/1

Heterogeneity as a function of sample size

30 40 50 60 70 80 90

−10

010

20

intercept

slop

e

3015

122913

22

1824

3216 22

12

29

27

182327

24

2525 16

16

27251726

2817

5

15

8

24

16

3022

28

28 25

26

2931

27272916

3017

1714

142624

18

2016

1116

20

29

19

21

19

22

2011

22

25

27

2515

22

19

12

21

23

1823

31

25

1116

1326

18

21

2116

26

16

2821

28231427

16

2017

2522

16

3224

20

2312

11

16

29

32

21

18

1715

12

23

24

111213

27

222217

1517

7

182030

1418 1821

19

15

14

30

3631

22

27

274

23

11

18

122622

23

1522

22

9

146

9

12

19

1424

1422

14

24

1725 19271919

2017

19

1819 24

232114 1710

8 12

19

17

9

28

29

12

15

16

10

2126

1127

191426

12

1118

1921 1626

2025

31

1221

23

22

171613

11

23

1418

1723 12

17

10

232114

20

20

16

1426

2012

22

821

18 27

19 1924

9

22

12

1520

2519

12

21

19

18

24

14

229

20

1813

13

1218

8

18

15

26

14

14

26162717

2315

25

16

17

7

12

2916

18

1420

16 16

19

1725

16 13

1919 21

20

12

8 261725

16

20 22

2223 11

22

11

1210

6

1822

17

3019

18

7

2325 21

2421

28

15

17

14

17

18

1814

1719

23

2523

24

21

25

17

27 24 29

18

31

26

2019

18

29

18

20

26

2532

24

17

14

2028

26

24

29

311922

2

26

27

1620

1815

2322

19 13

1622

1715

47

261317

21

18

15

621

8

16

10

17

11

106 10

2110

28

1717

22

2522

14

11

18

171319

17

29

18

25

23

201320

24 15 1931

14

17 2615

2013

18

13

1621 32

5

1720

1916

23

14

1716

17

15

2215

13

1823202213

24

21 22

1518

17

22

21

2311

242113

2310

12

13

2018

1719

2919

19

9

24

20

19

21

7182024 25

2024 22

1624

18

231714

1419

14

2211

10

12

21

13

17

16

18

1815

1816222120 16

3019

19222320 169

19

23 19222212

23 2318 18

17

2830

18

2524

2020

15

16

14

17 17

102529

193212

2618

1415

1218

23

13

18

2126

11

11

34

20

2628 24

17

1512

28 2514151616

19

17

22

1820

22141720

16

2829

13

16

27

242621

28

16

25

12

3148

2

16171530

143

68

32313 1320

46 27

4 9

11

102724

19

920

2822

50

4

29

17

8

2

189

152324

27

24

21

3

2215

11 18 2424

14

1727

1019 15

39

3

2916

20

25 13

28

13

5

18

15

2411

22

6

1513

13

Var[β̂j ] = σ2(XTj Xj)

−1

4/1

Modeling heterogeneity

In the hierarchical normal model:

θj = {µj , σ2}

yi,j = µj + σ2, {εi,j} ∼ i.i.d normal(µj , σ2)

µ1, . . . , µm ∼ i.i.d. normal (µ, τ 2)

What should we do for a hierarchical regression model?

θj = {βj , σ2}

yi,j = βTj xi,j + εi,j , {εi,j} ∼ i.i.d. normal(0, σ2)

β1, . . . ,βm ∼ i.i.d. p(βj)

5/1

HLM

MVN model for across-group heterogeneity:

β1, . . . ,βm ∼ i.i.d. multivariate normal(β,Σβ)

The parameters in this model include

β, an across-group mean regression vector

Σβ , a covariance matrix describing the variability of the βj ’s around β.

6/1

Ad-hoc estimates

## rough estimate of betaapply(BETA.OLS,2,mean,na.rm=TRUE)

## (Intercept) xj## 50.618228 3.672483

This estimator of β equally weights all schools.Generally, we want to assign a lower weight to schools with less data.

## rough estimate of Sigma_betacov(BETA.OLS,use="complete.obs")

## (Intercept) xj## (Intercept) 26.795851 1.001585## xj 1.001585 15.818939

This is a very rough estimate of Σβ :

• It ignores sample size differences;• It ignores the variability of β̂j around βj .

Var[β̂j ’s around β̂ ] ≈ Var[βj ’s around β ] + Var[β̂j ’s around βj ’s ]

Sample covariance of β̂j ’s ≈ Σβ + Estimation error

7/1

Fixed and random effects

Recall the following:

µj ∼ N(µ, τ 2)⇔ µj = µ+ aj , aj ∼ N(0, τ 2)

Analogously,

βj ∼ N(β,Σβ)⇔ βj = β + bj , bj ∼ N(0,Σβ)

Therefore, our hierarchical model says that

yj = Xjβj + εj

= Xj(β + bj) + εj

= Xjβ + Xjbj + εj

• β is sometimes called a fixed effect, as it is fixed across all groups.

• bj is sometimes called a random effect“random” as it varies across groups, or“random” if the groups were randomly sampled.

A model with fixed and random effects is called a mixed-effects model.

8/1

Within-group covariance

Recall the HNM:yi,j = µ+ aj + εi,j

What was the within-group covariance?

Cov[yi1,j , yi2,j ] = E[(yi,j − µ)(yi2,j − µ)]

= E[(aj + εi1,j)(aj + εi2,j)]

= E[a2j ] + 0 + 0 + 0

= τ 2

9/1

Within-group covariance, matrix form

More generally, we might want the within-group covariance matrix:

yj =

y1,j

...yn,j

Cov[yj ] =

Var[y1,j ] Cov[y1,j , y2,j ] · · · Cov[y1,j , yn,j ]

Cov[y1,j , y2,j ] Var[y2,j ] · · · Cov[y2,j , y2,j ]...

...Cov[y1,j , yn,j ] Cov[y2,j , yn,j ] · · · Var[yn,j ]

Our calculations have shown that for the HNM

Cov[yj ] =

σ2 + τ 2 τ 2 · · · τ 2

......

τ 2 τ 2 · · · σ2 + τ 2

10/1

Within-group covariance, matrix form

In general,Cov[yj ] = E[(yj − E[yj ])(yj − E[yj ])

T ]

For the HLM,yj − E[yj ] = yj − Xjβ = Xjbj + εj ,

so

Cov[yj ] = E[(Xjbj + εj)(Xjbj + εj)T ]

= E[(XjbjbTj XT

j ] + E[εjεTj ]

= XjΣβXTj + σ2I

11/1

Dependence and conditional independence

Thus p(yj |β,Σβ , σ2), unconditional on bj , is

yj ∼ multivariate normal(Xjβ,XjΣβXTj + σ2I).

On the other hand, conditional on bj ,

yj ∼ multivariate normal(Xjβ + Xjbj , σ2I).

12/1

Dependence and conditional independence

Marginal dependence: If I don’t know βj (or bj), then knowing yi1,j gives mea bit of information about βj , which in turn gives me information about yi2,j ,and so the observations are dependent: My information about yi2,j depends onthe value of yi1,j if I don’t know βj .

Conditional independence: If I know βj , then knowing yi1,j doesn’t give meany information about yi2,j , and so they are independent. My informationabout yi2,j does not depend on the value of yi1,j if I know βj .

Note: Within-group covariance can be positive or negative, depending on Xj .

13/1

Within-group covariance

Consider the case that xi,j = {1, xi,j} and βj = {β0,j , β1,j}.• Xj is nj × 2

• XjΣXTj is nj × nj , the covariances between observations within a group.

Cov[y1,j , y2,j ] = xT1,jΣx2,j

= Σ1,1 + Σ1,2(x1,j + x2,j) + Σ2,2x1,jx2,j

= Var[β0,j ] + Var[β1,j ]x1,jx2,j + Cov[β0,j , β1,j ](x1,j + x2,j)

• Intercept variance positivly correlates the observations within a group.

• Slope variance can lead to positive or negative correlation, depending onhow close x1,j and x2,j are.

14/1

Sources of variation and correlation

−2 −1 0 1 2

−4

−2

02

4

x

●●

●

●●●

●●

●

●

●●

●

●

●●

●●

●

●

●●●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●●

●

●●

● ●

●

● ●

●

●

●●●

●

●●●

●

●

●

●

●

●

●●

●

−2 −1 0 1 2

−4

−2

02

4

x

mu0 ●

●●

●●●

●●

●

●

●●

●

●

●●

●●

●

●

●●●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●●

●

●●

● ●

●

● ●

●

●

●●●

●

●●●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●●

●●

●

●

●●●●

●

● ●●●●●●

●● ●

●● ●

●●●

● ●● ● ●●

●●●

●

●

● ●

●

● ●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

−2 −1 0 1 2

−4

−2

02

4

x

mu0

●

●

●

●●

●

●●

●

●

●

●

●

●

●●

●●

●

●

●●●●

●

● ●●●●●●

●● ●

●● ●

●●●

● ●● ● ●●

●●●

●

●

● ●

●

● ●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●●

●

●

● ● ●●

●

●●●●

● ●

●

●●

●●

●●

●

●

● ●●

●

●

● ●

●

● ●

●

●●

●

●

●

● ●

●●

● ●

●

●

●

●

●

●

●

●

●

15/1

Fitting a HLM

Assuming data are independent across groups, the likelihood at a value(β,Σβ , σ

2) can be computed as follows:

0. Set ll= 0.

1. Set ll= ll + ldmvnorm( y1 , X1β , X1ΣβX1 + σ2I).

2. Set ll= ll + ldmvnorm( y2 , X2β , X2ΣβX2 + σ2I).

...

m. Set ll= ll + ldmvnorm( ym , Xmβ , XmΣβXm1 + σ2I).

We can then numerically optimize the likelihood to find the MLEs.

16/1

Fitting the HLM with lmer

library(lme4)fit.lme<-lmer( y.nels ~ ses.nels + (ses.nels | g.nels),REML=FALSE)

summary(fit.lme)

## Linear mixed model fit by maximum likelihood ['lmerMod']

## Formula: y.nels ~ ses.nels + (ses.nels | g.nels)

##

## AIC BIC logLik deviance df.resid

## 92553.1 92597.9 -46270.5 92541.1 12968

##

## Scaled residuals:

## Min 1Q Median 3Q Max

## -3.8910 -0.6382 0.0179 0.6669 4.4613

##

## Random effects:

## Groups Name Variance Std.Dev. Corr

## g.nels (Intercept) 12.223 3.496

## ses.nels 1.515 1.231 0.11

## Residual 67.345 8.206

## Number of obs: 12974, groups: g.nels, 684

##

## Fixed effects:

## Estimate Std. Error t value

## (Intercept) 50.6767 0.1551 326.7

## ses.nels 4.3594 0.1231 35.4

##

## Correlation of Fixed Effects:

## (Intr)

## ses.nels 0.007

17/1

Extracting results - fixed effects

### fixed effectsbeta.hat<-fixef(fit.lme)beta.hat

## (Intercept) ses.nels## 50.676704 4.359399

### variance-covariance of fixed effects estimatesVBETA<-vcov(fit.lme)VBETA

## 2 x 2 Matrix of class "dpoMatrix"## (Intercept) ses.nels## (Intercept) 0.0240606603 0.0001309645## ses.nels 0.0001309645 0.0151610507

### standard errorssqrt(diag(VBETA))

## [1] 0.1551150 0.1231302

### t-valuesbeta.hat/sqrt(diag(VBETA))

## (Intercept) ses.nels## 326.70410 35.40479

18/1

Extracting results - variance components

### within-group variances2.hat<-sigma(fit.lme)^2

### across-group varianceVarCorr(fit.lme)$g.nels

## (Intercept) ses.nels## (Intercept) 12.2231940 0.4887692## ses.nels 0.4887692 1.5148005## attr(,"stddev")## (Intercept) ses.nels## 3.496168 1.230772## attr(,"correlation")## (Intercept) ses.nels## (Intercept) 1.0000000 0.1135884## ses.nels 0.1135884 1.0000000

### remove the S4 uglinessVB<-matrix(VarCorr(fit.lme)$g.nels,2,2)

VB

## [,1] [,2]## [1,] 12.2231940 0.4887692## [2,] 0.4887692 1.5148005

19/1

Random effects estimates

B.LME<-as.matrix(ranef(fit.lme)$g.nels)BETA.LME<-sweep( B.LME , 2 , beta.hat, "+" )

−2 −1 0 1 2

2030

4050

6070

80

ses

mat

h sc

ore

OLS regression lines

−2 −1 0 1 2

2030

4050

6070

80

ses

mat

h sc

ore

HLM shrinkage estimates

20/1

Range of shrinkage estimates

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●●

●

●●

●

●●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●●

●

●

●

●

●

●●

●●●

●

●

● ●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

● ●

●

●

●

●

●●

●

●●

●●

●●●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●●●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●●

●

●

●●

●

●

●●

●

●

●●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

● ●

●●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●●

●

●

●

●

●●●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●●●

●

●

●

●●

●

●●

●

●

●●●●

●

●

●

●

●●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

● ●

●●

●

●

●●

●

●●

●

●

●

● ●●

●

●

●

●

●

●●

●●

●

●●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

● ●●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

● ●

●

●

●

●●

●

●

●

●

●

●●

●

●●

●

●

●

●●

●

●● ●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

30 40 50 60 70 80 90

4550

5560

OLS intercept

LME

inte

rcep

t

●

●●

●

●

●

●●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●●

● ●

●●●

●

●

●

●

●●

●

●

●

●

●

● ●

●

●●

●●

●●

●

●●

●

●

●

●

●●

●

●

●

●●●●

●

●

●●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●

●

●

●

●●●●

●●

●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●● ●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●

●●

●

●●

●

●●

●

●

●●●

●

●

●

●

●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●●

●

●

●

●

●

●

●●

●

●●

●

●●

● ●

● ●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●●●

●

●

●●

●

●

● ●

●

●

●

●

●

●●●

●

●

●

●

●

●●●

●

●●

●

●

●

●

●●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●●

●

●●

●●

●

●

●

●●

●

●

●

●

●

● ●

●

●●

●●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●●

●

●

●

●

●●

●●

●

●

●●

●

●

●●●

●

●●

●

●

●

●

●

●

●

●● ●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

−10 0 10 203

45

6

OLS slope

MLE

slo

pe

21/1

Formula for shrinkage estimates

Intuitively:β̃j = wj β̂j + (1− wj)β̂

where wj depends on Σb and σ2(XTj Xj)

1:

• wj is big if σ2(XTj Xj)

1 small compared to Σb;

• wj is small if σ2(XTj Xj)

1 large compared to Σb.

This is almost right. The averaging has to be done using matrices:

β̃j =(

XTj Xj/σ

2 + Σ−1β

)−1 (Xjyj/σ

2 + Σ−1β β

)In practice, σ2,Σβ ,β are usually replaced with σ̂2, Σ̂β , β̂.

Quiz: How does β̃j vary with Xj , σ2 and Σβ?

22/1

Macro-level effectsLME regression estimates:

●

●●

●

●

●

●●

●

● ●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

● ●

●

●

●

● ●●

●

●●

●

●

● ●

●

●

●●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●●

●

●

●

● ●

●

●

●

●●

●

● ●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●●

●●●

● ●

●

●

●

●

● ●

●

●

●●

●

● ●

●●●

●●

●●

●

●●

●

●

●

●

●●

●

●

●

●● ● ●

●

●

● ●

●

●

●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●●

●

●

●● ●●

●●

●

●

●● ●●

● ●

●

●

●

●

●

●●●

●

●

●●

●

● ●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

● ●

●

● ●

●●

●●

●

●●

● ●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●● ●● ●

●

●●

●

●

●

●

●

●●

●

●

●

● ●

●

●●●

●

●●

●●

●

● ●

●

●●●

●

●● ●

●●

●

●

●

●●

●

●

●

●

●●● ●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

● ●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

● ●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

● ●

●

●

●

●

●

●

●

●

● ● ●

●

●

● ●

●

●

●

●

●

●●

●

●

●●

●

●●

● ●

●●

●

●

●

●

●●●

●●

●

●

●

●

●●

●

●

●

●

●

●

● ●●

●

●

● ●

● ●

●

●

●

●

●

●

●

●

●●●

●

●

●

●●

●

●

●

●

●● ●

●

●

● ●

●

●

● ●

●

●

●

●

●

●●●

●

●

●

●

●

●●●●

●●

●

●

●●

●●●

●

●

●

●●

●

●

●●

●●

●

●

●

●●

●

●●

●

●●

●

● ●

●

●●●

●

●

●

●

●●

●

●

●

●

●

●●●

●●●●

●●●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●●

●●

●●

●●

● ●●●

●

●

●

●

●●

●●

●

●

●●

●●

● ●●●

●●

●

●

●

●

●

●

●

●●●●

●

●●

●

●

●

●●

●

●● ●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●

45 50 55 60

34

56

intercept

slop

e

Questions:

• What kind of schools have big intercepts?

• What kind of schools have big slopes?

Can we relate macro-level parameters to macro-level effects ?

23/1

Macro-level effects### FLP variableflp.school<-tapply( flp.nels , g.nels, mean)table(flp.school)

## flp.school## 1 2 3## 226 257 201

### RE and FLP associationmpar()par(mfrow=c(1,2))boxplot(BETA.LME[,1]~flp.school,col="lightblue")boxplot(BETA.LME[,2]~flp.school,col="lightblue")

●

●

●

●●

●

●●

●●

●

●

●

●

●

●

●

●

●●

●

●

●●

●●●●

●

1 2 3

4550

5560

●

●●

●

●●

●

●

●

●

●

1 2 3

34

56

24/1

Macro-level effects

It seems that β0,j and possibly β1,j are associated with flpj .

• Testing: Is there evidence for the association?

• Estimation: What is the association?

These questions can be addressed by expanding the model:

Old model:

yi,j = β0,j + β1,j × sesi,j + εi,j

= (β0 + b0,j) + (β1 + b1,j)× sesi,j + εi,j

New model:


= (β0 + α0 × flpj + b0,j) + (β1 + α1 × flpj + b1,j)× sesi,j + εi,j

Note that under this model,

• The intercept for school j is β0,j = (β0 + α0 × flpj + b0,j)

• The slope for school j is β1,j = (β1 + α1 × flpj + b1,j)

(Alternatively, we could treat flpj as a categorical variable)

25/1

Macro-level fixed effects


= (β0 + α0 × flpj + b0,j) + (β1 + α1 × flpj + b1,j)× sesi,j + εi,j

• α0 represents the macro effect of flpj on the intercept/mean in group j

• α1 represents the macro effect of flpj on the slope with sesi,j in group j

Note: α0 and α1 do not vary across groups. If they did, they would beconfouned with b0,j and b1,j .

Note: As they are fixed across groups, they are in fact fixed effects:

26/1

Macro-level fixed effects

yi,j = (β0 + α0 × flpj + b0,j) + (β1 + α1 × flpj + b1,j)× sesi,j + εi,j

Rearranging, we get

yi,j =β0 + α0 × flpj + β1 × sesi,j + α1 × flpj × sesi,j +

b0,j + b1,j × sesi,j +

εi,j

Fixed effects regression: β0 + α0 × flpj + β1 × sesi,j + α1 × flpj × sesi,j

Random effects regression: b0,j + b1,j × sesi,j

Note:

• The covariates for the two regressions are different.

• Macro-effects do not appear in the random effects regression.

27/1

Mixed-effects model

yi,j =β0 + α0 × flpj + β1 × sesi,j + α1 × flpj × sesi,j +


εi,j

We see the distinction between α’s and β’s is meaningless.

We rewrite the model as

yi,j = β0 + β1 × flpj + β2 × sesi,j + β3 × flpj × sesi,j +


εi,j

=βTxi,j + bjT zi,j + εi,j

• xi,j = (1, flpj , sesi,j)

• zi,j = (1, sesi,j)

28/1

Group-level representation

Micro-level representation:

yi,j = βTxi,j + bTj zi,j + εi,j

Combining observations within a group:y1,j

...yn,j

=

x1,j →...

xn,j →

β1

...βp

+

z1,j →...

zn,j →

b1,j

...bp,j

+

ε1,j

...εn,j

Two-level HLM: General form

yj = Xjβ + Zjbj + εj

Note: This formulation allows the fixed effects predictors to be different fromthe random effects predictors.

29/1

Two-level HLM: General form

This is the general form of a two-level hierarchical linear model


where bj and εj are multivariate normal.

• β are the fixed effects coefficients;

• Xj is the design matrix for the fixed effects.

• bj are the random effects coefficients for group j ;

• Zj is the design matrix for the fixed effects.

30/1

Variance components


E

[bj

εj

]=

[00

]and Cov

[bj

εj

]=

[Ψ 00 Σ

].

Across-group heterogeneity: Ψ is the variance-covariance in b1, . . . , bm.

Within-group heterogeneity: Σ is the variance-covariance of y1,j , . . . , ynj ,j .

Note: We should write Σj instead of Σ, as

Cov[yj ] = Cov[εj ] = Σj is an nj × nj matrix.

Note: In the examples so far,

Σj = σ2Inj .

Question: What other forms for Σj might be useful?

31/1

Example: One-way random effects model, aka the HNM

yi,j = µ+ aj + εi,j

{aj} ∼ iid N(0, τ 2)

{εi,j} ∼ iid N(0, σ2)

Exercise: Express this model as yj = Xjβ + Zjbj + εj

• Regression parameters:β = µ , bj = aj

• Design matrices:

Xj = Zj =

1...1

for each j ∈ {1, . . . ,m}

• Covariance terms:Ψ = Var[aj ] = τ 2 , Σ = σ2I

Exercise: Check your work by going in reverse.32/1

Example: One-way random effects model, aka the HNM

fit.0<-lmer(y.nels~ 1 + (1|g.nels), REML=FALSE)

summary(fit.0)

## Linear mixed model fit by maximum likelihood ['lmerMod']## Formula: y.nels ~ 1 + (1 | g.nels)#### AIC BIC logLik deviance df.resid## 93919.3 93941.7 -46956.6 93913.3 12971#### Scaled residuals:## Min 1Q Median 3Q Max## -3.8112 -0.6534 0.0093 0.6732 4.6999#### Random effects:## Groups Name Variance Std.Dev.## g.nels (Intercept) 23.63 4.861## Residual 73.71 8.585## Number of obs: 12974, groups: g.nels, 684#### Fixed effects:## Estimate Std. Error t value## (Intercept) 50.9391 0.2026 251.4

33/1

Group-specific linear regression

yi,j = βTxi,j + bTj xi,j + εi,j

{bj} ∼ iid N(0,Ψ)

{εi,j} ∼ iid N(0, σ2)

Exercise: Express this model as yj = Xjβ + Zjbj + εj

• Design matrices:

Xj = Zj =

x1,j →...

xnj ,j →

for each j ∈ {1, . . . ,m}

• Regression parameters:β = β , bj = bj

• Covariance terms:Ψ = Cov[bj ], Σ = σ2I

This is just a special case where Xj = Zj .34/1

Group-specific linear regression

fit.1<-lmer(y.nels~ ses.nels + (ses.nels|g.nels), REML=FALSE)

summary(fit.1)


## Formula: y.nels ~ ses.nels + (ses.nels | g.nels)

##


## 92553.1 92597.9 -46270.5 92541.1 12968

##



## -3.8910 -0.6382 0.0179 0.6669 4.4613

##

## Random effects:



## ses.nels 1.515 1.231 0.11

## Residual 67.345 8.206


##

## Fixed effects:


## (Intercept) 50.6767 0.1551 326.7

## ses.nels 4.3594 0.1231 35.4

##


## (Intr)

## ses.nels 0.007

35/1

General LME

yi,j = βTxi,j + bTj zi,j + εi,j

{bj} ∼ iid N(0,Ψ)

{εj} ∼ iid N(0,Σ)∗

* modulo different sample sizes.

Review of benefits of model extension:

• Group-specific regressors should appear in Xj but not Zj ;

• If {bk,1, . . . , bk,m} shows little variability (ψk,k small), we may want toremove xi,j,k from the random effects model, and include it as a fixedeffect only.

• Within-group covariances other than Σ = σ2I might be useful:• Σ with temporal correlation for longitudinal/panel data;• Unrestricted Σ for correlation but unordered outcomes (teeth, eg.)

36/1

General LMEfit.2<-lmer(y.nels~ flp.nels + ses.nels + flp.nels*ses.nels + (ses.nels | g.nels), REML=FALSE)

summary(fit.2)


## Formula: y.nels ~ flp.nels + ses.nels + flp.nels * ses.nels + (ses.nels |

## g.nels)

##


## 92396.3 92456.0 -46190.1 92380.3 12966

##



## -3.9773 -0.6417 0.0201 0.6659 4.5202

##

## Random effects:



## ses.nels 1.572 1.254 0.06

## Residual 67.260 8.201


##

## Fixed effects:


## (Intercept) 55.3975 0.3860 143.52

## flp.nels -2.4062 0.1819 -13.23

## ses.nels 4.4909 0.3327 13.50

## flp.nels:ses.nels -0.1931 0.1587 -1.22

##


## (Intr) flp.nl ss.nls

## flp.nels -0.930

## ses.nels -0.158 0.088

## flp.nls:ss. 0.086 -0.007 -0.92637/1

Documents

The Hierarchical Linear Model - Duke University