§❻ Hierarchical (Multi-Stage) Generalized Linear Models

Applied Bayesian Inference, KSU, April 29, 2012

§ / 1

§❻ Hierarchical (Multi-Stage) Generalized Linear Models

Robert J. Tempelman


§ / 2

Introduction

• Some inferential problems require non-classical approaches; e.g.– Heterogeneous variances and covariances across

environments.– Different distributional forms (e.g. heavy-tailed or

mixtures for residual/random effects).– High dimensional variable selection models

• Hierarchical Bayesian modeling provides some flexibility for such problems.


§ / 3

Heterogeneous variance models(Kizilkaya and Tempelman, 2005)

• Consider a study involving different subclasses (e.g. herds).– Mean responses are

different.– But suppose residual

variances are different too.

• Let’s discuss in context of LMM (linear mixed model)


§ / 4

Recall linear mixed model

• Given:

has a certain “heteroskedastic” specification.

• determines the nature of heterogeneous residual variances

= + +y Xβ Zu e

~ ,e 0 R ξN

R ξ

| , ~ | , ,y β u,ξ y β u,ξ Xβ Zu R ξp N

ξ


§ / 5

Modeling Heterogeneous Variances

• Suppose

– with as a “fixed” intercept residual variance

– gk > 0 kth fixed scaling effect.

– vl > 0 lth random scaling effect.

11 12e = e e est

e ~ 0,R ξ =Ikl kl

2kl kl n eN2 2 ; 1, 2, ; 1, 2, . .kle e lk kv s l tg

2e


§ / 6

Subjective and Subjective Priors• “intercept” variance : subjective flat or

conjugate vague inverted-gamma (IG) prior • Invoke typical constraints for “fixed effects”

– Corner parameterization: gs= 1.

– Flat or vague IG prior p(gk); k=1,2,..,s

• Structural prior for “random effects”

– i.e., vl ~ IG(a, a-1).

• E(vl )=1;

( 1) 1( 1)( | ) ( ) exp( )

ee ee

l e le l

p v vv

aa aaa

a

2 1=Var( )2v l

e

va

2 2~e ep

a functions like a “variance component” for residual variances.-> hyperparameter

1( | )2e

elCV v a

a


§ / 7

Remaining priors

• “Classical” random effects

• “Classical” fixed effects

• “Classical” random effects VC

• Hyperparameter (Albert, 1988)

2

1~ ( )(1 )e e

e

pa aa

~ ( )β βp

| ~ ( | ) = , ( )u φ u φ 0 G φp N

~ ( )φ φp

SAS PROC MCMC doesn’t seem to handle this…prior can’t be written as function of corresponding parameter


§ / 8

What was the last prior again???

2

11

vv

2

1vv

1

1 vUniform(0,1) on

1v

Uniform(0,1) on

Different diffuse priors can have different impacts on posterior inferences!...if data info is poor

Rosa et al. (2004)


§ / 9

Joint Posterior Density

• LMM:

2

1 1

2

2 ( ) ( | )

|

| , , ,

, , , |

(

,

)

,

,

,

y β u γ

β u γ v φ y

β u φφvs

e v

e e

t

k lk l

e

e p

p

p

p p

p

p

p

v

p

g a

a

a


§ / 10

Details on FCD

• All provided by Kizilkaya and Tempelman (2005)– All are recognizeable except for av:

– Use Metropolis-Hastings random walk on using normal as proposal density.

• For MH, generally a good idea to transform parameters so that parameter space is entire real line…but don’t forget to include Jacobian of transform.

11

1 1

| , , , , , , ,

1exp 1

β u φ γ v y L τe

e

e

t tte

e l l etl le

p

v v pa

a

a

aa a

a

loge e a


§ /

Small simulation study• Two different levels of heterogeneity:

– ae = 5, ae = 15– = 1

• Two different average random subclass sizes:– ne = 10 vs. ne = 30– 20 subclasses (habitats) in total

• Also modeled fixed effects:– Sex (2 levels) for location and dispersion (g1=2, g2=1).

• Additional set of random effects:– 30 levels (e.g. sires) cross-classified with habitats.

11

2e

1( | )2l e

e

CV v aa


§ / 12

PROC MIXED code• “Fixed” effects models for residual variances

– REML estimates of “herd” variances expressed relative to average.

proc mixed data=phenotype; class sireid habitatid sexid; model y = sexid; random intercept /subject = habitatid ; random intercept /subject = sireid; repeated / local = exp(sexid habitatid); ods output covparms=covparms;run; 2 2 ; 1, 2, ; 1, 2, . .

kle e lk kv s l tg

but treats vl as a fixed effect.Models


§ / 13

MCMC analyses (code available online)Posterior summaries on ae.

ae = 15; ne = 10

Mean Median Std Dev

1st Pctl 99th Pctl

58.84 20.36 99.72 3.755 562.6

ae = 5; ne = 10

Mean Median Std Dev

1st Pctl 99th Pctl

4.531 3.416 3.428 2.073 22.24

ae = 5; ne = 30

Mean Median Std Dev

1st Pctl 99th Pctl

3.683 3.382 1.302 2.081 8.006

ae = 15; ne = 30

Mean Median Std Dev

1st Pctl 99th Pctl

67.24 41.25 85.30 7.918 487.5


§ / 14

MCMC (₀) and REML (•) estimates of subclass residual variances vs. truth (vl)

ae=15;ne=10ae=5;ne=10

ae=5;ne=30 ae=15;ne=30

High shrinkage situation

Low shrinkage situation


§ / 15

Heterogeneous variances for ordinal categorical data

• Suppose we had a situation where residual variances were heterogeneous on the underlying latent scale– i.e., greater

frequency of extreme vs. intermediate categories in some subclasses 5 10 15

0.00

0.05

0.10

0.15

0.20

liability

dens

ity

Herd 1Herd 2Herd 3


§ / 16

Heterogeneous variances for ordinal categorical data?

• On liability scale:

has a certain “heteroskedastic” specification.

• determines the nature of heterogeneous variances

= + +Xβ Zu e

~ ,e 0 R ξN

R ξ

| , ~ | , ,β u,ξ β u,ξ Xβ Zu R ξp N

ξ


§ / 17

Cumulative probit mixed model (CPMM)

• For CPMM, l maps to Y:

1

1 2

1

1 if ,2 if ,

if ;

o i

ii

k i C

Y

k

-111 1 1

p( | , ) 1 1y L τklns t C

j ikl j ikljk l i

L y j


§ / 18

Modeling Heterogeneous Variances in CPMM

• Suppose

– With as a “fixed” reference residual variance

– gk > 0 kth fixed scaling effect.

– vl > 0 lth random scaling effect.– All other priors same as with LMM

11 12e = e e est

e ~ 0,R ξ =Ikl kl

2kl kl n eN2 2 ; 1, 2, ; 1, 2, . .kle e lk kv s l tg

2e


§ / 19

Joint Posterior Density in CPMM

• CPMM:

2

1 1

2

2

( ) ( |

, , , , , , , , |

()

| , | ,

|

,

)

, ,

τ β u

y L τ L β u γ

L τ β u γ

v

φ φ

v φ y

s t

e

e v

e k l vk l

vpp p p p

p

p

p

p

pp vg

a

a

a


§ / 20

Another small simulation study

• Two different levels of heterogeneity:– ae = 5, ae = 15

• Average random subclass size: ne = 30– 20 subclasses (habitats) in total

• Also modeled fixed effects:– Sex (2 levels) for location and dispersion.

• Additional set of random effects:– 30 levels (e.g. sires) cross-classified with habitats.

• Thresholds: 1 = -1, 1 = 1.5


§ / 21

ae = 15; ne = 30

Mean Median Std Dev

1st Pctl 99th Pctl

49.44 23.21 75.31 5.018 404.7

ESS = 391

ae = 5; ne = 30

Mean Median Std Dev

1st Pctl 99th Pctl

5.018 4.344 2.118 2.125 11.56ESS = 1422

ae = 5; ne = 30 ae = 15; ne = 30


§ / 22

Posterior means of subclass residual variances vs. truth (vl)

ae=5;ne=30 ae=15;ne=30

No PROC GLIMMIX counterpartAnother alternative: Heterogeneous thresholds!!! (Varona and Hernandez, 2006)


§ / 23

Additional extensions• PhD work by Fernando Cardoso

– Heterogeneous residual variances as functions of multiple fixed effects and multiple random effects.

– Heterogeneous t-error (Cardoso et al., 2007).

– Helps separates outliers from high variance subclasses from effects of outliers.

– Other candidates for distribution of wj lead to alternative heavy-tailed specifications (Rosa et al., 2004)

12

12

jk jl

j

K pkk

j

L qle l

e w

| ~ ,2 2jp w Gamma

t-error is outlier

robust


§ /

Posterior densities of breed-group heritabilities in multibreed Brazilian cattle (Fernando

Cardoso)

a) Gaussian homoskedastic model

05

1015202530

0 0.1 0.2 0.3 0.4 0.5Heritability

Post

erio

r den

sity

Nelore Hereford F1 A38

c) Gaussian heteroskedastic model

05

101520253035

0 0.1 0.2 0.3 0.4 0.5Heritability

Post

erio

r den

sity

Nelore Hereford F1 A38

Some of most variable herds were exclusively Herefords

Based on homogeneous residual variance (Cardoso and Tempelman, 2004)

Based on heterogeneous residual variances (Fixed: breed additive&dominance,sex; Random: CG (Cardoso et al., 2005)

•Estimated CV of CG-specific s2e →

0.72±0.06•F1 s2

e = 0.70±0.16 purebred s2e


§ / 25

Heterogeneous G-side scale parameters

• Could be accommodated in a similar manner.• In fact, the borrowing of information across

subclasses in estimating subclass-specific random effects variances is even more critical.– Low information per subclass? REML estimates

will converge to zero.


§ /

Heterogeneous bivariate G-side and R-side inferences!

, ,1

2, ,

,

,

'

'

z 0

0 zuu

j milk j

j CI j

milk j

CI j

milk

CI

milk fixed effectsfixed effect eCI

e

s

Bello et al. (2010, 2012)

Investigated herd-level and cow-level relationship between 305-day milk production and calving interval (CI) as a function of various factors

CG (herd-year) effects

Residual (cow) effects


§ /

Herd-Specific and Cow- Specific (Co)variances

,

,

2,,

2, ,

milk CImilkj

milk CI CI

e je je

e j e j

,

,

2,,

, 2, ,

milk CImilk

milk CI CI

u ku ku k

u k u k

Herd k

Cow j

, ,2

,

milk CI

milk

u kuk

u k

Let

and , ,2

,

milk CI

milk

e jej

e j


§ /

Rewrite this

|

2 2, ,

2 2, ,

2,

milk milk

milk milk C ilk

j

I m

ej

e e

e j e

j

j

e j e j e jje

|

2 2, ,

2 2, ,

, 2,

milk milk

milk mil CI milkk

uk

u

u k u

u uk k

k

u k uk

u

k

k

ku

Herd k

Cow j

Model each of these different colored terms as functions of fixed and random effects (in addition to the classical b and u)!


§ /

bST effect on Herd-Level Association betweenMilk Yield and Calving Interval

-1.5

-1

-0.5

0

0.5

0% <50% ≥50%% herd on

bST supplementation

days

per

100

kg

milk

yie

ld

0.01a 0.07a

-1.37b

a,b P < 0.0001

bST:Bovine somatotropin

uk


§ /

Number of times milking/day on Cow-level Association betweenMilk Yield and Calving Interval

0

0.2

0.4

0.6

2X 3+X Daily Milking Frequency

days

per

100

kg

milk

yie

ld

0.57a

0.45b

a,b P < 0.0001

Overall Antagonism

0.51±0.01 day longer CI per 100 kg increase in cumulative 305-d milk yield

ej


§ /

Variability between Herds for (Random effects)

• DICM0 – DICM1 = 243

• Expected range between extreme herd-years

± 2 = 0.7 d / 100 kg

2ˆ 0.030 0.005em

2em

Ott and Longnecker, 2001

0.0 0.2 0.4 0.6 0.8 1.0Increase in # of days of CI / 100 kg herd milk yield

0.16 0.7 d/100kg 0.86

ej

ej


§ /

Whole Genome Selection (WGS)

• Model: ' ; z gi i i iy fixed effects u e i = 1,2,…,n.

(e.g. age, parity)

1 2 3 'g mg g g g

21

~ ,u= 0 Ani ui

u N

21

~ ,e= 0 Ini ei

e N

Genotypes

SNP allelic substitution effects

Polygenic Effects

Residual effects

Phenotype

LD (linkage disequilibrium)

Phenotypes

'1 2 3zi i i i imz z z z

Anim

al

Genotypes

m >>>>>n


§ /

Typical WGS specifications• Random effects spec. on g (Meuwissen et al.

2001)– BLUP:– BayesA/B:

BayesA = BayesB with = 0.– “Random effects/Bayes” modeling allows m >> n

• Borrowing of information across genes.

2~ ,g 0 I gN

22 2 , with prob : (1 )

~ , ; ~0 with prob :

g 0 j jg g

SN diag


§ /

First-order antedependence-specifications (Yang and Tempelman, 2012)

• Instead of independence, specify first order antedependence:SNP Marker Genetic Effect

SNP 1: g1 = d1,SNP 2: g2 = t21g1 + d2,SNP 3: g3 = t32g2 + d3,

⁞ ⁞SNP m: gm = tm,m-1gm-1 + dm.

2, 1 ~ ,j j t tt N

22

1

2 , with prob : (1 )~ , ; ~

0 with prob :δ 0

j

m

jjS

N diag dd

Ante-BayesB

Ante-BayesA = Ante-BayesB with p = 0

1 1 2 1 2 3

1 2 2 3

1 2 2 3

1 2 3 2 3 3

11

11

Correlation

Random effects modeling: facilitates borrowing of information across SNP intervals

, 1( )j j jf t

SNP 1 SNP 2 SNP 3 SNP 4


§ /

Results from a simulation study

• Advantage of Ante-BayesA/B over conventional BayesA/B increases with increasing marker density (LD = linkage disequilbrium)

0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32

0.70

0.75

0.80

0.85

0.90

0.95

1.00

LD level

Acc

urac

y

BayesAante.BayesABayesBante.BayesB

Accuracy of Genomic EBV vs. LD level

(r2) P<.001 Bayes A/B vs. AnteBayesA/B


§ / 36

Other examples of multi-stage hierarchical modeling?

• Spatial variability in agronomy using t-error (Besag and Higdon, 1999)

• Ecology (Cressie et al., 2009).• Conceptually, one could model heterogeneous

and spatially correlated overdispersion parameters in Poisson/binomial GLMM as well!


§ / 37

What I haven’t covered in this workshop

• Model choice criteria– Bayes factors (generally, too challenging to compute)– DIC (Deviance information criteria)

• Bayesian model averaging– Advantage over conditioning on one model (e.g. for

multiple regression involving many covariates)• Posterior predictive checks.

– Great for diagnostics• Residual diagnostics based on latent residuals for

GLMM (Johnson and Albert, 1999).


§ / 38

Some closing comments/opinions

• Merit of Bayesian inference– Marginal for LMM with classical assumptions.

• GLS with REML seems to work fine.– Of greater benefit for GLMM

• Especially binary data with complex error structures– Greatest benefit for multi-stage hierarchical

models.• Larger datasets nevertheless required than with more

classical (homogeneous assumptions).


§ / 39

Implications

• Increased programming capabilities/skills are needed.– Cloud/cluster computing wouldn’t hurt.

• Don’t go in blind with canned Bayesian software. – Watch the diagnostics (e.g. trace plots) like a hawk!

• Don’t go on autopilot.– WinBugs/PROC MCMC works nicely for the simpler stuff.– Highly hierarchical models require statistical/algorithmic

insights…do recognize limitations in parameter identifiability (Cressie et al., 2009)


§ /

National Needs PhD FellowshipsMichigan State University

Focus: Integrated training in quantitative, statistical and molecular genetics, and breeding of food animals

Features:• Research in animal genetics/genomics with collaborative faculty team• Industry internship experience• Public policy internship in Washington, DC• Statistical consulting center experience• Teaching or Extension/outreach learning opportunities• Optional affiliation with inter-departmental programs in Quantitative

Biology, Genetics, othersFaculty Team:

C. Ernst, J. Steibel, R. Tempelman, R. Bates, H. Cheng, T. Brown, B. Alston-MillsEligibility is open to citizens and nationals of the US. Women

and underrepresented groups are encouraged to apply.


§ / 41

Thank You!!!• Any more questions???

http://actuary-info.blogspot.com/2011/05/homo-actuarius-bayesianes.html

Documents

§❻ Hierarchical (Multi-Stage) Generalized Linear Models