62
Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Embed Size (px)

Citation preview

Page 1: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Using complex random effect models in epidemiology and

ecology

Dr William J. Browne

School of Mathematical Sciences

University of Nottingham

Page 2: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Outline

• Background to my research, random effect and multilevel models and MCMC estimation.

• Random effect models for complex data structures including artificial insemination and Danish chicken examples.

• Multivariate random effect models and great tit nesting behaviour.

• Efficient MCMC algorithms.• Conclusions and future work.

Page 3: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Background

• 1995-1998 – PhD in Statistics, University of Bath.“Applying MCMC methods to multilevel models.”• 1998-2003 – Postdoctoral research positions at the

Centre for Multilevel Modelling at the Institute of Education, London.

• 2003-2006 – Lecturer in Statistics at University of Nottingham.

• 2006-2007 Associate professor of Statistics at University of Nottingham.

• 2007- Professor in Biostatistics, University of Bristol.Research interests:Multilevel modelling, complex random effect modelling,

applied statistics, Bayesian statistics and MCMC estimation.

Page 4: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Random effect models

• Models that account for the underlying structure in the dataset.• Originally developed for nested structures (multilevel models), for

example in education, pupils nested within schools.• An extension of linear modelling with the inclusion of random effects.A typical 2-level model is

Here i might index pupils and j index schools.Alternatively in another example i might index cows and j index herds.The important thing is that the model and statistical methods used are

the same!

),0(~),,0(~ 22

10

eijuj

ijjijij

NeNu

euxy

Page 5: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Estimation Methods for Multilevel Models

Due to additional random effects no simple matrix formulae exist for finding estimates in multilevel models.

Two alternative approaches exist:1. Iterative algorithms e.g. IGLS, RIGLS, that alternate

between estimating fixed and random effects until convergence. Can produce ML and REML estimates.

2. Simulation-based Bayesian methods e.g. MCMC that attempt to draw samples from the posterior distribution of the model.

One possible computer program to use for multilevel models which incorporates both approaches is MLwiN.

Page 6: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

MLwiN

• Software package designed specifically for fitting multilevel models.

• Developed by a team led by Harvey Goldstein and Jon Rasbash at the Institute of Education in London over past 15 years or so. Earlier incarnations ML2, ML3, MLN.

• Originally contained ‘classical’ estimation methods (IGLS) for fitting models.

• MLwiN launched in 1998 also included MCMC estimation.

• My role in the team was as developer of the MCMC functionality in MLwiN in my time at Bath and during 4.5 years at the IOE.

Note: MLwiN core team relocated to Bristol in 2005.

Page 7: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

MCMC Algorithm

• Consider the 2-level normal response model

• MCMC algorithms usually work in a Bayesian framework and so we need to add prior distributions for the unknown parameters.

• Here there are 4 sets of unknown parameters:

• We will add prior distributions

),0(~),,0(~ 22

10

eijuj

ijjijij

NeNu

euxy

22 ,,, euu

)(),(),( 22eu ppp

Page 8: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

MCMC Algorithm (2)

One possible MCMC algorithm for this model then involves simulating in turn from the 4 sets of conditional distributions. Such an algorithm is known as Gibbs Sampling. MLwiN uses Gibbs sampling for all normal response models.

Firstly we set starting values for each group of unknown parameters,

Then sample from the following conditional distributions, firstly

To get .

)0(2

)0(2

)0()0( ,,, euu

),,,|( 2)0(

2)0()0( euuyp

)1(

Page 9: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

MCMC Algorithm (3)

We next sample from

to get , then

to get , then finally

To get . We have then updated all of the unknowns in the model. The process is then simply repeated many times, each time using the previously generated parameter values to generate the next set

),,,|( 2)0(

2)0()1( euyup

)1(u),,,|( 2

)0()1()1(2

eu uyp 2

)1(u),,,|( 2

)1()1()1(2

ue uyp 2

)1(e

Page 10: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Burn-in and estimates

Burn-in: It is general practice to throw away the first n values to allow the Markov chain to approach its equilibrium distribution namely the joint posterior distribution of interest. These iterations are known as the burn-in.

Finding Estimates: We continue generating values at the end of the burn-in for another m iterations. These m values are then averaged to give point estimates of the parameter of interest. Posterior standard deviations and other summary measures can also be obtained from the chains.

Page 11: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

So why use MCMC?

• Often gives better (in terms of bias) estimates for non-normal responses (see Browne and Draper, 2006).

• Gives full posterior distribution so interval estimates for derived quantities are easy to produce.

• Can easily be extended to more complex problems as we will see next.

• Potential downside 1: Prior distributions required for all unknown parameters.

• Potential downside 2: MCMC estimation is much slower than the IGLS algorithm.

• For more information see my book: MCMC Estimation in MLwiN – Browne (2003).

Page 12: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Extension 1: Cross-classified modelsFor example, schools by neighbourhoods. Schools will draw pupils from many different neighbourhoods and the pupils of a neighbourhood will go to several schools. No pure hierarchy can be found and pupils are said to be contained within a cross-classification of schools by neighbourhoods:

 

nbhd 1 nbhd 2 Nbhd 3

School 1 xx x

School 2 x x

School 3 xx x

School 4 x xxx

School S1 S2 S3 S4

Pupil P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12

Nbhd N1 N2 N3

Page 13: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Notation

With hierarchical models we use a subscript notation that has one subscript per level and nesting is implied reading from the left. For example, subscript pattern ijk denotes the i’th level 1 unit within the j’th level 2 unit within the k’th level 3 unit.

If models become cross-classified we use the term classification instead of level. With notation that has one subscript per classification, that also captures the relationship between classifications, notation can become very cumbersome. We propose an alternative notation introduced in Browne et al. (2001) that only has a single subscript no matter how many classifications are in the model.

Page 14: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Single subscript notationSchool S1 S2 S3 S4

Pupil P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12

Nbhd N1 N2 N3

i nbhd(i) sch(i)1 1 12 2 13 1 14 2 25 1 26 2 27 2 38 3 39 3 410 2 411 3 412 3 4

)1()3()(

)2()(0 iischinbhdi euuy

We write the model as

1)3(

4)2(

3011

1)3(

1)2(

101

euuy

euuy

where classification 2 is neighbourhood and classification 3 is school. Classification 1 always corresponds to the classification at which the response measurements are made, in this case pupils. For pupils 1 and 11 equation (1) becomes:

Page 15: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Classification diagrams

School

Pupil

Neighbourhood

School

Pupil

Neighbourhood

Nested structure where schools are contained within neighbourhoods

Cross-classified structure where pupils from a school come from many neighbourhoods and pupils from a neighbourhood attend several schools.

In the single subscript notation we lose information about the relationship (crossed or nested) between classifications. A useful way of conveying this information is with the classification diagram. Which has one node per classification and nodes linked by arrows have a nested relationship and unlinked nodes have a crossed relationship.

Page 16: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Example : Artificial insemination by donor

Women w1 w2 w3 Cycles c1 c2 c3 c4… c1 c2 c3 c4… c1 c2 c3 c4… Donations d1 d2 d1 d2 d3 d1 d2 Donors m1 m2 m3

1901 women279 donors 1328 donations12100 ovulatory cyclesresponse is whether conception occurs in a given cycle

In terms of a unit diagram:

Donor

Woman

Cycle

Donation

Or a classification diagram:

Page 17: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Model for artificial insemination data

),0(~

),0(~

),0(~

)()logit(

),1(~

2)4(

)4()(

2)3(

)3()(

2)2(

)2()(

)4()(

)3()(

)2()(i

uidonor

uidonation

uiwoman

idonoridonationiwomani

ii

Nu

Nu

Nu

uuuX

Binomialy

We can write the model as

2)4(u

0

1

2

3

4

5

6

7

2)2(u

2)3(u

Parameter Description Estimate(se)

intercept -4.04(2.30)

azoospermia 0.22(0.11)

semen quality 0.19(0.03)

womens age>35 -0.30(0.14)

sperm count 0.20(0.07)

sperm motility 0.02(0.06)

insemination to early -0.72(0.19)

insemination to late -0.27(0.10)

women variance 1.02(0.21)

donation variance 0.644(0.21)

donor variance 0.338(0.07)

Results:

Note cross-classified models can be fitted in IGLS but are far easier to fit using MCMC estimation.

Page 18: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Extension 2: Multiple membership models

 

When level 1 units are members of more than one higher level unit we describe a model for such data as a multiple membership model.

For example,

•  Pupils change schools/classes and each school/class has an effect on pupil outcomes.

• Patients are seen by more than one nurse during the course of their treatment.

Page 19: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Notation

),0(~

)2(),0(~

)(

2

2)2(

)2(

)(

)2()2(,

ei

uj

inursejijjiii

Ne

Nu

euwXBy

Note that nurse(i) now indexes the set of nurses that treat patient i and w(2)

i,j is a weighting factor relating patient i to nurse j. For example, with four patients and three nurses, we may have the following weights:

  n1(j=1) n2(j=2) n3(j=3)

p1(i=1) 0.5 0 0.5

p2(i=2) 1 0 0

p3(i=3) 0 0.5 0.5

p4(i=4) 0.5 0.5 0

4)2(

2)2(

144

3)2(

3)2(

233

2)2(

122

1)2(

3)2(

111

5.05.0)(

5.05.0)(

1)(

5.05.0)(

euuXBy

euuXBy

euXBy

euuXBy

Here patient 1 was seen by nurse 1 and 3 but not nurse 2 and so on. If we substitute the values of w(2)

i,j , i and j. from the table into (2) we get the series of equations :

Page 20: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Classification diagrams for multiple membership relationships

Double arrows indicate a multiple membership relationship between classifications.

patient

nurseWe can mix multiple membership, crossed and hierarchical structures in a single model.

patient

nurse

hospital

GP practice

Here patients are multiple members of nurses, nurses are nested within hospitals and GP

practice is crossed with both nurse and hospital.

Page 21: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Example involving nesting, crossing and multiple membership – Danish chickens

Production hierarchy10,127 child flocks 725 houses 304 farms

Breeding hierarchy10,127 child flocks200 parent flocks

farm f1 f2… Houses h1 h2 h1 h2 Child flocks c1 c2 c3… c1 c2 c3…. c1 c2 c3…. c1 c2 c3…. Parent flock p1 p2 p3 p4 p5….

Child flock

House

Farm

Parent flock

As a unit diagram: As a classification diagram:

Page 22: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Model and results

),0(~

),0(~),0(~

)()logit(

),1(~

2)4(

)4()(

2)3(

)3()(

2)2(

)2(

)(.

)4()(

)3()(

)2()2(,i

uifarm

uihouseuj

iflockpjiifarmihousejjii

ii

Nu

NuNu

euuuwXB

Binomialy

0

1

2

3

4

5

2)2(u

2)3(u

2)4(u

Parameter Description Estimate(se)

intercept -2.322(0.213)

1996 -1.239(0.162)

1997 -1.165(0.187)

hatchery 2 -1.733(0.255)

hatchery 3 -0.211(0.252)

hatchery 4 -1.062(0.388)

parent flock variance 0.895(0.179)

house variance 0.208(0.108)

farm variance 0.927(0.197)

Results:

Response is cases of salmonella

Note multiple membership models can be fitted in IGLS and this model/dataset represents roughly the most complex model that the method can handle.

Such models are far easier to fit using MCMC estimation.

Page 23: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Random effect modelling of great tit nesting behaviour

• An extension of cross-classified models to multivariate responses.

• Collaborative research with Richard Pettifor (Institute of Zoology, London), and Robin McCleery and Ben Sheldon (University of Oxford).

Page 24: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Wytham woods great tit dataset

• A longitudinal study of great tits nesting in Wytham Woods, Oxfordshire.

• 6 responses : 3 continuous & 3 binary. • Clutch size, lay date and mean

nestling mass.• Nest success, male and female

survival.• Data: 4165 nesting attempts over a

period of 34 years. • There are 4 higher-level classifications

of the data: female parent, male parent, nestbox and year.

Page 25: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Data background

Source Numberof IDs

Median#obs

Mean#obs

Year 34 104 122.5

Nestbox 968 4 4.30

Male parent 2986 1 1.39

Female parent 2944 1 1.41

Note there is very little information on each individual male and female bird but we can get some estimates of variability via a random effects model.

The data structure can be summarised as follows:

Page 26: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Diagrammatic representation of the dataset.

Nest-box

Year

1

2

3

4

5

6

70

♀16♂16

♀17♂17

♀15♂18

♀18♂11

69

♀14♂14

♀13♂15

♀15♂11

68

♀11♂10

♀13♂13

♀12♂11

67

♀11♂10

♀12♂11

♀10♂12

66

♀8♂8

♀9♂6

♀10♂9

65

♀5♂1

♀6♂5

♀7♂6

♀3♂7

64

♀1♂1

♀2♂2

♀3♂3

♀4♂4

Page 27: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Univariate cross-classified random effect modelling

• For each of the 6 responses we will firstly fit a univariate model, normal responses for the continuous variables and probit regression for the binary variables. For example using notation of Browne et al. (2001) and letting response yi be clutch size:

)2,0(~

),2)5(

,0(~)5()(

),2)4(

,0(~)4()(

),2)3(

,0(~)3()(

),2)2(

,0(~)2()(

)5()(

)4()(

)3()(

)2()(

eN

ie

uN

iyearu

uN

inestboxu

uN

ifemaleu

uN

imaleu

ie

iyearu

inestboxu

ifemaleu

imaleu

iy

Page 28: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Estimation

• We use MCMC estimation in MLwiN and choose ‘diffuse’ priors for all parameters.

• We run 3 MCMC chains from different starting points for 250k iterations each (500k for binary responses) and use the Gelman-Rubin diagnostic to decide burn-in length.

• We compared results with the equivalent classical model using the Genstat software package and got broadly similar results.

• We fit all four higher classifications and do not consider model comparison.

Page 29: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Clutch SizeParameter Estimate (S.E.) Percentage

variance β 8.808 (0.109) -

2)5(u (Year) 0.365 (0.101) 14.3%

2)4(u (Nest box) 0.107 (0.026) 4.2%

2)3(u (Male) 0.046 (0.043) 1.8%

2)2(u (Female) 0.974 (0.062) 38.1%

2e (Observation) 1.064 (0.055) 41.6%

Here we see that the average clutch size is just below 9 eggs with large variability between female birds and some variability between years. Male birds and nest boxes have less impact.

Page 30: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Lay Date (days after April 1st)Parameter Estimate (S.E.) Percentage

variance β 29.38 (1.07) -

2)5(u (Year) 37.74 (10.08) 50.3%

2)4(u (Nest box) 3.38 (0.56) 4.5%

2)3(u (Male) 0.22 (0.39) 0.3%

2)2(u (Female) 8.55 (1.03) 11.4%

2e (Observation) 25.10 (1.04) 33.5%

Here we see that the mean lay date is around the end of April/beginning of May. The biggest driver of lay date is the year which is probably indicating weather differences. There is some variability due to female birds but little impact of nest box and male bird.

Page 31: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Nestling MassParameter Estimate (S.E.) Percentage

variance β 18.829 (0.060) -

2)5(u (Year) 0.105 (0.032) 9.0%

2)4(u (Nest box) 0.026 (0.013) 2.2%

2)3(u (Male) 0.153 (0.030) 13.1%

2)2(u (Female) 0.163 (0.031) 14.0%

2e (Observation) 0.720 (0.035) 61.7%

Here the response is the average mass of the chicks in a brood at 10 days old. Note here lots of the variability is unexplained and both parents are equally important.

Page 32: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Human example

Sarah Victoria Browne

Born 20th July 2004

Birth Weight 6lb 6oz

Helena Jayne Browne

Born 22nd May 2006

Birth Weight 8lb 0oz

Father’s birth weight 9lb 13oz, Mother’s birth weight 6lb 8oz

Page 33: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Nest SuccessParameter Estimate (S.E.) Percentage

overdispersion β 0.010 (0.080) -

2)5(u (Year) 0.191 (0.058) 56.0%

2)4(u (Nest box) 0.025 (0.020) 7.3%

2)3(u (Male) 0.065 (0.054) 19.1%

2)2(u (Female) 0.060 (0.052) 17.6%

Here we define nest success as one of the ringed nestlings captured in later years. The value 0.01 for β corresponds to around a 50% success rate. Most of the variability is explained by the Binomial assumption with the bulk of the over-dispersion mainly due to yearly differences.

Page 34: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Male SurvivalParameter Estimate (S.E.) Percentage

overdispersion β -0.428 (0.041) -

2)5(u (Year) 0.032 (0.013) 41.6%

2)4(u (Nest box) 0.006 (0.006) 7.8%

2)3(u (Male) 0.025 (0.023) 32.5%

2)2(u (Female) 0.014 (0.017) 18.2%

Here male survival is defined as being observed breeding in later years. The average probability is 0.334 and there is very little over-dispersion with differences between years being the main factor. Note the actual response is being observed breeding in later years and so the real probability is higher!

Page 35: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Female survivalParameter Estimate (S.E.) Percentage

overdispersion β -0.302 (0.048) -

2)5(u (Year) 0.053 (0.018) 36.6%

2)4(u (Nest box) 0.065 (0.024) 44.8%

2)3(u (Male) 0.014 (0.017) 9.7%

2)2(u (Female) 0.013 (0.014) 9.0%

Here female survival is defined as being observed breeding in later years. The average probability is 0.381 and again there isn’t much over-dispersion with differences between nestboxes and years being the main factors.

Page 36: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Multivariate modelling of the great tit dataset

• We now wish to combine the six univariate models into one big model that will also account for the correlations between the responses.

• We choose a MV Normal model and use latent variables (Chib and Greenburg, 1998) for the 3 binary responses that take positive values if the response is 1 and negative values if the response is 0.

• We are then left with a 6-vector for each observation consisting of the 3 continuous responses and 3 latent variables. The latent variables are estimated as an additional step in the MCMC algorithm and for identifiability the elements of the level 1 variance matrix that correspond to their variances are constrained to equal 1.

Page 37: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Multivariate Model

),0(~

),)5(

,0(~)5()(

),)4(

,0(~)4()(

),)3(

,0(~)3()(

),)2(

,0(~)2()(

)5()(

)4()(

)3()(

)2()(

eMVN

ie

uMVN

iyearu

uMVN

inestboxu

uMVN

ifemaleu

uMVN

imaleu

ie

iyearu

inestboxu

ifemaleu

imaleu

iy

05.000000

005.00000

0005.0000

0002.000

0000150

000005.0

)(

],)(*6,6[6~))((

iuS

iuSinvWisharti

up

Here the vector valued response is decomposed into a mean vector plus random effects for each classification.

Inverse Wishart priors are used for each of the classification variance matrices. The values are based on considering overall variability in each response and assuming an equal split for the 5 classifications.

Page 38: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Use of the multivariate model

• The multivariate model was fitted using an MCMC algorithm programmed into the MLwiN package which consists of Gibbs sampling steps for all parameters apart from the level 1 variance matrix which requires Metropolis sampling (see Browne 2006).

• The multivariate model will give variance estimates in line with the 6 univariate models.

• In addition the covariances/correlations at each level can be assessed to look at how correlations are partitioned.

Page 39: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Partitioning of covariances

a

Laying date

Clu

tch

siz

e

Raw phenotypic correlation

(1) Causal environmental effect

(2) Female condition effect

[partitioning ofcovariance

at different levels]

Laying date

Clu

tch

Siz

e

(i) Territory

(ii) Female

(iii) Observation

Laying date

Clu

tch

Siz

e

Laying date

Clu

tch

Siz

e

Laying date

Clu

tch

Siz

e

Laying date

Clu

tch

Siz

e

Laying date

Clu

tch

Siz

e

b

Fecundity

Su

rviv

al

partitioning ofcovariance atdifferent levels

Raw phenotypic correlation

Fecundity

Sur

viva

l

Fecundity

Sur

viva

l

(i) Female

(ii) Observation

Page 40: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Correlations from a 1-level model• If we ignore the structure of the data and consider it as 4165

independent observations we get the following correlations:

CS LD NM NS MS

LD -0.30 X X X X

NM -0.09 -0.06 X X X

NS 0.20 -0.22 0.16 X X

MS 0.02 -0.02 0.04 0.07 X

FS -0.02 -0.02 0.06 0.11 0.21

Note correlations in bold are statistically significant i.e. 95% credible interval doesn’t contain 0.

Page 41: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Correlations in full modelCS LD NM NS MS

LD N, F, O

-0.30

X X X X

NM F, O

-0.09

F, O

-0.06

X X X

NS Y, F

0.20

N, F, O

-0.22

O

0.16

X X

MS -

0.02

-

-0.02

-

0.04

Y

0.07

X

FS F, O

-0.02

F, O

-0.02

-

0.06

Y, F

0.11

Y, O

0.21

Key: Blue +ve, Red –ve: Y – year, N – nestbox, F – female, O - observation

Page 42: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Pairs of antagonistic covariances at different classifications

There are 3 pairs of antagonistic correlations i.e. correlations with different signs at different classifications:

LD & NM : Female 0.20 Observation -0.19Interpretation: Females who generally lay late, lay heavier

chicks but the later a particular bird lays the lighter its chicks.

CS & FS : Female 0.48 Observation -0.20Interpretation: Birds that lay larger clutches are more likely

to survive but a particular bird has less chance of surviving if it lays more eggs.

LD & FS : Female -0.67 Observation 0.11Interpretation: Birds that lay early are more likely to survive

but for a particular bird the later they lay the better!

Page 43: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Prior Sensitivity

Our choice of variance prior assumes a priori • No correlation between the 6 traits. • Variance for each trait is split equally between the 5

classifications.We compared this approach with a more Bayesian

approach by splitting the data into 2 halves:In the first 17 years (1964-1980) there were 1,116

observations whilst in the second 17 years (1981-1997) there were 3,049 observations

We therefore used estimates from the first 17 years of the data to give a prior for the second 17 years and compared this prior with our earlier prior.

Page 44: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Correlations for 2 priorsCS LD NM NS MS

LD 1. N, F, O

2. N, F, O

(N, F, O)

X X X X

NM 1. F, O

2. F, O

(F, O)

1. O

2. O

(F, O)

X X X

NS 1. Y, F

2. Y, F

(Y, F)

1. Y, F, O

2. N, F, O

(N, F, O)

1. O

2. O

(O)

X X

MS -

-

-

1. M

2. M, O

-

-

-

-

1. Y

2. Y

(Y)

X

FS 1. F, O

2. F, O

(F, O)

1. F, O

2. F, O

(F, O)

-

-

-

1. Y, F

2. Y, F

(Y, F)

1. Y, O

2. Y, O

(Y, O)

Key: Blue +ve, Red –ve: 1,2 prior numbers with full data results in brackets Y – year, N – nestbox, M – male, F – female, O - observation

Page 45: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

MCMC efficiency for clutch size response

• The MCMC algorithm used in the univariate analysis of clutch size was a simple 10-step Gibbs sampling algorithm.

• The same Gibbs sampling algorithm can be used in both the MLwiN and WinBUGS software packages and we ran both for 50,000 iterations.

• To compare methods for each parameter we can look at the effective sample sizes (ESS) which give an estimate of how many ‘independent samples we have’ for each parameter as opposed to 50,000 dependent samples.

• ESS = # of iterations/,

1

)(21k

k

Page 46: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Effective Sample sizes

Parameter MLwiN WinBUGS

Fixed Effect 671 602

Year 30632 29604

Nestbox 833 788

Male 36 33

Female 3098 3685

Observation 110 135

Time 519s 2601s

The effective sample sizes are similar for both packages. Note that MLwiN is 5 times quicker!!

We will now consider methods that will improve theESS values for particular parameters. We will firstly consider the fixed effect parameter.

Page 47: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Trace and autocorrelation plots for fixed effect using standard Gibbs sampling algorithm

Page 48: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Hierarchical Centering

This method was devised by Gelfand et al. (1995) for use in nested models. Basically (where feasible) parameters are moved up the hierarchy in a model reformulation. For example:

),0(~),,0(~, 220 eijujijjij NeNueuy

is equivalent to

),0(~),,(~, 220 eijujijjij NeNey

The motivation here is we remove the strong negative correlation between the fixed and random effects by reformulation.

Page 49: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Hierarchical Centering

).,(~),,(~),,(~

),,(~),,(~,1

),,0(~),,(~),,0(~

),,0(~),,0(~

,

1212)5(

12)4(

12)3(

12)2(0

22)5(0

)5()(

2)4(

)4()(

2)3(

)3()(

2)2(

)2()(

)5()(

)4()(

)3()(

)2()(

euu

uu

eiuiyearuinestbox

uimaleuifemale

iiyearinestboximaleifemalei

NeNNu

NuNu

euuuy

In our cross-classified model we have 4 possible hierarchies up which we can move parameters. We have chosen to move the fixed effect up the year hierarchy as it’s variance had biggest ESS although this choice is rather arbitrary.

The ESS for the fixed effect increases 50-fold from 602 to 35,063 while for the year level variance we have a smaller improvement from 29,604 to 34,626. Note this formulation also runs faster 1864s vs 2601s (in WinBUGS).

Page 50: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Trace and autocorrelation plots for fixed effect using hierarchical centering formulation

Page 51: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Parameter Expansion

• We next consider the variances and in particular the between-male bird variance. • When the posterior distribution of a variance parameter has some mass near zero this can hamper the mixing of the chains for both the variance parameter and the associated random effects. • The pictures over the page illustrate such poor mixing.• One solution is parameter expansion (Liu et al. 1998). • In this method we add an extra parameter to the model to improve mixing.

Page 52: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Trace plots for between males variance and a sample male effect using standard Gibbs sampling

algorithm

Page 53: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Autocorrelation plot for male variance and a sample male effect using standard Gibbs sampling

algorithm

Page 54: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Parameter Expansion

),(~,5,..2),,(~,1,1

),,0(~),,0(~),,0(~

),,0(~),,0(~

,

1212)(0

22)5(

)5()(

2)4(

)4()(

2)3(

)3()(

2)2(

)2()(

)5()(5

)4()(4

)3()(3

)2()(20

ekvk

eiviyearvinestbox

vimalevifemale

iiyearinestboximaleifemalei

k

NeNvNv

NvNv

evvvvy

In our example we use parameter expansion for all 4 hierarchies. Note the parameters have an impact on both the random effects and their variance.

The original parameters can be found by:2

)(22

)()()( and kvkku

kik

ki vu

Note the models are not identical as we now have different prior distributions for the variances.

Page 55: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Parameter Expansion

• For the between males variance we have a 20-fold increase in ESS from 33 to 600. • The parameter expanded model has different prior distributions for the variances although these priors are still ‘diffuse’.• It should be noted that the point and interval estimate of the level 2 variance has changed from 0.034 (0.002,0.126) to 0.064 (0.000,0.172).• Parameter expansion is computationally slower 3662s vs 2601s for our example.

Page 56: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Trace plots for between males variance and a sample male effect using parameter expansion.

Page 57: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Autocorrelation plot for male variance and a sample male effect using parameter expansion.

Page 58: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Combining the two methods

),(~,5,..2),,(~,1,1

),,0(~),,(~),,0(~

),,0(~),,0(~

,

1212)(0

22)5(0

)5()(

2)4(

)4()(

2)3(

)3()(

2)2(

)2()(

)5()(

)4()(4

)3()(3

)2()(20

ekvk

eiviyearuinestbox

vimalevifemale

iiyearinestboximaleifemalei

k

NeNNv

NvNv

evvvy

Hierarchical centering and parameter expansion can easily be combined in the same model. Here we perform centering on the year classification and parameter expansion on the other 3 hierarchies.

Page 59: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Effective Sample sizes

Parameter WinBUGS originally

WinBUGS combined

Fixed Effect 602 34296

Year 29604 34817

Nestbox 788 5170

Male 33 557

Female 3685 8580

Observation 135 1431

Time 2601s 2526s

As we can see below the effective sample sizes for all parameters are improved for this formulation while running time remains approximately the same.

Page 60: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Conclusions

• In this talk we have considered using complex random effects models in three application areas.

• For the bird ecology example we have seen how these models can be used to partition both variability and correlation between various classifications to identify interesting relationships.

• We then investigated hierarchical centering and parameter expansion for a model for one of our responses. These are both useful methods for improving mixing when using MCMC.

• Both methods are simple to implement in the WinBUGS package and can be easily combined to produce an efficient MCMC algorithm.

Page 61: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

Further Work

• Incorporating hierarchical centering and parameter expansion in MLwiN.

• Investigating their use in conjunction with the Metropolis-Hastings algorithm.

• Investigate block-updating methods e.g. the structured MCMC algorithm.

• Extending the methods to our original multivariate response problem.

Page 62: Using complex random effect models in epidemiology and ecology Dr William J. Browne School of Mathematical Sciences University of Nottingham

References• Browne, W.J. (2002). MCMC Estimation in MLwiN. London: Institute of Education, University of

London • Browne, W.J. (2004). An illustration of the use of reparameterisation methods for improving

MCMC efficiency in crossed random effect models. Multilevel Modelling Newsletter 16 (1): 13-25• Browne, W.J. (2006). MCMC Estimation of ‘constrained’ variance matrices with applications in

multilevel modelling. Computational Statistics and Data Analysis. 50: 1655-1677.• Browne, W.J. and Draper D. (2006). A Comparison of Bayesian and likelihood methods for fitting

multilevel models (with discussion). Bayesian Analysis. 1: 473-550.• Browne, W.J., Goldstein, H. and Rasbash, J. (2001). Multiple membership multiple classification

(MMMC) models. Statistical Modelling 1: 103-124.• Browne, W.J., McCleery, R.H., Sheldon, B.C., and Pettifor, R.A. (2006). Using cross-classified

multivariate mixed response models with application to the reproductive success of great tits (Parus Major). Statistical Modelling (to appear)

• Chib, S. and Greenburg, E. (1998). Analysis of multivariate probit models. Biometrika 85, 347-361.

• Gelfand A.E., Sahu S.K., and Carlin B.P. (1995). Efficient Parametrizations For Normal Linear Mixed Models. Biometrika 82 (3): 479-488.

• Kass, R.E., Carlin, B.P., Gelman, A. and Neal, R. (1998). Markov chain Monte Carlo in practice: a roundtable discussion. American Statistician, 52, 93-100.

• Liu, C., Rubin, D.B., and Wu, Y.N. (1998) Parameter expansion to accelerate EM: The PX-EM algorithm. Biometrika 85 (4): 755-770.

• Rasbash, J., Browne, W.J., Goldstein, H., Yang, M., Plewis, I., Healy, M., Woodhouse, G.,Draper, D., Langford, I., Lewis, T. (2000). A User’s Guide to MLwiN, Version 2.1, London: Institute of Education, University of London.