33
A Bayesian hierarchical model to integrate dietary exposure and biomarker measurements for the risk of cancer Marta Pittavino, PhD Senior Research Associate, Research Center for Statistics UNIGE, Geneva - Switzerland 1 / 23

A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

A Bayesian hierarchical model tointegrate dietary exposure and biomarkermeasurements for the risk of cancer

Marta Pittavino, PhD

Senior Research Associate, Research Center for StatisticsUNIGE, Geneva - Switzerland

1 / 23

Page 2: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

Summary

Outline on measurement error

- Combine biomarkers with self-reports

A Bayesian hierarchical model with three components

- An application from the EPIC study

Concluding remarks

2 / 23

Page 3: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

Outline on measurement error

- Difference between the actual true value and themeasured value

- Self-reported assessments (questionnaire: Q and24-HDR: R) of dietary exposure prone to random andsystematic measurement errors

- Estimates of the association between dietary factorsand risk of disease can be biased

It has been suggested to complement self-reports withobjective measurements, as dietary biomarkers (M)

3 / 23

Page 4: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

Outline on measurement error

- Difference between the actual true value and themeasured value

- Self-reported assessments (questionnaire: Q and24-HDR: R) of dietary exposure prone to random andsystematic measurement errors

- Estimates of the association between dietary factorsand risk of disease can be biased

It has been suggested to complement self-reports withobjective measurements, as dietary biomarkers (M)

3 / 23

Page 5: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

Outline on measurement error

- Difference between the actual true value and themeasured value

- Self-reported assessments (questionnaire: Q and24-HDR: R) of dietary exposure prone to random andsystematic measurement errors

- Estimates of the association between dietary factorsand risk of disease can be biased

It has been suggested to complement self-reports withobjective measurements, as dietary biomarkers (M)

3 / 23

Page 6: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

Outline on measurement error

- Difference between the actual true value and themeasured value

- Self-reported assessments (questionnaire: Q and24-HDR: R) of dietary exposure prone to random andsystematic measurement errors

- Estimates of the association between dietary factorsand risk of disease can be biased

It has been suggested to complement self-reports withobjective measurements, as dietary biomarkers (M)

3 / 23

Page 7: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

Why combine self-reports and biomarkers?

Objective biomarkers have long integrated self-reported

dietary (or physical activity) measurements

- Several validation studies with recovery (OPEN, NHS,

WHI, EPIC) and concentration biomarker

measurements (Ferrari et al., AJE, 2007)

- Biomarkers also integrated in diet-disease association

studies (Freedman et al., EPI, 2011; Tasevska et al., AJE, 2018;

Prentice et al., AJE, 2017)

4 / 23

Page 8: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

Motivating scenario

Plasma levels of Vit-B6 were inversely associated with

kidney cancer risk in EPIC (Johansson, JNCI, 2014)

- Serum levels of Vit-B6, unlike Q measurements, were

inversely related to lung cancer (Johansson, JAMA, 2010)

- Dietary folate inversely related to breast cancer (de

Battle, JNCI, 2014 ), unlike plasma levels (Matejcic, IJC,

2017)

5 / 23

Page 9: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

Application

Data from two EPIC nested case-control studies on

kidney (n=1,108) and lung (n=1,764) cancers

- Q, R and M measurements on vit-B6 and folate

R measurements, available on ∼10% of the sample,

were imputed

Data were log-transformed to approximate normality

6 / 23

Page 10: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

The EPIC Study

Prospective cohort with500,000 participants from23 centres

- Dietary and lifestyleexposures assessed atbaseline

- Biological samples collectedat baseline from 80%disease-free participants

7 / 23

Page 11: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

The setting

Develop a Bayesian latent factor hierarchical model to:

- Integrate self-reported (Q) and (R) with concentration

biomarker (M) measurements

- Evaluate the measurement error structure

- Estimate the unknown association between unknown

true dietary intakes (X) and risk of disease (Y)

8 / 23

Page 12: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

How did we get here?

Long time ago, Ray Caroll and Pietro Ferrari discussed

the idea of a Bayesian model for measurement errors

- DQs and 24-HDRs of energy and fat intakes were

related to the risk of breast cancer in EPIC

Pietro Ferrari and Martyn Plummer submitted an

applicationto WCRF in 2013.

First results were produced last year

9 / 23

Page 13: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

10 / 23

Page 14: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

How did we get here?

Long time ago, Ray Caroll and Pietro Ferrari discussed

the idea of a Bayesian model for measurement errors

- Combine DQs and 24-HDRs of energy and fat intakes,

and relate them to risk of breast cancer in EPIC

Pietro Ferrari and Martyn Plummer submitted an

applicationto WCRF in 2013

- First results were produced in 2017

11 / 23

Page 15: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

The Bayesian hierarchical model

1) Exposure model: p(X |µ,Σ)

2) Measurement model:p(Q,R ,M |X )

3) Disease model: p(Y |X )

µ Σ

XR

Q(1)

M

(2)

(3)

Y

12 / 23

Page 16: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

The Bayesian hierarchical model

1) Exposure model: p(X |µ,Σ)

2) Measurement model:p(Q,R ,M |X )

3) Disease model: p(Y |X )

µ Σ

XR

Q(1)

M

(2)

(3)

Y

13 / 23

Page 17: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

The Bayesian hierarchical model

1) Exposure model:p(X |µ,Σ)

2) Measurement model:p(Q,R ,M |X )

3) Disease model: p(Y |X )

µ Σ

XR

Q(1)

M

(2)

(3)

Y

14 / 23

Page 18: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

1. The exposure model

Xik : vector of unknown true (latent factor) dietaryintake of vit-B6 and folate (k = 1, 2), withi = 1, . . . , n:

Xik ∼ MVN(µk ,ΣX ) = MVN(0,ΣX )

Σ−1X ∼ Wishart(DxDxDx , rxrxrx)

- - - - - - - - - - - - - - - - - - - - - - - - - -

- Dx : scale matrix, (k × k)

- rx : rank of the Wishart distribution

15 / 23

Page 19: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

1. The exposure model

Xik : vector of unknown true (latent factor) dietaryintake of vit-B6 and folate (k = 1, 2), withi = 1, . . . , n:

Xik ∼ MVN(µk ,ΣX ) = MVN(0,ΣX )

Σ−1X ∼ Wishart(DxDxDx , rxrxrx)

- - - - - - - - - - - - - - - - - - - - - - - - - -

- Dx : scale matrix, (k × k)

- rx : rank of the Wishart distribution

15 / 23

Page 20: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

2. The measurement model

for i = 1, . . . , n and k = 1, 2:

Qik = αQk+ βQk

· Xik + εQk

Rik = Xik + εRk

Mik = αMk+ βMk

· Xik + εMk

- Assumptions:

cov(εQk, εRk

) 6= 0, cov(εQ1 , εQ2) 6= 0, cov(εR1 , εR2) 6= 0,

cov(εQk, εMk

) = 0, cov(εRk, εMk

) = 0, cov(εM1 , εM2) = 0

16 / 23

Page 21: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

2. The measurement model

for i = 1, . . . , n and k = 1, 2:

Qik = αQk+ βQk

· Xik + εQk

Rik = Xik + εRk

Mik = αMk+ βMk

· Xik + εMk

- Assumptions:

cov(εQk, εRk

) 6= 0, cov(εQ1 , εQ2) 6= 0, cov(εR1 , εR2) 6= 0,

cov(εQk, εMk

) = 0, cov(εRk, εMk

) = 0, cov(εM1 , εM2) = 0

16 / 23

Page 22: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

2. The measurement model

for i = 1, . . . , n and k = 1, 2:

Qik = αQk+ βQk

· Xik + εQk

Rik = Xik + εRk

Mik = αMk+ βMk

· Xik + εMk

- Assumptions:

cov(εQk, εRk

) 6= 0, cov(εQ1 , εQ2) 6= 0, cov(εR1 , εR2) 6= 0,

cov(εQk, εMk

) = 0, cov(εRk, εMk

) = 0, cov(εM1 , εM2) = 0

16 / 23

Page 23: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

2. The measurement model (ii)

for i = 1, . . . , n and k = 1, 2:

Qik = αQk+ βQk

· Xik + εQk

Rik = Xik + εRk

Mik = αMk+ βMk

· Xik + εMk

- Prior distributions:

αQk∼ N(0,σ2

αQkσ2αQkσ2αQk

), βQk∼ N(0,σ2

βQkσ2βQkσ2βQk

)

εQR ∼ MVN(0,ΣεQR), Σ−1

εQR∼ Wishart(DεQR

DεQRDεQR, rεQRrεQRrεQR

)

αMk∼ N(0,σ2

αMkσ2αMkσ2αMk

), βMk∼ N(0,σ2

βMkσ2βMkσ2βMk

)

εM ∼ MVN(0,ΣεM ), Σ−1εM∼ Wishart(DεMDεMDεM , rεMrεMrεM )

17 / 23

Page 24: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

2. The measurement model (ii)

for i = 1, . . . , n and k = 1, 2:

Qik = αQk+ βQk

· Xik + εQk

Rik = Xik + εRk

Mik = αMk+ βMk

· Xik + εMk

- Prior distributions:

αQk∼ N(0,σ2

αQkσ2αQkσ2αQk

), βQk∼ N(0,σ2

βQkσ2βQkσ2βQk

)

εQR ∼ MVN(0,ΣεQR), Σ−1

εQR∼ Wishart(DεQR

DεQRDεQR, rεQRrεQRrεQR

)

αMk∼ N(0,σ2

αMkσ2αMkσ2αMk

), βMk∼ N(0,σ2

βMkσ2βMkσ2βMk

)

εM ∼ MVN(0,ΣεM ), Σ−1εM∼ Wishart(DεMDεMDεM , rεMrεMrεM )

17 / 23

Page 25: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

2. The measurement model (ii)

for i = 1, . . . , n and k = 1, 2:

Qik = αQk+ βQk

· Xik + εQk

Rik = Xik + εRk

Mik = αMk+ βMk

· Xik + εMk

- Prior distributions:

αQk∼ N(0,σ2

αQkσ2αQkσ2αQk

), βQk∼ N(0,σ2

βQkσ2βQkσ2βQk

)

εQR ∼ MVN(0,ΣεQR), Σ−1

εQR∼ Wishart(DεQR

DεQRDεQR, rεQRrεQRrεQR

)

αMk∼ N(0,σ2

αMkσ2αMkσ2αMk

), βMk∼ N(0,σ2

βMkσ2βMkσ2βMk

)

εM ∼ MVN(0,ΣεM ), Σ−1εM∼ Wishart(DεMDεMDεM , rεMrεMrεM )

17 / 23

Page 26: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

3. The disease model

Yi ∈ (0, 1): disease indicator for the i th study subject

Yi ∼ Bin(1, πi), with πi the probability of developingthe disease

- A conditional logistic model is assumed as:

P(Yi |Xk ,Zp) = H(γ1 · Xi1 + γ2 · Xi2 + ZTip γ3)

Γ = (γ1, γ2, γ3) ∼ N(0,σ2Γ(k+p)σ2

Γ(k+p)σ2

Γ(k+p))

18 / 23

Page 27: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

3. The disease model

Yi ∈ (0, 1): disease indicator for the i th study subject

Yi ∼ Bin(1, πi), with πi the probability of developingthe disease

- A conditional logistic model is assumed as:

P(Yi |Xk ,Zp) = H(γ1 · Xi1 + γ2 · Xi2 + ZTip γ3)

Γ = (γ1, γ2, γ3) ∼ N(0,σ2Γ(k+p)σ2

Γ(k+p)σ2

Γ(k+p))

18 / 23

Page 28: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

Analytical steps

R measurements, available on ∼10% of the sample,were imputed as E (R |Q)

- Data were log-transformed to approximate normality

- Residuals were computed by country, study, smoking,batch, age and sex (M), country, age and sex (Q), ageand sex (R)

- Disease models were run separately by study

- Only results for kidney cancer disease model are shown

Analyses were run in R with JAGS (Martyn Plummer)

19 / 23

Page 29: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

3. Results: disease modelTable 3. Relative risk estimates ( ˆRRk = e γ̂k ).

Vit-B6 Folate

R̂R1 (95% CI1) R̂R2 (95% CI1)

Q 1.00 (0.88, 1.14) 0.96 (0.84, 1.10)

R - - - -

M 0.79 (0.69, 0.89) 0.95 (0.84, 1.07)

X 0.71 (0.58, 0.88) 0.85 (0.72, 1.02)

1Credible Intervals

20 / 23

Page 30: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

3. Results: disease modelTable 3. Relative risk estimates ( ˆRRk = e γ̂k ).

Vit-B6 Folate

R̂R1 (95% CI1) R̂R2 (95% CI1)

Q 1.00 (0.88, 1.14) 0.96 (0.84, 1.10)

R - - - -

M 0.79 (0.69, 0.89) 0.95 (0.84, 1.07)

X 0.71 (0.58, 0.88) 0.85 (0.72, 1.02)

1Credible Intervals

20 / 23

Page 31: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

Concluding remarks

Challenging to tackle the complexity of dietary andbiomarker measurements

- After measurement error correction Vit-B6 and folateintakes were inversely associated with the risk ofkidney cancer

Aspects still to tune up

- Great potential to use data from studies with availableconcentration biomarker data

21 / 23

Page 32: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

Concluding remarks (ii)

Are Bayesian modelling worth the trouble in nutritional

epidemiology?

- They are flexible and can be very informative

Need of informative data, possibly replicates of R

(and M) measurements

- Need of more biomarker measurements of dietary

exposure

22 / 23

Page 33: A Bayesian hierarchical model to integrate dietary ... · R measurements, available on ˘10% of the sample, were imputed as E(RjQ)-Data were log-transformed to approximate normality-Residuals

Acknowledgments

Pietro Ferrari, Hannah Lennon, Martyn Plummer

- WCRF/AICR for funding the project

- Mattias Johansson, Veronique Chajes

- Ray Carroll, Paul Gustafson, Victor Kipnis, HeatherBowles

EPIC collaborators

23 / 23