Full Bayesian models to handle missing values in cost ...statistica.it/gianluca/Talks/RSS.pdf ·...

Full Bayesian models to handle missing values in cost-effectivenessanalysis from individual level data

Gianluca Baio

(Joint work with Andrea Gabrio and Alexina Mason)

University College LondonDepartment of Statistical Science

g.baio@ucl.ac.uk

http://www.ucl.ac.uk/statistics/research/statistics-health-economics/http://www.statistica.it/gianluca

https://github.com/giabaio

RSS 2017 International ConferenceUniversity of Strathclyde, Glasgow

Wednesday 6 September 2017

Gianluca Baio (UCL) Bayesian models for missing data in CEA RSS Conference, 6 Sept 2017 1 / 15

Health technology assessment (HTA)

Objective: Combine costs & benefits of a given intervention into a rational scheme forallocating resources, increasingly often under a Bayesian framework

Statisticalmodel

Economicmodel

Decisionanalysis

Uncertaintyanalysis

• Estimates relevant populationparameters θ

• Varies with the type ofavailable data (& statisticalapproach!)

• Combines the parameters to obtaina population average measure forcosts and clinical benefits

• Varies with the type of availabledata & statistical model used

• Summarises the economic modelby computing suitable measures of“cost-effectiveness”

• Dictates the best course ofactions, given current evidence

• Standardised process

∆e = fe(θ)

∆c = fc(θ)

ICER = E[∆c]/E[∆e]

EIB = kE[∆e] − E[∆c]

CEAC = Pr(k∆e − ∆c > 0)

• Assesses the impact of uncertainty (eg inparameters or model structure) on theeconomic results

• Mandatory in many jurisdictions (includingNICE)

• Fundamentally Bayesian!

Statistical model — individual level data

• The available data usually look something like this:

Demographics HRQL data Resource use dataID Trt Sex Age . . . u0 u1 . . . uJ c0 c1 . . . cJ

1 1 M 23 . . . 0.32 0.66 . . . 0.44 103 241 . . . 802 1 M 21 . . . 0.12 0.16 . . . 0.38 1 204 1 808 . . . 8773 2 F 19 . . . 0.49 0.55 . . . 0.88 16 12 . . . 22. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

and the typical analysis is based on the following steps:

1 Compute individual QALYs and total costs as

J∑j=1

(uij + uij−1)δj

2and ci =

J∑j=0

cij ,[

with: δj =Timej − Timej−1

Unit of time

2 (Often implicitly) assume normality and linearity and model independently individualQALYs and total costs by controlling for baseline values

ei = αe0 + αe1u0i + αe2Trti + εei [+ . . .], εei ∼ Normal(0, σe)

ci = αc0 + αc1c0i + αc2Trti + εci [+ . . .], εci ∼ Normal(0, σc)

3 Estimate population average cost and effectiveness differentials and use bootstrap toquantify uncertainty

What’s wrong with this?

• Potential correlation between costs & clinical benefits– Strong positive correlation — effective treatments are innovative and result from

intensive and lengthy research ⇒ are associated with higher unit costs– Negative correlation — more effective treatments may reduce total care pathway costs

e.g. by reducing hospitalisations, side effects, etc.– NB: In any case, the economic evaluation is based on both!

• Joint/marginal normality not realistic– Costs usually skewed and benefits may be bounded in [0; 1]– Can use transformation (e.g. logs) — but care is needed when back transforming to

the natural scale– Can use more suitable models (e.g. Gamma or log-Normal) — especially under the

Bayesian approach

• ... and of course Partially Observed data– Can have item and/or unit non-response– Missingness may occur in either or both benefits/costs– The missingness mechanisms may also be correlated

– Focus in decision-making — not inference!

To be or not to be (Bayesians)?...

p(θ | y) ∝ p(θ)p(y | θ)?

• In general, can represent a joint distribution as a conditional regression

p(e, c) = p(e)p(e)p(c | e)p(c | e) = p(c)p(e | c)

φicψc

µc[. . .]

µe [. . .]β

Conditional model for cMarginal model for e

ei ∼ p(e | φei,ψe)

g(φei) = µe [+ . . .]

φei = locationψe = ancillary

φci = locationψc = ancillary

φei = marginal meanψe = marginal variance

φci = conditional meanψc = conditional variance

φei = marginal meanψe = marginal precision

φci = conditional meanψc = rateφciψc = shape

ci ∼ p(c | e, φci,ψc)

g(φci) = µc + β(ei − µe) [+ . . .]

• For example:

• Combining “modules” and fully characterising uncertainty about deterministicfunctions of random quantities is relatively straightforward using MCMC

• Prior information can help stabilise inference (especially with sparse data!), eg– Cancer patients are unlikely to survive as long as the general population– ORs are unlikely to be greater than ±5

To be or not to be (Bayesians)?...

• In general, can represent a joint distribution as a conditional regression

p(e, c) = p(e)p(e)p(c | e)p(c | e) = p(c)p(e | c)

φicψc

µc[. . .]

µe [. . .]β

Conditional model for cMarginal model for e

ei ∼ p(e | φei,ψe)

g(φei) = µe [+ . . .]

φei = locationψe = ancillary

φci = locationψc = ancillary

φei = marginal meanψe = marginal variance

φci = conditional meanψc = conditional variance

φei = marginal meanψe = marginal precision

φci = conditional meanψc = rateφciψc = shape

ci ∼ p(c | e, φci,ψc)

g(φci) = µc + β(ei − µe) [+ . . .]

• For example:

ei ∼ Beta(φeiψe, (1− φei)ψe), logit(φei) = µe [+ . . . ]

ci | ei ∼ Gamma(ψcφci, ψc), log(φci) = µc + β(ei − µe) [+ . . .]

• Combining “modules” and fully characterising uncertainty about deterministicfunctions of random quantities is relatively straightforward using MCMC

• Prior information can help stabilise inference (especially with sparse data!), eg– Cancer patients are unlikely to survive as long as the general population– ORs are unlikely to be greater than ±5

Dealing with missing data — selection models

MCAR (e, c)

φicψc

µcxci

φie ψe

µe xei

Model of analysis for (c, e) Model of missingness for eModel of missingness for c

Partially observed dataUnobservable parametersDeterministic function of random quantitiesFully observed, unmodelled dataFully observed, modelled data

• mei ∼ Bernoulli(πei); logit(πei) = γe0

• mci ∼ Bernoulli(πci); logit(πci) = γc0

MAR (e, c)

φicψc

µcxci

φie ψe

µe xei

• mei ∼ Bernoulli(πei); logit(πei) = γe0 +∑Kk=1 γekxeik

• mci ∼ Bernoulli(πci); logit(πci) = γc0 +∑Hh=1 γchxcih

MNAR (e, c)

φicψc

µcxci

φie ψe

µe xei

• mei ∼ Bernoulli(πei); logit(πei) = γe0 +∑Kk=1 γekxeik + γeK+1ei

• mci ∼ Bernoulli(πci); logit(πci) = γc0 +∑Hh=1 γchxcih + γcH+1ci

MNAR e; MAR c

φicψc

µcxci

φie ψe

µe xei

• mei ∼ Bernoulli(πei); logit(πei) = γe0 +∑Kk=1 γekxeik + γeK+1ei

• mci ∼ Bernoulli(πci); logit(πci) = γc0 +∑Hh=1 γchxcih

MAR e; MNAR c

φicψc

µcxci

φie ψe

µe xei

• mei ∼ Bernoulli(πei); logit(πei) = γe0 +∑Kk=1 γekxeik

• mci ∼ Bernoulli(πci); logit(πci) = γc0 +∑Hh=1 γchxcih + γcH+1ci

Motivating example: MenSS trial Partially observed data

• The MenSS pilot RCT evaluates the cost-effectiveness of a new digital interventionto reduce the incidence of STI in young men with respect to the SOC

– QALYs calculated from utilities (EQ-5D 3L)– Total costs calculated from different components (no baseline)

Time Type of outcome observed (%) observed (%)Control (n1=75) Intervention (n2=84)

Baseline utilities 72 (96%) 72 (86%)3 months utilities and costs 34 (45%) 23 (27%)6 months utilities and costs 35 (47%) 23 (27%)

12 months utilities and costs 43 (57%) 36 (43%)Complete cases utilities and costs 27 (44%) 19 (23%)

0.5 0.6 0.7 0.8 0.9 1.0

810 n1 = 27

mean = 0.904 median = 0.930

0 200 400 600 800 1000

810 n1 = 27

mean = 186 median = 176

0.5 0.6 0.7 0.8 0.9 1.0

810 n2 = 19

mean = 0.902 median = 0.940

0 100 200 300 400 500

810 n2 = 19

Motivating example: MenSS trial Skewness & “structural values”

• The MenSS pilot RCT evaluates the cost-effectiveness of a new digital interventionto reduce the incidence of STI in young men with respect to the SOC

– QALYs calculated from utilities (EQ-5D 3L)– Total costs calculated from different components (no baseline)

0.5 0.6 0.7 0.8 0.9 1.0

810 n1 = 27

mean = 0.904 median = 0.930

0 200 400 600 800 1000

810 n1 = 27

0.5 0.6 0.7 0.8 0.9 1.0

810 n2 = 19

mean = 0.902 median = 0.940

0 100 200 300 400 500

810 n2 = 19

Modelling

1 Bivariate Normal– Simpler and closer to “standard” frequentist model– Account for correlation between QALYs and costs

2 Beta-Gamma– Account for– Model the relevant ranges: QALYs ∈ (0, 1) and costs ∈ (0,∞)– But: needs to rescale observed data e∗it = (eit − ε) to avoid spikes at 1

3 Hurdle model– Model eit as a mixture to account for correlation between outcomes, model the

relevant ranges and account for structural values– May expand to account for partially observed baseline utility u0it

φictψct

µct βt

µet u∗0it αtMarginal model for e

eit ∼ Normal(φeit, ψet)

φeit = µet + αt(u0it − u0t)

φeit = µet + αtu∗0it

Conditional model for c | ecit | eit ∼ Normal(φcit, ψct)

φcit = µct + βt(eit − µet)

Modelling

2 Beta-Gamma– Account for correlation between outcomes– Model the relevant ranges: QALYs ∈ (0, 1) and costs ∈ (0,∞)– But: needs to rescale observed data e∗it = (eit − ε) to avoid spikes at 1

φictψct

µct βt

e∗it

µet u∗0it αt

Marginal model for e∗

e∗it ∼ Beta (φeitψet, (1− φeit)ψet)logit(φeit) = µet + αt(u0it − u0t)

φeit = µet + αtu∗0it

Conditional model for c | e∗

cit | e∗it ∼ Gamma(ψctφcit, ψct)

log(φcit) = µct + βt(e∗it − µet)

Modelling

2 Beta-Gamma– Account for– Model the relevant ranges: QALYs ∈ (0, 1) and costs ∈ (0,∞)– But: needs to rescale observed data e∗it = (eit − ε) to avoid spikes at 1

φictψct

µct βt

φiet ψet

µ<1et

u∗0it αt

e∗itπit

ditXit ηt

Model for the structural ones

dit := I(eit = 1) ∼ Bernoulli(πit)

logit(πit) = Xitηt

Mixture model for e

e1it := 1

e<1it ∼ Beta (φeitψet, (1− φeit)ψet)

logit(φeit) = µ<1et + αt(u0it − u0t)

logit(φeit) = µ<1et + αtu

∗0it

e∗it = πite1it + (1− πit)e<1

µet = (1− πt)µ<1et + πt

Conditional model for c | e∗

cit | e∗it ∼ Gamma(ψctφcit, ψct)

log(φcit) = µct + βt(e∗it − µet)

Results — estimation of the main parameters (CCA + MAR)

QALYs (Control)0.80 0.85 0.90 0.95 1.00

QALYs (Intervention)0.80 0.85 0.90 0.95 1.00

Costs (Control)100 200 300 400 500

Costs (Intervention)100 200 300 400 500

Bivariate Normal

Beta-Gamma

Hurdle model

µe1 µe2 µc1 µc2

—•— Fully observed—•— All (MAR)

Probability of structural ones0.1 0.2 0.3 0.4 0.5

Mean QALY (non−ones)0.75 0.80 0.85 0.90

Intervention (t=2)

Control (t=1)

πt µ<1et

Bayesian multiple imputation (under MAR)

●●●

●●

● ●●

● ● ●●

●●

● ● ●

●● ●●

●●●

●●

●● ●● ●●

●●

● ●

●● ●●

●●

● ●● ●● ● ● ● ● ● ● ● ●Q

●●●

●●

● ●●

● ● ●●

●●

● ● ●

●●● ●●

●●●

●●

●●●

●●

●● ●● ●●

● ●

●● ●●

●●

● ●● ●● ● ● ● ● ● ● ● ●

●●

●●●

●● ●●

●●● ●●

●●

● ●●

●●

● ●●

● ●

●● ●●

●●

●● ●●

●●

● ● ● ●

Bivariate Normal

Individuals (n1 = 75) Individuals (n2 = 84)

Beta-Gamma

Hurdle model

—•— Imputed, observed baseline—•— Imputed, missing baseline× Observed

• We observe n01 = 13 and n02 = 22 individuals with u0it = 1 and ujit = NA, forj > 1

• For those individuals, we cannot compute directly the structural one indicator ditand so need to make assumptions/model this

– Sensitivity analysis to alternative MNAR departures from MAR

MNAR1. Set dit = 1 for all individuals with unit observed baseline utility

MNAR2. Set dit = 0 for all individuals with unit observed baseline utility

MNAR3. Set dit = 1 for the n01 = 13 individuals with u0i1 = 1 and dit = 0 for then02 = 22 individuals with u0i2 = 1

MNAR4. Set dit = 0 for the n01 = 13 individuals with u0i1 = 1 and dit = 1 for then02 = 22 individuals with u0i2 = 1

Results — MNAR

Probability of structural ones0.0 0.1 0.2 0.3 0.4 0.5 0.6

Mean QALY0.75 0.80 0.85 0.90 0.95

πt µet

—•— Control (t = 1)—•— Intervention(t = 2)

Cost-effectiveness analysis

Conclusions

• A full Bayesian approach to handling missing data extends standard“imputation methods”

– Can consider MAR and MNAR with relatively little expansion to the basic model

• Particularly helpful in cost-effectiveness analysis, to account for– Asymmetrical distributions for the main outcomes– Correlation between costs & benefits– Structural values (eg spikes at 1 for utilities or spikes at 0 for costs)

• Need specialised software + coding skills– R package missingHE under development to implement a set of general models– Preliminary work available at https://github.com/giabaio/missingHE

– Eventually, will be able to combine with existing packages (eg BCEA:http://www.statistica.it/gianluca/BCEA; https://github.com/giabaio/BCEA) toperform the whole economic analysis

Thank you!

Full Bayesian models to handle missing values in cost ...statistica.it/gianluca/Talks/RSS.pdf ·...

Documents

CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ... · Examples: Missing Data Modeling And Bayesian Analysis 443 CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS

Semiparametric Bayesian Instrumental Variables …...Semiparametric Bayesian Instrumental Variables Estimation for Nonignorable Missing Instruments Ryo Kato Research Institute for

Modeling Fake Missing Transverse Energy with Bayesian Neural NetwoRkS

A Bayesian Maritime Search Model for Missing Aircraft

Bayesian Methods to Handle Missing Data in High-Dimensional Data Sets using Factor Analysis Strategies Thomas R. Belin UCLA Department of Biostatistics

Bayesian Nonparametric Adaptive Control of Time-varying Systems using Gaussian Processesdspace.mit.edu/bitstream/handle/1721.1/77933/Chowdhary... · 2019-04-10 · Bayesian Nonparametric

Bayesian Search for Missing Aircraft - inf.ed.ac.uk · Lawrence D. Stone Bayesian Search for Missing Aircraft 20 April 2017 Bayesian search theory provides a principled and successful

How To Handle Missing Information Without Using NULL

Bayesian Approaches to Handling Missing Data › Missing2012 › MissingDataLectureNotes.pdf · Bayesian Approaches to Handling Missing Data Nicky Best and Alexina Mason BIAS Short

Missing data & how to handle it.pptx

Learning Bayesian Belief Networks with Neural Network ... · Learning Bayesian Belief Networks with Neural Network Estimators 579 the presence of data points with missing values

Cumulative bayesian ridge for handling missing data · Cumulative bayesian ridge for handling missing data Samih M. Mostafa a,* Abdelrahman S. Mohamed a and Safwat Hamad b aMathematics

MACHINE LEARNING TOOLBOX - Amazon S3 · Machine Learning Toolbox Dealing with missing values Most models require numbers, can’t handle missing data Common approach: remove rows

Variational Bayesian Learning of ICA with Missing Data

Bayesian Missing Data Problems - GBV

On Factor Models with Random Missing: EM Estimation ...mis.sem.tsinghua.edu.cn/ueditor/jsp/upload/file/... · To handle the missing data problem in factor models eﬀectively, two

Sequential Imputations and Bayesian Missing Data Problems ... · For missing data problems, ... The multiple complete data sets used in the mixture are ideally created by draws from

Statistical Methods for Drug Safetydocshare01.docshare.tips/files/31056/310569218.pdf · Bayesian Analysis Made Simple: An Excel ... Gianluca Baio Bayesian Missing Data Problems:

Running Head: BAYESIAN MEDIATION WITH MISSING DATA 1 · Running Head: BAYESIAN MEDIATION WITH MISSING DATA 5 ̂ (2) where is the slope coefficient from the regression of M on X, is

ITEM #0127039 SINGLE-HANDLE KITCHEN FAUCETpdf.lowes.com/installationguides/6953123900396_install.pdfSINGLE-HANDLE KITCHEN FAUCET Questions, problems, missing parts? Before returning