20
Combining prevalence estimates from multiple sources Julian Flowers

Combining prevalence estimates from multiple sources Julian Flowers

Embed Size (px)

Citation preview

Page 1: Combining prevalence estimates from multiple sources Julian Flowers

Combining prevalence estimates from multiple sources

Julian Flowers

Page 2: Combining prevalence estimates from multiple sources Julian Flowers

The problem (1)...

• No systematic way of monitoring health behaviours at small area level in England

• => Have smoking targets but don’t know smoking prevalence for PCTs/ districts

• But multiple potential sources of data– Surveys

– Commercial datasets

– GP data

– Synthetic estimates

Page 3: Combining prevalence estimates from multiple sources Julian Flowers

• Tend to use “favourite” data sources

• Different datasets give different answers

• But all may have useful information about smoking

• Question...what is the best estimate of smoking prevalence given the data we have...?

The problem (2)...

Page 4: Combining prevalence estimates from multiple sources Julian Flowers

7 Datasets about districts...

• Synthetic estimates (from DH) for districts based on Health Survey for England 2003-5

• Estimates based on commercial data abut tobacco expenditure by households at small area (actually a synthetic estimate)

• 3 years of commercial data based on responses to market research data

• Separate analysis of HSE by ASH

Page 5: Combining prevalence estimates from multiple sources Julian Flowers

7 datasets...

• All biased in someway – some estimates looked to low; some not well correlated – which one(s) to believe

• ? Could/ should they be combined –if so how (heptangulation...)

Page 6: Combining prevalence estimates from multiple sources Julian Flowers
Page 7: Combining prevalence estimates from multiple sources Julian Flowers

X axis HP02 Acx03 Acx04 Acx05 CACI05 ASH02

Y ax

is

Acx03

0.74

Acx04

0.63 0.54

Acx05

0.83 0.78 0.58

CACI05

0.71 0.67 0.49 0.57

ASH02

0.89 0.80 0.62 0.75 0.89

HP05

0.89 0.79 0.61 0.77 0.72 0.89

Page 8: Combining prevalence estimates from multiple sources Julian Flowers
Page 9: Combining prevalence estimates from multiple sources Julian Flowers

9 08/01/2008

The situation in the East of England: different estimates from different sources

Motivation for combining estimates

Proportion meta-analysis plot [random effects]

0.17 0.22 0.27 0.32 0.37

combined 0.26 (0.21, 0.32)

ASH estimates 0.30 (0.30, 0.31)

Synthetic estimates 0.27 (0.27, 0.27)

HDA 2003 0.33 (0.32, 0.33)

Axciom 0.17 (0.17, 0.18)

CACI 2005 0.24 (0.24, 0.24)

proportion (95% confidence interval)

Basildon: pooled smoking prevalence estimates

Page 10: Combining prevalence estimates from multiple sources Julian Flowers

Bayesian modelling

• Work with MRC Biostatistics Unit

• Based on work looking at bias adjusted meta-analysis

• Idea is that in meta analysis should include all relevant studies which contain relevant information but weight them according to bias

Page 11: Combining prevalence estimates from multiple sources Julian Flowers

1108/01/2008

Bayesian hierarchical model structure.

Developed in WinBUGS.

Allows for additive bias (Turner et al. 2007, Spiegelhalter and Best 2003).

The model assumes the biases affecting the SP estimates to vary between data sources.

Let be the SP estimate obtained from data source j (j=1,…,7) for LA i (i=1,…,48 for the East of England), be the corresponding sampling variance (obtained from the 95% confidence limits and assumed known) and the corresponding biases assumed exchangeable within data sources. Then the SP estimates are believed to be generated by a normal distribution with mean and variance , where is the true SP estimate for the i-th LA.

A constraint is needed: our choice is an overall 23% smoking prevalence for the East of England.

Several variants of this model (included a multivariate model aiming to detect correlation among data sources) have been performed with no significant differences.

The basic model

i ij 2ij

Model

ijy2ij

ij

i

Page 12: Combining prevalence estimates from multiple sources Julian Flowers

12 08/01/2008

Synthetic + classical + recent approaches

Statistical literature

Multilevel synthetic estimation (Twigg et al. 2000): using a multilevel modelling approach and nesting individuals within postcode sectors within health authorities, multilevel-derived synthetic estimates are obtained by means of ecological and individual variables associated with the phenomenon of interest. Prevalence estimates can be combined directly from surveys.

Multiple-frame estimation (Lohr and Rao, 2000; 2006): different sampling frames (not necessarily non-overlapping) whose union covers the whole population are considered and probability samples are drawn independently from each frame. Samples are then properly combined to obtain optimal linear estimators of population quantities. The survey database is needed.

Statistical matching (Rodgers, 1984; Moriarity and Scheuren, 2001) considers records of subjects having “similar profiles” from different data sources, and puts together different information from them. The survey database is needed.

Scoring method (Elliot and Davis, 2005): this method is based on adjusting the survey weights such that the complementary strengths of each survey in terms of sample size or unbiasedness are exchanged. The surveys are therefore scored consequently. The survey database is needed.

Bayesian hierarchical methods: a recent work by Raghunathan et al., 2007 addresses the problem of combining prevalence rates from two surveys by means of a hierarchical Bayesian approach. One of the two surveys is believed less biased in terms of coverage and contains information about the presence of a telephone line at home. Survey respondents are then divided in two groups, depending on whether or not they have a telephone at home. The other survey is based on telephone interviews only and for this reason is believed more biased, but its size is bigger. The hierarchical Bayesian model maps the bigger survey with the information on telephone provided by the less biased survey. Prevalence estimates can be combined directly from surveys.

Page 13: Combining prevalence estimates from multiple sources Julian Flowers
Page 14: Combining prevalence estimates from multiple sources Julian Flowers
Page 15: Combining prevalence estimates from multiple sources Julian Flowers
Page 16: Combining prevalence estimates from multiple sources Julian Flowers
Page 17: Combining prevalence estimates from multiple sources Julian Flowers

Modelled estimates with CIs

Page 18: Combining prevalence estimates from multiple sources Julian Flowers

Comparison with 2008 survey

Page 19: Combining prevalence estimates from multiple sources Julian Flowers

Conclusions

• Bayesian hierarchical models can be used to pool prevalence estimates from different sources adjusting for measured bias in each source. This is a type of formal triangulation of data.

• This method can be used to when direct estimates are not available.

• It could be applied to any life-style or prevalence data where multiple sources are available

• Further work is need to compare modelled estimates with direct estimates and other for other life-style behaviours

• Further work is needed to implement the modelling in conventional statistical packages

• Local surveys can help to recalibrate the models on a regular basis

Page 20: Combining prevalence estimates from multiple sources Julian Flowers

Modelling bias in combining small-area prevalence estimates

from multiple surveys

 

Giancarlo Manzi1,, David J Spiegelhalter1, Rebecca M Turner1,

Julian Flowers2, Simon G Thompson1

 1MRC Biostatistics Unit, Institute of Public Health, Cambridge, UK

 2Eastern Region Public Health Observatory, Institute of Public Health,

Cambridge, UK

 

Current address: Department of Economics, Business and Statistics, University of Milan, Italy.