30
Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

Embed Size (px)

Citation preview

Page 1: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

Data and Statistics: New methods and future challenges

Phil O’NeillUniversity of Nottingham

Page 2: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

Professors: How theyspend their time

Page 3: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

Professors: How theyspend their time

Page 4: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

1. High-resolution genetic data2. Model assessment

Page 5: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

1. High-resolution genetic data2. Model assessment

Page 6: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

Gardy 2011 NEJM

Page 7: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

“High-resolution genetic data”: what are they?

individual-level data on the pathogen can be taken at single or multiple time points high-dimensional e.g. whole genome sequences proportion of individuals sampled could be high/low becoming far more common due to cost reduction

Page 8: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

“High-resolution genetic data”: what use are they?

better inference about transmission paths more reliable estimates of epi quantities? understand evolution of the pathogen

Page 9: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

.

Page 10: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

.

A C C C T T G G G A A A .....

Page 11: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

Modelling and Data Analysis methods

Two kinds of approaches exist:

1. Separate genetic and epidemic components (e.g. Volz, Rasmussen)

2. Combine genetic and epidemic components (e.g. Ypma, Worby, Morelli)

Page 12: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

1. Separate genetic and epidemic components e.g: - estimate phylogenetic tree - given the tree, fit epidemic modelor - cluster individuals into genetically similar groups - given the groups, fit multi-type epidemic model

Page 13: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

1. Separate genetic and epidemic components + “Simple” approach + Avoids complex modelling

- Ignores any relationship between transmission and genetic information

Page 14: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

2. Combine genetic and epidemic components e.g: - model genetic evolution explicitly - define model featuring both genetic and epidemic parts

Page 15: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

2. Combine genetic and epidemic components + “Integrated” approach - Is modelling too detailed? - Initial conditions: typical sequence?

+/- Model differences between individuals instead?

Page 16: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

1. High-resolution genetic data2. Model assessment

Page 17: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

“Model assessment”: what is it?

Does our model fit the data? Is there a better model?

Page 18: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

“Model assessment”: why do it?

Poor fit sheds doubt on conclusions from modelling Model choice can be a tool for directly addressing questions of interest

Page 19: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

Linear regression: yk = axk + b + ek , ek ~ N(0,v)

Minimise distance of model mean from observed data

Page 20: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

Linear regression: yk = axk + b + ek , ek ~ N(0,v)

Minimise distance of model mean from observed data

Page 21: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

For outbreak data:

What are the right residuals? Should observed or unobserved data be compared to the model? (Streftaris and Gibson) Mean model may only be available via simulation Is the mean the right quantity to consider?

Page 22: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

For outbreak data:

What are the right residuals? Should observed or unobserved data be compared to the model? (Streftaris and Gibson) Mean model may only be available via simulation Is the mean the right quantity to consider?

Page 23: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

Simulation-based approaches to model fit:

Forward simulation – “close” to data? Choice of summary statistics? Close ties to ABC methods (McKinley, Neal)

Page 24: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

Approaches to model choice

Hypermodels/saturated models Bayesian non-parametric methods Bayesian methods e.g. RJMCMC Mixture models

Page 25: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

Hypermodels/saturated models

e.g. Infection rates βS or βSI or βSI0.5 in an SIR model? Instead use βSI and estimate

(O’Neill and Wen)

Page 26: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

Bayesian non-parametric methods

e.g. Infection rate β(t)SI or β(t) in an SIR model; Estimate β(t) in a Bayesian non-parametric manner using Gaussian process machinery

(Kypraios, O’Neill and Xu; Knock and Kypraios)

Page 27: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham
Page 28: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

Reversible Jump MCMC

e.g. Distinct models (usually small number), estimate Bayes factors by running MCMC on union of parameter spaces (O’Neill; Neal and Roberts; Knock and O’Neill)

Page 29: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

Mixture models

e.g. Given two models (f, g), create mixture model

f(x) = g(x) + (1- ) h(x);

estimation of enables estimation of Bayes Factors (Kypraios and O’Neill)

Page 30: Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham