A Data-driven Epidemiological Model

Preview:

DESCRIPTION

A Data-driven Epidemiological Model. Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference on "Data in Complex Systems" Palermo, Italy, April 7-9 2008. Data driven epidemiological models. Complex system Data driven, individual-based simulation Privacy and accuracy issues. - PowerPoint PPT Presentation

Citation preview

Network Dynamics and Simulation Science

Laboratory

A Data-driven Epidemiological Model

Stephen Eubank, Christopher Barrett, Madhav V. Marathe

GIACS Conference on"Data in Complex Systems"

Palermo, Italy, April 7-9 2008

QuickTime™ and aAnimation decompressor

are needed to see this picture.

QuickTime™ and aCinepak decompressor

are needed to see this picture.

Network Dynamics and Simulation Science

Laboratory

Data driven epidemiological models

I. Complex system

II. Data driven, individual-based simulation

III. Privacy and accuracy issues

Network Dynamics and Simulation Science

Laboratory

What’s so complex about epidemiology?

Consider an “outbreak” among 4 people

removedinfectious

susceptible

Network Dynamics and Simulation Science

Laboratory

Outbreaks can be represented as Markov processes

A given configuration of the system probabilistically transitions into any of several other configurations.

Even a small system has many possible configurations.

Network Dynamics and Simulation Science

Laboratory

Very little data is available to estimate this process

Historically, we (partially) observe 1 or 2 Markov chains

We want to estimate transition probabilities on every edge

Network Dynamics and Simulation Science

Laboratory

Aggregation simplifies the model …

… at the cost of reduced information content.

p(C’t+1 | C’t) is less informative than p(Ct+1 | Ct) when C’ C,

0 1 2 3 4 #S

#I

4

3

2

1

0

Network Dynamics and Simulation Science

Laboratory

Other assumptions further simplify the model …

… but are unwarranted in social systems, where components are

1. Heterogenous (distinguishable)2. Intentional (behavior not determined by physical laws)

QuickTime™ and aPNG decompressor

are needed to see this picture.

Network Dynamics and Simulation Science

Laboratory

Aggregation naturally makes contact with observations

Observations of outbreaks often ignore heterogeneity and intention, and provide only point estimates.

“An approximate answer to the right problem is worth a good deal more than

an exact answer to an approximate problem”

- J. Tukey

“All models are wrong, but some are useful”- G.E.P. Box

A system is complex “if its behavior crucially depends on the details of its parts.”

- G. Parisi

Network Dynamics and Simulation Science

Laboratory

Interaction approach simplifies process itself

Interactions among system components completely determine transition probabilities among configurations

replaced with

Network Dynamics and Simulation Science

Laboratory

Calibrating with unexpectedly rich data

• For aerosol borne pathogens, the probability of transmission

is related to physical proximity, duration, etc.

• The interaction approach reduces to estimating a social network.

• There is much more data available for this than for outbreaks.

• But it is not directly observable.

How can we estimate a social network?

Network Dynamics and Simulation Science

Laboratory

A possible approach we didn’t use

• Consider a subset of random networks subject to certain constraints• Constraints should be relevant to the global dynamics, i.e. epidemics• But what are those? A “chicken or the egg” problem:

It would seem offhand that a taxonomy of “nets” … would arise naturally from the consideration of the statistical parameters... But the statistical parameters themselves are singled out on the basis of taxonomic considerations, which have yet to be clarified.

- Anatol Rapoport and William Horvath, Behav Sci. 1961, 6, 279–291

Network Dynamics and Simulation Science

Laboratory

Questions to drive model development

1. What is the optimal targeted allocation of antivirals used prophylactically or therapeutically to mitigate influenza pandemic?

2. What combination of targeted antivirals and feasible, community-based, non-pharmaceutical interventions (e.g. closing schools, allowing liberal leave from work) can best delay an outbreak from becoming epidemic for several months?

1 & 2 Models must compare changes in social network with changes in transmissibility

This is an example of policy informatics for complex systems

Network Dynamics and Simulation Science

Laboratory

Interventions specified naturally by effect on network

No single “knob” reduces overall transmission by 50%

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Network Dynamics and Simulation Science

Laboratory

Step 1. Create a synthetic population

• Census data– Individual demographics

• Age and gender

– Household characteristics• Size and Income

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Network Dynamics and Simulation Science

Laboratory

• Start from a proto-population, e.g. a list of ids.• Add observed data• Capture correlations in data using statistical models

(iterative proportional fitting from Public Use Microdata)

• Start from a proto-population, e.g. a list of ids.• Add observed data• Start from a proto-population, e.g. a list of ids.

Successive refinement of synthetic data

ID GenderHousehold

1 M 1

.

.

.

.

.

.

.

.

.

3 x 108 F 1.2 x 108

Network Dynamics and Simulation Science

Laboratory

Step 2. Assign activities, locations & times

• Locations – Dunn and Bradstreet data

• Activity surveys– Matched to households by demographics

– Matched to locations by activity type & travel time

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Network Dynamics and Simulation Science

Laboratory

• Surveys are very different kinds of data sources than census• This step depends on data fusion capability• Some values may be outcomes of very large games, not statistical models

Successive refinement of synthetic data

• Surveys are very different kinds of data sources than census• This step depends on data fusion capability

ID GenderHousehold

ActivitiesActivity

LocationsActivityTimes

1 M 1Schoolshop

2743

8:003:00

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

3 x 108 F 1.2 x 108 Worksocial

98734723947

9:007:30

Network Dynamics and Simulation Science

Laboratory

So far: a typical family’s day

Carpool

HomeHome

Work Lunch WorkCarpool

Bus

Shopping

Car

Daycare

Car

School

time

Bus

Network Dynamics and Simulation Science

Laboratory

Overlapping families’ days create a social network

Network Dynamics and Simulation Science

Laboratory

Successive refinement of synthetic data

• Gives us a generative model for contacts• More powerful than traditional encapsulated agents• Note: each byte of data / person adds ~300 MB to the database

ID GenderHousehold

ActivitiesActivity

LocationsActivityTimes

ContactsContactDuration

1 M 1Schoolshop

2743

8:003:00

2,3,4836, 289

5:200:45

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

3 x 108 F 1.2 x 108 Worksocial

98734723947

9:007:30

Network Dynamics and Simulation Science

Laboratory

Using data for purposes other than intended

Possibly the only epidemiological model that hasbeen calibrated using automobile traffic counts!

(Because the same activity model generates both transportation demand and contact networks)

QuickTime™ and aCinepak decompressor

are needed to see this picture.

Network Dynamics and Simulation Science

Laboratory

HomeHome

Activities adapt to situation & generate network changesActivities adapt to situation & generate network changes

Network Dynamics and Simulation Science

Laboratory

Derive disease interaction from social network

Interactions only need to get a few things right:• Susceptibility• Infectivity as a function of time since exposure

Network Dynamics and Simulation Science

Laboratory

Modeling pandemic influenza

• Nobody knows what pandemic flu will look like

• Assume something like seasonal flu, but with less immunity

• Create several “flu” bugs in siico– Moderate (10% attack rate)– Strong (20 - 25% attack rate) – Catastrophic (> 50% attack rate)

• For each, fix other characteristics:– Incubation period: 2-3 days– Infectious period: 2-5 days

Network Dynamics and Simulation Science

Laboratory

Resolution, fidelity, and accuracy are different

• Resolution describes level of aggregation,

e.g. individuals vs populations

• Fidelity describes the completeness of the representation’s features,

e.g. age vs (age, gender, income, household size, education)

• Accuracy describes the correctness of features and correlations

e.g. is mixing by age derived from social network correct?

“Validity” (always for a particular question) depends on all 3.

Effect of changes in social networks (above) on disease dynamics (below)

Network Dynamics and Simulation Science

Laboratory

Characterizing the resulting network

Degree Distribution, location-location

Degree Distribution, people-people

Sensitivity to parameters

Sensitivity to parameters

Network Dynamics and Simulation Science

Laboratory

Assortative Mixing

• Static people - people projection is assortative – by degree (~0.25)– but not as strongly by age, income, household size, …

This is

• Like other social networks • Unlike

– technological networks, – Erdos-Renyi random graphs– Barabasi-Albert networks

Removing high degree people useless

Removing high degree locations better

Network Dynamics and Simulation Science

Laboratory

Summary

• Complex systems models are hungry for detail (= data)

• Privacy & extrapolation require “synthetic” data, combining observations (declarative), statistical models, and simulation results (procedural)

• Validity of synthetic data depends on resolution, fidelity, accuracy, and the question it is intended to answer

Network Dynamics and Simulation Science

Laboratory

When is this model simpler?

Notation: x and y are states of a component at time t and t+1

1. Components’ states are updated independently:

# parameters

2. Interactions are pairwise independent:

# parameters

Network Dynamics and Simulation Science

Laboratory

When is this model simpler?

3. Most components do not interact directly:

# parameters

4. Only one state transition, S I, is affected by interactions: # parameters

Architecture

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Network Dynamics and Simulation Science

Laboratory

Computational Resources

• Demonstration experiment– 8 experiments (exp ids: 1083 to 1090)– 24 cells with 200 days and 25 reps

• Computations performed– 291 million contacts * 200 days * 25 reps * 24 cells =

34.92 quadrillion transmission evaluations

• Time Requirements– Single processor: 2 years 340 days– Small cluster (10 nodes, 4 cores): 26 days 18 hours– Current IDAC cluster: > 3 hours

Network Dynamics and Simulation Science

Laboratory

Example Located Synthetic Population

Example Route Plans

HOME

WORKLUNCH

WORK

DOCTOR

SHOP

HOME

HOME

WORK

SHOP

second person in household

first person in household

Network Dynamics and Simulation Science

Laboratory

Time Slice of a Typical Family’s Day

Network Dynamics and Simulation Science

Laboratory

How much does detail matter?

• Interaction picture: – Dynamics of outbreak depend on topology

– How and how much?

– What differences in network topology are relevant to prevention/mitigation

• What statistics capture difference?• Answer staring us in the face (see above):

– Overall attack rate is a function of the topology of the network

• Other measures for other questions– Attack rate by transmissibility as function of edges retained

– Vulnerability of a subset as function of edges retained

– Distribution of vulnerabilities as function of edges retained

How much does detail matter?

Network Dynamics and Simulation Science

Laboratory

Edge deletion in a graph

• RTI synthesized poultry farm network• In collaboration with Upenn, studying outbreaks• National network, essentially complete graph

– Distribution of weights

• Attack rate as function of edges retained• Attack rate by transmissibility as function of edges retained• Vulnerability of a subset as function of edges retained• Distribution of vulnerabilities as function of edges retained

Network Dynamics and Simulation Science

Laboratory

Model comparison

• Compare outcomes of same scenarios– Compare distributions of outcomes of similar scenarios

– Compare distributions of summary statistics of outcomes of similar scenarios

– Compare distributions of answers to questions about similar scenarios

• Compare

Network Dynamics and Simulation Science

Laboratory

Adds up to serious informatics challenge

• Managing the refinement process

• Integrating various data sources & simulations

• Curating the database

• Providing HPC services

• Providing analysis support

Recommended