Upload
nguyentuyen
View
218
Download
2
Embed Size (px)
Citation preview
1
Multi-site Time Series Analysis
Motivation and Methodology
SAMSI Spatial EpidemiologyFall 2009
Howard Chang
2
Epidemiology
The study of factors affecting
the health of human populations
Some objectives of epidemiologic studies:
– Identify the cause of a disease and its risk factors.
– Measure the extent and occurrence of the disease.
– Quantify the burden of the disease.
– Evaluate current methods of health care delivery.
– Create preventive and intervention programs.
– Provide information for policy and regulatory decisions.
3
First Step in Epidemiology
Exposure Adverse Health Outcome?
Exposure Examples
A few factors studied for breast cancer:
genes, physical activity, schizophrenia, birth-weight, obesity,
consumption of fruits and vegetables, total visual blindness, arthritis,
… (about 22,000 hits from PubMed)
Health Outcome Examples
Some ways to measure frailty in the elderly:
slow walking speed, poor grip strength, exhaustion,
unintended weight loss and low physical activity
4
Challenges in Epidemiologic Study
Test subjects = Humans
Study Design
– How to select and recruit subjects?
• experimental versus observational
• sample size and cost
– How to define and measure exposure?
• duration, intensity
– Ethical concerns
Interpretations
– How to establish causation through associations?
– Can the results be generalized to the whole population?
Bias, Confounder, Interaction
5
The London Smog (1952)
Adverse health effects of extreme air pollution are well established.
6
Air Pollution Epidemiology
Scientific Question:
Does everyday level of air pollution affect human health?
Motivations:
Air pollution is experienced by everyone and there is no alternative to
breathing!
The health impact and economic cost of the population can be substantial.
Ambient pollutants are mostly generated by human activities and regulatory
policies are required to protect public health.
7
Background
The EPA currently regulate six criteria pollutants:
Ozone, particulate matter, carbon monoxide, nitrogen oxides,
sulfur dioxide and lead
The National Ambient Air Quality Standards (NAAQS) provide limits
on both long-term and short-term exposure.
Example: Fine particulate matter (PM2.5)
Similarly the health effects of air pollution are classified as chronic or
acute that are estimated using different study designs.
15 µg/m3Annual
35 µg/m324-hour
LevelAveraging Time
8
Time Series Analysis
It is the most common population-based study design to estimate the
short-term (acute) health effects of air pollution.
IDEA: Quantify the association between daily variations in air pollution
level and variations in daily adverse health outcomes.
Example: Cook, IL
9
Chronic Health Effects
Cannot use the time series design that relies on temporal (between
days) comparison.
Study of chronic health effect quantifies the association between spatial
variation in air pollution level and health outcomes in different
geographic areas.
Annual Average Level of PM2.5 (µg/m3)
10
Multi-site Time Series Analysis
Goal: Estimate the acute health effect of an exposure that
varies both spatially and temporally.
Daily Variation Spatial Variation
050
100
150
Kern County
Daily PM2.5 Level
1999 2000 2001 2002 2003
010
20
30
40
50
King County
Daily PM2.5 Level
1999 2000 2001 2002 2003
Annual Average Level of PM2.5 (µg/m3)
11
Multi-site Time Series Analysis
Stage I
A single-site time series analysis is conducted within a community such
as a city, a county, or a metropolitan area.
Data:Outcome of interest: daily count for an adverse health outcome in the
community. Example: hospital admissions, deaths
Exposure of interest: daily community-level exposure to air pollution that reflects the average level of exposure experienced by all at-risk individuals.
Other known predictors (confounders) of the health outcome, such as
temperature, humidity, …
Stage II
A multi-site analysis combines the health effects across locations.
12
Case Study Example: NMMAPS
National Morbidity, Mortality, Air Pollution Study
– Study period: 1987 ~ 2000
– 108 urban communities (cities).
– Daily mortality count from National Center for Health Statistics
– Daily air pollution data (PM2.5, PM10, O3, NO2, SO2, CO)
– Weather data from the National Climate Data Center
– City characteristics from the 2000 Census
13
NMMAPS Resources
Website: http://www.ihapss.jhsph.edu/
Book:
14
Case Study Example: MCAPSMedicare and Air Pollution Study
– Study period: 1999 ~ 2005 (on-going)
– Approximately 204 counties
– Medicare enrollees aged 65 or above
– Daily hospital admission count for primary diagnosis
– 11.5 million Medicare enrollees residing an average of 5.9 miles from a PM2.5
15
Case Study
Study Population
Medicare Enrollees from 204 US counties with population greater than 200,000
Exposure Data
Time series of daily county-level average concentrations of PM2.5 were calculated
using measurements from EPA's monitoring network.
Health Outcome Data
Time series of daily number of hospitalization for various cardiovascular and
respiratory diseases were constructed for each county.
Time series of the total number of at-risk individuals for each hospitalization
outcome.
16
Stage I County-specific Model
sconfounderxN ptcccctct +++= − )(loglog βαµ
( )ctct Poissony µ~
For county c:
yct = number of admission on day t
xc(t-p) = county-level PM2.5 exposure on day with lag p
(ex. p = 0 for same-day exposure; p = 1 for previous-day exposure)
Nct = population at risk on day t
For each county separately, we model the count outcome via Poisson
regression with over-dispersion:
17
Stage I Modelling
Time series analysis is ecological in time:
(1) We regress aggregated health outcome on aggregated
exposure.
(2) Day serves as the unit of comparison.
Over-dispersion may be due to residual confounding, measurement error, or
ecological bias.
The acute health effect βc represents:
county-specific log relative risk associated per unit increase in
same-day PM2.5 level controlling for known confounders.
% increase in hospital admissions associated per unit increase in
same-day PM2.5 level controlling for known confounders.
a single number with great policy implication!
18
Confounders
Also known as hidden variables or lurking variables.
In establishing whether A causes B,
factor C is a confounder if:
(1) C is a known risk factor for B
(2) C is associated with A but not
in the causal pathway of A.
(B) Health Outcome
(A) Air Pollution
(C) Temperature?
19
Controlling for Confounders
It is important to rigorously control for confounders. A typical model will include:
• Day of the week
• Age-group categories (under 65 versus 65 to 75 versus 75+)
• Smooth function of calendar time to control for long-term trends and seasonality due to
epidemics of influenza and respiratory infections.
• Interaction between age-group and smooth function of time
• Smooth functions of current-day and previous-day temperature
• Smooth function of current-day and previous-day dew-point temperature to control for humidity
Smooth functions for the confounders are modelled via natural cubic spline.
Note that confounders that do no vary with time is automatically controlled for!
20
Controlling for Confounders Examples
(1) Mortality and Temperature
Association between lag 1 PM10 and mortality as
the number of lags of temperature included in the
model is increased, New York, NY, 1987–2000.
(2) Mortality and Time
Estimates of the log relative risk PM10 for
Denver, Colorado, 1987–2000, as the number
of degrees of freedom per year in the smooth
function of time is varied
Peng RD, Dominici F (2008). Statistical Methods for Environmental Epidemiology in R: A Case Study in Air Pollution and Health, Springer.
21
Stage II Combining Across Locations
),(~ 2τµβ Normalc
A simple hierarchical model:
Assuming the true location-specific log relative risks are independent across
locations,
µ = ( pooled / overall / average / national ) relative risk
= between-county variability (spatial heterogeneity) in relative risks
One can view the adverse health effects of PM2.5 as treatments that were randomly
assigned to the selected counties or that the risks are exchangeable among counties.
2τ
22
Estimation
We cannot carry out estimation for both Stage I and Stage II simultaneously because of
the large number of county-specific regression coefficients for confounders.
A two-stage approximation approach:
1. First estimate county-specific log relative risk and its variance
2. Use an MLE-based Normal approximation:
cβ̂ cV̂
)ˆ,(~|ˆcccc VNormal βββ
),(~ 2τµβ Normalc
The above two-level Normal-Normal model can be estimated via MCMC,
programs for meta-analysis, or the TLNISE algorithm of Everson and
Morris (2000)
23
National Estimates for PM2.5 and Admissions
24
Example: County-specific Effect of PM10 on Mortality
F. Dominici, A. McDermott, M. Daniels, S. L. Zeger, and J. M. Samet. Mortality among residents of 90 cities. In Revised Analyses of Time-Series Studies of Air Pollution and
Health, pages 9–24. The Health Effects Institute, Cambridge, MA, 2003.
MLE Estimates
25
Example: County-specific Effect of PM10 on Mortality
F. Dominici, A. McDermott, M. Daniels, S. L. Zeger, and J. M. Samet. Mortality among residents of 90 cities. In Revised Analyses of Time-Series Studies of Air Pollution and
Health, pages 9–24. The Health Effects Institute, Cambridge, MA, 2003.
Bayesian Estimates
26
County-specific Estimates
The hierarchical framework borrows strength across studies (locations).
In Stage I, county-specific relative risks estimates are often poorly estimated.
Example: Mortality and PM10
MLE Bayesian
Log relative rates of mortality from exposure to PM10. areas of the circles are proportional to the posterior precisions of
the Bayesian estimates; larger circles indicate more precise estimates. Black outline denote relative rates with posterior
mean and posterior standard deviation ratio > 1.96
Dominici F. McDermott A. Zeger S.L. Samet J.M. National Maps of the Effects of PM on Mortality: Exploring Geographical Variation
Environmental Health Perspectives vol 111 no 1, 39-43
27
Risk Heterogeneity
The observed heterogeneity in risks can from unmeasured confounders and effect
modifications due to county-specific characteristics.
We can include higher level covariates in the hierarchical model:
),(~ 2τγβ cc ZNormal
County-specific covariates (Zc) may include factors that potentially modify the true
health effects. Examples:
Exposure measurement errorAverage distance between
residents and monitor
Pollutant composition% urbanicity
Socio-economic status% poverty
To test the effect of Variable
28
Example of Risk Heterogeneity
Health Outcome
Air Pollution
East versus West
Does region (East versus West) modify the health effects?
?
29
Example of Health Burden Estimates
N×−×= ]1)10([exp µAnnual reduction
30
Advantages of Multi-site Time Series
• Can achieve large study population and long study period from utilizing publicly
available national air pollution and health surveillance databases
• Day-to-day comparison allows a community to serve as its own control and
unmeasured confounders that are relatively constant between days.
• A multi-site approach combine evidence, borrow information across locations, and
potentially enhance statistical power.
• Multi-site ensures that the same analytic method is used at each location,
minimizing publication/selection bias and allowing better generalizability of the
results.
• Comparing risk estimates from different locations, effect modification due to
location-specific characteristics can be examined.
31
Epidemiologic Evidence and Policy
Regarding the time series design, the EPA’s 2004 Criteria document for
particulate matter states that
``the temporal relationship supports a conclusion of a causal relation, even when both the
outcome and the exposure are community indices.’’
Consistency and Strength
Regarding the evidence on the health effects of fine PM,
`` A growing body of epidemiologic evidence both (a) confirms associations between short-
term ambient exposures to fine-fraction particles (generally indexed by PM2.5) and various
mortality or morbidity endpoint effects and (b) supports the general conclusion that PM2.5
(or one or more PM2.5 components), acting alone and/or in combination with gaseous co-
pollutants, are likely causally related to observed ambient fine particle associated health
effects. ’’