47
May 25, 2011 Evaluation of Generalized Additive Models for Spatial Analyses of Cancer Data Verónica Vieira Department of Environmental Health

Evaluation of Generalized Additive Models for Spatial

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Generalized Additive Models and Power of Smoother Hypothesis TestsMay 25, 2011
Evaluation of Generalized Additive Models for Spatial Analyses of Cancer Data
Verónica Vieira Department of Environmental Health
Outline Generalized Additive Models
Cancer Study Spatial Analyses Investigating New Hypothesis
Conclusions and Future Research
Rationale Cancer Registry Analyses
Latency Spatial confounding Population density
Applying non-parametric methods to population- based case-control data is one method for dealing with these issues
Data Source: Silent Spring Institute
Non-parametric Regression
nonparametric regression (smoothing)
Generalized Additive Models
A generalization of Generalized Linear Models Allows simultaneous smoothing and adjustment for
covariates Advantages include optimal degree of smoothing
and global and local hypothesis testing
bivariate smoothing function of location
logit[p(x1,x2)] = S(x1,x2) + γz
Loess Smoother
Locally weighted straight line smoother combines advantages of nearest neighbor & fixed kernel
Tri-cube weight function assigned to k nearest neighbors of target point x0 is adaptive to changes in data density
Smooth at x0 is the fitted value from the weighted linear fit Percent of points in dataset included in the k nearest
neighbors is called the span
Span = degree of smoothing in LOESS Apply GAM across a range of possible span sizes Select model that minimizes AIC statistic Corresponding span is the “Optimal Span Size”
Optimal Span Size
Hypothesis Testing
Global Hypothesis Testing H0: There is no association between smoothed location and disease risk HA: There is an association between smoothed location and disease risk
Testing methods: Approximate Chi-Square Test
Based on the likelihood ratio test Known to be only approximate
Approximate statistic, degrees of freedom, p-value available Produced by S-Plus and R
Conditional Permutation Test
Conditional Permutation Test
Global Hypothesis Test Procedure: 1. Select span size by minimizing AIC statistic 2. Compute the difference in deviance between the
two models with and without the smooth 3. Perform 999 permutations of smoothed location
using optimal span, otherwise maintaining link of outcome and non-smoothed covariates
4. Rank statistics from lowest to highest values 5. For nominal significance level of 0.05, if observed
statistic falls in top 2.5% of the permutation distribution then reject H0
Alternative Methods CPT has an inflated type I error rate
There is approximately twice the probability of falsely rejecting the null hypothesis when applied with a bivariate smooth.
In practice, when observed p-values are extreme, e.g. in the upper 2.5%, investigators may feel confident with the study results
Unconditional Permutation Test Perform same span selection procedure for each permuted
dataset Compare observed statistic to permutation distribution as
previously described Correct type I error rate but computationally more intensive
Spatial Scan Statistic (SaTScan) Popular method for cluster detection A likelihood ratio test
Create circles centered at each observation with radii varying continuously from zero to an upper limit
For a given circle, the likelihood of being a case within that circle can be calculated Number of individuals at risk and number of cases
in region are assumed known The most likely cluster maximizes the likelihood
Calculates Risk Ratios but does not produce a map Applicable through freely available software SaTScan
Simulation Study Compare the “sensitivity” of GAMs and SaTScan to
detect clusters given that H0 is rejected Simulated Data
Parameters: Dichotomous Outcome
Three Cluster Patterns OR = 0.5, 1.0, 2.0, 3.0
n=1000 1000 datasets simulated for each parameter combination Nominal significance level = 0.05
Sensitivity to Detect Cluster
Locating areas of increased/decreased risk Performed if global null hypothesis was rejected Generalized Additive Model Screening Tool
Overlay region map with a fine regular grid Produce pointwise predicted logodds from models
applied to observed and permuted datasets For each point:
Compare predicted logodds to permutation distribution of predicted logodds
If predicted value from observed data falls in the upper/lower 2.5% of the distribution, the point belongs to a hot-/coldspot
Spatial Scan Statistic Most likely cluster identified by SaTScan
gam-CPT ~ SatScan
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
OR
gam-CPT
SaTScan
Case 2: Proportion of datasets correctly detecting the point-source (center)
0
0.2
0.4
0.6
0.8
1
1.2
OR
gam-CPT
SaTScan
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
OR
gam-CPT
SaTScan
Summary
Identifying Risk Circular cluster (percent area detected) Point source (probability of finding center) Line source (percent line detected)
Advantages/Disadvantages Non-technical software available*
Regression/regression based inference Continuous or dichotomous outcomes Adjustment for covariates
Choosing a Method Outcome of interest? Sample size? Hypothesized shape a priori? Exploratory analysis?
Scan Statistic GAM/CPT GAM/CPT Both GAM/CPT GAM/CPT
Scan Statistic GAM/CPT GAM/CPT Both^
Motivating Problem – Breast Cancer
Study Population Upper Cape Cancer Study (Aschengrau et al.1989) Women’s Health on Cape Cod Study (Aschengrau
et al.1997) Diagnosis
Study Area
Residential History Geocoded addresses for past 40 years Residency years for all addresses Water supply: private or public water 1,480 study participants; 2,432 residences
ControlsCases
Distribution of Subjects
Space-Time Analysis
Hypothesis Generating
of wastewater contamination in many of its public water supplies.
Drinking water contaminated by wastewater is a potential source of exposure to mammary carcinogens.
Mammary carcinogens include benzene and other organic solvents; PAHs; some pesticides; and some pharmaceuticals and endogenous hormones.
Barnstable Water Co. (BWC) Wells
BWC wells (2)
after primary treatment into the groundwater through sand filter beds.
Annual Barnstable Town Reports provided the gallons of sewage and private septage processed at the facility every year.
Residences on Public Water
Private Wells in Plumes
Ground Water Model in GIS Darcy velocity estimates direction and
magnitude of ground water flow based on elevation, porosity, thickness, and transmissivity.
Constants: Porosity = 0.35; Thickness = 60 meters; Transmissivity = 180 m2/day
Elevation varied across study area. Concentration gradient was produced
using the resulting groundwater flow and available effluent volumes.
Exposure Assessment
Using GIS, plume tracked from WWTF for 1 to 60 years beginning in 1937
First reached public drinking water well in 1966
Spatially joined residences to plume Residential history was used to determine
relative exposure measures.
Model Validation
Our plume matches the plume modeled by the USGS in 1993.
Nitrate samples were collected from the public wells starting in 1972. Values are highest for the wells located in the Barnstable plume.
Statistical Analyses Divided exposure into durations based on
the distribution among controls Considered (1) exposure over the entire
residential period and (2) exposure restricted by a latency period
Used common unexposed reference group of 700 controls and 533 cases for these analyses
Controlled for age, vital status, family hx, personal hx, age at first birth, education, race and study of origin
Latency period ≤5 years >5 years
0
15
Plume Analyses
detecting clusters that addresses many limitations of cancer registry analyses
Performance is similar or better than SaTScan for detecting the exposure source in different cluster scenarios
Cluster analyses can provide new hypotheses for further research
Breast cancer cluster in Cape Cod likely due to contaminated drinking water
Future Research
Verify results More complex alternative hypotheses
Irregular edges, non-uniform population density, multiple clusters, areas of sparse data
Alternative smoothing types Splines
http://www.busrp.org/ http://www.cireeh.org/pmwiki.php/Main/SpatialEpidemiology
Acknowledgement Funding: This work was supported by the Superfund Basic Research Program 5 P42 ES007381 and National Cancer Institute 5R03CA119703-02.
Coauthors: Robin Young1, Lisa Gallagher2, Tom Webster2, Janice Weinberg1, Ann Aschengrau3, Depts. Biostatistics1, Environmental Health2, and Epidemiology3
Boston University School of Public Health, Boston, MA
Thank you.
Evaluation of Generalized Additive Models for Spatial Analyses of Cancer Data
Outline
Rationale
Wastewater Treatment Facility
Exposure Assessment