16
FINAL REPORT Title: Health effects from wildfire air pollution: a spatiotemporal modeling approach JFSP PROJECT ID: L14AC00173 July 2019 PI Name Michael Jerrett, PhD Professor and Chair Affiliation Department of Environmental Health Sciences and Director, Center for Occupational and Environmental Health Fielding School of Public Health University of California, Los Angeles 650 Charles E. Young Drive S Rm. 56-070 CHS, Mail Code: 177220 Los Angeles, CA 90095 Fax: 1-310-794-2106 Phone: 1-310-825-9037 Email: [email protected]

Jerrett Final Report JFSP V4 Fin - Firescience.Gov · Abstract published in Environmental Health Perspectives as part of the conference proceedings. Jerrett, M. Geographic Information

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Jerrett Final Report JFSP V4 Fin - Firescience.Gov · Abstract published in Environmental Health Perspectives as part of the conference proceedings. Jerrett, M. Geographic Information

FINAL REPORT Title: Health effects from wildfire air pollution: a

spatiotemporal modeling approach

JFSP PROJECT ID: L14AC00173

July 2019

PI Name

Michael Jerrett, PhD Professor and Chair

Affiliation Department of Environmental Health Sciences and Director, Center for Occupational and Environmental Health Fielding School of Public Health University of California, Los Angeles 650 Charles E. Young Drive S Rm. 56-070 CHS, Mail Code: 177220 Los Angeles, CA 90095 Fax: 1-310-794-2106 Phone: 1-310-825-9037 Email: [email protected]

Page 2: Jerrett Final Report JFSP V4 Fin - Firescience.Gov · Abstract published in Environmental Health Perspectives as part of the conference proceedings. Jerrett, M. Geographic Information

2

The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the opinions or policies of the U.S. Government. Mention of trade names or commercial products does not constitute their endorsement by the U.S. Government.

Page 3: Jerrett Final Report JFSP V4 Fin - Firescience.Gov · Abstract published in Environmental Health Perspectives as part of the conference proceedings. Jerrett, M. Geographic Information

3

Abstract The frequency and severity of wildland fires are projected to increase in many parts of the world due to alterations of temperature and precipitation patterns related to climate change, making an understanding of the population health impacts from exposure to air pollution from wildland fires paramount. Although increasing evidence of respiratory health effects from exposure to particulate matter (PM) from wildland fires exists, there is conflicting evidence of whether there are mortality or cardiovascular morbidity effects, despite clear evidence of these effects from ambient PM. Our study has four main areas of significance for understanding the public health impacts associated with wildland fire smoke exposure. Firstly, we have used cutting-edge statistical models to develop spatiotemporal estimates of population exposure to PM2.5 (particulate matter less than 2.5 microns in aerodynamic diameter) and ozone from the 2007 southern California and 2008 northern California mega-complex wildfires. These models combine information from globally available datasets such as chemical transport models, satellite products of atmospheric composition, meteorological data, and other relevant data sets. Spatiotemporal exposure estimates can account for variability in air pollution exposures on small spatial and temporal time scales, which helps to minimize exposure misclassification from spatial and temporal averaging of exposure and leads to more accurate estimation of health effects from wildland fires. Model predictions can be used in locations with few or no monitoring stations, thereby allowing analysis of the health effects of wildland fire smoke in previously unstudied populations. Secondly, we filled a knowledge gap by assessing the health impacts of wildland fire-related ozone exposure. To date, only a few studies have investigated the health effects of wildland fire-related ozone exposure, despite many studies documenting an increase in ozone from many wildfires and the large literature on the health effects from ambient ozone. We have estimated the effects of wildfire ozone and PM2.5 on cardiorespiratory hospitalizations and emergency department visits. We found generally positive associations between respiratory admissions or hospital visits and PM2.5, but no associations with ozone. Thirdly, we investigated whether subgroups or areas were more susceptible to the health effects of wildland fires. Only a few studies have identified vulnerable populations for wildfires. With the limited information on the health effects from fires, identification of vulnerable populations can provide important information for public health agencies targeting messages and interventions during future wildland fire events. We found that people living in lower income ZIP codes experienced worse health effects from wildland fire smoke. Women also were found to be more susceptible. Such information can be useful in planning prevention measures and evacuations. Finally, we sought to use modeling estimate results to better inform future public health protection plans. This was to involve counterfactual modeling, which would have estimated potential lives saved or emergency room visits or hospital admissions by for example evacuating more people earlier. As explained below we were unable to fulfil this aim due to newly-imposed data restrictions by the State of California, which made the data redaction so heavy that the data were functionally useless for this intended purpose. Following from these broader aims we summarize the specific project objectives below and give commentary on how each was fulfilled.

Page 4: Jerrett Final Report JFSP V4 Fin - Firescience.Gov · Abstract published in Environmental Health Perspectives as part of the conference proceedings. Jerrett, M. Geographic Information

4

Main Report: Project Objectives and Accomplishments

1. Project Objectives Our original project objectives were specified as follows:

1. To create a spatiotemporal O3 model for exposure to the 2008 northern California fires. We will create a spatiotemporal O3 model for the 2008 northern California wildfires using similar methods to our previously created PM2.5 model. We will use 10-fold cross validation to select the optimal model from a set of satellite data products, chemical transport model output, meteorology, and other data that could predict O3

well. By allowing the data to inform which of 10 different statistical models best fits the data rather than assuming one functional form, we are likely to obtain a more optimal exposure model.

2. To test the ability of spatiotemporal exposure assessments created for one wildfire episode to estimate exposure accurately for another wildfire in a different location.

We will assess the extent to which our spatiotemporal PM2.5 and O3 exposure models can be applied to another fire to estimate known concentrations of PM2.5 and O3. We chose the 2007 southern California wildfires as the test case because that region has a dense monitoring network for validation. Also, the vegetation type, meteorology, and topography are quite different from northern California, and these fires occurred during a different season.

3. To estimate the effects of exposure to PM2.5 and O3 from wildfires on hospitalizations, emergency department (ED) visits, and mortality for the 2008 wildfires.

Applying our exposure models, we will estimate exposure to each pollutant at the ZIP code centroid. The use of a spatiotemporal exposure assessment on large populations affected by wildfire smoke will shed light on whether these high acute air pollution exposures relate to cardiovascular or mortality effects and will contribute information on the respiratory health effects of PM2.5 and particularly O3 from fires. We will also identify vulnerable groups through stratification by individual and community-level covariates.

4. To quantify the health benefits of interventions that could be employed during a wildfire by applying the PIM to the 2008 fires.

Using our time series model for hospitalization and emergency department visits from exposure to PM2.5 and O3 from the wildfires, we will estimate the population health effects of various counterfactual scenarios to identify vulnerable populations. Overall Summary We successfully completed Aims 1-3. We were unable to complete Aim 4 because the California Department of Public Health changed the rules governing access to geocoded mortality data. Despite numerous time-consuming applications to obtain these data, the data we received were largely redacted and therefore not useful for our analyses. Outputs and Dissemination We successfully published several important papers in high-impact public health journals as well as presenting at some of the best internal conferences in the field. We also have two other papers that are accepted for publication or in press in high-impact atmospheric science and public health journals. The papers and

Page 5: Jerrett Final Report JFSP V4 Fin - Firescience.Gov · Abstract published in Environmental Health Perspectives as part of the conference proceedings. Jerrett, M. Geographic Information

5

presentations are listed below: Peer-Reviewed Publications Reid, C.E., Jerrett, M., Petersen, M.L., Pfister, G.G., Morefield, P.E., Tager, I.B., Raffuse, S.M., and Balmes, J.R. 2015. Spatiotemporal prediction of fine particulate matter during the 2008 Northern California wildfires using machine learning. Environmental Science and Technology 49:3887-3896. Reid, C.E., Brauer, M., Johnston, F., Jerrett, M., Balmes, J.R., and Elliott, C.T. 2016. Critical review of health impacts of wildfire smoke exposure. Environmental Health Perspectives. 124: 1344-1343. Reid, C.E., Jerrett, M., Tager, I.B., Petersen, M.L., Mann, J.K., and Balmes, J.R. 2016. Differential respiratory health effects from the 2008 northern California wildfires: A spatiotemporal approach. Environmental Research. 150:227-235. Reid C.E., Considine E.M., Watson G.L., Telesca D., Pfister G.G., Jerrett M. 2019. Associations between respiratory health and ozone and fine particulate matter during a wildfire event. Environment International. 129:291-298. Watson G.L., Telesca D., Reid C.E., Pfister, G.G., Jerrett M. Machine Learning Models Accurately Downscale Ozone during Wildfire Events. Accepted for publication in Environmental Pollution. doi:10.1016/j.envpol.2019.06.088. Reid C.E. Considine E.D., Telesca D., Watson G., Pfister G.G., Jerrett M. Effect modification of the association between wildfire smoke and respiratory health by area-level measures of income inequality and racial segregation. In preparation for submission to Social Science and Medicine. Peer-reviewed Presentations Jerrett, M., Reid, C.E., Tager, I.B., Petersen, M.L., Mann, J.K., and Balmes, J.R. 2016. Differential respiratory health effects from the 2008 northern California wildfires: A spatiotemporal approach. Annual Fire Behavior Meeting, Portland OR April 2016. Watson G.L., Telescaa D., Reid C.E., Phister, G., Jerrett M. Machine Learning Models Accurately Downscale Ozone during Wildfire Events. Second Annual Smoke Science conference in Long Beach CA, November 2016. Reid CE. “Health Impacts of Wildfire Smoke Exposure” at the Air Quality & Health meeting of the Colorado Air Quality Control Commission and the Colorado School of Public Health. May 9, 2019. Reid CE. Epidemiological Evidence of Wildfire Smoke Health Impacts. British Columbia Lung Association 16th Annual Air Quality & Health Workshop. February 6, 2019. Vancouver, Canada. Reid CE. Wildfire Smoke Exposure and Population Health. Center for Health and the Environment Webinar. October 1, 2018.

Page 6: Jerrett Final Report JFSP V4 Fin - Firescience.Gov · Abstract published in Environmental Health Perspectives as part of the conference proceedings. Jerrett, M. Geographic Information

6

Reid CE. Wildfire Smoke Exposure and Population Health. National Association of County and City Health Officials (NACCHO) Webinar: Western Wildfires - Keeping Communities from Polluted Air. May 21, 2018. Reid CE. Wildfire Smoke Exposure and Population Health. UCSF Fresno Air Pollution and Health Meeting. University of California San Francisco Fresno Center, Fresno, CA. April 7, 2018. Reid CE. Using a Novel Spatiotemporal Exposure Assessment to Estimate Health Effects during the 2008 Northern California Wildfires. Presentation at the Colorado School of Public Health, Aurora, CO, October 4, 2017. Reid CE. 2017. Using a Novel Spatiotemporal Exposure Assessment to Estimate Health Effects during the 2008 Northern California Wildfires. Presented July 17, 2017 at the 2017 NCAR/CDC Workshop on Weather, Climate, and Health, Boulder, CO. Reid CE. 2017. Health effects of the 2008 northern California wildfires: a spatiotemporal approach. Presentation at the University of Michigan, Ann Arbor. January 10, 2017. Jerrett, M. Health effects of the 2008 northern California wildfires: a spatiotemporal approach. International Medical Geography Symposium, Angers, France, July 2017. Jerrett, M. Assessing Air Pollution Exposures in Large Cohort Studies: A Critical Review. International Society for Environmental Epidemiology, Sydney, Australia, September 2017. Abstract published in Environmental Health Perspectives as part of the conference proceedings. Jerrett, M. Geographic Information Science, Sensor Systems and Data Fusion for Better Exposure Estimation. International Society of Exposure Sciences Annual Meeting, Raleigh, NC, October 17, 2017. Jerrett, M. Geographic Information Science, Sensor Systems and Data Fusion for Better Exposure Estimation. International Society of Exposure Sciences Annual Meeting, Raleigh, NC, October 17, 2017. Jerrett M., Watson G.L., Telesca D., Reid C.E., Phister, G. Machine Learning Models Accurately Downscale Ozone during Wildfire Events. International Conference on Air Quality – Science and Application, Barcelona, Spain March 2018. Reid CE, Jerrett M, Telesca D, Watson G, Pfister GG. Respiratory health impacts associated with fine particulate matter and ozone from a California wildfire. American Association of Geographers Annual Meeting, New Orleans, LA. April 14, 2018. Reid CE, Jerrett M, Considine E, Telesca D, Watson G, Pfister GG. Health Impacts Associated with Fine Particulate Matter and Ozone during a Wildfire: Evidence of Differential Effects Due to Measures of the Social Environment. Oral Presentation at the International Society of Environmental Epidemiology annual meeting in Ottawa, Canada. August 30, 2018.

Page 7: Jerrett Final Report JFSP V4 Fin - Firescience.Gov · Abstract published in Environmental Health Perspectives as part of the conference proceedings. Jerrett, M. Geographic Information

7

Jerrett, M. Reid C. Wildfire Exposures: Understanding Health Effects from Unnatural Disasters. Health Effects Institute Annual Conference Seattle WA, May 2019. Invited Plenary. Reid CE. 2019. Challenges in Studying Wildfire Health Impacts – Exposure Assessment. National Academies of Science Workshop on “Implications of California Wildfires for Health, Communities, and Preparedness”. June 4, 2019. Reid CE. 2019. Epidemiological Evidence of Health Impacts from Wildfire Smoke Exposure. National Academies of Science Workshop on “Implications of California Wildfires for Health, Communities, and Preparedness”. June 4, 2019. Reid CE, Considine EM, Watson G, Telesca D, Pfister GG. Differential Effects of Wildfire Smoke Exposures on Respiratory Health: A Case Study of the 2008 Northern California Wildfires. 44th Annual Natural Hazards Research and Applications Workshop, Broomfield, CO, July 18, 2019. As documented above, the results have been widely disseminated in peer-reviewed publications and at leading conferences. We refer readers to the published reports for summaries of the extensive research contributions we have made. For the paper led by Mr. Watson, which is currently accepted but not publicly available in final form, we refer readers to Appendix A below. Student, Postdoctoral Researcher, and Junior Faculty Training The grant supported the training of several students and postdoctoral researchers. Mr. Greg Watson was supported on the grant for 2.5 year. He successfully completed his PhD qualifying exam in 2018 and is not on track to finish his doctoral later this year. Dr. Colleen Reid obtained another postdoctoral fellowship from Harvard University just at the inception of this JFSP grant. Thus, she did not draw directly on this grant for salary as a postdoctoral researcher, but she did receive support from a consultant we hired to assemble data for the exposure modeling and from Mr. Watson who conducted the exposure modeling under the supervision of Drs. Telesca and Jerrett. Dr. Reid successfully competed for a ladder-rank, tenure-track position at the University of Colorado, Boulder. The grant supported part of her salary and training of an undergraduate student, Ellen Considine, who helped her conduct the health effects analyses that was just published in Environment International.

Page 8: Jerrett Final Report JFSP V4 Fin - Firescience.Gov · Abstract published in Environmental Health Perspectives as part of the conference proceedings. Jerrett, M. Geographic Information

8

AppendixA:

MachineLearningModelsAccuratelyDownscaleOzoneduringWildfireEvents:SummaryforJFSPFinalReport

Gregory L. Watson, Donatello Telesca, Colleen E. Reid, Gabriele G. Pfister, Michael Jerrett (Note: Full paper is accepted for publication and currently in press)

This study evaluates for the first time the predictive accuracy of nine machine learning algorithms for predicting ground-level ozone exposure, quantified as the daily 8-hour maximum average, during a 2008 wildfire event in northern California. Models were evaluated using a leave-one-location-out cross-validation (LOLO CV) procedure to account for the spatial and temporal dependence of the data and produce more realistic estimates of prediction error, avoiding the overly optimistic bias of 𝑘-fold cross-validation when used on data exhibiting spatial or temporal dependence. Gradient boosting was the most accurate of the nine machine learning algorithms with the lowest LOLO CV estimated root mean square error (0.236) and the highest LOLO CV 𝑅#$ (0.653). Random forest was the second best performing algorithm with a LOLO CV 𝑅#$ of 0.627. The LOLO CV estimates of predictive accuracy were less optimistic than 10-fold CV estimates for all nine models. The difference in estimated accuracy between the 10-fold CV and LOLO CV was greater for more flexible models like gradient boosting and random forest. The order of model accuracy was not invariant to the choice of evaluation metric, indicating that 10-fold CV and LOLO CV may select different models or sets of covariates as optimal and calling into question the reliability of 10-fold CV for model (or variable) selection. Predictions were much less accurate when models trained on the 2008 northern California data were used to predict ozone exposure during 2007 southern California wildfires, suggesting these models may not be suitable for extrapolating beyond the domain of the training data to predict exposure at new times and locations.

Data

One hundred ground-based ozone monitors administered by the United States Environmental Protection Agency (EPA) made hourly observations from which the daily maximum 8-hour averages were computed across northern California between May 6, 2008 and September 26, 2008 for a total of 13,487 observations. We selected this time period with the goal of estimating ozone exposures before, during, and after a spate of wildfires that afflicted northern California in late June and July of 2008. The mean maximum 8-hour average concentration was 36.2 ppb, and the standard deviation was 13.6 ppb. During the study, the maximum 8-hour average exceeded 70 ppb 236 times and exceeded 75 ppb 107 times. Most exceedances occurred while the fires were burning (153 and 75 respectively). Figure 1 depicts the temporal evolution of monitor observations throughout this time period.

Page 9: Jerrett Final Report JFSP V4 Fin - Firescience.Gov · Abstract published in Environmental Health Perspectives as part of the conference proceedings. Jerrett, M. Geographic Information

9

Figure 1. The daily empirical distribution of maximum daily 8-hour average ozone between May 6 and September 26, 2008 at 100 northern California monitoring sites.

Eighteen covariates were also collected for the monitor locations, including location, elevation, date, an indicator variable for weekend, atmospheric weather data (dew point, boundary layer height, surface pressure, relative humidity, temperature, and wind speed), inverse distance to the nearest fire, traffic, land use information (agricultural, urban, and vegetation), tropospheric nitrogen dioxide (NO$) vertical column density, and predictions of daily mean carbon monoxide concentration (CO) and daily maximum 8-hour average ozone. Table 1 lists the covariates and their sources. Monitor elevation was determined from the 2010 National Elevation Dataset for California. The date of each observation was encoded as the continuous covariate, Julian date, and dichotomized into an indicator variable for weekend. The U.S. National Centers for Environmental Prediction’s Rapid Update Cycle atmospheric prediction model provided hourly predictions of dew point, planetary boundary layer height, surface pressure, relative humidity, temperature, and the U and V components of wind speed, which were averaged into daily values . Inverse distance to the nearest fire was included as a covariate. The Fire Inventory from NCAR (FINN) v1.5 provided estimates of fire point locations in California during the study period . Fire points occurring within 5 km of each other were clustered, and the distance between each monitoring site and the nearest fire cluster was determined on each day. On days with no fire in California, distance to the nearest fire was undefined. Conceptualizing this undefined distance as equivalent to the nearest fire cluster being infinitely far away, inverse distance to fire was defined as 0 for observations taken when there were no fires and as the inverse of the distance to the nearest fire cluster otherwise. Table 1. Covariates used to predict ozone.

Covariate Data Source Monitor Latitude U. S. Environmental Protection Agency Monitor Longitude U. S. Environmental Protection Agency

Ozo

ne

(p

pb

)

May June July August September

Fires

0

20

40

60

80

100Quantile

1st25th

50th

75th99th

Page 10: Jerrett Final Report JFSP V4 Fin - Firescience.Gov · Abstract published in Environmental Health Perspectives as part of the conference proceedings. Jerrett, M. Geographic Information

10

Elevation (m) National Digital Elevation Model Date U. S. Environmental Protection Agency Weekend U. S. Environmental Protection Agency Dew Point (∘K) Rapid Update Cycle Boundary Layer Height (m) Rapid Update Cycle Surface Pressure (Pa) Rapid Update Cycle Relative Humidity (%) Rapid Update Cycle Temperature at 2 m (∘K) Rapid Update Cycle U-Component of Wind Speed (m/s) Rapid Update Cycle V-Component of Wind Speed (m/s) Rapid Update Cycle Inverse Distance to Nearest Fire (m&') Fire Inventory from NCAR v1.5 Annual Average Traffic within 1 km Dynamap 2000, TeleAtlas Agricultural Land Use within 1 km (%) 2006 National Land Cover Database Urban Land Use within 1 km (%) 2006 National Land Cover Database Vegetation Land Use within 1 km (%) 2006 National Land Cover Database Normalized Difference Vegetation Index Landsat Data Nitrogen Dioxide (log molecules/cm$) Ozone Monitoring Instrument Satellite Carbon Monoxide (moles) WRF-Chem WRF-Chem Ozone (8 Hour Maximum) WRF-Chem

Dynamap 2000, a TeleAtlas product, was used to compute the annual average of roadway traffic within 1 km of each monitor . The National Land Cover Database for 2006 was used to calculate the percentage of urban development (codes 22, 23, and 24), agriculture (codes 81 and 82) and other vegetation (codes 21, 41, 42, 43, 52, and 71) within 1 km of each monitor. The normalized difference vegetation index (NDVI) quantifies the density of green vegetation on a scale between -1 and 1 by measuring the visible and near-infrared light reflected at a location via remote sensing. The chlorophyll in healthy vegetation absorbs most of the visible light and reflects much of the near-infrared light to which it is exposed, giving locations with more vegetation a higher NDVI score. NDVI for each monitor location was extracted from the NDVI remote sensing raster surface and included as a covariate. Nitrogen dioxide (NO$) was estimated on each day at monitor locations (if available) using NO$ tropospheric column amounts retrieved from NASA’s Ozone Monitoring Instrument (OMI) satellite, which has an overpass time of 1:30 local time . Predicted carbon monoxide and ozone concentrations were extracted from the Weather Research and Forecasting with Chemistry (WRF-Chem) 3.2 model. WRF-Chem is a regional chemical transport model that simulates meteorology and behavior of atmospheric gases and aerosols . Our simulations rely upon the global Model for OZone And Related chemical Tracers (MOZART)-4 CTM for chemical and upon National Centers for Environmental Prediction (NCEP) Final Analysis (FNL) for meteorological spatial and temporal boundary conditions and incorporates a variety of input data sources, including meteorological, physical, and chemical atmospheric processes, a 2008 California emissions inventory, online biogenic emissions and fire emissions estimated by FINN. WRF-Chem predictions of the daily total CO concentations and the maximum daily 8-hour average ozone on each day were included as covariates.

Statistical Analysis

Page 11: Jerrett Final Report JFSP V4 Fin - Firescience.Gov · Abstract published in Environmental Health Perspectives as part of the conference proceedings. Jerrett, M. Geographic Information

11

Each observation comprises an outcome, 𝑦), the log maximum 8-hour average ozone on a given day at a given monitoring location, and a vector of covariates, 𝐱) = (𝑥)', . . . , 𝑥)0)′, 𝑖 = 1, . . . , 𝑛, where 𝑛 is the number of observations, and 𝑝 is the number of covariates. The vector of outcomes, 𝐲 = (𝑦', . . . , 𝑦8)′, and the matrix of covariates, 𝐗 = (𝐱', . . . , 𝐱8)′, together compose the data, 𝐷 = {𝐗, 𝐲}. Ozone observations were log transformed to reduce the impact of heteroscedasticity, as data exploration revealed the variance was substantially greater than the mean at high values. The maximum daily 8-hour average ozone from the WRF chemical transport model (WRF-Chem) was also log transformed to have the same scale as the outcome. All other covariates were transformed to have a mean of 0 and variance of 1.

Nine predictive algorithms were trained and evaluated on these data: linear models, lasso regression, elastic net regression, generalized additive models (GAM), multivariate adaptive regression splines (MARS), random forest, gradient boosting, support vector machines with a radial basis kernel (SVM), and 𝑘-nearest neighbor regression. Training each prediction model produces a prediction rule 𝜂(𝐱, 𝐷), which is a function of 𝐷, the data on which it was trained, and a vector of covariates, 𝐱, mapping them to a prediction for 𝑦 ∣ 𝐱, which is often used as an estimator of 𝐸(𝑦 ∣ 𝐱), the conditional expectation of 𝑦 given 𝐱. The models were tuned, selected, and evaluated using cross-validated estimators of root mean square error (RMSE) and 𝑅$, which are both functions of the mean square error (MSE). The MSE of a prediction rule 𝜂(𝐱, 𝐷@), where 𝐷@ is the data with which 𝜂 was trained, may be estimated using a test data set 𝐷A as

𝑀𝑆𝐸D(𝐷A, 𝜂(𝐱, 𝐷@)) =1𝑛E

F (G∈IJ

𝑦G − 𝜂(𝐱G, 𝐷@))$,

where 𝑛E is the number of data points in 𝐷A. If 𝐷A and 𝐷@ are disjoint, (i.e., if 𝜂 was not trained using any part of 𝐷A), then this is an out-of-sample estimator of the MSE. RMSE may be estimated by the square root of 𝑀𝑆𝐸D , and 𝑅$ is estimated by

𝑅#$(𝐷A,𝜂(𝐱,𝐷@)) = 1 −𝑀𝑆𝐸D(𝐷A,𝜂(𝐱, 𝐷@))𝑛E&' ∑ (G∈IJ 𝑦G − 𝑦E)

$,

where 𝑦E = 𝑛E&' ∑ 𝑦GG∈IJ is the mean outcome in 𝐷A. For ease of notation, the function arguments for 𝑀𝑆𝐸D , 𝑅𝑀𝑆𝐸D , and 𝑅#$ are hereafter suppressed.

Two different cross-validation (CV) strategies were employed for model evaluation: 10-fold cross-validation and leave-one-location-out (LOLO) cross-validation. For 10-fold CV, the data were randomly partitioned into 10 non-overlapping subsets, each containing one tenth of the data. Each subset served as the test data for models trained on the other nine tenths of the data, resulting in ten different pairs of training and test sets, with each observation appearing in one test set and the nine training sets not paired with that test set. This yielded 10 estimates of MSE for each model, which were averaged into an overall estimate of MSE, from which the 10-fold CV estimates of RMSE and 𝑅$ were computed. Ten-fold cross-validation is widely used for estimating prediction error; however, it is known to be overly optimistic for dependent data. Data recorded by air pollution monitors are expected to exhibit spatial or space-time dependence. To more accurately estimate the error associated with predicting ozone at an unobserved location, RMSE and 𝑅$ were estimated using LOLO CV, in which a model is trained on data from all but one location, and its prediction error is computed for the observations at the withheld location. This process is repeated with observations at each location serving as the withheld test set once, and the resulting errors are averaged into the LOLO CV estimate of prediction error. Unlike 10-fold CV in which observations are distributed among folds uniformly at random, LOLO CV ensures that no observations from the test location may appear in the training data. This provides a realistic estimate of the prediction error associated with predicting ozone observations at a new location within the same region as the monitoring data.

Page 12: Jerrett Final Report JFSP V4 Fin - Firescience.Gov · Abstract published in Environmental Health Perspectives as part of the conference proceedings. Jerrett, M. Geographic Information

12

Most predictive machine learning algorithms depend upon one or more parameters whose values must be set prior to fitting the model. Algorithm performance can vary greatly depending on these parameter values, and it is often desirable to select values that optimize some criteria in an attempt to improve model performance. The process of choosing values for these parameters is often referred to as tuning and the parameters themselves as tuning parameters. In our analysis, most tuning parameter values were selected by comparing the performance of candidate values on 25 bootstrap samples of the data using the caret R package . Parameters for 𝑘-nearest neighbors and GAM were specifically tuned for LOLO CV in an attempt to stabilize the LOLO CV prediction error, as these models made extremely poor LOLO predictions using bootstrap-selected tuning parameter values. To investigate the transferability of a model trained on data in one region to another, the predictive performance of the two best models trained on the 2008 Northern California wildfire periodthose two with the lowest LOLO CV estimates of RMSEwas evaluated on data collected during a 2007 wildfire event in Southern California. The Southern California data consisted of 5,978 daily 8-hour maximum ozone values recorded at 72 monitors between September 1, 2007 and November 28, 2007.

Results Table 2 reports and Figure 2 graphically depicts the cross-validated estimates of RMSE and 𝑅$ for each algorithm using 10-fold CV and LOLO CV. In every case, the 10-fold CV 𝑅𝑀𝑆𝐸D was lower than the LOLO CV 𝑅𝑀𝑆𝐸D , and the 10-fold CV 𝑅#$ was higher than the LOLO CV 𝑅#$. Gradient boosting had the lowest 10-fold CV 𝑅𝑀𝑆𝐸D (0.209 log ppm), lowest LOLO CV 𝑅𝑀𝑆𝐸D (0.236 log ppm), highest 10-fold CV 𝑅#$ (0.729), and highest LOLO CV 𝑅#$ (0.653). Random forest placed second in all four categories. These results answer the two primary questions posed by this study, demonstrating that machine learning methods can model ozone during a wildfire reasonably well and identifying gradient boosting and random forest as performing particularly well.

Table 2. Comparison of ozone prediction models using 10-fold and leave-one-location-out cross-validation for the 2008 California wildfire period (May 6Sept 26, 2008).

Model 10-Fold CV 𝑅𝑀𝑆𝐸D (log ppm)

LOLO CV 𝑅𝑀𝑆𝐸D (log ppm)

10-Fold CV 𝑅#$

LOLO CV 𝑹N𝟐

Gradient Boosting 0.209 0.236 0.729 0.653

Random Forest 0.222 0.245 0.698 0.627

Generalized Additive Model 0.224 0.265 0.688 0.563

Support Vector Machines 0.229 0.258 0.679 0.587

Multivariate Adaptive Regression Splines 0.249 0.259 0.615 0.584

Lasso 0.263 0.271 0.569 0.542

Linear Model 0.263 0.271 0.569 0.541

Elastic Net 0.263 0.271 0.569 0.542

𝒌-Nearest Neighbors 0.266 0.386 0.562 0.073

The 10-fold CV estimates of RMSE and 𝑅$ were optimistic compared to those of LOLO CV for every model. In 10-fold CV, the observations from each monitor location were distributed uniformly at random throughout the 10 folds. Consequently, the test set observations used by 10-fold CV to evaluate prediction error were recorded at locations that appear in the training data. The strong dependence between observations taken at the same location produces estimates of prediction error that are unrealistically optimistic for estimating prediction error at a new location. In contrast, LOLO defines each fold as the observations taken at one monitor location, ensuring that predictions are made using models that have not been trained on data from the test locations.

Page 13: Jerrett Final Report JFSP V4 Fin - Firescience.Gov · Abstract published in Environmental Health Perspectives as part of the conference proceedings. Jerrett, M. Geographic Information

13

(a) 𝑅𝑀𝑆𝐸D (b) 𝑅#$

Figure 2. 10-fold and leave-one-location-out cross-validated estimates of 𝑅𝑀𝑆𝐸D and 𝑅#$for ozone prediction models.

Figure 3. The mean leave-one-location-out (LOLO) cross-validated estimates of RMSE for gradient boosting and random forest at each monitor location between May 6, 2008 and September 26, 2008 smoothed throughout the study region using a two-dimensional spline-on-sphere smoother.

Discussion The 10-fold CV estimators are too optimistic and also unreliable for model selection. The ordering of model performance is not invariant to the choice of evaluation criterion, which is demonstrated here in the comparison of GAM, SVM and MARS. Though their estimated prediction errors are similar, the relative ordering of the LOLO CV estimates differs from those of 10-fold CV. While the magnitude of the difference is not substantial in this case, this demonstrates that model (or variable) selection using 10-fold CV can give different results than LOLO CV. The difference between the 10-fold and LOLO cross-validated estimates of performance was smaller for

!"#$

Grad

ient B

oosti

ng

Rand

om Fo

rest

GAM

SVM

MARS

Lasso

Linea

r Mod

el

Elasti

c Net

%-Nea

rest N

eighb

ors

0.20

0.25

0.30

0.35

0.4010-Fold CVLOLO CV

!2

Grad

ient B

oosti

ng

Rand

om Fo

rest

GAM

SVM

MARS

Lasso

Linea

r Mod

el

Elasti

c Net

"-Nea

rest N

eighb

ors

0.0

0.2

0.4

0.6

0.8

1.010-Fold CVLOLO CV

0.0

0.3

0.6

0.9

RMSE

0.0 log ppm

0.3 log ppm

0.6 log ppm

0.9 log ppm

LOLO CV-RMSE

Gradient Boosting Random Forest

Monitoring Site

Page 14: Jerrett Final Report JFSP V4 Fin - Firescience.Gov · Abstract published in Environmental Health Perspectives as part of the conference proceedings. Jerrett, M. Geographic Information

14

relatively inflexible models like lasso, elastic net, and linear regression than for the other models, whose greater flexibility enabled them to better exploit the more highly dependent folds of 10-fold CV. This difference was very large for 𝑘-nearest neighbor regression. In 10-fold CV, the nearest neighbors of an observation are very likely to be other observations taken at that location. In LOLO CV, no observations taken at the test location appear in the training data. It is no surprise that this yields substantially higher estimates of prediction error. Every effort was made to stabilize the 𝑘-nearest neighbors models in the LOLO setting, including alternate parameter tuning procedures. In this case the 10-fold CV estimate is highly misleading, and the consistently poor LOLO predictive performance illustrates the importance of selecting an appropriate model performance measure.

(a) Gradient Boosting 10-Fold CV (b) Gradient Boosting LOLO CV

(c) Random Forest 10-Fold CV (d) Random Forest LOLO CV Figure 4. 10-fold and leave-one-location-out (LOLO) cross-validated gradient boosting and random forest predictions plotted against observed. The magnitude of the difference between the LOLO and 10-fold CV estimates of prediction error has meaningful consequences for estimating exposures for subsequent epidemiological analyses. Predicted exposure is often used as the covariate of interest in analyses seeking to infer the health consequences of air pollution without accounting for prediction uncertainty. The more realistic estimates of prediction error provided by LOLO CV offer better insight into whether it is reasonable to ignore this uncertainty. This may motivate improvements to epidemiological models to account for exposure measurement error. The increased accuracy of LOLO CV comes at a computational cost. When the number of locations exceeds 10, LOLO CV is more computationally expensive than 10-fold CV. In this analysis, there are 100 monitor locations and therefore 100 folds in LOLO CV, corresponding to approximately 10 times the computational burden of 10-fold CV. The additional computational burden can be eliminated, however, if one-tenth of the monitor locations are withheld as each CV fold. Using monitor locations to define the folds for a 10-fold CV analysis estimates the same prediction error as LOLO CV, but has the same computational cost as traditional 10-fold CV.

Page 15: Jerrett Final Report JFSP V4 Fin - Firescience.Gov · Abstract published in Environmental Health Perspectives as part of the conference proceedings. Jerrett, M. Geographic Information

15

The top two models, gradient boosting and random forest, are both ensembles of tree-based models that provide very flexible mean structures. The 10-fold CV estimates of RMSE and 𝑅$ were similar to, although slightly lower than, those reported in a similar analysis of machine learning exposure models for PM$.R during the same wildfire time period . Tree-based ensembles were also the best performing models in that study, suggesting that algorithms with flexible mean structures can produce useful exposure models for ozone and PM$.R during wildfire events. Traditional exposure models have focused on modeling the dependence between observations, while employing a simple mean structure. The machine learning models evaluated here assume independent observations, but offer much greater flexibility in modeling the mean. This approach is expected to provide more accurate predictions distant from the observations on which the model was trained than methods that rely upon the dependence between observations. Combining the flexible mean structure of tree-based ensembles with the dependence structures of traditional spatial statistics models is a promising avenue for future work. While these algorithms predict ozone concentrations reasonably well, they do not provide reliable inference on the effect of wildfires on ozone concentrations. The WRF-Chem simulations that produced estimates of carbon monoxide and ozone concentration to be used as covariates in the analysis incorporated wildfire emissions. A second set of simulations that exclude the wildfire emissions could be used to disentangle the effect of wildfires from the other inputs of the WRF-Chem model, but this has not been available.

Figure 5. The RMSE of predictions made for the 2007 Southern California fire by gradient boosting and random forest models trained on the 2008 data and smoothed throughout the study region using a two-dimensional spline-on-sphere smoother.

Neither the gradient boosting nor random forest models trained on the northern California 2008 wildfire data performed well when predicting ozone exposure in southern California in 2007. The predictions from both models had negative 𝑅#$, indicating that their predictions were less accurate (i.e., had higher estimated MSE) than the sample mean of the southern California ozone monitors, which by definition has an 𝑅#$ of 0. In fairness to gradient boosting and random forest, in an out-of-domain prediction problem, the sample mean is unknown, and therefore cannot be used as a prediction rule. When in-fill prediction error for the 2007 Southern California wildfire data was estimated using LOLO CV, gradient boosting and random forest predictive accuracy was comparable to that of the 2008 northern California wildfire data. The substantial decrease in predictive accuracy between within-domain and out-of-domain predictive performance suggests that the relationships between covariates and ozone exposure differ in space and time. Gradient boosting and random forest are flexible enough to learn complicated relationships in the data upon which they are trained, but they are unable to make accurate predictions at locations or times where those relationships are different. While these flexible machine learning algorithms are recommended for modeling ozone exposure within the spatial and temporal domains of the data upon which they were trained, they may not be suitable for extrapolating to other locations or times. Machine learning prediction algorithms, especially ensembles of tree models like gradient boosting and random forest, perform well at the in-fill prediction of ozone concentrations during wildfire events and likely outside of wildfire events. The choice of evaluation metric is critical to understanding predictive performance in this context. Metrics that ignore the dependent structure of the data, including 𝑘-fold CV, are overly optimistic and unreliable for model selection. LOLO CV is a superior alternative that accounts of the spatial dependence of the

Gradient Boosting Random Forest

0.0 log ppm

0.3 log ppm

0.6 log ppm

0.9 log ppm

RMSE

Monitoring Site

Page 16: Jerrett Final Report JFSP V4 Fin - Firescience.Gov · Abstract published in Environmental Health Perspectives as part of the conference proceedings. Jerrett, M. Geographic Information

16

data in evaluating model predictive performance, resulting in more reliable estimates of predictive performance.