U.S. Department of the InteriorU.S. Geological Survey
Data Series 697
Digital Spatial Data for Observed, Predicted, and Misclassification Errors for Observations in the Training Dataset for Nitrate and Arsenic Concentrations in Basin-Fill Aquifers in the Southwest Principal Aquifers Study Area
National Water-Quality Assessment Program
Observed arsenic concentration, micrograms per literLess than 1.01.0 to 1.92.0 to 2.93.0 to 4.95.0 to 9.910 to 24Equal to or greater than 25
EXPLANATION
Arsenic Training Observations
TrinityTrinity
RiverRiver
Eel RiverEel River
Russian River
Russian River
RiverRiver
Sacramento
Sacramento
Feath
er R
iver
Feath
er R
iver
NorthNorthYubaYuba River
River
American RiverAmerican River
TruckeeTruckee RiverRiver
Mokelumne River
Mokelumne RiverRiverRiver
Sacramento
Sacramento
RiverRiverAmericanAmerican
North
For
k
North
For
k
RiverRiverTuolumneTuolumne
Merced River
Merced River
RiverRiver
San JoaquinSan Joaquin
RiverRiver
San Joaquin
San Joaquin
River
River
KingsKings
Salinas River
Salinas River
Kern River
Kern River
Quinn R
iver
Quinn R
iver
RiverRiver
Humboldt
Humboldt
Rees
e Rive
r
Rees
e Rive
r
Sevier R
iver
Sevier R
iver
River
River
Bear
Bear
Jordan RiverJordan River
Mea
dow
Valle
y Was
h
Mea
dow
Valle
y Was
hW
hite RiverW
hite River
Pahranagat Wash
Pahranagat Wash
Muddy
RiverM
uddy
River
Amaragosa
Amaragosa
RiverRiver
Verde River
Verde River
Salt RiverSalt River
Colorado River
Colorado River
Gila RiverGila River
Mojave River
Mojave River
Rio Grande
Rio Grande
RioChama
RioChama
Rio PuercoRio Puerco
Owens River
Owens River
GrandeGrande
RioRio
ARIZONA NEW MEXICO
UTAH COLORADO
MEXICO
OREGON
CALIFORNIA
IDAHO
WYOMINGNEVADA
0 45 90 135 180 MILES
0 50 100 150 200 KILOMETERS
35°
40°
124°
117°
110°
Study-areaboundary
RenoReno
SacramentoSacramento
SanFrancisco
SanFrancisco
StocktonStockton
ModestoModesto
San JoseSan Jose
FresnoFresno
BakersfieldBakersfield
Las VegasLas Vegas
Los AngelesLos Angeles
San DiegoSan DiegoYumaYuma
PhoenixPhoenix
TucsonTucson
El PasoEl Paso
FlagstaffFlagstaff
Santa FeSanta Fe
AlbuquerqueAlbuquerque
Salt Lake CitySalt Lake City
U.S. Geological Survey digital data, 1:2,000,000, 2,500,000, and 5,000,000 scale, 2003, 2005, and 2006National Elevation Data 1:24,000, 1999Albers Equal Area Conic Projection, central meridian -113, NAD 83
Pacific Ocean
LasCrucesLasCruces
Digital Spatial Data for Observed, Predicted, and Misclassification Errors for Observations in the Training Dataset for Nitrate and Arsenic Concentrations in Basin-Fill Aquifers in the Southwest Principal Aquifers Study Area
By Tim S. McKinney and David W. Anning
National Water-Quality Assessment Program
Data Series 697
U.S. Department of the InteriorU.S. Geological Survey
U.S. Department of the InteriorKEN SALAZAR, Secretary
U.S. Geological SurveyMarcia K. McNutt, Director
U.S. Geological Survey, Reston, Virginia: 2012
For more information on the USGS—the Federal source for science about the Earth, its natural and living resources, natural hazards, and the environment, visit http://www.usgs.gov or call 1–888–ASK–USGS.
For an overview of USGS information products, including maps, imagery, and publications, visit http://www.usgs.gov/pubprod
To order this and other USGS information products, visit http://store.usgs.gov
Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Although this report is in the public domain, permission must be secured from the individual copyright owners to reproduce any copyrighted materials contained within this report.
Suggested citation:McKinney, T.S. and Anning, D.W., 2012, Digital spatial data for observed, predicted, and misclassification errors for observations in the training dataset for nitrate and arsenic concentrations in basin-fill aquifers in the Southwest Principal Aquifers study area: U.S. Geological Survey Data Series Report 697, 2 p. Available at http://pubs.usgs.gov/ds/697.
iii
ContentsAbstract ......................................................................................................................................................... 1Background ................................................................................................................................................... 1Development of Classifiers ......................................................................................................................... 1Supplemental Information .......................................................................................................................... 1References Cited .......................................................................................................................................... 2
iv
Digital Spatial Data for Observed, Predicted, and Misclassification Errors for Observations in the Training Dataset for Nitrate and Arsenic Concentrations in Basin-Fill Aquifers in the Southwest Principal Aquifers Study Area
AbstractThis product “Digital spatial data for observed, predicted,
and misclassification errors for observations in the training dataset for nitrate and arsenic concentrations in basin-fill aquifers in the Southwest Principal Aquifers study area” is a 1:250,000-scale point spatial dataset developed as part of a regional Southwest Principal Aquifers (SWPA) study (Anning and others, 2012). The study examined the vulnerability of basin-fill aquifers in the southwestern United States to nitrate contamination and arsenic enrichment. Statistical models were developed by using the random forest classifier algorithm to predict concentrations of nitrate and arsenic across a model grid that represents local- and basin-scale measures of source, aquifer susceptibility, and geochemical conditions.
BackgroundThis dataset was developed as part of a regional Southwest
Principal Aquifers (SWPA) study and for the National Water-Quality Assessment (NAWQA) Program. The dataset repre-sents observed nitrate and arsenic concentrations that were used to train the confirmatory and prediction classifiers.
Development of ClassifiersSeparate classifiers were developed for nitrate and arsenic
because each constituent was expected to be affected by a different set of factors, and each factor could have a different magnitude or directional influence (increase/decrease) on con-centration. For each constituent, two different classifiers were developed: a prediction classifier and a confirmatory classi-fier. The prediction classifiers were developed specifically to predict nitrate and arsenic concentrations in basin-fill aquifers across the SWPA study area and were based on explanatory variables representing source and susceptibility conditions. These explanatory variables were available throughout the entire SWPA study area and, therefore, did not pose a limita-tion for using the classifiers to predict concentrations.
The confirmatory classifiers were developed to supplement the prediction classifiers in the evaluation of the conceptual model. The name, “confirmatory,” reflects the classifier’s pur-pose for evaluation of a-priori hypotheses and contrasts other general types of statistical models, such as those used for pre-diction or exploratory purposes. The confirmatory classifiers included the explanatory variables used in the prediction clas-sifiers, as well as additional variables representing geochemi-cal conditions and basin groundwater budget components. The inclusion of the geochemical and basin groundwater budget variables in the confirmatory classifiers allowed for further evaluation of the conceptual models, which was not possible with the prediction classifiers alone. The geochemical data, however, were only available at specific well locations, and consistent water-budget data were not available for every basin in the study area. The limited availability of the data for these variables constrained the confirmatory classifiers to obser-vations from 16 case-study basins and precluded use of the confirmatory classifier for predicting concentrations across the SWPA study area. To contrast the scope of the two classifiers, the confirmatory classifiers were developed by using all avail-able explanatory variables but with observations restricted to the 16 case-study basins, whereas the prediction classifiers were unrestricted with respect to spatial extent because these were developed by using a subset of the explanatory variables that were available throughout the study area.
Supplemental InformationThe nitrate and arsenic observations, predictions, and
misclassification error data for observations are part of a larger dataset of explanatory variables and model input data. Those data are archived in tabular form along with the model report. The explanatory data may be joined to this dataset via the site identifier. For more information on the development of the explanatory variables, see “Compilation and Processing of Explanatory Variables” in the “Approach and Methods” sec-tion of Anning and others (2012).
The digital dataset can be downloaded from the USGS at http://water.usgs.gov/lookup/getspatial?ds2012-697_SWPA_NO3_As_training
2 Digital Spatial Data for Observed, Predicted, and Misclassification Errors
References Cited
Anning, D.W., Paul, A.P., McKinney, T.S., Huntington, J.M., Bexfield, L.M., and Thiros, S.A., 2012, Predicted nitrate and arsenic concentrations in basin-fill aquifers of the south-western United States: U.S. Geological Survey Scientific Investigations Report 2012–5065, 78 p. Available at http://pubs.usgs.gov/sir/2012/5065.
McKinney and Anning—
Digital Spatial D
ata for Observed, Predicted, and M
isclassification Errors for Observations in the Training D
ataset for Nitrate and A
rsenic Concentrations in B
asin-Fill Aquifers in the Southw
est Principal Aquifers Study A
rea—Data Series 697