Upload
kristian-ramsey
View
216
Download
2
Tags:
Embed Size (px)
Citation preview
USGS WRD National USGS WRD National Internal QW Database Internal QW Database
Pilot ProjectPilot Project
Nate BoothNate BoothSandy WilliamsonSandy Williamson
To WRDTo WRD10-27-0310-27-03
Goals of the PilotGoals of the Pilot Network and upload performance:Network and upload performance: Feasibility of Feasibility of
aggregating from 45 NWIS databases in a weekendaggregating from 45 NWIS databases in a weekend Content scalability:Content scalability: Compare performance for ad hoc Compare performance for ad hoc
queries [65M] with the NAWQA Data Warehouse [8M]queries [65M] with the NAWQA Data Warehouse [8M] Access:Access: Test direct database access (ODBC) vs. NWIS Test direct database access (ODBC) vs. NWIS
export programsexport programs Conform systems:Conform systems: Test a precedence scheme to Test a precedence scheme to
collapse duplicate sites and samples across multiple collapse duplicate sites and samples across multiple NWIS systemsNWIS systems
Incremental Refresh:Incremental Refresh: Accommodate Accommodate add/change/delete transactions from NWIS without add/change/delete transactions from NWIS without “drop and rebuild”“drop and rebuild”
COTS:COTS: Evaluation of “commercial off the shelf” data Evaluation of “commercial off the shelf” data integration software for all of the aboveintegration software for all of the above
Why Did NAWQA Build It?Why Did NAWQA Build It?
Need to aggregate NAWQA data more Need to aggregate NAWQA data more oftenoften
NWIS changes required revisions in NWIS changes required revisions in NAWQA Agg. Scheme and caused export NAWQA Agg. Scheme and caused export programs to slowprograms to slow
NAWQA interested in QW data beyond NAWQA interested in QW data beyond NAWQANAWQA
NAWQA Hydrologists commonly asked NAWQA Hydrologists commonly asked “can we use Discoverer to access all “can we use Discoverer to access all QWDATA?”QWDATA?”
Other ReasonsOther Reasons Three current National aggregation systems Three current National aggregation systems
maintained – NAWQA NWIS, NASQAN -- proposed maintained – NAWQA NWIS, NASQAN -- proposed system could be used by ALLsystem could be used by ALL
No system available for nationwide “internal” No system available for nationwide “internal” QWQW
Meets many of QW user group’s Critical prioritiesMeets many of QW user group’s Critical priorities• Weekly refresh currently possible Weekly refresh currently possible • Contains all NWIS 4.1 attributesContains all NWIS 4.1 attributes• Contains a merged complete set of data by siteContains a merged complete set of data by site• Most all reference lists integratedMost all reference lists integrated
Ad hoc internal access allows:Ad hoc internal access allows:• querying by any attribute and flexible output formatsquerying by any attribute and flexible output formats
Leveraging the prepaid USGS Oracle licenseLeveraging the prepaid USGS Oracle license
Pilot Contains:Pilot Contains:
All site and QW data from 45 NWIS All site and QW data from 45 NWIS systems including 77 databasessystems including 77 databases
1.3M sites, 3.8M samples and 61M 1.3M sites, 3.8M samples and 61M results results
data load pipeline contains both a data load pipeline contains both a lifecycle and NWIS system ownership lifecycle and NWIS system ownership audit trail for sites, samples and audit trail for sites, samples and results, between data aggregations results, between data aggregations
Two Data Aggregation Two Data Aggregation SchemesSchemes
one for new and updated sites, samples one for new and updated sites, samples and results -- can be completed for the 46 and results -- can be completed for the 46 servers in the test case in approximately servers in the test case in approximately 48 hours—all on single processor PC 48 hours—all on single processor PC server [and must do complete data server [and must do complete data transfer]transfer]
the other is a complete refresh needed for the other is a complete refresh needed for deleting purged sites, samples and results deleting purged sites, samples and results until NWIS transaction logging is working -- until NWIS transaction logging is working -- requires approximately 3 days.requires approximately 3 days.
Dealing With Duplicated Dealing With Duplicated SamplesSamples
Distributed NWIS architecture allows duplicate Distributed NWIS architecture allows duplicate sites, samples and results with differing sites, samples and results with differing attribute valuesattribute values
To allow system to work beyond NAWQA, To allow system to work beyond NAWQA, needed a flexible and effective way to collapse needed a flexible and effective way to collapse duplicates through multiple precedence duplicates through multiple precedence schemesschemes
So far, a NAWQA scheme and a generic USGS So far, a NAWQA scheme and a generic USGS wide scheme have been appliedwide scheme have been applied
Yields multiple data marts with one transfer of Yields multiple data marts with one transfer of datadata
System SpecsSystem Specs Oracle 9i database on Linux – 1-cpu PC serverOracle 9i database on Linux – 1-cpu PC server Informatica PowerMart data Informatica PowerMart data
integration/aggregation software on Windows integration/aggregation software on Windows 2000 2000
3X Oracle Discoverer ad hoc query servers on 3X Oracle Discoverer ad hoc query servers on Windows 2000 (load balanced using a Windows 2000 (load balanced using a mirrored set of Cisco Local Directors) mirrored set of Cisco Local Directors)
Systems are firewalled as per DOI/USGS Systems are firewalled as per DOI/USGS headquarters guidance and database access headquarters guidance and database access is password protectedis password protected
All systems housed and maintained by the All systems housed and maintained by the Database Team in the Wisconsin DistrictDatabase Team in the Wisconsin District
Example Queries Demonstrate:Example Queries Demonstrate:
Nested analytical functionsNested analytical functions Atomic level data retrievalAtomic level data retrieval Querying the audit trailQuerying the audit trail Animal tissue dataAnimal tissue data Sample accounting with drilldownSample accounting with drilldown Nested spatial functionalityNested spatial functionality
Ex. 1: Ex. 1: Simple Simple statisticsstatistics. . Find the min Find the min
and max and max concentrations concentrations
for surface for surface water samples. water samples. Also, count the Also, count the
number of number of detections and detections and non-detections non-detections for the group of for the group of
samples samples analyzedanalyzed
Discoverer Discoverer Wizard for Wizard for
field field selectionselection
< - Note meta data
Wizard 2: Custom or predefined selection Wizard 2: Custom or predefined selection conditions can be applied.conditions can be applied.
<- Note meta data maintained in the DB
The database collects The database collects statistics and “learns” statistics and “learns”
performance estimates.performance estimates.
Results Results for for
Caffeine, Caffeine, in surface in surface water with water with statistics statistics by State, by State,
ordered by ordered by percent percent
detection. detection.
Query Query done in 2 done in 2
min.min.
By pivoting the elements in the layout wizard, By pivoting the elements in the layout wizard, an end user can easily display the results in a an end user can easily display the results in a
new format.new format.
This This shows shows percent percent
detections detections for for
Estradiol, Estradiol, Caffeine, Caffeine,
and and CholesterolCholesterol in surface in surface
water, water, ordered ordered by state.by state.
Example 2: Detailed data retrieval. Atrazine in Example 2: Detailed data retrieval. Atrazine in surface watersurface water
Wildcard condition chooses station names like Wildcard condition chooses station names like “DUCK CREEK” in WI“DUCK CREEK” in WI
This query completed in 3 seconds.This query completed in 3 seconds.
Example 3: Audit Trail. For the pilot project, there is not a Example 3: Audit Trail. For the pilot project, there is not a GUI for querying the audit trail. Using standard SQL, what GUI for querying the audit trail. Using standard SQL, what hosts have sample keys duplicated across DBNUMs in the hosts have sample keys duplicated across DBNUMs in the
pilot?pilot?There are 18 hosts with samples
duplicated across DBNUMs. While
the staging database is not indexed for ad hoc query, this
query still completed in
about 1 minute.
Ex. 4: Animal Ex. 4: Animal Tissue Data. Tissue Data. For Colorado, For Colorado,
min, max, min, max, average and average and
counts of counts of tissue tissue
concentrations concentrations in trout for all in trout for all constituents constituents analyzed. analyzed. -- 20 sec. -- 20 sec.
Ex 5: Sample Ex 5: Sample Accounting. Accounting. Number of Number of
OrthophosphatOrthophosphate and BOD e and BOD
samples by WY samples by WY in WI since in WI since
1980. Drill into 1980. Drill into 1997 to counts 1997 to counts
by Stationby Station-- 20 sec.-- 20 sec.
Drill Drill into into
1997 to 1997 to counts counts
by by StationStation
Ex. 6: Spatial Analysis for planning a group of wells for a Ex. 6: Spatial Analysis for planning a group of wells for a sampling effort. Considering Alachlor samples in water year sampling effort. Considering Alachlor samples in water year
2002, find what well has a sample with an Alachlor 2002, find what well has a sample with an Alachlor concentration closest to the 99th percentile of all groundwater concentration closest to the 99th percentile of all groundwater environmental Alachlor concentrations in WY 2002. Then find environmental Alachlor concentrations in WY 2002. Then find
the 9 nearest wells to broaden a future sampling effort.the 9 nearest wells to broaden a future sampling effort.
Mercury In Bed Sediment Mercury In Bed Sediment (34910)(34910)
Nitrate in GW (00631)Nitrate in GW (00631)
Zinc in GW (01090)Zinc in GW (01090)
Zinc in SW (01090)Zinc in SW (01090)
Atrazine in SW (39632)Atrazine in SW (39632)
Uranium in GW (22703)Uranium in GW (22703)
Sulfate in SW (00945) – 47K Sulfate in SW (00945) – 47K sitessites
Arsenic in GW (0100) – MCL Arsenic in GW (0100) – MCL ScenarioScenario
Manganese in GW (01056) – Manganese in GW (01056) – 80K80K
Future Possibilities?Future Possibilities? National internal DWH AND data source for at National internal DWH AND data source for at
least weekly updates to NWIS QW?least weekly updates to NWIS QW? Additional functionality possible with existing Additional functionality possible with existing
software, expertise, licensing:software, expertise, licensing:• Daily updates from NWIS site file to ARC-ready pointsDaily updates from NWIS site file to ARC-ready points• Web mapping & spatial search – location checks and Web mapping & spatial search – location checks and
neighbor searchesneighbor searches• Store multimedia attributes like site photosStore multimedia attributes like site photos• XML outputXML output• Web based ad hoc query software-easy extensions to Web based ad hoc query software-easy extensions to
DiscovererDiscoverer• Web based graphics / data visualizationWeb based graphics / data visualization• Redox potentials characterization for all QW data that Redox potentials characterization for all QW data that
qualifiesqualifies User testing/requirement gatheringUser testing/requirement gathering