32
USGS WRD National USGS WRD National Internal QW Database Internal QW Database Pilot Project Pilot Project Nate Booth Nate Booth Sandy Williamson Sandy Williamson To WRD To WRD 10-27-03 10-27-03

USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Embed Size (px)

Citation preview

Page 1: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

USGS WRD National USGS WRD National Internal QW Database Internal QW Database

Pilot ProjectPilot Project

Nate BoothNate BoothSandy WilliamsonSandy Williamson

To WRDTo WRD10-27-0310-27-03

Page 2: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Goals of the PilotGoals of the Pilot Network and upload performance:Network and upload performance: Feasibility of Feasibility of

aggregating from 45 NWIS databases in a weekendaggregating from 45 NWIS databases in a weekend Content scalability:Content scalability: Compare performance for ad hoc Compare performance for ad hoc

queries [65M] with the NAWQA Data Warehouse [8M]queries [65M] with the NAWQA Data Warehouse [8M] Access:Access: Test direct database access (ODBC) vs. NWIS Test direct database access (ODBC) vs. NWIS

export programsexport programs Conform systems:Conform systems: Test a precedence scheme to Test a precedence scheme to

collapse duplicate sites and samples across multiple collapse duplicate sites and samples across multiple NWIS systemsNWIS systems

Incremental Refresh:Incremental Refresh: Accommodate Accommodate add/change/delete transactions from NWIS without add/change/delete transactions from NWIS without “drop and rebuild”“drop and rebuild”

COTS:COTS: Evaluation of “commercial off the shelf” data Evaluation of “commercial off the shelf” data integration software for all of the aboveintegration software for all of the above

Page 3: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Why Did NAWQA Build It?Why Did NAWQA Build It?

Need to aggregate NAWQA data more Need to aggregate NAWQA data more oftenoften

NWIS changes required revisions in NWIS changes required revisions in NAWQA Agg. Scheme and caused export NAWQA Agg. Scheme and caused export programs to slowprograms to slow

NAWQA interested in QW data beyond NAWQA interested in QW data beyond NAWQANAWQA

NAWQA Hydrologists commonly asked NAWQA Hydrologists commonly asked “can we use Discoverer to access all “can we use Discoverer to access all QWDATA?”QWDATA?”

Page 4: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Other ReasonsOther Reasons Three current National aggregation systems Three current National aggregation systems

maintained – NAWQA NWIS, NASQAN -- proposed maintained – NAWQA NWIS, NASQAN -- proposed system could be used by ALLsystem could be used by ALL

No system available for nationwide “internal” No system available for nationwide “internal” QWQW

Meets many of QW user group’s Critical prioritiesMeets many of QW user group’s Critical priorities• Weekly refresh currently possible Weekly refresh currently possible • Contains all NWIS 4.1 attributesContains all NWIS 4.1 attributes• Contains a merged complete set of data by siteContains a merged complete set of data by site• Most all reference lists integratedMost all reference lists integrated

Ad hoc internal access allows:Ad hoc internal access allows:• querying by any attribute and flexible output formatsquerying by any attribute and flexible output formats

Leveraging the prepaid USGS Oracle licenseLeveraging the prepaid USGS Oracle license

Page 5: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Pilot Contains:Pilot Contains:

All site and QW data from 45 NWIS All site and QW data from 45 NWIS systems including 77 databasessystems including 77 databases

1.3M sites, 3.8M samples and 61M 1.3M sites, 3.8M samples and 61M results results

data load pipeline contains both a data load pipeline contains both a lifecycle and NWIS system ownership lifecycle and NWIS system ownership audit trail for sites, samples and audit trail for sites, samples and results, between data aggregations results, between data aggregations

Page 6: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Two Data Aggregation Two Data Aggregation SchemesSchemes

one for new and updated sites, samples one for new and updated sites, samples and results -- can be completed for the 46 and results -- can be completed for the 46 servers in the test case in approximately servers in the test case in approximately 48 hours—all on single processor PC 48 hours—all on single processor PC server [and must do complete data server [and must do complete data transfer]transfer]

the other is a complete refresh needed for the other is a complete refresh needed for deleting purged sites, samples and results deleting purged sites, samples and results until NWIS transaction logging is working -- until NWIS transaction logging is working -- requires approximately 3 days.requires approximately 3 days.

Page 7: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Dealing With Duplicated Dealing With Duplicated SamplesSamples

Distributed NWIS architecture allows duplicate Distributed NWIS architecture allows duplicate sites, samples and results with differing sites, samples and results with differing attribute valuesattribute values

To allow system to work beyond NAWQA, To allow system to work beyond NAWQA, needed a flexible and effective way to collapse needed a flexible and effective way to collapse duplicates through multiple precedence duplicates through multiple precedence schemesschemes

So far, a NAWQA scheme and a generic USGS So far, a NAWQA scheme and a generic USGS wide scheme have been appliedwide scheme have been applied

Yields multiple data marts with one transfer of Yields multiple data marts with one transfer of datadata

Page 8: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

System SpecsSystem Specs Oracle 9i database on Linux – 1-cpu PC serverOracle 9i database on Linux – 1-cpu PC server Informatica PowerMart data Informatica PowerMart data

integration/aggregation software on Windows integration/aggregation software on Windows 2000 2000

3X Oracle Discoverer ad hoc query servers on 3X Oracle Discoverer ad hoc query servers on Windows 2000 (load balanced using a Windows 2000 (load balanced using a mirrored set of Cisco Local Directors) mirrored set of Cisco Local Directors)

Systems are firewalled as per DOI/USGS Systems are firewalled as per DOI/USGS headquarters guidance and database access headquarters guidance and database access is password protectedis password protected

All systems housed and maintained by the All systems housed and maintained by the Database Team in the Wisconsin DistrictDatabase Team in the Wisconsin District

Page 9: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Example Queries Demonstrate:Example Queries Demonstrate:

Nested analytical functionsNested analytical functions Atomic level data retrievalAtomic level data retrieval Querying the audit trailQuerying the audit trail Animal tissue dataAnimal tissue data Sample accounting with drilldownSample accounting with drilldown Nested spatial functionalityNested spatial functionality

Page 10: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Ex. 1: Ex. 1: Simple Simple statisticsstatistics. . Find the min Find the min

and max and max concentrations concentrations

for surface for surface water samples. water samples. Also, count the Also, count the

number of number of detections and detections and non-detections non-detections for the group of for the group of

samples samples analyzedanalyzed

Discoverer Discoverer Wizard for Wizard for

field field selectionselection

< - Note meta data

Page 11: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Wizard 2: Custom or predefined selection Wizard 2: Custom or predefined selection conditions can be applied.conditions can be applied.

<- Note meta data maintained in the DB

Page 12: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

The database collects The database collects statistics and “learns” statistics and “learns”

performance estimates.performance estimates.

Page 13: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Results Results for for

Caffeine, Caffeine, in surface in surface water with water with statistics statistics by State, by State,

ordered by ordered by percent percent

detection. detection.

Query Query done in 2 done in 2

min.min.

Page 14: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

By pivoting the elements in the layout wizard, By pivoting the elements in the layout wizard, an end user can easily display the results in a an end user can easily display the results in a

new format.new format.

Page 15: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

This This shows shows percent percent

detections detections for for

Estradiol, Estradiol, Caffeine, Caffeine,

and and CholesterolCholesterol in surface in surface

water, water, ordered ordered by state.by state.

Page 16: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Example 2: Detailed data retrieval. Atrazine in Example 2: Detailed data retrieval. Atrazine in surface watersurface water

Wildcard condition chooses station names like Wildcard condition chooses station names like “DUCK CREEK” in WI“DUCK CREEK” in WI

Page 17: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

This query completed in 3 seconds.This query completed in 3 seconds.

Page 18: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Example 3: Audit Trail. For the pilot project, there is not a Example 3: Audit Trail. For the pilot project, there is not a GUI for querying the audit trail. Using standard SQL, what GUI for querying the audit trail. Using standard SQL, what hosts have sample keys duplicated across DBNUMs in the hosts have sample keys duplicated across DBNUMs in the

pilot?pilot?There are 18 hosts with samples

duplicated across DBNUMs. While

the staging database is not indexed for ad hoc query, this

query still completed in

about 1 minute.

Page 19: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Ex. 4: Animal Ex. 4: Animal Tissue Data. Tissue Data. For Colorado, For Colorado,

min, max, min, max, average and average and

counts of counts of tissue tissue

concentrations concentrations in trout for all in trout for all constituents constituents analyzed. analyzed. -- 20 sec. -- 20 sec.

Page 20: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Ex 5: Sample Ex 5: Sample Accounting. Accounting. Number of Number of

OrthophosphatOrthophosphate and BOD e and BOD

samples by WY samples by WY in WI since in WI since

1980. Drill into 1980. Drill into 1997 to counts 1997 to counts

by Stationby Station-- 20 sec.-- 20 sec.

Page 21: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Drill Drill into into

1997 to 1997 to counts counts

by by StationStation

Page 22: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Ex. 6: Spatial Analysis for planning a group of wells for a Ex. 6: Spatial Analysis for planning a group of wells for a sampling effort. Considering Alachlor samples in water year sampling effort. Considering Alachlor samples in water year

2002, find what well has a sample with an Alachlor 2002, find what well has a sample with an Alachlor concentration closest to the 99th percentile of all groundwater concentration closest to the 99th percentile of all groundwater environmental Alachlor concentrations in WY 2002. Then find environmental Alachlor concentrations in WY 2002. Then find

the 9 nearest wells to broaden a future sampling effort.the 9 nearest wells to broaden a future sampling effort.

Page 23: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Mercury In Bed Sediment Mercury In Bed Sediment (34910)(34910)

Page 24: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Nitrate in GW (00631)Nitrate in GW (00631)

Page 25: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Zinc in GW (01090)Zinc in GW (01090)

Page 26: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Zinc in SW (01090)Zinc in SW (01090)

Page 27: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Atrazine in SW (39632)Atrazine in SW (39632)

Page 28: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Uranium in GW (22703)Uranium in GW (22703)

Page 29: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Sulfate in SW (00945) – 47K Sulfate in SW (00945) – 47K sitessites

Page 30: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Arsenic in GW (0100) – MCL Arsenic in GW (0100) – MCL ScenarioScenario

Page 31: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Manganese in GW (01056) – Manganese in GW (01056) – 80K80K

Page 32: USGS WRD National Internal QW Database Pilot Project Nate Booth Sandy Williamson To WRD 10-27-03

Future Possibilities?Future Possibilities? National internal DWH AND data source for at National internal DWH AND data source for at

least weekly updates to NWIS QW?least weekly updates to NWIS QW? Additional functionality possible with existing Additional functionality possible with existing

software, expertise, licensing:software, expertise, licensing:• Daily updates from NWIS site file to ARC-ready pointsDaily updates from NWIS site file to ARC-ready points• Web mapping & spatial search – location checks and Web mapping & spatial search – location checks and

neighbor searchesneighbor searches• Store multimedia attributes like site photosStore multimedia attributes like site photos• XML outputXML output• Web based ad hoc query software-easy extensions to Web based ad hoc query software-easy extensions to

DiscovererDiscoverer• Web based graphics / data visualizationWeb based graphics / data visualization• Redox potentials characterization for all QW data that Redox potentials characterization for all QW data that

qualifiesqualifies User testing/requirement gatheringUser testing/requirement gathering