1
Abstract Abstract Analysis and Visualization of Hydrologic Data and Analysis and Visualization of Hydrologic Data and Observations Catalogs Using the OLAP Data Cube Technology Observations Catalogs Using the OLAP Data Cube Technology Ilya Zaslavsky Ilya Zaslavsky a a , Matthew Rodriguez , Matthew Rodriguez a a , Bora Beran , Bora Beran b b , David Valentine , David Valentine a a , and Catharine van Ingen , and Catharine van Ingen b b San Diego Supercomputer Center, UCSD, San Diego, CA San Diego Supercomputer Center, UCSD, San Diego, CA a a ; Microsoft Research, San Francisco, CA ; Microsoft Research, San Francisco, CA b b About CUAHSI HIS About CUAHSI HIS One of the goals of the CUAHSI Hydrologic One of the goals of the CUAHSI Hydrologic Information System project is to create a Information System project is to create a comprehensive portrait of hydrologic observations comprehensive portrait of hydrologic observations for the country, integrating observations data and for the country, integrating observations data and metadata from multiple sources, at federal, metadata from multiple sources, at federal, regional, and local levels. The data are made regional, and local levels. The data are made available via a uniform set of web service available via a uniform set of web service interfaces, called CUAHSI WaterOneFlow services, and interfaces, called CUAHSI WaterOneFlow services, and via online mapping applications that include the via online mapping applications that include the Data Access System for Hydrology (DASH), and Data Access System for Hydrology (DASH), and Hydroseek. To provide rapid access to data Hydroseek. To provide rapid access to data summaries, in particular for several nation-wide summaries, in particular for several nation-wide data repositories including EPA STORET, USGS NWIS, data repositories including EPA STORET, USGS NWIS, and USDA SNOTEL, we convert the observations data and USDA SNOTEL, we convert the observations data catalogs and the databases with the harvested catalogs and the databases with the harvested values, into special representations that support values, into special representations that support high performance analysis and visualization. high performance analysis and visualization. Construction of OLAP (Online Analytical Processing) Construction of OLAP (Online Analytical Processing) cubes, often called data cubes, is an approach to cubes, often called data cubes, is an approach to organizing and querying large multi-dimensional data organizing and querying large multi-dimensional data collections. We have applied the OLAP techniques, as collections. We have applied the OLAP techniques, as implemented in Microsoft SQL Server 2005, to the implemented in Microsoft SQL Server 2005, to the analysis of the catalogs from several agencies. analysis of the catalogs from several agencies. The initial results reflect hydrologic data The initial results reflect hydrologic data availability from a group of observations data availability from a group of observations data catalogs, and temporal and spatial drilldown catalogs, and temporal and spatial drilldown summaries for selected datasets. The results summaries for selected datasets. The results demonstrate geography and history of available data demonstrate geography and history of available data totals from USGS NWIS, EPA STORET, and USDA SNOTEL totals from USGS NWIS, EPA STORET, and USDA SNOTEL repositories, and spatial and temporal dynamics of repositories, and spatial and temporal dynamics of available measurements for several key parameters. available measurements for several key parameters. The Consortium of Universities for the Advancement The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) is an of Hydrologic Science, Inc. (CUAHSI) is an organization representing 120+ universities in the organization representing 120+ universities in the US and abroad. As part of its mission, CUAHSI US and abroad. As part of its mission, CUAHSI supports the development of cyberinfrastructure for supports the development of cyberinfrastructure for the hydrologic sciences. The CUAHSI HIS (Hydrologic the hydrologic sciences. The CUAHSI HIS (Hydrologic Information System) project is a multi-year multi- Information System) project is a multi-year multi- institution effort focused on consistent management institution effort focused on consistent management of observations data available from several federal of observations data available from several federal agencies (USGS, EPA, USDA, NOAA, etc.) as well as agencies (USGS, EPA, USDA, NOAA, etc.) as well as published by individual investigators. published by individual investigators. CUAHSI HIS develops service-oriented architecture CUAHSI HIS develops service-oriented architecture for hydrologic research and education, to enable for hydrologic research and education, to enable publication, discovery, retrieval, analysis and publication, discovery, retrieval, analysis and integration of hydrologic data. The project team integration of hydrologic data. The project team has defined a common information model for has defined a common information model for organizing hydrologic observation data, designed a organizing hydrologic observation data, designed a common exchange protocol (Water Markup Language) common exchange protocol (Water Markup Language) and developed a collection of SOAP web services and developed a collection of SOAP web services (WaterOneFlow services) that provide uniform access (WaterOneFlow services) that provide uniform access to different federal, state and local hydrologic to different federal, state and local hydrologic data repositories. data repositories. This system is now implemented as a collection of This system is now implemented as a collection of Hydrologic Information Servers deployed at NSF- Hydrologic Information Servers deployed at NSF- supported Hydrologic Observatory test beds. supported Hydrologic Observatory test beds. EPA STORET EPA STORET Conclusion Conclusion OLAP datacubes allow domain scientists to access and mine OLAP datacubes allow domain scientists to access and mine huge volumes of observations data and catalogs simply and huge volumes of observations data and catalogs simply and efficiently using familiar desktop tools such as Excel efficiently using familiar desktop tools such as Excel PivotTables. PivotTables. We demonstrate how OLAP datacubes are constructed for We demonstrate how OLAP datacubes are constructed for several large observations data catalogs which contain several large observations data catalogs which contain site summary information, including site’s period of site summary information, including site’s period of record for each parameter, or series. The datacubes are record for each parameter, or series. The datacubes are analyzed spatially and temporally. The the summary maps analyzed spatially and temporally. The the summary maps and diagrams of data availability by various datacube and diagrams of data availability by various datacube dimensions and filters appear useful to understanding the dimensions and filters appear useful to understanding the nature of the datsets. Constructing datacubes over data nature of the datsets. Constructing datacubes over data values as well can automatically produce what are now values as well can automatically produce what are now referred to as unique data products such as daily and referred to as unique data products such as daily and yearly statistics (Mean, Minimum, Maximum, Variance) yearly statistics (Mean, Minimum, Maximum, Variance) across one or more sites filtered by quality or other across one or more sites filtered by quality or other attribute. attribute. Future work will include construction of domain specific Future work will include construction of domain specific hierarchies (e.g. using HUC codes), and generation and hierarchies (e.g. using HUC codes), and generation and analysis of watershed-specific cubes that integrate analysis of watershed-specific cubes that integrate observations from multiple sources. observations from multiple sources. Changes over time Changes over time Some physical properties by Some physical properties by decade: decade: Available Data Total Available Data Total USGS NWIS Daily USGS NWIS Daily Values Values US Map of USGS Daily Values US Map of USGS Daily Values Series Series by 1 degree latitude- by 1 degree latitude- longitude bins longitude bins CUAHSI HIS Service Oriented Architecture: CUAHSI HIS Service Oriented Architecture: General Outline General Outline OLAP Technology for Catalogs OLAP Technology for Catalogs Observations data catalogs assembled within CUAHSI HIS follow Observations data catalogs assembled within CUAHSI HIS follow standard relational schema of the standard relational schema of the Observations Data Model Observations Data Model (ODM). The ODM (ODM). The ODM catalogs collections of observation series, each described by: catalogs collections of observation series, each described by: what: what: observed parameter variable including metadata such as observed parameter variable including metadata such as processing methodology, units, quality assessments processing methodology, units, quality assessments where: where: location or site where the observation was made including location or site where the observation was made including relevant site metadata such as Hydrologic Unit or vegetation relevant site metadata such as Hydrologic Unit or vegetation class class when: when: start and end date of measurements and time granularity of start and end date of measurements and time granularity of measurement measurement OLAP (Online Analytical Processing) is an approach to rapidly OLAP (Online Analytical Processing) is an approach to rapidly analyze such large multi-dimensional databases. OLAP analyze such large multi-dimensional databases. OLAP Datacubes Datacubes organize data along organize data along dimensions dimensions , enable drilldown on , enable drilldown on hierarchies hierarchies , and , and aggregates aggregates such as average or minimum. Using such as average or minimum. Using SQL Server Analysis Services, SQL Server Analysis Services, an an OLAP cube can be created from any database with a star schema such OLAP cube can be created from any database with a star schema such as ODM. using as ODM. using SQL Server Analysis Services SQL Server Analysis Services . Simple aggregates (such as . Simple aggregates (such as daily averages or yearly maxima) can be pre-computed for improved daily averages or yearly maxima) can be pre-computed for improved query performance. query performance. Links Links CUAHSI HIS: CUAHSI HIS: http:// www.cuahsi.org /his/ HIS Wiki @ SDSC : HIS Wiki @ SDSC : http:// river.sdsc.edu /wiki BWC OLAP user manual: BWC OLAP user manual: http:// bwc.berkeley.edu/UserManual/UserManual.htm Building a Cube from a CUAHSI ODM database: Building a Cube from a CUAHSI ODM database: http://research.microsoft.com/~vaningen/Hydrology/BearRiver/default. htm Using OLAP Datacubes Using OLAP Datacubes Bear River Sample Watershed Bear River Sample Watershed stream gage comings and goings stream gage comings and goings 1958-1988 1958-1988 Daily Discharge Daily Discharge Averaged by Year, Averaged by Year, Month, Week Month, Week 1980-1990 1980-1990 (Top 12 Sites Only) (Top 12 Sites Only) Annual Mean Monthly Mean Weekly Mean Drilling Drilling down time down time hierarchy hierarchy gives gives overall overall sense of sense of wet years wet years and flood and flood events events Datacube enables simple temporal Datacube enables simple temporal drilldown. These examples are for the drilldown. These examples are for the ODM sample database from the Little Bear ODM sample database from the Little Bear River region in Utah (2 sites, 12 River region in Utah (2 sites, 12 series, 593,000 observations). series, 593,000 observations). The three graphs to the right plot mean The three graphs to the right plot mean annual, monthly, and weekly discharge at annual, monthly, and weekly discharge at the dozen highest flow gages in the the dozen highest flow gages in the watershed over the years 1980 to 1990. watershed over the years 1980 to 1990. The annual graph clearly shows the wet The annual graph clearly shows the wet years - years of highest flow. The years - years of highest flow. The monthly and weekly graph show monthly and weekly graph show differences between years. For example, differences between years. For example, the high flows in 1984 occur in the the high flows in 1984 occur in the middle of the year as the result of snow middle of the year as the result of snow melt. The high flows in 1986 begin in melt. The high flows in 1986 begin in the early part of the year before the the early part of the year before the usual snow melt continue through the usual snow melt continue through the summer season. summer season. Datacubes can be easily browsed with Excel PivotTables and drag-drop Datacubes can be easily browsed with Excel PivotTables and drag-drop over the internet over the internet MatLab and other rich analysis client access integration coming MatLab and other rich analysis client access integration coming Browsing can give fast overview insight into data availability and Browsing can give fast overview insight into data availability and quality quality Mean Period of Record Mean Period of Record Mapping Data Availability Mapping Data Availability Different types of nutrients by Different types of nutrients by decade decade These are examples from the These are examples from the USGS National Water USGS National Water Information System datacube. Information System datacube. The cube contains 1.4 million The cube contains 1.4 million sites and 12 million series. sites and 12 million series. To the left are “maps” of data To the left are “maps” of data availability, by one degree availability, by one degree cells. Note higher volumes of cells. Note higher volumes of measurements accumulated along measurements accumulated along the coasts, on the east coast the coasts, on the east coast especially. Stations with the especially. Stations with the longest mean period of records longest mean period of records are distributed across the are distributed across the country, with peaks in the country, with peaks in the central parts. central parts. To the right is dynamics in To the right is dynamics in data availability for selected data availability for selected parameters. The top chart parameters. The top chart shows changes in measurement shows changes in measurement of nutrients over years: note of nutrients over years: note the drop in phosphate and the drop in phosphate and increase in phosphorus and increase in phosphorus and orthophosphate measurements. orthophosphate measurements. The lower chart shows that The lower chart shows that ‘discharge’ dropped as ‘gauge ‘discharge’ dropped as ‘gauge height’ increased in the last height’ increased in the last two decades two decades To the left, data availability To the left, data availability for a parameter, discharge, for a parameter, discharge, for a site shows that data for a site shows that data availability is not always availability is not always continuous. Sites have continuous. Sites have disappeared, and reappeared. disappeared, and reappeared. USDA SNOTEL USDA SNOTEL The SNOTEL datacube The SNOTEL datacube contains 810 sites, contains 810 sites, 4552 series, and 28 4552 series, and 28 million million observations. The observations. The data data is is collected by collected by a standard set of a standard set of variables, and the variables, and the number of number of observations observations increases over time. increases over time. The SNOTEL datacube The SNOTEL datacube also contains also contains values, so you can values, so you can rapidly investigate rapidly investigate aggregations over aggregations over county, states, and county, states, and the entire dataset. the entire dataset. Total Available Data Total Available Data The EPA STORET datacube contains 273K sites and 2.7M series. The EPA STORET datacube contains 273K sites and 2.7M series. Florida is the source of about 25% of the total Florida is the source of about 25% of the total records. records. Water Quality Data Record Counts by Site Water Quality Data Record Counts by Site Classification Classification Number of years of Number of years of record by start record by start decade decade About 60% of About 60% of the water the water quality records quality records are short term are short term measurements measurements (one year or (one year or less in less in duration). The duration). The starting decade starting decade of the longer of the longer series are to series are to the right. the right. 93% of the series 93% of the series are water quality are water quality data. data. Testbed H IS Servers C entral H IS servers D esktop clients Custom izable w eb interface (D A SH ) HTM L -X M L W SD L -SO A P G lobalsearch (H ydroseek) H IS Lite Servers External data providers Deployment to test beds O ther popularonline clients D ata publishing Testbed H IS Servers Testbed H IS Servers Testbed H IS Servers C entral H IS servers C entral H IS servers ArcG IS Matlab IDL MapW indow Excel Programming (Fortran,C ,VB) D esktop clients Custom izable w eb interface (D A SH ) HTM L -X M L W SD L -SO A P Modeling (OpenM I) G lobalsearch (H ydroseek) W aterOneFlow W eb Services, W aterM L H IS Lite Servers H IS Lite Servers H IS Lite Servers External data providers External data providers Deployment to test beds O ther popularonline clients O DM D ataLoader Stream ing Data Loading O ntology tagging (H ydrotagger) W SD L and O DM registration D ata publishing ODM Tools Serverconfig tools

Abstract Analysis and Visualization of Hydrologic Data and Observations Catalogs Using the OLAP Data Cube Technology Ilya Zaslavsky a, Matthew Rodriguez

Embed Size (px)

Citation preview

Page 1: Abstract Analysis and Visualization of Hydrologic Data and Observations Catalogs Using the OLAP Data Cube Technology Ilya Zaslavsky a, Matthew Rodriguez

AbstractAbstract

Analysis and Visualization of Hydrologic Data and Analysis and Visualization of Hydrologic Data and Observations Catalogs Using the OLAP Data Cube TechnologyObservations Catalogs Using the OLAP Data Cube Technology

Ilya ZaslavskyIlya Zaslavskyaa, Matthew Rodriguez, Matthew Rodriguezaa, Bora Beran, Bora Beranbb, David Valentine, David Valentineaa, and Catharine van Ingen, and Catharine van Ingenbb

San Diego Supercomputer Center, UCSD, San Diego, CASan Diego Supercomputer Center, UCSD, San Diego, CAaa; Microsoft Research, San Francisco, CA; Microsoft Research, San Francisco, CAbb

About CUAHSI HISAbout CUAHSI HIS

One of the goals of the CUAHSI Hydrologic Information System project One of the goals of the CUAHSI Hydrologic Information System project is to create a comprehensive portrait of hydrologic observations for the is to create a comprehensive portrait of hydrologic observations for the country, integrating observations data and metadata from multiple country, integrating observations data and metadata from multiple sources, at federal, regional, and local levels. The data are made sources, at federal, regional, and local levels. The data are made available via a uniform set of web service interfaces, called CUAHSI available via a uniform set of web service interfaces, called CUAHSI WaterOneFlow services, and via online mapping applications that WaterOneFlow services, and via online mapping applications that include the Data Access System for Hydrology  (DASH), and include the Data Access System for Hydrology  (DASH), and Hydroseek. To provide rapid access to data summaries, in particular for Hydroseek. To provide rapid access to data summaries, in particular for several nation-wide data repositories including EPA STORET, USGS several nation-wide data repositories including EPA STORET, USGS NWIS, and USDA SNOTEL, we convert the observations data catalogs NWIS, and USDA SNOTEL, we convert the observations data catalogs and the databases with the harvested values, into special and the databases with the harvested values, into special representations that support high performance analysis and representations that support high performance analysis and visualization. visualization.

Construction of OLAP (Online Analytical Processing) cubes, often Construction of OLAP (Online Analytical Processing) cubes, often called data cubes, is an approach to organizing and querying large called data cubes, is an approach to organizing and querying large multi-dimensional data collections. We have applied the OLAP multi-dimensional data collections. We have applied the OLAP techniques, as implemented in Microsoft SQL Server 2005, to the techniques, as implemented in Microsoft SQL Server 2005, to the analysis of the catalogs from several agencies. analysis of the catalogs from several agencies.

The initial results reflect hydrologic data availability from a group of The initial results reflect hydrologic data availability from a group of observations data catalogs, and temporal and spatial drilldown observations data catalogs, and temporal and spatial drilldown summaries for selected datasets. The results demonstrate geography summaries for selected datasets. The results demonstrate geography and history of available data totals from USGS NWIS, EPA STORET, and history of available data totals from USGS NWIS, EPA STORET, and USDA SNOTEL repositories, and spatial and temporal dynamics of and USDA SNOTEL repositories, and spatial and temporal dynamics of available measurements for several key parameters.available measurements for several key parameters.

The Consortium of Universities for the Advancement of Hydrologic The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) is an organization representing 120+ Science, Inc. (CUAHSI) is an organization representing 120+ universities in the US and abroad. As part of its mission, CUAHSI universities in the US and abroad. As part of its mission, CUAHSI supports the development of cyberinfrastructure for the hydrologic supports the development of cyberinfrastructure for the hydrologic sciences. The CUAHSI HIS (Hydrologic Information System) project is sciences. The CUAHSI HIS (Hydrologic Information System) project is a multi-year multi-institution effort focused on consistent management a multi-year multi-institution effort focused on consistent management of observations data available from several federal agencies (USGS, of observations data available from several federal agencies (USGS, EPA, USDA, NOAA, etc.) as well as published by individual EPA, USDA, NOAA, etc.) as well as published by individual investigators. investigators.

CUAHSI HIS develops service-oriented architecture for hydrologic CUAHSI HIS develops service-oriented architecture for hydrologic research and education, to enable publication, discovery, retrieval, research and education, to enable publication, discovery, retrieval, analysis and integration of hydrologic data. The project team has analysis and integration of hydrologic data. The project team has defined a common information model for organizing hydrologic defined a common information model for organizing hydrologic observation data, designed a common exchange protocol (Water observation data, designed a common exchange protocol (Water Markup Language) and developed a collection of SOAP web services Markup Language) and developed a collection of SOAP web services (WaterOneFlow services) that provide uniform access to different (WaterOneFlow services) that provide uniform access to different federal, state and local hydrologic data repositories. federal, state and local hydrologic data repositories.

This system is now implemented as a collection of Hydrologic This system is now implemented as a collection of Hydrologic Information Servers deployed at NSF-supported Hydrologic Information Servers deployed at NSF-supported Hydrologic Observatory test beds.Observatory test beds.

EPA STORETEPA STORET

ConclusionConclusionOLAP datacubes allow domain scientists to access and mine huge volumes OLAP datacubes allow domain scientists to access and mine huge volumes of observations data and catalogs simply and efficiently using familiar desktop of observations data and catalogs simply and efficiently using familiar desktop tools such as Excel PivotTables. tools such as Excel PivotTables.

We demonstrate how OLAP datacubes are constructed for several large We demonstrate how OLAP datacubes are constructed for several large observations data catalogs which contain site summary information, including observations data catalogs which contain site summary information, including site’s period of record for each parameter, or series. The datacubes are site’s period of record for each parameter, or series. The datacubes are analyzed spatially and temporally. The the summary maps and diagrams of analyzed spatially and temporally. The the summary maps and diagrams of data availability by various datacube dimensions and filters appear useful to data availability by various datacube dimensions and filters appear useful to understanding the nature of the datsets. Constructing datacubes over data understanding the nature of the datsets. Constructing datacubes over data values as well can automatically produce what are now referred to as unique values as well can automatically produce what are now referred to as unique data products such as daily and yearly statistics (Mean, Minimum, Maximum, data products such as daily and yearly statistics (Mean, Minimum, Maximum, Variance) across one or more sites filtered by quality or other attribute. Variance) across one or more sites filtered by quality or other attribute.

Future work will include construction of domain specific hierarchies (e.g. Future work will include construction of domain specific hierarchies (e.g. using HUC codes), and generation and analysis of watershed-specific cubes using HUC codes), and generation and analysis of watershed-specific cubes that integrate observations from multiple sources. that integrate observations from multiple sources.

Changes over timeChanges over time

Some physical properties by decade: Some physical properties by decade: Available Data TotalAvailable Data Total

USGS NWIS Daily ValuesUSGS NWIS Daily Values

US Map of USGS Daily Values Series US Map of USGS Daily Values Series by 1 degree latitude-longitude binsby 1 degree latitude-longitude bins

CUAHSI HIS Service Oriented Architecture: General OutlineCUAHSI HIS Service Oriented Architecture: General Outline

OLAP Technology for CatalogsOLAP Technology for CatalogsObservations data catalogs assembled within CUAHSI HIS follow standard relational Observations data catalogs assembled within CUAHSI HIS follow standard relational schema of the schema of the Observations Data ModelObservations Data Model (ODM). The ODM catalogs collections of (ODM). The ODM catalogs collections of observation series, each described by:observation series, each described by:

• what: what: observed parameter variable including metadata such as processing observed parameter variable including metadata such as processing methodology, units, quality assessmentsmethodology, units, quality assessments

• where: where: location or site where the observation was made including relevant site location or site where the observation was made including relevant site metadata such as Hydrologic Unit or vegetation classmetadata such as Hydrologic Unit or vegetation class

• when: when: start and end date of measurements and time granularity of measurementstart and end date of measurements and time granularity of measurement

OLAP (Online Analytical Processing) is an approach to rapidly analyze such large multi-OLAP (Online Analytical Processing) is an approach to rapidly analyze such large multi-dimensional databases. OLAPdimensional databases. OLAP Datacubes Datacubes organize data along organize data along dimensionsdimensions, enable , enable drilldown on drilldown on hierarchieshierarchies, and , and aggregatesaggregates such as average or minimum. Using such as average or minimum. Using SQL Server SQL Server Analysis Services, Analysis Services, an OLAP cube can be created from any database with a star schema an OLAP cube can be created from any database with a star schema such as ODM. using such as ODM. using SQL Server Analysis ServicesSQL Server Analysis Services. Simple aggregates (such as daily . Simple aggregates (such as daily averages or yearly maxima) can be pre-computed for improved query performance. averages or yearly maxima) can be pre-computed for improved query performance.

LinksLinksCUAHSI HIS: CUAHSI HIS: http://www.cuahsi.org/his/

HIS Wiki @ SDSC : HIS Wiki @ SDSC : http://river.sdsc.edu/wiki

BWC OLAP user manual: BWC OLAP user manual: http://bwc.berkeley.edu/UserManual/UserManual.htm

Building a Cube from a CUAHSI ODM database: Building a Cube from a CUAHSI ODM database: http://research.microsoft.com/~vaningen/Hydrology/BearRiver/default.htm

Using OLAP DatacubesUsing OLAP Datacubes

Bear River Sample WatershedBear River Sample Watershedstream gage comings and goingsstream gage comings and goings1958-19881958-1988

Daily Discharge Averaged Daily Discharge Averaged by Year, Month, Week by Year, Month, Week

1980-1990 1980-1990 (Top 12 Sites Only)(Top 12 Sites Only)

Annual Mean

MonthlyMean

WeeklyMean

Drilling down Drilling down time hierarchy time hierarchy gives overall gives overall sense of wet sense of wet years and years and flood eventsflood events

Datacube enables simple temporal drilldown. These Datacube enables simple temporal drilldown. These examples are for the ODM sample database from the examples are for the ODM sample database from the Little Bear River region in Utah (2 sites, 12 series, Little Bear River region in Utah (2 sites, 12 series, 593,000 observations). 593,000 observations).

The three graphs to the right plot mean annual, The three graphs to the right plot mean annual, monthly, and weekly discharge at the dozen highest monthly, and weekly discharge at the dozen highest flow gages in the watershed over the years 1980 to flow gages in the watershed over the years 1980 to 1990. The annual graph clearly shows the wet years - 1990. The annual graph clearly shows the wet years - years of highest flow. The monthly and weekly graph years of highest flow. The monthly and weekly graph show differences between years. For example, the show differences between years. For example, the high flows in 1984 occur in the middle of the year as high flows in 1984 occur in the middle of the year as the result of snow melt. The high flows in 1986 begin the result of snow melt. The high flows in 1986 begin in the early part of the year before the usual snow melt in the early part of the year before the usual snow melt continue through the summer season. continue through the summer season.

• Datacubes can be easily browsed with Excel PivotTables and drag-drop over the internet Datacubes can be easily browsed with Excel PivotTables and drag-drop over the internet • MatLab and other rich analysis client access integration comingMatLab and other rich analysis client access integration coming• Browsing can give fast overview insight into data availability and qualityBrowsing can give fast overview insight into data availability and quality

Mean Period of RecordMean Period of Record

Mapping Data AvailabilityMapping Data Availability

Different types of nutrients by decade Different types of nutrients by decade

These are examples from the USGS These are examples from the USGS National Water Information System National Water Information System datacube. The cube contains 1.4 million datacube. The cube contains 1.4 million sites and 12 million series. sites and 12 million series.

To the left are “maps” of data availability, To the left are “maps” of data availability, by one degree cells. Note higher by one degree cells. Note higher volumes of measurements accumulated volumes of measurements accumulated along the coasts, on the east coast along the coasts, on the east coast especially. Stations with the longest especially. Stations with the longest mean period of records are distributed mean period of records are distributed across the country, with peaks in the across the country, with peaks in the central parts.central parts.

To the right is dynamics in data To the right is dynamics in data availability for selected parameters. The availability for selected parameters. The top chart shows changes in top chart shows changes in measurement of nutrients over years: measurement of nutrients over years: note the drop in phosphate and increase note the drop in phosphate and increase in phosphorus and orthophosphate in phosphorus and orthophosphate measurements. The lower chart shows measurements. The lower chart shows that ‘discharge’ dropped as ‘gauge that ‘discharge’ dropped as ‘gauge height’ increased in the last two decadesheight’ increased in the last two decades

To the left, data availability for a To the left, data availability for a parameter, discharge, for a site shows parameter, discharge, for a site shows that data availability is not always that data availability is not always continuous. Sites have disappeared, and continuous. Sites have disappeared, and reappeared.reappeared.

USDA SNOTELUSDA SNOTEL

The SNOTEL datacube The SNOTEL datacube contains 810 sites, 4552 contains 810 sites, 4552 series, and 28 million series, and 28 million observations. The data observations. The data is is collected by a standard set collected by a standard set of variables, and the of variables, and the number of observations number of observations increases over time.increases over time.

The SNOTEL datacube The SNOTEL datacube also contains values, so also contains values, so you can rapidly investigate you can rapidly investigate aggregations over county, aggregations over county, states, and the entire states, and the entire dataset.dataset.

Total Available DataTotal Available Data

The EPA STORET datacube contains 273K sites and 2.7M series. The EPA STORET datacube contains 273K sites and 2.7M series.

Florida is the source of about 25% of the total records. Florida is the source of about 25% of the total records.

Water Quality Data Record Counts by Site ClassificationWater Quality Data Record Counts by Site Classification

Number of years of record Number of years of record by start decadeby start decade

About 60% of the About 60% of the water quality records water quality records are short term are short term measurements (one measurements (one year or less in year or less in duration). The duration). The starting decade of starting decade of the longer series are the longer series are to the right.to the right.

93% of the series are 93% of the series are water quality data. water quality data.

Test bed HISServers

Central HIS servers

ArcGIS

Matlab

IDL

MapWindow

Excel

Programming (Fortran, C, VB)

Desktop clients

Customizable web interface (DASH)

HTML - XML

WS

DL

-SO

AP

Modeling (OpenMI)

Global search (Hydroseek)

WaterOneFlow Web Services, WaterML

HIS LiteServers

External data providers

Deployment to test beds

Other popular online clients

ODM DataLoader

Streaming Data Loading

Ontology tagging (Hydrotagger)

WSDL and ODM registration

Data publishing

ODMTools

Server configtools

Test bed HISServers

Test bed HISServers

Test bed HISServers

Central HIS servers

Central HIS servers

ArcGIS

Matlab

IDL

MapWindow

Excel

Programming (Fortran, C, VB)

Desktop clients

Customizable web interface (DASH)

HTML - XML

WS

DL

-SO

AP

Modeling (OpenMI)

Global search (Hydroseek)

WaterOneFlow Web Services, WaterML

HIS LiteServersHIS LiteServersHIS LiteServers

External data providers

External data providers

Deployment to test beds

Other popular online clients

ODM DataLoader

Streaming Data Loading

Ontology tagging (Hydrotagger)

WSDL and ODM registration

Data publishing

ODMTools

Server configtools