30
Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of Universities for the Advancement of Hydrologic Sciences, Inc.; HIS = Hydrologic Information System NSF-supported Collaborative Project: UT Austin + SDSC + Drexel + Duke +Utah State www.cuahsi.org/his/

Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

Embed Size (px)

Citation preview

Page 1: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

Web Services and Water Markup Language for Distributed Hydrologic

Data AccessIlya Zaslavsky

San Diego Supercomputer Center, UCSD

CUAHSI = Consortium of Universities for the Advancement of Hydrologic Sciences, Inc.; HIS = Hydrologic Information System

NSF-supported Collaborative Project: UT Austin + SDSC + Drexel + Duke +Utah State

www.cuahsi.org/his/

Page 2: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

The Grid is becoming the backbone for collaborative science and data sharing

CI is about RE-USING data and research resources !!

Page 3: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

Cyberinfrastructure for hydrology (in the U.S.)

• Hydrologic observations:• Reliance on federally-organized data collection (NWIS, STORET, NCDC, etc.)

with huge and complex nomenclatures simplifying access to federal repositories relatively lower emphasis on data ownership

• Handling time in both UTC and local• Various spatial offsets• Multiple data types: time series, fields, spatial data

• Integrative discipline:• Interoperation with atmospheric, ocean, soils, geomorphology, social datasets

and services…• Community:

• Organized by “natural boundaries” networks of relatively autonomous self-managed data nodes

• Partnership with public sector water management• 96% use Windows for research; Excel, ArcGIS, Matlab – most popular

Mix of standards, software licensing models, vocabularies; leveraging tools developed in other CI projects.

Page 4: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

WaterOneFlow Web Services

Data access through web

services

Data storage through web

services

Dow

nlo

ads

Upl

oa

ds

Observatory servers

Workgroup HIS

SDSC HIS servers

3rd party servers

e.g. USGS, NCDC

GIS

Matlab

IDL

Splus, R

D2K, I2K

Programming (Fortran, C, VB)

Web services interface

DASH: Data Access System for Hydrology

Information input, display, query and output services

Preliminary data exploration and discovery. See what is available and perform exploratory analyses

HTML -XML WS

DL

- SO

AP

Hydrologic Information System Service Oriented Architecture

Page 5: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

SupercomputerCenters:NCSA,TACC

Domain Sciences:

Unidata, NCARLTER, GEON

Government:USGS, EPA,

NCDC, USDA

Industry:ESRI, Kisters,

OpenMI

HISTeam

WATERSTestbed

WATERS Network Information System

CUAHSI HIS

The CUAHSI Community, HIS and WATERS

CUAHSI: 116 Universities (Nov. 2006)

HIS Team:Texas, SDSC,Utah, Drexel,

Duke

Page 6: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

CUAHSI HIS as a mediator across multiple agency and PI data

• Keeps identifiers for sites, variables, etc. across observation networks

• Manages and publishes controlled vocabularies, and provides vocabulary/ontology management and update tools

• Provides common structural definitions for data interchange

• Provides a sample protocol implementation• Governance framework: a consortium of universities,

MOUs with federal agencies, collaboration with key commercial partners, led by renowned hydrologists, and NSF support for core development and test beds

Page 7: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

Main Components

• Hydrologic Observations Data Model, ODM databases and site catalogs

• Web services for accessing hydrologic repositories and data in ODMs

• Clients: Online Data Access System + multiple desktopapplication add-ons

• Network of CUAHSI HIS servers, deployed at hydrologic observatories and integrated with other observing systems and sensor data collection

NWISNWIS

ArcGISArcGIS

ExcelExcel

NCARNCAR

UnidataUnidata

NASANASAStoretStoret

NCDCNCDC

AmerifluxAmeriflux

MatlabMatlabAccessAccess SASSAS

FortranFortran

Visual BasicVisual Basic

C/C++C/C++

CUAHSI Web ServicesCUAHSI Web Services

Remote CUAHSI HIS Node (Windows)

Data

IIS Web ServerASP . Net

SQL ServerArcGIS

Technologies

HDASHODM

Web

ServiceWeb Services

Web Serviceproxies

Remote CUAHSI HIS Node (Windows)

Data

IIS Web ServerASP . Net

SQL ServerArcGIS

Technologies

HDASHODM

Web

ServiceWeb Services

Web Serviceproxies

Remote CUAHSI HIS Node (Windows)

Data

IIS Web ServerASP . Net

SQL ServerArcGIS

Technologies

HDASHODM

Web

ServiceWeb Services

Web Serviceproxies

Remote CUAHSI HIS Node (Windows)

Data

IIS Web ServerASP . Net

SQL ServerArcGIS

Technologies

HDASHODM

Web

ServiceWeb Services

Web Serviceproxies

Page 8: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

Point Observations Information Model

• A data source operates an observation network• A network is a set of observation sites• A site is a point location where one or more variables are measured• A variable is a property describing the flow or quality of water• An observation series is an array of observations at a given site, for a given variable, with start time and end time• A value is an observation of a variable at a particular time• A qualifier is a symbol that provides additional information about the value

Data Source

Network

Sites

ObservationSeries

Values

{Value, Time, Qualifier}

USGS

Streamflow gages

Neuse River near Clayton, NC

Discharge, stage, start, end (Daily or instantaneous)

206 cfs, 13 August 2006

Return network information, and variable information within the network

Return site information, including a series catalog of variables measured at a site with their periods of record

Return time series of values

Page 9: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

Challenges… (1/2)• Sites

• STORET has stations, and measurement points, at various offsets…• Site metadata lacking and inconsistent (e.g. 2/3 no HUC info, 1/3 no

state/county info); agency site files need to be upgraded to ODM…• A groundwater site is different than a stream gauge…

• Censored values• Values have qualifiers, such as “less than”, “censored”, etc. – per value.

Sometimes mixed data types.. • Units

• There are multiple renditions of the same units, even within one repository• There may be several units for the same parameter code (STORET)• If no value recorded – there are no units??• Unit multipliers

• E.g. NCDC ASOS keeps measurements as integers, and provides a multiplier for each variable

• Sources• STORET requires organization IDs (which collected data for STORET) in

addition to site IDs• Time stamps: ISO 8601

• A service to determine UTC offsets given lat/lon and date??

Page 10: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

Challenges… (2/2)• Values retrieval

• USGS: by site, variable, time range• EPA: by organization-site, variable, medium, units, time range• NCDC: fewer variables, period of record applies to site, not to

seriesCatalog• Variable semantics

• Variable names and measurement methods don’t match• E.g. NWIS parameter # 625 is labeled ‘ammonia + organic nitrogen‘,

Kjeldahl method is used for determination but not mentioned in parameter description. In STORET this parameter is referred to as Kjeldahl Nitrogen.

• One-to-one mapping not always possible• E.g. NWIS: ‘bed sediment’ and ‘suspended sediment’ medium types vs.

STORET’s ‘sediment’.

Ontology tagging, semantic mediation

Page 11: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

- From different database structures, data collection procedures, quality control, access mechanisms to uniform signatures … Water Markup Language- Tested in different environments- Standards-based- Can support advanced interfaces via harvested catalogs- Accessible to community- Templates for development of new services- Optimized, error handling, memory management, versioning, run from fast servers- Working with agencies on setting up services and updating site files

NWIS Daily Values (discharge), NWIS Ground Water, NWIS Unit Values (real time), NWIS Instantaneous Irregular Data, EPA STORET, NCDC ASOS, DAYMET, MODIS, NAM12K, ODM

Page 12: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

WaterOneFlow API, v. 1.0

• GetValues • Returns a TimeSeries

• GetSiteInfo• Station Information, including a period of record

• GetVariableInfo• Returns variable/parameter information

• Also: GetSites, GetVariables• Object and string output

Page 13: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

WaterML design principles

• Driven largely by hydrologists; the goal is to capture semantics of hydrologic observations discovery and retrieval

• Relies to a large extent on the information model as in ODM (Observations Data Model), and terms are aligned as much as possible• Several community reviews since 2005

• Driven by data served by USGS NWIS, EPA STORET, multiple individual PI-collected observations

• Is no more than an exchange schema for CUAHSI web services

• The least barrier for adoption by hydrologists• A fairly simple and rigid schema tuned to the current

implementation• Conformance with OGC specs not in the initial scope

Page 14: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

WaterML key elements

• Response Types

– SiteInfo

– Variables

– TimeSeries

• Key Elements– site– sourceInfo– seriesCatalog– variable– timeSeries

• values

– queryInfo

GetValues

GetVariableInfo

GetSiteInfo

Page 15: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

variables

variablesResponse

variable

1

many

timeSeriesqueryInfo

criteria

timeSeriesResponse

variable

sourceInfo

queryURL

values

site queryInfo

criteria

sitesResponse

seriesCatalog siteInfo

queryURL

variable

series

variableTimeInterval

1

many

Structure of responses

Page 16: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

SiteInfo responsequeryInfo

site

name

code

location

seriesCatalog

variables

what

how many

when

TimePeriodType

Page 17: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

TimeSeries responsequeryInfo

location

variable

values

Page 18: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

Clients• Tested with .Net and Java• Desktop clients:

Excel, Matlab, ArcGIS, VB.NET,more beingwritten

• Web client: DASH (Data Access System for Hydrology): http://river.sdsc.edu/DASH (beta)

Page 19: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

DA

SH

AGS Server

IIS

Windows 2003 Server4 GB Ram1 TB Disk

Quad Core CPU

SQLServer

VS 2

005

WaterOneFlow Web Services

ArcGIS 9.2

GIS Data Mxd Service

OD

M L

oad

er

OD

M t

ools

OD

M

Current Deployment Current Deployment ArchitectureArchitecture

Direct DB connection

Page 20: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

SQL Server

ODMs and catalogs. All instancesexposed as ODM (i.e. have standard ODM tables or views: Sites, Variables, SeriesCatalog, etc.)

NWIS-IID

NWIS-DV

ASOS

STORET

TCEQ

BearRiver

. . .

Spatial store

Geodatabase or collection of shapefilesor both

NWIS-IID points

NWIS-DV points

ASOS points

STORET points

TCEQ points

BearRiver points

. . .

My new ODM

My new points

More databases

More synced layers

DASH Web Application

Background layers

(can be in the same or separate spatial store)

WOF services

Web services from a common template

NWIS-IID WS

NWIS-DV WS

ASOS WS

STORET WS

TCEQ WS

BearRiver WS

. . .My new WS

More WS fromODM-WS template

USGS

NCDC

EPA

TCEQ

Web Configuration fileStores information about registered networks

MXDStores information about layers

WSDLs

, web

serv

ice U

RLs Connection

strings

Layer info,

symbology, etc.

ODMDataLoader

2

6

5

3

1

4

WORKGROUPHISSERVERORGANIZATION

STEPS FORREGISTERINGOBSERVATIO

NDATA

Page 21: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

HIS Scalability

• Adding…– …data types and datasets; processing models and services; servers;

users and roles – – - shall not create unmanageable bottlenecks that require system re-

engineering

• Designing for scalability:– Distilling a generic set of web service signatures; resolving semantic

and structural heterogeneities– Using ODM as a common generic format for time series data, for ease

of coding and uniform search interfaces– DASH GUI design to abstract specifics of disparate repositories– Leveraging common CI components developed in GEON– Working with agencies to remove web service bottlenecks

Page 22: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

Near future

• Deployment at the 11 WATERS test beds, and beyond• And documenting experience• Organizing HIS support

• Working with federal and state agencies on web services• NCDC, USGS, EPA, state agencies (e.g. TCEQ)• Analysis services for site catalogs and ODMs ( ---- see next slide)

• OGC connections: WaterML is OGC Discussion Paper (approved at April 2007 TC Meeting) • Need to be reviewed further, based on initial implementation• Internationalization (with CSIRO WRON, European WISE, H2OML) • Carry CUAHSI WaterML messages over O&M, as O&M profile

• Towards WaterML and web services 1.1

Page 23: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

US Map of USGS Observations

Antarctica

Puerto Rico

Hawaii

Alaska

Page 24: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

US Map of USGS Observations – by Mean Period of Record

Page 25: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

Different types of nutrients by decade: Available Data Total

Page 26: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

Some physical properties by decade: Available Data Total

Page 27: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

Same without discharge, gage height, temperature and precipitation (the four most common, in that order):Available Data Total

Page 28: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

Near future

• Deployment at the 11 WATERS test beds, and beyond• And documenting experience• Organizing HIS support

• Working with federal and state agencies on web services• NCDC, USGS, EPA, state agencies (e.g. TCEQ)• Analysis services for site catalogs and ODMs ( ---- see next slide)

• OGC connections: WaterML is OGC Discussion Paper (approved at April 2007 TC Meeting) • Need to be reviewed further, based on initial implementation• Internationalization (with CSIRO WRON, European WISE, H2OML) • Carry CUAHSI WaterML messages over O&M, as O&M profile

• Towards WaterML and web services 1.1

Page 29: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

SDSC Spatial Information Systems Lab

Research and system development• Services-based spatial information integration

infrastructure• Mediation services for spatial data, query

processing, map assembly services• Long-term spatial data preservation• Spatial data standards and technologies for

online mapping (SVG, WMS/WFS)• Support of spatial data projects at SDSC and

beyond

Mediator

LegendGenerator

MapAssembler

Ontology

GRID SERVICESFOR MAP INTEGRATION

Mediator

LegendGenerator

MapAssembler

Ontology

GRID SERVICESFOR MAP INTEGRATION

services

In Geosciences (GEON, CUAHSI, CBEO,…)

Spatial web services

FederalAgencies

Figure 1.26 The Geography Network.

ESRICounty spatial data and toxicant information

Telesis, other localNon-profits

CA state

WSDL

WSWSDL

WSWSDL

WSWSDL

WSWSDL

WSWSDL

WS

Student projects

The CHI ME Model

In regional development (NIEHS SBRP, Katrina)

In Neurosciences (BIRN, CCDB)

http://scirad.sdsc.edu/datatech/si.html

Contact: [email protected]

Page 30: Web Services and Water Markup Language for Distributed Hydrologic Data Access Ilya Zaslavsky San Diego Supercomputer Center, UCSD CUAHSI = Consortium of

Links and Acknowledgments

• The CUAHSI HIS project:• http://www.cuahsi.org/his/ (main site)• http://water.sdsc.edu (central development server)

• Many thanks to Microsoft Research for partly sponsoring this trip