33
Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Sc ientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Embed Size (px)

Citation preview

Page 1: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

OPeNDAP and THREDDS:Access and Discovery of Distributed Scientific Data

Yuan Ho

Ethan Davis

UCAR Unidata

Page 2: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

Access and Discovery of Distributed Scientific Data

• OPeNDAP – access to scientific data but no standard inventory or discovery mechanisms

• THREDDS – cataloging, describing, and discovery of scientific data

Page 3: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

What is OPeNDAP• OPeNDAP (Open source Project for a Network Data Access Protocol) is a

protocol for accessing distributed scientific data (aka DODS DAP).

• OPeNDAP is a generic data exchange mechanism that lies at the core of a variety of discipline data system.

• OPeNDAP is two reference implementations of the protocol (C++ and Java)

• OPeNDAP is a software framework that simplifies all aspects of scientific data networking, allowing simple access to remote data.

• OPeNDAP is a community of users and developers

• OPeNDAP is a non-profit corporation called OPeNDAP Inc..

Page 4: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

Design Principles

• The user should be able to share their data via OPeNDAP over network (server).

• The user should be able to use their application package to examine or analyze the data of interest (client).

Page 5: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

Client/Server Interaction• Data access (client)

– Access to remote data in users normal application

• IDL (win32)• Matlab• Ferret• GrADS• Any netCDF application• Excel

– Don’t need to know the data format in which the data is stored

– Can access data subsets.

• Data publishing (server)– Network interface via http

– DAP provides common/network representation for data

– Can serve data in various formats• netCDF• HDF• SQL• FreeForm• JGOFS• DSP

– Allows subsetting of data

Page 6: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

OPeNDAP Status• OPeNDAP/DODS 3.4 release

• OPeNDAP Java 1.1.3

• OPeNADP Data Connector 2.3X

• OPeNDAP DAP Specification 4.0

Page 7: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

OPeNDAP Data Object• Three important OPeNDAP data objects:

– DDX• The DDX is an XML representation of the structure of all or part of a data

set, as well as a description of the variables within that datasets.– Blob

• Binary data transfer from the data source to the client. The Blob contains the serialized data represented by the DDX.

– ErrorX• The ErrorX object is an XML document containing information about any

errors that may have been encountered by the server while processing a request.

Page 8: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

DDX Example• DDX Example

<Datasets name=“fnoc1.nc”

xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance

xmlns=http://www.opendap.org/ns/OPeNDAP

xsi:schemaLocation=“http://www.opendap.org/ns/OPeNDAP

http://dods.coas.oregonstate.edu:8080/opendap/opendap.xsd”>

<Attribute name=“Description” type=“String”>

<value>Fleet Numerical Wind Data</value>

</Attribute>

<Array name=“u”>

<Attribute name=“long_name” type=“String”>

<value>U_Wind_Vector</value>

</Attribute>

<Float32/>

<dimension size=“16” name=“latitude”>

<dimension size=“17” name=“longitude”>

<dimension size=“21” name=“time”>

</Array>

<Blob URL=“http://dcz.opendap.org/dap/data/nc/fnoc1.nc?u”/>

</Dataset>

Page 9: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

Variables and Attributes• Each variable consists of a name, a type, a value and a collection of Attributes.

– Atomic variables: atomic data types are indivisible.• integer, floating-point, string, and binary images.• Example

<Float64 name=“Depth”/>

<Binary name=“sound_sample” size=“17623”/>

– Constructor variables: a constructor variable is assembled from collections of other variables, including both atomic and constructor types.

• array, structure, grid, and sequence.• Example

<Array name=“temp”>

<Byte/>

<dimension size=“5” name=“lon”/>

<dimension size=“3” name=“lat”/>

</Array>

Page 10: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

Variables and Attributes• An attributes is composed of a name, a type, and a value.

– Each variable may have zero or more attributes.

– Types: Boolean, Byte, IntXX, UIntXX, FloatXX, String, URL.

– Example

<Dataset name=“test”>

<Structure name=“measurement”>

<Attribute name=“data” type=“String”>

<value> 18 Mar 03</value>

</Attribute>

<Attribute name=“other” type=“Structure”>

<Attribute name=“satellite_name” type=“String”>

<value>GOES</value>

<Attribute name=“experiment number” type=“int32”>

<value>898976</value>

</Attribute>

</Attribute>

<Float64 name=“value”>

<Array name=“time_series”>

<dimension size=“32”>

</Array>

</Structure>

</Dataset>

Page 11: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

Requests/Responses

Responses: four categories of information pass from the server to client

– Information about the data: DDX– The data: Blob– Error messages: ErrorX object– Information about the server: version messages and server capabilities document

Requests: a constraint expression provides a way for client to request certain information from a dataset, such certain variables, or parts of certain variables.

– Projection clause: a collection of one or more project elements– Selection clause: one or more select elements.– Example:

<Constraint>

<Project variable=“/sample/temp”/>

<Project variable=“/sample/salt”/>

<Select condition=“/sample/salt>34.0” target=“sample”/>

</Constraint>

Page 12: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

Problems of searching and retrieving datasets from OPeNDAP server

• Metadata– Use metadata: metadata at the data level

– Search metadata: metadata at the directory level

• OPeNDAP has been built from data level, high functionality at the data acquisition level.

• OPeNDAP AIS (ancillary information service) adding metadata information into OPeNDAP data stream. The role of ancillary data is to translate and access of data

• ODC is more a directory services with limit data searching functionality.

Page 13: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

Summary of OPeNDAP• OPeNDAP data delivery architecture provides remote access of data via internat.

• OPeNDAP uses HTTP (FTP, GridFTP, Telnet, et cetera) to transport its data object.

• OPeNDAP has proved very versatile.

• XML for the persistent form of the data objects.

• OPeNDAP is a data access tool, need a data discovery tool to complement each other.

Page 14: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

THREDDS Project

• Develop a framework to bridge the gap between data providers and data users, to make scientific data discoverable and usable as well as referencable from scientific publications and educational materials.

• The framework should be:– Scalable for large and small projects– Easy to use yet powerful and flexible– Capable of supporting various user interfaces

Page 15: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

THREDDS Catalogs

• Hierarchal structure of datasets

• Dataset access methods• Structure on which to hang

(reference) metadata

1

0..*

0..*0..*

0..*

THREDDS catalogs are for communicating information about a collection of datasets

Page 16: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

THREDDS Catalogs

• Hierarchal structure of datasets

• Dataset access methods• Structure on which to hang

(reference) metadata

1

0..*

0..*0..*

0..*

THREDDS catalogs are for communicating information about a collection of datasets

Page 17: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

THREDDS Catalogs

<catalog version="0.6">

<dataset name="Unidata IDD Model Data">

<dataset name="NCEP Eta 80km CONUS model data">

<metadata metadataType="DublinCore"

xlink:href="http://server/dods/eta.xml" />

<dataset name="NCEP Eta 80km CONUS 2003-09-24 12Z">

<access serviceType="DODS"

urlPath="http://server/dods/2003092412_eta.nc" />

</dataset>

Page 18: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

THREDDS Catalogs

• Hierarchal structure of datasets

• Dataset access methods• Structure on which to hang

(reference) metadata

1

0..*

0..*0..*

0..*

THREDDS catalogs are for communicating information about a collection of datasets

Page 19: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

THREDDS Catalogs

<catalog version="0.6">

<dataset name="Unidata IDD Model Data">

<dataset name="NCEP Eta 80km CONUS model data">

<metadata metadataType="DublinCore"

xlink:href="http://server/dods/eta.xml" />

<dataset name="NCEP Eta 80km CONUS 2003-09-24 12Z">

<access serviceType="DODS"

urlPath="http://server/dods/2003092412_eta.nc" />

</dataset>

Page 20: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

THREDDS Catalogs

• Hierarchal structure of datasets

• Dataset access methods• Structure on which to

hang (reference) metadata

1

0..*

0..*0..*

0..*

THREDDS catalogs are for communicating information about a collection of datasets

Page 21: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

THREDDS Catalogs

<catalog version="0.6">

<dataset name="Unidata IDD Model Data">

<dataset name="NCEP Eta 80km CONUS model data">

<metadata metadataType="DublinCore"

xlink:href="http://server/dods/eta.xml" />

<dataset name="NCEP Eta 80km CONUS 2003-09-24 12Z">

<access serviceType="DODS"

urlPath="http://server/dods/2003092412_eta.nc" />

</dataset>

Page 22: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

THREDDS Catalogs

<dc:title>NCEP Eta 80km CONUS model data</dc:title>

<dc:creator>NOAA/NCEP</dc:creator>

<dc:subject>NCEP Eta Model data; Real-time data</dc:subject>

<dc:description>

This collection of real-time NOAA/NCEP Eta model data contains five

days worth of data. The data is on a 80km CONUS grid (GRIB grid

211). Daily 00Z and 12Z runs are available where each dataset

includes analysis data and forecast data from a single Eta run. Each

dataset contains forecasts for every 6 hours going out two and a half

days (60hrs) from the run time.

</dc:description>

Page 23: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

THREDDS Catalogs

• Hierarchal structure of datasets

• Dataset access methods• Structure on which to hang

(reference) metadata

1

0..*

0..*0..*

0..*

THREDDS catalogs are for communicating information about a collection of datasets

Page 24: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

THREDDS DQC(Dataset Query Capabilities)

• THREDDS DQC documents describe how a subset of a data collection can be requested.– Large and time varying data collections are cumbersome to

view as a hierarchical structure

• THREDDS DQC documents describes the set of requests that can be made to one or more DQC services and the form of those requests.

• THREDDS DQC documents are an abstract representation of a collection of datasets

Page 25: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

THREDDS DQCSubsetting Large Collections

Page 26: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

THREDDS DQC<?xml version="1.0" encoding="UTF-8"?><queryCapability name="Unidata IDD NEXRAD Level 3 Radar Data" version="0.2"> <query base="http://motherlode.ucar.edu/cgi-bin/thredds/RadarServer.pl" construct="append" returns="catalog"/> <selectStation id="station" title="Stations:" multiple="true" required="true"> <station name="ANCHORAGE/Bethel AK" value="ABC"> <location latitude="60.78" longitude="-161.87"/> </station> …

</selectStation> <selectList id="product" title="Products:" multiple="true" required="true"> <choice name=".5 reflectivity .54nm res" value="N0R" description=".5 reflectivity .54nm res 16 levels id 19/r"/> … </selectList> <selectList id="time" title="Times:" required="true"> <choice name="Latest" value="latest"/> … </selectList></queryCapability>

Page 27: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

THREDDS Services

• THREDDS catalogs are sources of information about a collection of data on top of which complex services can be built. For instance, tools that:– Provide interoperability with GIS systems– Supply external discovery systems with

needed information (e.g., Dublin Core, DIF, FGDC)

– Supply information to improve data display and analysis, e.g., geolocation information

Page 28: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

THREDDS and Discovery Systems

• To supply external discovery services with the information they require, we need:– The proper information added to a catalog, e.g.,

title and description of a dataset, spatial and temporal ranges, parameters, dataset ID.

– Service to provide metadata in desired encoding– Service to feed information to discovery system

• Use discovery systems to search for data

Page 29: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

THREDDS and Discovery Systems

Data server

Communicate with Discovery Systems

MetadataRepository

MetadataHarvester

Reads

References

DiscoverySystem(e.g., DLESE)

THREDDS Serviceswith data server

WritesCatalog

Searches

Dublin CoreGenerator

Page 30: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

Search and Discovery Services

Page 31: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

THREDDS Status

• Working on new versions of the catalog and DQC schemas

• Working on updating existing tools to use new schemas

• Working with UCAR DMWG and NCAR CDP on enhancing descriptive metadata

• Working with OPeNDAP developers on integrating THREDDS and OPeNDAP

Page 32: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

OPeNDAP and THREDDS

• Enhance OPeNDAP C++ implementation to serve THREDDS catalogs

• THREDDS DQC replace OPeNDAP File Servers

Page 33: Unidata Seminar Series - 30 January 2004 OPeNDAP and THREDDS: Access and Discovery of Distributed Scientific Data Yuan Ho Ethan Davis UCAR Unidata

Unidata Seminar Series - 30 January 2004

OPeNDAP and THREDDSMore Information

• OPeNDAP Web page: http://www.unidata.ucar.edu/packages/dods/

• OPeNDAP Email list: [email protected], subscribe at http://www.unidata.ucar.edu/packages/dods/home/mailLists/

• THREDDS Email list: [email protected], subscribe at http://www.unidata.ucar.edu/projects/THREDDS/maillists/

• THREDDS Web page: http://www.unidata.ucar.edu/projects/THREDDS/

• Support questions: [email protected]