Upload
leo-hunnings
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Unidata Seminar Series - 30 January 2004
OPeNDAP and THREDDS:Access and Discovery of Distributed Scientific Data
Yuan Ho
Ethan Davis
UCAR Unidata
Unidata Seminar Series - 30 January 2004
Access and Discovery of Distributed Scientific Data
• OPeNDAP – access to scientific data but no standard inventory or discovery mechanisms
• THREDDS – cataloging, describing, and discovery of scientific data
Unidata Seminar Series - 30 January 2004
What is OPeNDAP• OPeNDAP (Open source Project for a Network Data Access Protocol) is a
protocol for accessing distributed scientific data (aka DODS DAP).
• OPeNDAP is a generic data exchange mechanism that lies at the core of a variety of discipline data system.
• OPeNDAP is two reference implementations of the protocol (C++ and Java)
• OPeNDAP is a software framework that simplifies all aspects of scientific data networking, allowing simple access to remote data.
• OPeNDAP is a community of users and developers
• OPeNDAP is a non-profit corporation called OPeNDAP Inc..
Unidata Seminar Series - 30 January 2004
Design Principles
• The user should be able to share their data via OPeNDAP over network (server).
• The user should be able to use their application package to examine or analyze the data of interest (client).
Unidata Seminar Series - 30 January 2004
Client/Server Interaction• Data access (client)
– Access to remote data in users normal application
• IDL (win32)• Matlab• Ferret• GrADS• Any netCDF application• Excel
– Don’t need to know the data format in which the data is stored
– Can access data subsets.
• Data publishing (server)– Network interface via http
– DAP provides common/network representation for data
– Can serve data in various formats• netCDF• HDF• SQL• FreeForm• JGOFS• DSP
– Allows subsetting of data
Unidata Seminar Series - 30 January 2004
OPeNDAP Status• OPeNDAP/DODS 3.4 release
• OPeNDAP Java 1.1.3
• OPeNADP Data Connector 2.3X
• OPeNDAP DAP Specification 4.0
Unidata Seminar Series - 30 January 2004
OPeNDAP Data Object• Three important OPeNDAP data objects:
– DDX• The DDX is an XML representation of the structure of all or part of a data
set, as well as a description of the variables within that datasets.– Blob
• Binary data transfer from the data source to the client. The Blob contains the serialized data represented by the DDX.
– ErrorX• The ErrorX object is an XML document containing information about any
errors that may have been encountered by the server while processing a request.
Unidata Seminar Series - 30 January 2004
DDX Example• DDX Example
<Datasets name=“fnoc1.nc”
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
xmlns=http://www.opendap.org/ns/OPeNDAP
xsi:schemaLocation=“http://www.opendap.org/ns/OPeNDAP
http://dods.coas.oregonstate.edu:8080/opendap/opendap.xsd”>
<Attribute name=“Description” type=“String”>
<value>Fleet Numerical Wind Data</value>
</Attribute>
<Array name=“u”>
<Attribute name=“long_name” type=“String”>
<value>U_Wind_Vector</value>
</Attribute>
<Float32/>
<dimension size=“16” name=“latitude”>
<dimension size=“17” name=“longitude”>
<dimension size=“21” name=“time”>
</Array>
<Blob URL=“http://dcz.opendap.org/dap/data/nc/fnoc1.nc?u”/>
</Dataset>
Unidata Seminar Series - 30 January 2004
Variables and Attributes• Each variable consists of a name, a type, a value and a collection of Attributes.
– Atomic variables: atomic data types are indivisible.• integer, floating-point, string, and binary images.• Example
<Float64 name=“Depth”/>
<Binary name=“sound_sample” size=“17623”/>
– Constructor variables: a constructor variable is assembled from collections of other variables, including both atomic and constructor types.
• array, structure, grid, and sequence.• Example
<Array name=“temp”>
<Byte/>
<dimension size=“5” name=“lon”/>
<dimension size=“3” name=“lat”/>
</Array>
Unidata Seminar Series - 30 January 2004
Variables and Attributes• An attributes is composed of a name, a type, and a value.
– Each variable may have zero or more attributes.
– Types: Boolean, Byte, IntXX, UIntXX, FloatXX, String, URL.
– Example
<Dataset name=“test”>
<Structure name=“measurement”>
<Attribute name=“data” type=“String”>
<value> 18 Mar 03</value>
</Attribute>
<Attribute name=“other” type=“Structure”>
<Attribute name=“satellite_name” type=“String”>
<value>GOES</value>
<Attribute name=“experiment number” type=“int32”>
<value>898976</value>
</Attribute>
</Attribute>
<Float64 name=“value”>
<Array name=“time_series”>
<dimension size=“32”>
</Array>
</Structure>
</Dataset>
Unidata Seminar Series - 30 January 2004
Requests/Responses
Responses: four categories of information pass from the server to client
– Information about the data: DDX– The data: Blob– Error messages: ErrorX object– Information about the server: version messages and server capabilities document
Requests: a constraint expression provides a way for client to request certain information from a dataset, such certain variables, or parts of certain variables.
– Projection clause: a collection of one or more project elements– Selection clause: one or more select elements.– Example:
<Constraint>
<Project variable=“/sample/temp”/>
<Project variable=“/sample/salt”/>
<Select condition=“/sample/salt>34.0” target=“sample”/>
</Constraint>
Unidata Seminar Series - 30 January 2004
Problems of searching and retrieving datasets from OPeNDAP server
• Metadata– Use metadata: metadata at the data level
– Search metadata: metadata at the directory level
• OPeNDAP has been built from data level, high functionality at the data acquisition level.
• OPeNDAP AIS (ancillary information service) adding metadata information into OPeNDAP data stream. The role of ancillary data is to translate and access of data
• ODC is more a directory services with limit data searching functionality.
Unidata Seminar Series - 30 January 2004
Summary of OPeNDAP• OPeNDAP data delivery architecture provides remote access of data via internat.
• OPeNDAP uses HTTP (FTP, GridFTP, Telnet, et cetera) to transport its data object.
• OPeNDAP has proved very versatile.
• XML for the persistent form of the data objects.
• OPeNDAP is a data access tool, need a data discovery tool to complement each other.
Unidata Seminar Series - 30 January 2004
THREDDS Project
• Develop a framework to bridge the gap between data providers and data users, to make scientific data discoverable and usable as well as referencable from scientific publications and educational materials.
• The framework should be:– Scalable for large and small projects– Easy to use yet powerful and flexible– Capable of supporting various user interfaces
Unidata Seminar Series - 30 January 2004
THREDDS Catalogs
• Hierarchal structure of datasets
• Dataset access methods• Structure on which to hang
(reference) metadata
1
0..*
0..*0..*
0..*
THREDDS catalogs are for communicating information about a collection of datasets
Unidata Seminar Series - 30 January 2004
THREDDS Catalogs
• Hierarchal structure of datasets
• Dataset access methods• Structure on which to hang
(reference) metadata
1
0..*
0..*0..*
0..*
THREDDS catalogs are for communicating information about a collection of datasets
Unidata Seminar Series - 30 January 2004
THREDDS Catalogs
<catalog version="0.6">
<dataset name="Unidata IDD Model Data">
<dataset name="NCEP Eta 80km CONUS model data">
<metadata metadataType="DublinCore"
xlink:href="http://server/dods/eta.xml" />
<dataset name="NCEP Eta 80km CONUS 2003-09-24 12Z">
<access serviceType="DODS"
urlPath="http://server/dods/2003092412_eta.nc" />
</dataset>
…
Unidata Seminar Series - 30 January 2004
THREDDS Catalogs
• Hierarchal structure of datasets
• Dataset access methods• Structure on which to hang
(reference) metadata
1
0..*
0..*0..*
0..*
THREDDS catalogs are for communicating information about a collection of datasets
Unidata Seminar Series - 30 January 2004
THREDDS Catalogs
<catalog version="0.6">
<dataset name="Unidata IDD Model Data">
<dataset name="NCEP Eta 80km CONUS model data">
<metadata metadataType="DublinCore"
xlink:href="http://server/dods/eta.xml" />
<dataset name="NCEP Eta 80km CONUS 2003-09-24 12Z">
<access serviceType="DODS"
urlPath="http://server/dods/2003092412_eta.nc" />
</dataset>
…
Unidata Seminar Series - 30 January 2004
THREDDS Catalogs
• Hierarchal structure of datasets
• Dataset access methods• Structure on which to
hang (reference) metadata
1
0..*
0..*0..*
0..*
THREDDS catalogs are for communicating information about a collection of datasets
Unidata Seminar Series - 30 January 2004
THREDDS Catalogs
<catalog version="0.6">
<dataset name="Unidata IDD Model Data">
<dataset name="NCEP Eta 80km CONUS model data">
<metadata metadataType="DublinCore"
xlink:href="http://server/dods/eta.xml" />
<dataset name="NCEP Eta 80km CONUS 2003-09-24 12Z">
<access serviceType="DODS"
urlPath="http://server/dods/2003092412_eta.nc" />
</dataset>
…
Unidata Seminar Series - 30 January 2004
THREDDS Catalogs
<dc:title>NCEP Eta 80km CONUS model data</dc:title>
<dc:creator>NOAA/NCEP</dc:creator>
<dc:subject>NCEP Eta Model data; Real-time data</dc:subject>
<dc:description>
This collection of real-time NOAA/NCEP Eta model data contains five
days worth of data. The data is on a 80km CONUS grid (GRIB grid
211). Daily 00Z and 12Z runs are available where each dataset
includes analysis data and forecast data from a single Eta run. Each
dataset contains forecasts for every 6 hours going out two and a half
days (60hrs) from the run time.
</dc:description>
…
Unidata Seminar Series - 30 January 2004
THREDDS Catalogs
• Hierarchal structure of datasets
• Dataset access methods• Structure on which to hang
(reference) metadata
1
0..*
0..*0..*
0..*
THREDDS catalogs are for communicating information about a collection of datasets
Unidata Seminar Series - 30 January 2004
THREDDS DQC(Dataset Query Capabilities)
• THREDDS DQC documents describe how a subset of a data collection can be requested.– Large and time varying data collections are cumbersome to
view as a hierarchical structure
• THREDDS DQC documents describes the set of requests that can be made to one or more DQC services and the form of those requests.
• THREDDS DQC documents are an abstract representation of a collection of datasets
Unidata Seminar Series - 30 January 2004
THREDDS DQCSubsetting Large Collections
Unidata Seminar Series - 30 January 2004
THREDDS DQC<?xml version="1.0" encoding="UTF-8"?><queryCapability name="Unidata IDD NEXRAD Level 3 Radar Data" version="0.2"> <query base="http://motherlode.ucar.edu/cgi-bin/thredds/RadarServer.pl" construct="append" returns="catalog"/> <selectStation id="station" title="Stations:" multiple="true" required="true"> <station name="ANCHORAGE/Bethel AK" value="ABC"> <location latitude="60.78" longitude="-161.87"/> </station> …
</selectStation> <selectList id="product" title="Products:" multiple="true" required="true"> <choice name=".5 reflectivity .54nm res" value="N0R" description=".5 reflectivity .54nm res 16 levels id 19/r"/> … </selectList> <selectList id="time" title="Times:" required="true"> <choice name="Latest" value="latest"/> … </selectList></queryCapability>
Unidata Seminar Series - 30 January 2004
THREDDS Services
• THREDDS catalogs are sources of information about a collection of data on top of which complex services can be built. For instance, tools that:– Provide interoperability with GIS systems– Supply external discovery systems with
needed information (e.g., Dublin Core, DIF, FGDC)
– Supply information to improve data display and analysis, e.g., geolocation information
Unidata Seminar Series - 30 January 2004
THREDDS and Discovery Systems
• To supply external discovery services with the information they require, we need:– The proper information added to a catalog, e.g.,
title and description of a dataset, spatial and temporal ranges, parameters, dataset ID.
– Service to provide metadata in desired encoding– Service to feed information to discovery system
• Use discovery systems to search for data
Unidata Seminar Series - 30 January 2004
THREDDS and Discovery Systems
Data server
Communicate with Discovery Systems
MetadataRepository
MetadataHarvester
Reads
References
DiscoverySystem(e.g., DLESE)
THREDDS Serviceswith data server
WritesCatalog
Searches
Dublin CoreGenerator
Unidata Seminar Series - 30 January 2004
Search and Discovery Services
Unidata Seminar Series - 30 January 2004
THREDDS Status
• Working on new versions of the catalog and DQC schemas
• Working on updating existing tools to use new schemas
• Working with UCAR DMWG and NCAR CDP on enhancing descriptive metadata
• Working with OPeNDAP developers on integrating THREDDS and OPeNDAP
Unidata Seminar Series - 30 January 2004
OPeNDAP and THREDDS
• Enhance OPeNDAP C++ implementation to serve THREDDS catalogs
• THREDDS DQC replace OPeNDAP File Servers
Unidata Seminar Series - 30 January 2004
OPeNDAP and THREDDSMore Information
• OPeNDAP Web page: http://www.unidata.ucar.edu/packages/dods/
• OPeNDAP Email list: [email protected], subscribe at http://www.unidata.ucar.edu/packages/dods/home/mailLists/
• THREDDS Email list: [email protected], subscribe at http://www.unidata.ucar.edu/projects/THREDDS/maillists/
• THREDDS Web page: http://www.unidata.ucar.edu/projects/THREDDS/
• Support questions: [email protected]