THREDDS Data Server Unidata’s Common Data Model Background / Summary

Preview:

DESCRIPTION

THREDDS Data Server Unidata’s Common Data Model Background / Summary. John Caron Unidata/UCAR Mar 2007. THREDDS Data Server. HTTP Tomcat Server. catalog.xml. Application. THREDDS Server. WCS. OPeNDAP. HTTPServer. NetcdfSubset. NetCDF-Java library. motherlode.ucar.edu. - PowerPoint PPT Presentation

Citation preview

THREDDS Data ServerUnidata’s Common Data Model

Background / Summary

John CaronUnidata/UCAR

Mar 2007

HTTP Tomcat Server

THREDDS Data Server

Datasets

catalog.xml

motherlode.ucar.edu

THREDDS Server Application

NetCDF-Javalibrary

IDD Data

•HTTPServer

•NetcdfSubset

•WCS•OPeNDAP

THREDDS Catalogs• XML over HTTP• Hierarchical listing of online resources (datasets)• Container for arbitrary search metadata

– Standard set maps to DC, GCMD, ADN – Unidata/CDP

• Metadata can be inherited• Design goal: Make it easy for data providers• TDS uses for configuration

– Client view vs. server view• Data Access URLS

– “Crossing the protocol boundary”

catalog.xml

Motherlode catalog example

THREDDS WCS 1.0 Server

• Each (gridded) Dataset is WCS• Each Grid is a Coverage • Return formats

– GeoTIFF: floating point, greyscale– NetCDF / CF-1.0 (same as NetcdfSubset Service)

• No reprojections, resampling• GALEON 2

– upgrade to WCS 1.1– Try returning point datasets

THREDDS OPeNDAP Server

• Current version 2.0; NASA ESE standard– Working on new 4.0 protocol spec

• Based on Java-OPeNDAP library – shared development by Unidata/opendap.org

• Any CDM dataset can be served• Server4 (Hyrax):

– latest version of opendap.org C++ library – uses THREDDS catalog generation code– THREDDS Catalogs replace dods_dir

HTTP Tomcat Server

Common Data Model

catalog.xml

hostname.edu

THREDDS Server Application

NetCDF-Javalibrary

IDD Data

•HTTPServer

•NetcdfSubset

•WCS•OPeNDAP

Then a miracle

happens

Datasets

NetcdfDataset

ApplicationScientific Datatypes

NetCDF-Java version 2.2 architecture

OPeNDAP

THREDDS

Catalog.xml NetCDF-3

HDF5

I/O service provider

GRIB

GINI

NIDS

NetcdfFile

NetCDF-4

…Nexrad

DMSP

CoordSystem Builder

Datatype Adapter

ADDE

NcML

I/O Service Provider Implementations

• General: NetCDF, HDF5, OPeNDAP• Gridded: GRIB-1, GRIB-2 • Radar: NEXRAD level 2 and 3, DORADE,

Chinese NEXRAD• Point: BUFR, ASCII• Satellite: DMSP, GINI, McIDAS AREA• In development / tentative

– NOAA CLASS legacy files– Barrowdale DataBlade

Coordinate Systems

Common Data Model Layers

Data Access

Scientific Datatypes

Grid

Point

Radial

Trajectory

Swath

Station Profile

NetCDF-4 andCommon Data Model(Data Access Layer)

NetCDF-4 C library

• 4.0 Beta implements CDM access layer– complete, but waiting for HDF5 release 1.8 to

finalize file format (Maybe this month, 1.5 years late!)

– Persistence format for complete CDM• 4.1: adding Coordinate Systems

– Optional layer, focus on CF-1 (libcf)• 4.?: merge OPeNDAP access (pending

funding)

Coordinate Systems UML

NcML: NetCDF Markup Language

XML representation of netCDF metadata• Core: netCDF data access model• Coordinate System: general and

georeferencing coordinate system• Dataset: redefine, aggregate, subset

Luca Cinquini (NCAR/SCD/ESG), John Caron, Ethan Davis, Bob Drach (LLNL), Stefano Nativi (Florence), Russ Rew

NcML

• NcML Coordinate Systems further developed into NcML-G by Stefano et al.

• NcML Core and Dataset combined into single schema to allow dataset modification

• Aggregation:– Union– Syntactic join on (existing or new) outer dimension– Semantic aggregation of (runtime, forecast time) =

Forecast Model Run Collection

<?xml version="1.0" encoding="UTF-8"?>

<netcdf xmlns="http://www.unidata.ucar.edu/schemas/netcdf/ncml-2.2" location=“/data/nids/N0R_20041119_2147">

<attribute name=“cdm_datatype" value=“Radial" /> <remove type=“attribute” name=“password" /> <variable name="Reflectivity" orgName=“R34768”> <attribute name="units" value=“dBZ" /> </variable>

</netcdf>

NcML example

TDS / NcML example<datasetScan name="Ocean Satellite Data" path="ocean/sat"

dirLocation="R:/tds/netcdf/">

<netcdf> <attribute name="Conventions" value="CF-1.0"/> </netcdf>

</datasetScan>

TDS / NcML aggregation<dataset name="WEST-CONUS_4km Aggregation"

urlPath="satellite/3.9/WEST-CONUS_4km">

<netcdf > <aggregation dimName="time" type="joinNew"> <scan location="/data/ldm/pub/satellite/3.9/WEST-CONUS_4km/"

suffix=".gini" /> </aggregation> </netcdf>

</dataset>

Datasets vs. Files

• Must hide actual location of data files on your server

• Would like to hide actual file format• Must encapsulate collections of files into

logical datasets– Homogenous metadata – Hide arbitrary storage decisions– Minimize number of datasets

Forecast Model Run Collection (FMRC)

Data Model: Sampled Functions

Our phenomena are continuous functions: F: Domain → Range

where Domain = subset of space-time (3 spatial, time) (Ε4) Range = Rn (product set of real numbers)

Our measurements are sampled functions Domain is a point subset = {p, p є Ε4}

M: E4 → Rn

Variables

Variable is a container for an Array of valuesdimensions lat = 64; lon = 128;variables: float temperature( lat, lon);

Domain is a set of points in Index space:Temperature : {[0..63] x [0..127]} → RTemperature : I2 → RVariable : Im → Rn

Coordinate Systems

Coordinate Axis : Im → R{Axis} = Coordinate System : Im → E4

V: Im → Rn

CS: Im → E4 V ° CS-1 : E4 → Rn

Scientific Data Types

• Trying to go beyond index-space subsetting• Trying to satisfy V ° CS-1 : E4 → Rn

– I.e. support subsetting using Space, Time “queries”• Based on datasets Unidata is familiar with

– APIs are evolving• Intended to scale to large, multifile collections• Corresponding “standard” NetCDF file format

conventions

Implementations

Datatype• Grid• PointObs• RadialSweep• Swath

Dataset• GridDataset• FMRCDataset• CollectionOfPointObs• StationCollectionOfPointObs• StationCollectionOfRadialSweep

Conclusions

• CDM is our implementation data model• Map to data access models such as OGC• Current work is to serve collections

instead of individual files.• Dataset is desired level of granularity• Scientific data types are implementations

with specialized access

Datatype Collection

• GridDataset collection of GridDatatype

NetcdfDataset

ApplicationScientific Datatypes

NetCDF-Java version 2.2 architecture

OPeNDAP

THREDDS

Catalog.xml NetCDF-3

HDF5

I/O service provider

GRIB

GINI

NIDS

NetcdfFile

NetCDF-4

…Nexrad

DMSP

CoordSystem Builder

Datatype Adapter

ADDE

NcML

Gridded Datatype

float gridData(t,z,y,x); float time(t); float y(y); float x(x); float lat(y,x); float lon(y,x); float z(z); float height(t,z,y,x);

• Cartesian coordinates• All dimensions are connected• horizontal: lat,lon or projection x,y • time(time) orthogonal 1D• seperable: (x, y) X time X z

GridDatatype methodsCoordinateAxis getTaxis();CoordinateAxis getXaxis();CoordinateAxis getYaxis();CoordinateAxis getZaxis();Projection getProjection();

int[] findXYindexFromCoord( double x_coord, double y_coord);

LatLonRect getLatLonBoundingBox();

Array getDataSlice (Range[] …) GridDatatype makeSubset (Range[] …)

Radial Data

radialData(radial, gate) : distance(gate) azimuth(radial) elevation(radial) time(radial)

• Polar coordinates• All dimensions are connected• Not separate time dimension

Swath

swathData(line,cell) lat(line,cell) lon(line,cell) time(line) z(line,cell) ??

• lat/lon coordinates• not separate time dimension• all dimensions are connected

Unstructured Grid

float unstructGrid(t,z,pt); float lat(pt); float lon(pt); float time(t); float height(z);

• Pt dimension not connected• Looks the same as point data• Need to specify the connectivity explicitly

Point Observation Data

Structure { lat, lon, z, time; v1, v2, ... } obs( pt);

• Set of measurements at the same point in space and time• Point dimension not connected

float obs1(pt);float obs2(pt); float lat(pt); float lon(pt); float z(pt); float time(pt);

PointObsDataset Methods

// Iterator<StructureData>Iterator getData( LatLonRect boundingBox, Date start, Date end);

Time series Station Data

Structure { name; lat, lon, z; Structure{ time; v1, v2, ... } obs(*); // connected } stn(stn); // not connected

StationObs Methods

// List<Station>List getStations( LatLonRect boundingBox);

// Iterator<StructureData>Iterator getData( Station s, Date start, Date end);

Structure { name; Structure { lat, lon, z, time; v1, v2, ... } obs(*); // connected } traj(traj) // not connected

Trajectory Data

Structure { lat, lon, z, time; v1, v2, ... } obs(pt); // connected

• pt dimension is connected• Collection dimension not connected

Profiler/Sounding Station Data Structure { name; lat, lon, time; Structure { z; v1, v2, ... } obs(*); // connected } loc(nloc); // not connected

Structure { name; lat, lon; Structure { time, Structure { z; v1, v2, ... } obs(*); // connected } time(*); // connected } stn(stn); // not connected

Data Types Summary

• Data access through a standard API• Convenient georeferencing• Specialized subsetting methods

– Efficiency for large datasets

File Format#N

File Format#2

File Format#1

CDMVisualization

&Analysis

PayoffN + M instead of N * M things on your TODO List!

NetCDF file

OpenDAP Server

WCS Service

Web Service

Next: DataType Aggregation

• Work at the CDM DataType level, know (some) data semantics

• Forecast Model Collection– Combine multiple model forecasts into single

dataset with two time dimensions– With NOAA/IOOS (Steve Hankin)

• Point/Station/Trajectory/Profile Data – Allow space/time queries, return nested sequences– Start from / standardize “Dapper conventions”

Forecast

Model

Collections

Coordinate Systems: implicit/explicit

• NetCDF, OPeNDAP, HDF data models do not have explicit coordinate systems– so georeferencing not part of API– Need conventions to specify (eg CF-1,

COARDS, etc) • GRIB, HDF-EOS (eg) are explicit

– But no uniform API

47

NetCDF-4

C

Library

HDF5 Library

netCDF-4 Library

netCDF-3Interface

NetCDF-4 C Library

Conclusion

• Standardized Data Access in good shape– HDF5, NetCDF, OPeNDAP– Write an IOSP for proprietary formats (Java)

• But that’s not good enough!• To do:

– Standard representations of coordinate systems

– Classifications of data types, standard services for them

Recommended