22
Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group

Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group

Embed Size (px)

Citation preview

Best Practices to Promote Data Interoperability

Chris LynnesJoe Glassy

Technology Infusion Working Group

Outline

• Data interoperability: what and why?• Factors affecting data interoperability• Implementations that support interoperability

What is Data Interoperability?

Data interoperability exists when a data user is able to work with (view, analyze, process, etc.) a data provider's science data or model output “transparently,” without having to reformat the data, write special tools to read or extract the data, or rely on specific proprietary software.

Quicker data usability, easier portability, more transparency – S. Volz

Illustration: Panoply

DATASET COMPARISON•North American Reanalysis

from NCDC•Atmospheric Infrared

Sounder (AIRS) from GES DISC

PROCEDURE1.Cut and paste NARR

OPeNDAP URL2.Double-click variable to

display3.Repeat for AIRS

What good is data interoperability?

• Makes it easier to write tools that work with many datasets...

• ...Which increases the ability to work with multiple datasets together...

• ...And promotes user-satisfaction and early experiences with ( {your|my|our} data)...

• ...Which enhances a dataset’s life-cycle economics.

FACTORS AFFECTING DATA INTEROPERABILITY

There is no single path to interoperability…

File Formats

• Standard formats– More economical to develop general tools– Format is well documented– APIs* exist– Many datasets enabled by one set of code

modules• “Self-describing” formats– Contain embedded metadata to interpret the

content, context, and/or structure of the file

*Application Programming Interfaces

File Structures

• Coordinates: where and named how?– Latitude, longitude– Vertical dimension: altitude, pressure, sigma

level, depth, ...– Time

• Flat vs. hierarchical• Simple vs. complex

Usage Metadata

• Inside file vs. separate file– Easy for users to lose a separate file– A key benefit of self-describing formats

• Variable-level metadata– Units– Fill Value– Scale / offset

• File-level metadata• Standards (e.g., CF-1, HDF-EOS, ISO 19115)

Grids

• Common grids enable dataset comparison, merging, etc.

• Reprojection from one grid to another usually loses information

• Tradeoff– Most appropriate grid for a dataset vs....– ...most commonly used grid in the “community”– Keep in mind that the potential community may

be much broader than you think

Names and Units

• Variable names– Standard names (CF-1)– Unique names within file

• Some tools have difficulty with hierarchies having variables with the same name in different branches

• Dimension / coordinate names– Latitude, longitude, time, altitude/pressure

• Unit names– Standard units– Unit conversion

• Note that altitude <-> pressure requires additional information• Filenames– Descriptive filenames: dataset, version, data date/time…

Sidebar: Data Identifiers• Filenames, even descriptive ones, may not be

completely reliable as unique identifiers• Identifiers are ideally embedded within the data file• Uniquely identifying datasets and data files helps:– Catalog interoperability– Transparency / provenance– Citation metrics

• See Ruth Duerr’s talk on recommendations for unique identifiers for datasets and granules

• Future tools may make use of these embedded identifiers: look up references, get related data...

IMPLEMENTATIONS OF DATA INTEROPERABILITY

CF-1• Climate-Forecast convention– Popular in modeling community– Extending to point and satellite data

• Coordinate system: Key for tool usage– Latitude + longitude

• Specifications for both regular L3 grids and L2 swaths– Time, vertical– Recognizable via units (e.g. “degrees_north”)

• Standard variable names: Key for model incorporation• Most often associated with netCDF– Also applicable in OPeNDAP– Work is underway to apply to HDF5

OPeNDAP• Open-Source Project for a Network Data Access

Protocol• Client-Server framework– Standard web (GET) request syntax

• Remote fine-grained access to data files• Presents a standard data model and “format” to

clients• Supports multiple formats on the back end– HDF, netCDF, ASCII, GRIB, binary

• Multiple server implementations– Hyrax, THREDDS, ERDDAP, GDS, Dapper, PyDAP, TSDS...

• Client support in many tools– IDV, McIDAS-V, GrADS, Matlab, IDL, Ferret, Panoply

Web Coverage Service

• Client-Server framework– Open Geospatial Consortium protocol– Standard web (GET) request syntax

• Multiple response formats, including GeoTIFF, netCDF/CF-1 and HDF-EOS

• Includes spatial subsetting• BUT:– Client support is still nascent outside GIS community– Some datatypes are difficult or impossible to fit into

WCS (e.g., limb-scanning profiles)

Semantic Web

• Enables machine recognition of:– names– relationships

• Effective for:– Metadata– Small ASCII data

• Use of semantic web to make Earth Science data interoperable is still in its experimental phase

Data Tools for Use with Interoperable Data

• Panoply– http://www.giss.nasa.gov/tools/panoply/

• IDV– http://www.unidata.ucar.edu/software/idv/

• McIDAS-V– http://www.ssec.wisc.edu/mcidas/software/v/

• GrADS– http://www.iges.org/grads/

• Ferret– http://ferret.wrc.noaa.gov/Ferret/

Summary

• Data users benefit from data interoperability– More tools available to handle more datasets

• Consider format, structure, grids, metadata and naming

• If interoperability cannot be built in at data production, some tools (OPeNDAP, WCS, semantic web) can compensate...

• ...IF the metadata and information content of the data are sufficient

BACKUP SLIDES

References• Practical Data Interoperability for Earth Scientists

http://www.esdswg.org/techinfusion/downloads/pdies/view• Recommendations for Data Level Interoperability

http://tiwg.wik.is/Interoperability/Interoperability_Recommendations• HDF

http://www.hdfgroup.org/• HDF-EOS

http://hdfeos.org/• netCDF

http://www.unidata.ucar.edu/software/netcdf/• OPeNDAP:

http://www.opendap.org• CF-1

http://cf-pcmdi.llnl.gov/• Web Coverage Service

http://en.wikipedia.org/wiki/Web_Coverage_Service

OPeNDAP URL examples• Get metadata in XML

http://airspar1u.ecs.nasa.gov/opendap/Aqua_AIRS_Level2/AIRX2RET.005/2010/285/AIRS.2010.10.12.090.L2.RetStd.v5.2.2.0.G10286064818.hdf.ddx

• Get data slice in ASCII:http://airspar1u.ecs.nasa.gov/opendap/Aqua_AIRS_Level2/AIRX2RET.005/2010/285/AIRS.2010.10.12.090.L2.RetStd.v5.2.2.0.G10286064818.hdf.ascii?H2OMMRStd[0:1:44][0:1:29][4:1:5]

• Data access URL for clients (IDV, Panoply):http://airspar1u.ecs.nasa.gov/opendap/Aqua_AIRS_Level2/AIRX2RET.005/2010/285/AIRS.2010.10.12.090.L2.RetStd.v5.2.2.0.G10286064818.hdf