23
XML for Science Data Access R. Suresh (NASA/MTECH) ([email protected] ) Kenneth McDonald (NASA/GSFC) ([email protected]) CEOS Joint Sub-Group Meeting, Frascati, Italy

XML for Science Data Access R. Suresh (NASA/MTECH) ([email protected])[email protected] Kenneth McDonald (NASA/GSFC) ([email protected])

Embed Size (px)

Citation preview

Page 1: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

XML for Science Data Access

R. Suresh (NASA/MTECH)([email protected])

Kenneth McDonald (NASA/GSFC)([email protected])

CEOS Joint Sub-Group Meeting, Frascati, Italy

Page 2: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

Introduction

• Earth Science data is exploding in– resolution– complexity– heterogeneity– volume

• Access to data collection is not a mere website• Data Access needs to provide data services across

the user community• XML related technologies can provide building

blocks to improve data access

Page 3: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

XML Technologies

• XML is really a set of closely-related technologies, including– XML: generalized markup– XLink and URI: interobject reference and linking – XML-Schema: document model definition– XSL: transformation and presentation– RDF: metadata and and inference– XQuery: retrieval from XML documents– SOAP: remote procedure calling

• Key commonalities:– draft standards from WWW consortium– text-based– extensible/portable

Page 4: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

XML Technologies

• Suitable for metadata and "light data"• Structured• Hierarchical

– Limited graph-like relationships (e.g. ID's)• Portable across

– languages– operating systems

• Becoming ubiquitous– standard parser API's (DOM, SAX)– parsers available in all major languages, platforms

Page 5: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

XML Issues• No semantics associated with markup• No random-access• No non-textual content• Document Type Definition

– Not itself encoded in XML– No constraints on element content– Context-free

• Syntax of element contents independent of element’s position in document tree

– No cardinality constraints

Page 6: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

XML for Scientific Data Access• Good because it supports more than one data

collection across:– discipline or sub-discipline (Ocean, atmosphere, Land)

– multiple data types (e.g. satellite swath, Grid, point, vector, raster)

– access modality (e.g. browsing, search, visualization, simulation)

• Requires the generation of use scenarios– input from scientific community

• Develop ontologies

• Identify requirements

Page 7: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

How to Use XML for Scientific Data Access(cont.)

• Develop data and metadata models to enable the scenarios– identify community-wide data semantics

• formal, incremental process• ongoing review and documentation

– target key semantics for scenarios– use extensible data modeling technologies (e.g. XML, RDF,

HDF) to implement data models• Link scenarios to build network of data services• Other concerns

– security– intellectual property– data preservation

Page 8: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

XML • Translators• Description Languages • Applications

Advantages

• Foster Evolution• Preserves interoperability• Internationalized text (unicode)• Structured text

Building Blocks

Page 9: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

XML based data format for interoperability

XML HDF

netCDF

CDF

CEOS

FITS

GRIB BUFR

SDTS

Page 10: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

Extensible Data Format (XDF)

What is XDF?XDF is developed at the NASA GSFCXML-based language for encapsulating scientific data. XDF aims to be the (mathematical) kernel of other fully-featured, discipline-oriented scientific formats written in XML.key features:

•Hierarchical data structures•Any dimensional arrays merged with coordinate information •High dimensional tables merged with field information, variable resolution •Easy wrapping of existing data •User specified coordinate systems •Searchable ASCII metadata •Extensibility to new features.

Page 11: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

1. Structures, arrays, parameters, axes 2. Clear coordinate information 3. Unrestrictive binary and ASCII formats. 4. Examples: EOS, astronomy, biology, etc. 5. OO Perl and Java application interfaces 6. FITSML - adopt FITS keywords and an XML kernel 7. Converters between FITS, FITSML, HDF, and CDF.

XDF home page: http://tarantella.gsfc.nasa.gov/xml/XDF_home.html

XDF Features

Page 12: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

A simplified structure with an image <XDF> <structure> <array> <axis name="X-axis">

<values> a list of values along one dimension</values></axis>

<axis name="Y-axis"><values> a list of values along other dimension</values>

</axis><read> info on the ordering of the data values and record format.

<recordFormat>...</recordFormat></read>

<data> The Data goes here </data> </array>

<array> Some other array of data... </array>

</structure></XDF>

Page 13: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

Advantages of XML based translators

1. Universal acceptance 2. Separation of information and presentation 3. Automatic validation 4. File inclusion (Internal and External Entities) 5. Hierarchical 6. Parsers 7. Stylesheet languages 8. Field specific languages 9. Extensible namespace

Page 14: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

Earth Science Markup Language (ESML)

ESML is currently developed at the University of Alabama, Huntsville under a NASA grant. •Specialized Markup language for Earth Science Metadata based on XML• Machine readable and interpretable• Representation of the structure and content of any data file, regardless of data format• Human readable• External metadata files that can be generated by either data producer or consumer (at collection, data set or granule level)• Supports data/service interoperability

Page 15: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

ESML

• Users can describe and publish files using ESML

• Users can describe ASCII and Binary data

• ESML will facilitate data discovery•Metadata can be indexed and searched by web search engines

• Allows users to utilize internet search engines to locate data

• Web site: http//esml.itsc.uah.edu

Page 16: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

ODL – XML Translator

• A stand alone Java program

• Extracts ODL metadata from HDF file

• Displays metadata using style sheet

• This program will be useful to build a metadata catalog system in XML

Page 17: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

HDF EOS Metadata

Each HDF file contains three metadata elements: Inventory, archive and structural

HDF- EOS Grid

HDF- EOS Point

XMLXMLHDF-EOS Swath

HDF EOS has three file types or objects. Each file type will contain all three metadata

elements

Page 18: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

Metadata Tools & Systems– XML

• The Earth System Markup Language (ESML, University of Alabama-Huntsville);

• The DIstributed MEtadata System (DIMES, George Mason University);

• The aggregation data catalog that is part of the Distributed Oceanographic Data System (DODS, University of Rhode Island);

• GDLIP, General Digital Library Interchange Protocol (Alexandria Digital Library);

• Digital Library for Earth System Education (DLESE); and

• Web Mapping Testbed (OGC, Digital Earth).

• Global Change Master Directory (GCMD) - NASA

Page 19: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

Tools and Systems• VISAD infrastructure from SSEC

http://www.ssec.wisc.edu/~billh/visad.html; • Live access server – PMEL

http://www.ferret.noaa.gov/nopp/main.pl? • WXWise applets University of Wisconsin-Madison

http://itg1.meteor.wisc.edu/wxwise/ • The Virtual Exploratoriumhttp://www.unidata.ucar.edu/workshops/ShapingFuture/Presentations/

Mohan_files/frame.htm• EDMI (Earth Data Multimedia Instrument, Bruce Caron, New Media

Studio); and • WorldWatcher from Northwestern University University of Northern

Coloradohttp://www.worldwatcher.nwu.edu/

Page 20: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

20

Var

iou

s S

chem

as (

desc

ribi

ng v

ario

us “

type

s” o

f m

etad

ata)

Unified Access to Metadata

XML layer (database, access tool)Conceptual/physical layer

User/system User/system User/system

Meta/DataSystem

Meta/DataSystem

Meta/DataSystem

Meta/DataSystem

Page 21: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

New Technologies: The Semantic Web

• Multiple metadata objects (RDF documents) linked together

• Ontologies– Taxonomies– Inference rules

• Promise: agents can synthesize information from multiple documents

• Like a world-wide ORDBMS

T. Berners-Lee et al, Scientific American, May 2001

Page 22: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

"The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." -- Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web

In the Semantic Web we will need:• Machines talking to machines – semantics need to be unambiguously declared• Joined-up data – enabling complex tasks based on information from various sources• Wide scope – from, say, home to government to commerce• Trust – both in data and who is saying it• This is not going to be easily achieved

Semantic Web

Page 23: XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com)suresh@mayurtech.com Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov)

Conclusion

• XML usage has increased in scientific data applications

• Usage is not common across the systems

• Web Services and Data Services

• Semantic web for scientific applications is in infancy