Upload
joylyn
View
47
Download
0
Embed Size (px)
DESCRIPTION
Describing data through metadata and XML: the CSML experience. Bryan Lawrence 1. Andrew Woolf 2 , Roy Lowry 3 , Kerstin Kleese van Dam 2 , Ray Cramer 3 , Marta Gutierrez 1 , Siva Kondapalli 3 , Susan Latham 1 , Dominic Lowe 1 , Kevin O’Neill 2 , Ag Stephens 1 … and Michael Burek 4 - PowerPoint PPT Presentation
Citation preview
BADC, BODC, CCLRC, PML and SOC
Describing data through metadata and XML: the CSML
experience.
Describing data through metadata and XML: the CSML
experience.
+ ++ + +[ ]=
Bryan Lawrence1
Andrew Woolf2, Roy Lowry3, Kerstin Kleese van Dam2, Ray Cramer3, Marta Gutierrez1, Siva Kondapalli3, Susan Latham1, Dominic Lowe1,
Kevin O’Neill2, Ag Stephens1
… and Michael Burek4
1British Atmospheric Data Centre2CCLRC e-Science Centre
3British Oceanographic Data Centre4National Center for Atmospheric Research
WMO Metadata, Sep, 2005
Outline
• Motivation– The NERC Datagrid– Metadata Taxonomy
• Standards– OGC Vision– Feature Types
• CSML– Description– Prototyping in MarineXML– Round-Tripping
• The importance of CF
• Feature Type Issues• Next Steps• NDG Discovery Demo
WMO Metadata, Sep, 2005
http://ndg.nerc.ac.uk
British Atmospheric Data Centre
British Oceanographic Data Centre
Complexity + Volume + Remote Access = Grid Challenge
NCAR
WMO Metadata, Sep, 2005
Wider InternetNERC Grid
taperobot
XML data-base
XML data-base
BADC NDG Wrapper
OnlineData
OnlineData
BODC NDGWrapper
OnlineData
XML data-base
Group NDGWrapper
Software Agent
Grid User
Satellite Supercomputer
Research Group DataSources
Internet Link
Internet User
Internet LinkESG (&other)Applications
Wider Internet
NDGWeb
Portal
XML data-base
WMO Metadata, Sep, 2005
NDG Metadata Taxonomy
… not one schema, not one solution!
CSMLNCML+CF
MOLES THREDDS
DIF -> ISO19115
CLADDIER
WMO Metadata, Sep, 2005
NDG
Things I’m not going to talk about:
• NDG PKI based secure access to datasets– Supporting authentication/authorisation across differing virtual
organisations without centralised user or role databases.– Built on bilateral trust relationships
• NDG Discovery– Portal already provides access to hundreds of datasets with
international scope– Open Archives Initiative Harvesting– Web Service Interfaces
• NDG Browse– New method of “walking” around datasets.
WMO Metadata, Sep, 2005
OGC Web Feature Service
GML WFSServer
Community-specific GML application language– TigerGML, LandGML, O&M, XMML, CGI-GML, ADX,
GPML, CSML, MarineXML etc
public private boundary
Data-source organised for custodian’s requirements
WFSClient
HTML
Slide adapted from Simon Cox (AUKEGGS, 2005)
WMO Metadata, Sep, 2005
Standard transfer format allows multiple data sources
WFSClient
WFSServer
WFSServer
B
WFSServer
C
Slide adapted from Simon Cox (AUKEGGS, 2005)
WMO Metadata, Sep, 2005
e.g.: ERA40 re-analysis surface air temperature, 2001-04-27– deegree open-source WMS modified with netCDF connector
Overlaid with rainfall from globe.digitalearth.gov WMS server
NetCDF + WMS
NB: Now using Mapserver for Interoperability experiments
WMO Metadata, Sep, 2005
Standards
• ISO 19101: Geographic information – Reference model
A geospatial dataset…
…consists of features and related objects…
…in a defined logical
structure…
…delivered through
services…
…and described by metadata.
WMO Metadata, Sep, 2005
Standards• Geographic ‘features’
– “abstraction of real world phenomena” [ISO 19101]
– Type or instance– Encapsulate important
semantics in universe of discourse
• Application schema– Defines semantic content
and logical structure of datasets
– ISO standards provide toolkit:
• spatial/temporal referencing
• geometry (1-, 2-, 3-D)• topology• dictionaries (phenomena,
units, etc.)– GML – canonical encoding
[from ISO 19109 “Geographic information – Rules for Application Schema”]
WMO Metadata, Sep, 2005
Standards
• Emerging ISO standards– TC211 – around 40 standards for geographic
information– Cover activity spectrum: discovery access use– Provide a framework for data integration
Universe of discourse
...
startTime[1]bottomTime[1]salinity[*]temperature[*]depth[*]
CTDProfile
...
...
Feature types
ISO 19103
<?xml version="1.0" encoding="UTF-8"?> <schema targetNamespace="http://ndg.nerc.ac.uk/csml" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:csml="http://ndg.nerc.ac.uk/csml" xmlns:om="http://www.opengis.net/om" xmlns:gml="http://www.opengis.net/gml" elementFormDefault="qualified" attributeFormDefault="unqualified" version="0.1"> <annotation> <documentation>CSML application schema</documentation> </annotation> <!--====================================================================== --> <import namespace="http://www.opengis.net/gml" schemaLocation="GML-3.1.0/base/gml.xsd"/> <import namespace="http://www.opengis.net/om" schemaLocation="phenomenon.xsd"/> <!--====================================================================== --> <!--===== Root element for CSML dataset =====--> <!--====================================================================== --> <complexType name="DatasetType"> <complexContent> <extension base="gml:AbstractGMLType"> <sequence> <element ref="csml:UnitDefinitions" minOccurs="0" maxOccurs="unbounded"/> <element ref="csml:ReferenceSystemDefinitions" minOccurs="0" maxOccurs="unbounded"/> <element ref="csml:PhenomenonDefinitions" minOccurs="0"/> <element ref="csml:_ArrayDescriptor" minOccurs="0" maxOccurs="unbounded"/> <element ref="gml:FeatureCollection" minOccurs="0" maxOccurs="unbounded"/> </sequence> </extension> </complexContent> </complexType> <element name="Dataset" type="csml:DatasetType"/> <!--====================================================================== --> <!--===== Dictionary/definition elements =====--> <!--====================================================================== --> <complexType name="ReferenceSystemDefinitionsType"> <complexContent> <extension base="gml:DictionaryType"/> </complexContent> </complexType> <element name="ReferenceSystemDefinitions" type="csml:ReferenceSystemDefinitionsType"/> <complexType name="ReferenceSystemDefinitionsPropertyType"> <sequence> <element ref="csml:ReferenceSystemDefinitions" minOccurs="0"/> </sequence> <attributeGroup ref="gml:AssociationAttributeGroup"/> </complexType>
Application schema
ISO 19109
FTC
ISO 19110
WMO Metadata, Sep, 2005
Standards
• The importance of governance– Information community defined by shared semantics– Need community process to manage those semantics
(definitions, models, vocabularies, taxonomies, etc.)– e.g. CF conventions for netCDF files– Role of Feature Type Catalogues [ISO 19110] and registers [ISO
19135]
• Governance as driver for granularity– Remit / interest determines appropriate granularity– ref. IOC, IHO, WMO
abstract generic highly specialised
feature types spectrum
<temperatureProfile/><measurement type=“Radiosonde” measurand=“temperature”/> <Sonde parameter=“temperature”/>
WMO Metadata, Sep, 2005
NDG-A: Climate Science Modelling Language
• Aims:– provide semantic integration mechanism for NDG data– explore new standards-based interoperability framework– emphasise content, not container
• Design principles:– offload semantics onto parameter type (‘phenomenon’,
observable, measurand)• e.g. wind-profiler, balloon temperature sounding
– offload semantics onto CRS• e.g. scanning radar, sounding radar
– ‘sensible plotting’ as discriminant• ‘in-principle’ unsupervised portrayal
– explicitly aim for small number of weakly-typed features (in accordance with governance principle and NDG remit)
WMO Metadata, Sep, 2005
Climate ScienceModelling Language
• CSML feature types– defined on basis of geometric and topologic
structureCSML feature
type Description Examples
TrajectoryFeature
Discrete path in time and space of a platform or instrument.
ship’s cruise track, aircraft’s flight path
PointFeature Single point measurement. raingauge measurement
ProfileFeature Single ‘profile’ of some parameter along a directed line in space.
wind sounding, XBT, CTD, radiosonde
GridFeature Single time-snapshot of a gridded field. gridded analysis fieldPointSeriesFeature Series of single datum measurements. tidegauge, rainfall
timeseries
ProfileSeriesFeature Series of profile-type measurements.
vertical or scanning radar, shipborne ADCP, thermistor chain timeseries
GridSeriesFeature Timeseries of gridded parameter fields.
numerical weather prediction model, ocean general circulation model
WMO Metadata, Sep, 2005
Climate Science Modelling Language
• CSML feature types– examples...
ProfileSeriesFeature
ProfileFeature
GridFeature
WMO Metadata, Sep, 2005
Climate Science Modelling Language
• Application schema– logical structure and semantic content of NDG
‘Dataset’– Based on GML 3.1
«Type»GML::AbstractGMLType
«Type»Dataset
«Type»UnitDefinitions
«Type»ReferenceSystemDefinitions
«Type»PhenomenonDefinitions
«Type»AbstractArrayDescriptor
«Type»GML::FeatureCollection
**
*
*
*
*
*
*
*
*
WMO Metadata, Sep, 2005
Climate Science Modelling Language
• Numerical array descriptors– provides ‘wrapper’
architecture for legacy data files
– ‘Connected’ to data model numerical content through ‘xlink:href’
• Three subtypes:– InlineArray– ArrayGenerator– FileExtract (NASAAmes,
NetCDF, GRIB)
• Composite design pattern for aggregation
+arraySize[1]+uom[0..1]+numericType[0..1]+numericTransform[0..1]+regExpTransform[0..1]
«Type»AbstractArrayDescriptor
+aggType[1]+aggIndex[1]
«Type»AggregatedArray
1
+component
*
+values[*]
«Type»InlineArray
+expression[1]
«Type»ArrayGenerator
+fileName[1]
«Type»AbstractFileExtract
+variableName[1]+index[0..1]
«Type»NASAAmesExtract
+variableName[1]
«Type»NetCDFExtract
+parameterCode[1]+recordNumber[0..1]+fileOffset[0..1]
«Type»GRIBExtract
+id+metaDataProperty+description+name
«Type»GML::AbstractGMLType
WMO Metadata, Sep, 2005
Climate Science Modelling Language
• Inline array
• Array generator
<NDGInlineArray><arraySize>5 2</arraySize><uom>udunits.xml#degreeC</uom><numericType>float</numericType><regExpTransform>s/10/9/ge</regExpTransform><numericTransform>+5</numericTransform><values>1 2 3 4 5 6 7 8 9 10</values>
</NDGInlineArray>
<NDGArrayGenerator><arraySize>10001</arraySize><uom>udunits.xml#minute</uom><numericType>float</numericType><expression>0:5:50000</expression>
</NDGArrayGenerator>
WMO Metadata, Sep, 2005
Climate Science Modelling Language
File extract<NDGNASAAmesExtract>
<arraySize>526</arraySize><numericType>double</numericType><fileName>/data/BADC/macehead/mh960606.cf1</fileName><variableName>CFC-12</variableName>
</NDGNASAAmesExtract>
<NDGNetCDFExtract gml:id="feat04azimuth"><arraySize>10000</arraySize><fileName>radar_data.nc</fileName><variableName>az</variableName>
</NDGNetCDFExtract>
<NDGGRIBExtract><arraySize>320 160</arraySize><numericType>double</numericType><fileName>/e40/ggas1992010100rsn.grb</fileName><parameterCode>203</parameterCode><recordNumber>5</ recordNumber><fileOffset>289412</fileOffset>
</NDGGRIBExtract>
WMO Metadata, Sep, 2005
CSML Observations
<gml:dictionaryEntry><om:Phenomenon gml:id="atmosphere_cloud_condensed_water_content"><gml:description>"condensed_water" means liquid and ice. "Content" indicates a quantity per unit area. The "atmosphere content" of a quantity refers to the vertical integral from the surface to the top of the atmosphere. For the content between specified levels in the atmosphere, standard names including content_of_atmosphere_layer are used.</gml:description><gml:name codeSpace="http://www.cgd.ucar.edu/cms/eaton/cf-metadata/">atmosphere_cloud_condensed_water_content</gml:name><gml:name codeSpace="GRIB">76</gml:name><gml:name codeSpace="PCMDI">clwvi</gml:name></om:Phenomenon></gml:dictionaryEntry>
WMO Metadata, Sep, 2005
XM
L P
arser
SeeMyDENC
Data Dictionary
S52 Portrayal Library
SENC
MarineGML
(NDG) Feature
Types
XML
XML
XML
Biological Species
Chl-a from Satellite
ModelledHydrodynamics
XSLT
XSLT
XSLT
For each XSD (for the source data) there is an
XSLT to translate the data to the Feature
Types (FT) defined by CSML. The FT’s and
XSLT are maintained in a ‘MarineXML registry’ The FTs can then
be translated to equivalent FTs for
display in the ECDIS system
XSLT
Features in the source XSD must be present in
the data dictionary.
XSD
XSD
XSD
XML
XML
The result of the translation is an encoding that contains the
marine data in weakly typed (i.e. generic) Features
XSLT
XSLT
Phenomena in the XSD must have an associated
portrayal
ECDIS acts as an example client for
the data.
Data from different parts of the marine
community conforming to a variety of schema
(XSD)
MeasuredHydrodynamics
S-57v3 GML
XML
XSD
XML
XSD
Feature described using S-57v3.1Application
Schema can be imported and are equivalent to the same features in CSML’
Slide adapted from Kieran Millard (AUKEGGS, 2005)
MarineXML Testbed
WMO Metadata, Sep, 2005
Biological sampling station with attributes for the species sampled at each
Grid of Chl-a from the MERIS instrument on ENVISAT
Predicted and measured wave climate timeseries (height, direction and period)
Vectors of currents from instruments
MarineXML Testbed
Slide adapted from Kieran Millard (AUKEGGS, 2005)
WMO Metadata, Sep, 2005
The Concept of re-using Features
Here structured XML is converted to plain ascii text in the form required for a numerical model
HTML warning service pages are generated ‘on the fly’XML can also be converted to SVG to display data graphically
Here the same XML is converted to the SENC format used in a proprietary tool for viewing electronic navigation charts.
All this requires agreement on standards
Slide adapted from Kieran Millard (AUKEGGS, 2005)
WMO Metadata, Sep, 2005
Climate Science Modelling Language
• Status:– Initial feature types defined– First draft application schema complete– Trial software tooling being coded (parser, netCDF
instantiation)– Initial deployment trial across BODC, BADC datasets
• Future:– Separate out wrapper implementation (array
descriptors)– Disallow ‘internal’ dictionaries– More strongly-typed features?
• Complex feature (vector measurements)• Implicit Ensemble Support• Swathes
– Follow (and pursue!) GML evolution, enhance compliance
– Expand tooling
WMO Metadata, Sep, 2005
CSML Round Tripping - 1
Managing semantics
UGAS
GML app schema
XML
<gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList>
GML dataset
instance
Class1
Class2
-End1
1
-End2
*
«datatype»DataType1
conceptual model
Conforms to
101010
New Dataset
Application
produces
parser
Under Development
WMO Metadata, Sep, 2005
CSML Round Tripping - 2
Managing data - 1
parser
Under Development
<gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList>
GML dataset
scanner
Under Development
GML app schema
XML
instance
101010
CF Dataset
Application
producesCF
WMO Metadata, Sep, 2005
Climate Forecasting Conventions
1. Data should be self-describing. No external tables are needed to interpret the file. For instance, CF encodings do not depend upon numeric codes (by contrast with GRIB).
2. The convention should be easy to use, both for data-writers and users of data.
3. The metadata and the semantic meaning encoding though the metadata should be readable by humans as well as easily utilized by programs.
4. Redundancy should be minimised as much as possible (because it reduces the chance of errors of inconsistency when writing data)
For using NetCDF to store CF data:
WMO Metadata, Sep, 2005
CF
CF consists of:• Vocabulary management• Semantic concepts (axes, cells etc), • and format specific conventions (NetCDF
now)
CF is at the heart of • IPCC data comparison• Academic earth system science data
exploitation (and archival).
WMO Metadata, Sep, 2005
CF
CF• Exploits 100’s of man years of effort on
NetCDF evolution and tools• Is one of the means by which we can take
NetCDF data and make meaningful feature types.
• Application schema will depend on CF (a la the registry relationships).
CF• Needs WMO blessing (to validate the effort of
those who work on it as a standards activity).
WMO Metadata, Sep, 2005
Managing Data 2
101010
CF Dataset
<gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList>
GML dataset
scanner
XSLT
ISO19115
XMLPUBLISH
DECISIONPROCESSES
101010
CF Dataset
Define Dataset
Add Information
WMO Metadata, Sep, 2005
Metadata extensions and profiles
ISO19115 ConceptShould apply to all “metadata”
WMO Metadata, Sep, 2005
D – ISO19115
• Most important decision to make is:– “What is a dataset”
• Governance Issue• Granularity Issue
• Not to be confused with application schema
• Can and should exploit external vocabularies
• Should be extended!
WMO Metadata, Sep, 2005
NumSim.xsd – extension to D
WMO Metadata, Sep, 2005
Registry Relationships
Feature Types, need
• relationships
• operations GML
Managed Registries Crucial to Interoperabilty
WMO Metadata, Sep, 2005
Catalogues or Registries?
The terms catalogue and registry are often used interchangeably, but in OGC speak:
“a registry is a catalogue that exemplifies a formal registration process (e.g., such as those described in ISO 19135 or ISO 11179-6).”
A registry is typically maintained by an authorized registration authority who assumes responsibility for complying with a set of policies and procedures for accessing and managing metadata items within some application domain.
WMO Metadata, Sep, 2005
Exploit Theory of Measurement
Work with Simon Cox on relationship between features, coverages, complex (multiple) measurements and CSML. Get “metocean” concepts in future version of GML!
WMO Metadata, Sep, 2005
Where next with feature types?
• Feature Types (ISO 19110) are the primary language component for a spatial community to define real world phenomena.
• Although ISO19136 provides a route for the encoding of Feature Types in XML, it does not provide a route-map for defining the features themselves.
• The level of granularity for Feature Types has to be based on the governance model. If no-one is prepared to put in place a process to define semantics, interoperability has reached its limits.
• … Marine: MOTIIVE, WMO: ????
WMO Metadata, Sep, 2005
Summary Points• Need clear distinction in mind between inter- and intra-operability
– When to use GML etc, and when not to!• Need clear governance principles at a number of levels for
– Dataset definitions (data provider, e.g. BADC)– Feature-type definitions (community, e.g. WMO)– Interoperability (frameworks, e.g. OGC)
• CSML – Provides some geometric features based on portrayal– Can and will be extended– Provides generalisation of specific examples, e.g. METARS
• WMO should:– “stand-up” a reusable feature-type catalogue
(but how? Work underway in MOTIIVE etc)– Support CF development– Work on dictionaries and thesauri and publish them.– Exploit work in the academic community
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
Choose to return either data or “B-”Metadata
Look at DIFs in either HTML or XML
Can order responses by Title, Data Centre or Temporal coverage (default random)
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
WMO Metadata, Sep, 2005
Background activity being parallelised with GODIVA/CCLRC e-science collaboration (spectral -> gridpoint + CDMS + visualisation tools)
Download either plot or the data that went into the plot.
WMO Metadata, Sep, 2005
ERA40:
•All driven from one CDML file, 9 TB online spherical harmonics, looking like 40 TB “virtual” gridded!