BADC, BODC, CCLRC, PML and SOC
The NERC Metadata Gateway: a product of the NERC DataGridThe NERC Metadata Gateway: a product of the NERC DataGrid
+ ++ + +[ ]=
Bryan Lawrence
(on behalf of a big team)
TECO-WIS, Nov 2006
Outline
• Introduction to NERC, the NERC Data Centres, and NCAS• The NERC DataGrid Project
– Key Components:• Data Tools, Data Discovery, {Access Control}
– NDG Information Environment• Key Standards Structures: the ISO Family• From CSML, {MOLES}, DIF to ISO19139 (NumSim)
• Distributed Content Search – Why we did it this way– Our Discovery Architecture
• NDG Discovery– Now … and – The Future – The “New NERC Metadata Gateway”
• ISO19139 Best Practice• Summary
TECO-WIS, Nov 2006
Some Introductions
• NERC: The Natural Environment Research Council– The major player in UK environmental research – Is both a funding agency, and a conglomeration of “centres”: internal “research” institutes,
• The British Oceanographic Data Centre (BODC) is part of one of the internal institutes.And external “collaborative” centres, which include:• The Plymouth Marine Laboratory• The National Oceanographic Centre, Southampton• The National Centre for Atmospheric Science, NCAS, mostly embedded in Universities, but
part of which is • the British Atmospheric Centre (BADC) which is embedded in the
• CCLRC: Council for the Central Laboratories of the Research Councils – Is about to be replaced by a new entity, which might be called the “Large Facilities Research
Council”
• NERC has seven discipline based designated data centres (including the BODC and BADC), and requires as much integration of data access as possible. – From discovery to utilisation, from genomics to ecology, from oceanography to atmospheric
science, from antarctic science to British geology …
TECO-WIS, Nov 2006
http://ndg.nerc.ac.uk
British Atmospheric Data Centre
British Oceanographic Data Centre
Complexity + Volume + Remote Access = Grid Challenge
NCAR
TECO-WIS, Nov 2006
If it’s not obvious
• Lots of organisations– Varying membership, and trust internally and between each
other is not consistent.
• Lots of priorities– Not all organisations are “about” data
• Different internal storage structures– Data stored in variety of databases and filesystems.– Some things well documented, but not automated– Some things automated, but information content is sparse …
• Integrating data access non-trivial
And none of that includes the important relationships with customers and collaborators!
TECO-WIS, Nov 2006
Key Components
Discovery Tools• Discovery Portal
– Metadata Search– Direct Links to Data and Services
Data Tools• Slice and Dice• Visualisation• Manipulation
Access Control• Systems are resource limited• Data may access may be restricted by license
Metadata Structures to support all the above
TECO-WIS, Nov 2006
Standards Landscape
Or two:• ISO TC211 Standards, e.g
– ISO 19101: Geographic information – Reference model– ISO 19103: Geographic information – Conceptual schema language– ISO 19107: Geographic information – Spatial schema– ISO 19108: Geographic information – Temporal schema– ISO 19109: Geographic information – Rules for application schema– ISO 19111: Geographic information – Spatial referencing by
coordinates– ISO 19115: Geographic information – Metadata
• Open Geospatial Consortium Specs– Geographic Markup Language, a toolkit for building data
descriptions– WMS, WCS, WFS, WPS: the Web (Map, Coverage, Feature,
and Processing) services.
TECO-WIS, Nov 2006
Standards
• ISO 19101: Geographic information – Reference model
A geospatial dataset…
…consists of features and related objects…
…in a defined logical
structure…
…delivered through
services…
…and described by metadata.
TECO-WIS, Nov 2006
Data Description Standards• Geographic ‘features’
– “abstraction of real world phenomena” [ISO 19101]
– Type or instance– Encapsulate important
semantics in universe of discourse
– “Something you can name”• Application schema
– Defines semantic content and logical structure
– ISO standards provide toolkit:
• spatial/temporal referencing
• geometry (1-, 2-, 3-D)• topology• dictionaries (phenomena,
units, etc.)– GML – canonical encoding
[from ISO 19109 “Geographic information – Rules for Application Schema”]
TECO-WIS, Nov 2006
CSML: Climate Science Modelling Language
• Fully Featured GML Application Schema, with extensions for– External binary data (Grib, netCDF etc)– Irregular Grids, “Proper” vertical coordinate systems (both
activities now on OGC and ISO standards tracks)
• V1.0 included seven feature types and provided only “data” modelling.
• V1.0 CSML tooling includes a scanner (creates CSML from netCDF files), and a parser (instantiates python objects which can be manipulated scientifically (based on the XML CSML documents).
TECO-WIS, Nov 2006
XM
L P
arser
SeeMyDENC
Data Dictionary
S52 Portrayal Library
SENC
MarineGML
(NDG) Feature
Types
XML
XML
XML
Biological Species
Chl-a from Satellite
ModelledHydrodynamics
XSLT
XSLT
XSLT
For each XSD (for the source data) there is an
XSLT to translate the data to the Feature
Types (FT) defined by CSML. The FT’s and
XSLT are maintained in a ‘MarineXML registry’ The FTs can then
be translated to equivalent FTs for
display in the ECDIS system
XSLT
Features in the source XSD must be present in
the data dictionary.
XSD
XSD
XSD
XML
XML
The result of the translation is an encoding that contains the
marine data in weakly typed (i.e. generic) Features
XSLT
XSLT
Phenomena in the XSD must have an associated
portrayal
ECDIS acts as an example client for
the data.
Data from different parts of the marine
community conforming to a variety of schema
(XSD)
MeasuredHydrodynamics
S-57v3 GML
XML
XSD
XML
XSD
Feature described using S-57v3.1Application
Schema can be imported and are equivalent to the same features in CSML’
Slide adapted from Kieran Millard (AUKEGGS, 2005)
MarineXML Testbed
TECO-WIS, Nov 2006
The Concept of re-using Features
Here structured XML is converted to plain ascii text in the form required for a numerical model
HTML warning service pages are generated ‘on the fly’XML can also be converted to SVG to display data graphically
Here the same XML is converted to the SENC format used in a proprietary tool for viewing electronic navigation charts.
All this requires agreement on standards
Slide adapted from Kieran Millard (AUKEGGS, 2005)
TECO-WIS, Nov 2006
CSML Round Tripping - 1
Managing semantics
UGAS
GML app schema
XML
<gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList>
GML dataset
instance
Class1
Class2
-End1
1
-End2
*
«datatype»DataType1
conceptual model
Conforms to
101010
New Dataset
Application
produces
parser
V1.0 (Python, Complete)
TECO-WIS, Nov 2006
CSML Round Tripping - 2
Managing data - 1
parser
V1.0V2 in development
<gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList>
GML dataset
scanner
V1.0V2 in development
GML app schema
XML
instance
101010
CF Dataset
Application
producesCF
TECO-WIS, Nov 2006
CSML2: Structure “Affords” Behaviour
cd ProfileSeriesFeature
Cov erage Types::ProfileSeriesCov erage
+/ domainSet: Profi leSeriesDomain+/ rangeSet: Record [0..*]
«FeatureType»ProfileSeriesFeature
+ location: GM_Point [0..1]+ time: TM_Instant [0..1]
AnyDefinition
«ObjectType»phenomenon::Phenomenon
CV_DiscreteCoverage
Discrete Cov erages::CV_DiscreteGridPointCov erage
+ find(DirectPosition*, Integer*) : Sequence<CV_GridPointValuePair>+ list() : Set<CV_GridPointValuePair>+ locate(DirectPosition*) : Set<CV_GridPointValuePair>+ point(CV_GridCoordinate*) : CV_GridPointValuePair
«type»FT types::ProfileSeriesType
+ location: GM_Point [0..1]+ time: TM_Instant [0..1]+ value: ProfileSeriesCoverage
+ extractPoint() : PointFeature+ extractPointSeries() : PointSeriesFeature+ extractProfile() : Profi leFeature+ extractProfileSeries() : Profi leSeriesFeature
+parameter
+value
«realize»«implement»
ISO 19123 coverage class
‘Affordance’ modelled with UML <<type>>
Moving beyond GML, but staying in the ISO Frame!
TECO-WIS, Nov 2006
CSML2: Related to new OGC Observations and Measurements Spec
cd O&M
Event
«FeatureType»observ ation::Observ ation
+ quality: DQ_DataQuality [0..1]+ responsible: CI_ResponsibleParty [0..1]+ result: Any
«Union»procedure::Procedure
+ procedureUse: ProcedureEvent+ standardProcedure: ProcedureSystem
AnyDefinition
«ObjectType»phenomenon::Phenomenon
«FeatureType»Feature Types::
ProfileSeriesFeature
+ location: GM_Point [0..1]+ time: TM_Instant [0..1]
CV_DiscreteGridPointCoverage
Cov erage Types::ProfileSeriesCov erage
+/ domainSet: ProfileSeriesDomain+/ rangeSet: Record [0..*]
+generatedObservation 0..*
+procedure 1
+observedProperty
1{Definition must be of a phenomenon that is a property of the featureOfInterest}
+parameter
+value
+result
+featureOfInterest
An Observation is an Event whose result is an estimate of the value of some Property of the Feature-of-interest, obtained using a specified Procedure
TECO-WIS, Nov 2006
Managing Data 2
101010
CF Dataset
<gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList>
GML dataset
scanner
XSLT
ISO19115
XMLPUBLISH
DECISIONPROCESSES
101010
CF Dataset
Define Dataset
Add Information
TECO-WIS, Nov 2006
The Most Important Decision
What is a dataset?
Granularity too coarse: can’t find what you want – not enough information exposed.
Granularity too fine: can’t find what you want – buried in unordered results.
TECO-WIS, Nov 2006
Distributed Query
Options:• Harvest or Crawl• Distribute Query to known targets versus
harvest from known targets and do local query– Timeliness versus Responsiveness
Decision:• NDG Discovery based on Open Archives
Initiative Protocol for Metadata Harvesting– Additional Partners include NCAR, MPI-WDCC, TPAC,
UK-MDIP
TECO-WIS, Nov 2006
Discovery Metadata Usage
Local Discoverymetadata inOAI provider
NDGDiscoverymetadata
store
Query WSinterface
OA
I Ha
rvestin
g
Local metadatastore
Local Discoverymetadata inOAI provider
Local metadatastore
Query
Results
DataCentre 1
DataCentre 2
Generate
Generate
Portal 1GUI
Portal 2GUI
Feeds
Feeds
XML: Metadata store: can support a limited variety of different xml schema provided WS-interface understands them (need unique xquery for each method, schema pair)
TECO-WIS, Nov 2006
Metadata Formats
Currently Supporting• NASA Global Change Master Directory: Directory Interchange Format (DIF)
Experimenting with:• Vanilla ISO19139• Dublin Core• UK Gemini V1 format
Will support following ISO profiles for harvest:• (eventually) UK Gemini profile• WMO profile• IOC profile• (whenever) US FGDC profile
ALL SIMULTANEOUSLY: XML Database plus appropriate xqueries
TECO-WIS, Nov 2006
Simulation in the context of ISO19139: NumSim
NDG Products: NumSim
TECO-WIS, Nov 2006
NumSim Example
NumSim Example
TECO-WIS, Nov 2006
Firefox Search Plugin
TECO-WIS, Nov 2006
International Discovery - Climate
TECO-WIS, Nov 2006
NDG “New Interface”
TECO-WIS, Nov 2006
Within Record
Scrolling Down
TECO-WIS, Nov 2006
New Interfaces
(No CSS as yet)
Simple Advanced
Issues:
• Times (forecast, paleo etc)
• BBOX (near poles and dateline)
• Semantic Vocabulary matching (exploiting a new NDG web-service providing thesaurus content, and ontology mapping)
TECO-WIS, Nov 2006
• Metadata extensions and profiles
ISO
TECO-WIS, Nov 2006
ISO19139
Background:• Designed to exploit as much as possible of the xml-
schema machinery• Not designed for Humans!
Advice:• Use in conjunction with a clear concept of why it’s being
used: • Decide on dataset granularity, and use other metadata
schema to describe how to use content (“A” metadata; e.g. an application schema of GML).
• Devise a profile with utility then: restrict, restrict, restrict. Document. Register.
TECO-WIS, Nov 2006
On Restriction
ISO19139 is also about INTEROPERABILITY!
• Don’t follow the ISO19139 advice and produce a new schema!
• Ensure that your profile instances are valid vanilla ISO19139
• Restrict content out-of-band, e.g. schematron, etc.
• Agree on how you’re going to deploy ISO19139
TECO-WIS, Nov 2006
On Extension
ISO19139 is also about INTEROPERABILITY!
Do follow the ISO19139 advice and produce a new schema!
• Do what you need for your community, but:
• Design so that code expecting ISO19139 instances can parse yours!
• Make it easy for third party code to ignore your content!
TECO-WIS, Nov 2006
Summary
• NDG dealing with heterogeneous environment
• Successful deployment of OAI with discovery metadata
(There are some issues differentiating between model simulations and ordering response sets)
• Directly linking to and exploiting GML application schema
• Web Service backends make deployment easier.
• Communities need to be very careful how they deploy ISO19139