Harmonization of environmental data using the Climate Science Modelling Language

Preview:

DESCRIPTION

Harmonization of environmental data using the Climate Science Modelling Language. Jon Blower , Alastair Gemmell ( Reading e-Science Centre ) Andrew Woolf, Dominic Lowe, Arif Shaon ( STFC e-Science Centre ) Stephen Pascoe ( British Atmospheric Data Centre ) - PowerPoint PPT Presentation

Citation preview

9 September 2008

© University of Reading 2008 www.reading.ac.uk

Reading e-Science Centre

Harmonization of environmental data using the Climate Science Modelling LanguageJon Blower, Alastair Gemmell (Reading e-Science Centre)Andrew Woolf, Dominic Lowe, Arif Shaon (STFC e-Science Centre)Stephen Pascoe (British Atmospheric Data Centre)Keiran Millard, Quillon Harphem (HR Wallingford)

We need to integrate and comparelots of different types of data…

SSM/I HadCM3

HiGEMERA-40

Satellite

Re-analysis product

Low res. Climate GCM

HadCM3

Hi-res Climate GCM, New physics

Putt, Gurney and Haines

…for validating numerical models…

… calibrating instruments …

+ =

…data assimilation…

Black line: control run

time

Green stars: observationsRed line: assimilation run

Flood prediction

... and making predictions

Search and rescue

Climate prediction

Where we are now (mostly)

Separate websites for

each data provider

The need for harmonization

• Each community has evolved its own means for presenting data:– File formats– Metadata conventions– Coordinate systems

• These are not usually mutually compatible

• … and vital metadata can be missing

• No widely-accepted standards exist for certain types of data

• Hence scientists spend lots of time dealing with low-level technical issues

• Need a common view onto all these datasets

Open Geospatial standards

• Aim to describe all geographic data

• XML encoding– Geography Markup Language

• Web Services for data exchange

• Rooted in international standards

• Mandated by European INSPIRE directive

• But fiendishly complex• Evolved from map-oriented

systems– Vertical and temporal

information not handled cleanly

Bridging the gap: CSML• Climate Science Modelling Language

– Abstract data model defined using ISO/OGC approach

– XML encoding based upon GML

• Adapts open geospatial standards to environmental science data– “Best of both worlds”

• Wraps existing data– Doesn’t expect providers to convert data

• Data are seen as geographical “features”, not as a file system

Selected CSML Feature Types

PointSeriesFeature

(timeseries at a point)

ProfileFeature

(vertical profile at a point)

GridSeriesFeature

(series of multidimensional grids)

SwathFeature

(single satellite sweep)

SectionFeature

(vertical section)

Feature Types are classified by their geometry

Harmonizing 2 databases using CSML

• Different data providers, different internal representation– Met Office “MIDAS” dataset– “Environmental Change

Network” dataset

• Modelled both databases as collections of CSML PointSeriesFeatures

• Allowed sharing of plotting and analysis tools– CSML-XML documents

converted to maps, plots and KML

• Intermediate step via XML not necessary in ideal world

Java-CSML• Need reusable libraries to

apply CSML more widely• Aim is to reduce cost of

developing data-driven applications

• Interoperates with other means of modelling data in Java:– GeoAPI, Common Data

Model

• High-level analysis/visualization routines completely decoupled from low-level data access

Java-CSML: Design attempts

1. Transform CSML’s XML schema to Java code using automated tool• Led to very deeply-nested code

2. Based upon OGC-sponsored GeoAPI• Incomprehensible unless very familiar with ISO

standards• GeoAPI is a moving target

3. Based on well-known Java concepts• Accessible to “typical” Java programmer• Compatibility with other data models assured

through wrappers• Insulated against inevitable changes to standards• More code needs to be written by Java-CSML

designers• Less code needs to be written by users

Java-CSML Application 1:Coastal oceanography decision support system

Red line: Smartbuoy dataBlue dots: model output

Behind the scenes

Smartbuoys(via Web Feature Service)

Physical model(via NetCDF files)

Biological model(via OPeNDAP server)

Java-CSMLwrappers

Java-CSMLPlotting routines

Java-CSML Application 2:Atmospheric ozone

Control run

Assimilation run

Specializing CSML Features

• A generic data model can’t encode all possible metadata without becoming extremely complex

• In CSML generic feature types can be specialized– cf. object-oriented

inheritance

• Hence core data model retains simplicity

ProfileFeature

ArgoProfileFeatureint qualityFlag

Java-CSML Application 3:Ocean data assimilation

ArgoProfileFeatureProfileFeature

Red lines: Argo dataBlue lines: model output

Summary• CSML bridges gap between bottom-up (science) and top-

down (GIS) approaches to modelling data– Wraps existing data holdings

• Data modelled as Feature Types distinguished by geometry and “sensible plotting”– Complexity managed through feature inheritance

• Doesn’t attempt to model everything!– Other technologies deal with discovery, provenance,

security…

• Java-CSML framework allows data intercomparison applications to be built quickly– Automates tedious and error-prone tasks

Wider lessons• “Interoperable” data formats not necessarily

suitable for storage– Because no single data model can satisfy every

application– Abstraction usually leads to data loss!

• Trade-offs between scope and complexity– Don’t attempt to put everything in one specification

• Symbiotic relationship between standards, tools and applications– Must be developed in parallel

Recommended