26
PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA Climate and Environment data Retrieval and Archiving system at MPI-Met / M&D S. Legutke, F. Toussaint , M. Lautenschlager

PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

Embed Size (px)

Citation preview

Page 1: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 1

The CERA Climate and Environment data Retrieval and Archiving system

at MPI-Met / M&D

S. Legutke, F. Toussaint , M. Lautenschlager

Page 2: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 2

Content

• History, Architecture, Usage of the CERA DB

• WDCC , IPCC/DDC, CEOP : data archives hosted by CERA

• Core and Extensions of the CERA meta data model

• Relations with other meta data standards

Page 3: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 3

CERA-1 1995 compliant with DIF (DirectoryInterchangeFormat), NASA

Hierachic 2-layer structure: Experiments => Datasets

Shortcomings:- static 2-layer horizontal structure of climate

model data

- restructuring needed

History Architecture Usage

Page 4: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 4

CERA-2 1997, compliant in addition with FGDC meta data standard

1-layer structure: RDBMS with

tree-like / hierachical / network

relations between entities

Requirements:

- geographically distributed archives

- common meta data model for all archives

=> simple but extendible

- one GUI for all archives

History Architecture Usage

Un

chan

ged

sin

ce 7

yea

rs

Page 5: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 5

History Architecture Usage

User

Application Server

DBMS (Oracle): 12 TB in 10/2002Metadata, Blob-Data, Processing

Fileserver (Unitree)Processed + Raw Data

Mass Storage Archive ( 0.5 PB in 10/2002)

FTP

Data Migration

SQL*NetIIOP

CORBA-Client

RMI/IIOP

http, jdbc, iiop

Direct file access

177 TB in 11/2005

3.4 PB in 11/2005

Page 6: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 6

Mass Storage capacity/load

tape archive: STK Tape Silo > 3.4 PB

disks: 177 TB in Oracle RDBMS (web accessible; applet or servlet)

Bandwidth compute - data server450 MB / sec

1 TB/day automated filling at model run time (IPCC)

3.4 PB data in files (no.=67263)

No. of experiments: 570

> 1000 requests per day

History Architecture Usage

Page 7: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 7

WDCC IPCC/DDC CEOP Other

CERA is hosting the data of World Data Centre of Climate

• Maintained by M&D in cooperation with DKRZ and MPI-Met

• Collection and dissemination of data related to climate change (focus on georeferenced data)

• Access: WWW or FTP (on request)

Page 8: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 8

WDCC IPCC/DDC CEOP Other

M&D and its CERA DB is acknowledged as Data Distribution Centre for IPCC model data

• Hosting (and distributing) a subset of IPCC data

• all monthly mean model data of AR4, TAR, SAR

Page 9: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 9

WDCC IPCC/DDC CEOP Other

Page 10: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 10

CERA-2 holds the CEOP data archive (Coordinated Enhanced Observing Period)

http://www.ceop.net

Strong cooperation with GEWEX, CLIVAR, CLiC, IGOS-P, CEOS

web based access to xml meta data and data files

WDCC IPCC/DDC CEOP Other

Page 11: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 11

The Winter TopTen Program identifies the world’s largest and most heavily used databases.

Email reached in September, 13th: ….. Congratulations on achieving Grand Prize award winner status (1) in Database Size, Other, All and TopTen Winner status Database Size, Other, Linux;Workload, Other, Linux in Winter Corp.'s 2005 TopTen Program! .......

(1) Grand prizes are  awarded for first place winners in the All Environments categories only.

WDCC's CERA DB has been identified as the largest Linux DB.

http://www.wintercorp.com/VLDB/2005_TopTen_Survey/2005TopTenWinners.pdf

Page 12: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 12

Collaborations within Climate Community Data Archive Initiative DFD/DLR IPA/DLR DOD DWD GFZ PANGAEA/AWI xDAT/PIK CERA-2/PIK ECMWF CERA-2/DKRZ BADC

Distributed Archive

Page 13: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 13

CERA-2 Metat data model Core scheme:- valid for all entries

Extensions:- community defined Module (e.g. PIK, DKRZ, PRISM to be defined?) - user defined local extension

Structural flexibility: - definable fields, tables, entry types & various other - flexible lists of valid values (LOV): extensible but controlled

Simple structure: - blockwise table groups- all CERA-2 blocks have a similar structure- more complex structures go into CERA Modules

Core and Extensions

Page 14: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 14

The CERA Core meta data: only data common to most data in geophysics compliant with 1st level of FGDC standard sufficient to answer:

What data are stored? How to get assistance? How to get the data?

Little information is requireable, in order to make the model applicable for as many institutions/data as possible !

Schema and example at http://wini.wdc-climate.de

The core meta data system is extendible but not changeable(e.g. the CERA Core table structure may not be changed)

Core and Extensions

Page 15: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 15

ParameterBlock describes data topic,

variable and unit

Metadata EntryThis is the central CERA Block,providing information on• the entry's title• type and relation to other entries• the project the data belong to• a summary of the entry• a list of general keywords related to data• creation and review dates of the metadata

CoverageInformation on the volume of space-time

covered by the dataReference

Any publication related to the data together with the publication form

StatusStatus information like data quality, processing steps, etc.

DistributionDistribution information including access restrictions, data format and fees if necessary

Contact

Data related to contact persons and institutes like distributor, investigator, and owner of copyright

Spatial Reference

Information on the coordinatesystem used

Core and Extension

FGDClevel 1Extension needed

for Grid description

Page 16: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 16

The Core structure

Page 17: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 17

ParameterBlock describes data topic,

variable and unit

Metadata EntryThis is the central CERA Block,providing information on• the entry's title• type and relation to other entries• the project the data belong to• a summary of the entry• a list of general keywords related to data• creation and review dates of the metadata

Additionally: Modules / Local Extensions

Module DATA_ORGANIZATION (grid structure)Module DATA_ACCESS (physical storage)Local extension for specific information on (e.g.)• data usage• data access and data administration

CoverageInformation on the volume of space-time

covered by the dataReference

Any publication related to the data together with the publication form

StatusStatus information like data quality, processing steps, etc.

DistributionDistribution information including access restrictions, data format and fees if necessary

Contact

Data related to contact persons and institutes like distributor, investigator, and owner of copyright

Spatial Reference

Information on the coordinatesystem used

Core and Extension

Page 18: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 18

Core and Extensions

ENTRYentry_id....

PARAMETERentry_id..data_org_iddata_access_id...

DATA_ORG

data_org_iddata_org_descrspace_idtime_id

DATA_ACCESS

data_access_idaccess_structure_idstorage1_idstorage2_idstorage3_idstorage4_idrec_structure_idmodification_date

CO

RE

CORE

Page 19: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 19

CERA: Module Example

Page 20: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 20

Core and Extensions

DATA_ORG module

data_org_descr/name/acronym

space_id: key of table with space informationgridded or point data (station data, buoys, ships, …)gridded data only if lat/lon coordinates

time_id : key of table with time information (grid)

=> any data value locatable in space / time

Page 21: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 21

Meta data not in the CERA core can be defined in new modules.Presently:

DATA_ORG module DATA_ACCESS module

Presently there is little information on model code (= NMM code base) or on configurations of models (=NMM models) in CERA

=> define model meta data module

• A minimum of specifications should be required(allowing to exactly reproduce a model run)

• Most specifications should be optional

Core and Extensions

Page 22: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 22

A minimum of specifications should be required(allowing to exactly reproduce a model run)

Components involved Code repository for each component Code release numbers for each component Compile scripts Namelists Initial data files Forcing data files

Core and Extensions

Page 23: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 23

Most specifications should be optional: All the required from above can be split into small pieces of

informations and included to the right place of the meta data / tables

Core and Extensions

Page 24: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 24

CF standard

CF standard compliancy:

• Any data file with any file format can be an entry of CERA

• CERA is primarily containing GRIB single variable data files

• Support for NetCDF/CF file format is being implemented:

- adding meta data elements for the NetCDF/CF attributes if needed

- e.g. additional CF_UNIT table

- optional retrieval of data time windows of fine granularity

- search along NetCDF-CF attributes

Page 25: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 25

Other standards

xsl scripts exists to transfer the CERA meta data into other standards/formats:

• xhtml

• DIF (NASA) - xml

• CSDGM (FGDC) - xml

• ISO/TC211 (19115/19139) - xml

• Dublin Core – xml

Page 26: PSI Meta Data meeting, Toulouse - 15 November 2005 - 1 The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,

PSI Meta Data meeting, Toulouse - 15 November 2005 - 26

The End