Upload
charity-harvey
View
213
Download
0
Embed Size (px)
Citation preview
PSI Meta Data meeting, Toulouse - 15 November 2005 - 1
The CERA Climate and Environment data Retrieval and Archiving system
at MPI-Met / M&D
S. Legutke, F. Toussaint , M. Lautenschlager
PSI Meta Data meeting, Toulouse - 15 November 2005 - 2
Content
• History, Architecture, Usage of the CERA DB
• WDCC , IPCC/DDC, CEOP : data archives hosted by CERA
• Core and Extensions of the CERA meta data model
• Relations with other meta data standards
PSI Meta Data meeting, Toulouse - 15 November 2005 - 3
CERA-1 1995 compliant with DIF (DirectoryInterchangeFormat), NASA
Hierachic 2-layer structure: Experiments => Datasets
Shortcomings:- static 2-layer horizontal structure of climate
model data
- restructuring needed
History Architecture Usage
PSI Meta Data meeting, Toulouse - 15 November 2005 - 4
CERA-2 1997, compliant in addition with FGDC meta data standard
1-layer structure: RDBMS with
tree-like / hierachical / network
relations between entities
Requirements:
- geographically distributed archives
- common meta data model for all archives
=> simple but extendible
- one GUI for all archives
History Architecture Usage
Un
chan
ged
sin
ce 7
yea
rs
PSI Meta Data meeting, Toulouse - 15 November 2005 - 5
History Architecture Usage
User
Application Server
DBMS (Oracle): 12 TB in 10/2002Metadata, Blob-Data, Processing
Fileserver (Unitree)Processed + Raw Data
Mass Storage Archive ( 0.5 PB in 10/2002)
FTP
Data Migration
SQL*NetIIOP
CORBA-Client
RMI/IIOP
http, jdbc, iiop
Direct file access
177 TB in 11/2005
3.4 PB in 11/2005
PSI Meta Data meeting, Toulouse - 15 November 2005 - 6
Mass Storage capacity/load
tape archive: STK Tape Silo > 3.4 PB
disks: 177 TB in Oracle RDBMS (web accessible; applet or servlet)
Bandwidth compute - data server450 MB / sec
1 TB/day automated filling at model run time (IPCC)
3.4 PB data in files (no.=67263)
No. of experiments: 570
> 1000 requests per day
History Architecture Usage
PSI Meta Data meeting, Toulouse - 15 November 2005 - 7
WDCC IPCC/DDC CEOP Other
CERA is hosting the data of World Data Centre of Climate
• Maintained by M&D in cooperation with DKRZ and MPI-Met
• Collection and dissemination of data related to climate change (focus on georeferenced data)
• Access: WWW or FTP (on request)
PSI Meta Data meeting, Toulouse - 15 November 2005 - 8
WDCC IPCC/DDC CEOP Other
M&D and its CERA DB is acknowledged as Data Distribution Centre for IPCC model data
• Hosting (and distributing) a subset of IPCC data
• all monthly mean model data of AR4, TAR, SAR
PSI Meta Data meeting, Toulouse - 15 November 2005 - 9
WDCC IPCC/DDC CEOP Other
PSI Meta Data meeting, Toulouse - 15 November 2005 - 10
CERA-2 holds the CEOP data archive (Coordinated Enhanced Observing Period)
http://www.ceop.net
Strong cooperation with GEWEX, CLIVAR, CLiC, IGOS-P, CEOS
web based access to xml meta data and data files
WDCC IPCC/DDC CEOP Other
PSI Meta Data meeting, Toulouse - 15 November 2005 - 11
The Winter TopTen Program identifies the world’s largest and most heavily used databases.
Email reached in September, 13th: ….. Congratulations on achieving Grand Prize award winner status (1) in Database Size, Other, All and TopTen Winner status Database Size, Other, Linux;Workload, Other, Linux in Winter Corp.'s 2005 TopTen Program! .......
(1) Grand prizes are awarded for first place winners in the All Environments categories only.
WDCC's CERA DB has been identified as the largest Linux DB.
http://www.wintercorp.com/VLDB/2005_TopTen_Survey/2005TopTenWinners.pdf
PSI Meta Data meeting, Toulouse - 15 November 2005 - 12
Collaborations within Climate Community Data Archive Initiative DFD/DLR IPA/DLR DOD DWD GFZ PANGAEA/AWI xDAT/PIK CERA-2/PIK ECMWF CERA-2/DKRZ BADC
Distributed Archive
PSI Meta Data meeting, Toulouse - 15 November 2005 - 13
CERA-2 Metat data model Core scheme:- valid for all entries
Extensions:- community defined Module (e.g. PIK, DKRZ, PRISM to be defined?) - user defined local extension
Structural flexibility: - definable fields, tables, entry types & various other - flexible lists of valid values (LOV): extensible but controlled
Simple structure: - blockwise table groups- all CERA-2 blocks have a similar structure- more complex structures go into CERA Modules
Core and Extensions
PSI Meta Data meeting, Toulouse - 15 November 2005 - 14
The CERA Core meta data: only data common to most data in geophysics compliant with 1st level of FGDC standard sufficient to answer:
What data are stored? How to get assistance? How to get the data?
Little information is requireable, in order to make the model applicable for as many institutions/data as possible !
Schema and example at http://wini.wdc-climate.de
The core meta data system is extendible but not changeable(e.g. the CERA Core table structure may not be changed)
Core and Extensions
PSI Meta Data meeting, Toulouse - 15 November 2005 - 15
ParameterBlock describes data topic,
variable and unit
Metadata EntryThis is the central CERA Block,providing information on• the entry's title• type and relation to other entries• the project the data belong to• a summary of the entry• a list of general keywords related to data• creation and review dates of the metadata
CoverageInformation on the volume of space-time
covered by the dataReference
Any publication related to the data together with the publication form
StatusStatus information like data quality, processing steps, etc.
DistributionDistribution information including access restrictions, data format and fees if necessary
Contact
Data related to contact persons and institutes like distributor, investigator, and owner of copyright
Spatial Reference
Information on the coordinatesystem used
Core and Extension
FGDClevel 1Extension needed
for Grid description
PSI Meta Data meeting, Toulouse - 15 November 2005 - 16
The Core structure
PSI Meta Data meeting, Toulouse - 15 November 2005 - 17
ParameterBlock describes data topic,
variable and unit
Metadata EntryThis is the central CERA Block,providing information on• the entry's title• type and relation to other entries• the project the data belong to• a summary of the entry• a list of general keywords related to data• creation and review dates of the metadata
Additionally: Modules / Local Extensions
Module DATA_ORGANIZATION (grid structure)Module DATA_ACCESS (physical storage)Local extension for specific information on (e.g.)• data usage• data access and data administration
CoverageInformation on the volume of space-time
covered by the dataReference
Any publication related to the data together with the publication form
StatusStatus information like data quality, processing steps, etc.
DistributionDistribution information including access restrictions, data format and fees if necessary
Contact
Data related to contact persons and institutes like distributor, investigator, and owner of copyright
Spatial Reference
Information on the coordinatesystem used
Core and Extension
PSI Meta Data meeting, Toulouse - 15 November 2005 - 18
Core and Extensions
ENTRYentry_id....
PARAMETERentry_id..data_org_iddata_access_id...
DATA_ORG
data_org_iddata_org_descrspace_idtime_id
DATA_ACCESS
data_access_idaccess_structure_idstorage1_idstorage2_idstorage3_idstorage4_idrec_structure_idmodification_date
CO
RE
CORE
PSI Meta Data meeting, Toulouse - 15 November 2005 - 19
CERA: Module Example
PSI Meta Data meeting, Toulouse - 15 November 2005 - 20
Core and Extensions
DATA_ORG module
data_org_descr/name/acronym
space_id: key of table with space informationgridded or point data (station data, buoys, ships, …)gridded data only if lat/lon coordinates
time_id : key of table with time information (grid)
=> any data value locatable in space / time
PSI Meta Data meeting, Toulouse - 15 November 2005 - 21
Meta data not in the CERA core can be defined in new modules.Presently:
DATA_ORG module DATA_ACCESS module
Presently there is little information on model code (= NMM code base) or on configurations of models (=NMM models) in CERA
=> define model meta data module
• A minimum of specifications should be required(allowing to exactly reproduce a model run)
• Most specifications should be optional
Core and Extensions
PSI Meta Data meeting, Toulouse - 15 November 2005 - 22
A minimum of specifications should be required(allowing to exactly reproduce a model run)
Components involved Code repository for each component Code release numbers for each component Compile scripts Namelists Initial data files Forcing data files
Core and Extensions
PSI Meta Data meeting, Toulouse - 15 November 2005 - 23
Most specifications should be optional: All the required from above can be split into small pieces of
informations and included to the right place of the meta data / tables
Core and Extensions
PSI Meta Data meeting, Toulouse - 15 November 2005 - 24
CF standard
CF standard compliancy:
• Any data file with any file format can be an entry of CERA
• CERA is primarily containing GRIB single variable data files
• Support for NetCDF/CF file format is being implemented:
- adding meta data elements for the NetCDF/CF attributes if needed
- e.g. additional CF_UNIT table
- optional retrieval of data time windows of fine granularity
- search along NetCDF-CF attributes
PSI Meta Data meeting, Toulouse - 15 November 2005 - 25
Other standards
xsl scripts exists to transfer the CERA meta data into other standards/formats:
• xhtml
• DIF (NASA) - xml
• CSDGM (FGDC) - xml
• ISO/TC211 (19115/19139) - xml
• Dublin Core – xml
PSI Meta Data meeting, Toulouse - 15 November 2005 - 26
The End