41
PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B. Ritschel, [email protected] ISDC Team (V. Mende, H. Palm 1 , Ch. Bruhns 2 , R. Kopischke 2 , S. Freiberg 3 , L. Gericke 3 ), [email protected] The Electronic Geophysical Year

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

Embed Size (px)

Citation preview

Page 1: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

The ISDC concept for long-term sustainability

of geoscience data and information

B. Ritschel, [email protected]

ISDC Team (V. Mende, H. Palm1, Ch. Bruhns2, R. Kopischke2, S. Freiberg3, L. Gericke3), [email protected]

1left the group, 2administrator, 3university student

The ElectronicGeophysical Year

Page 2: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

Information in time and space

Problem: Digital information sustainability

Page 3: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

ISDC portal homepage: isdc.gfz-potsdam.de

Page 4: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

ISDC collaboration projects (1)

• CHAMP and GRACE satellite missions Orbit/Gravity + Magnetic/Electric Field + Atmosphere data

• GGP (Global Geodynamic Project) Superconducting Gravimeter + auxiliary data

• GNSS (Galileo testbed phase 1 project) GFZ Potsdam GPS ground station data

• GPS-PDR (Potsdam-Dresden-Reprocessing) GPS Orbit + Earth rotation parameter data

Page 5: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

ISDC collaboration projects (2)

• GGSP (Galileo Geodetic Service Provider) GPS time series, orbit, ERP, SLR, auxiliary data

• TerraSAR-X Orbit + Atmosphere/Ionosphere data

• ICGEM (International Centre for Global Earth Models)

Global Earth gravity models

• GGOS (Global Geodetic Observing System)

Page 6: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

Number and volume of data

• 288 different product types from different geoscience domains– 92 + 20* product types for public use

– 4 product types with extended rights for specific science team members

– 180 product types for internal use only

• 10,2 Terra Byte of data• 15,9 Million products

• 1576 national and international users and user groups• 19 data provider (scientific groups) GFZ + NASA

*TerraSAR-X (coming soon)

Page 7: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

ISDC portal – Product Types: isdc.gfz-potsdam.de/product_types

Page 8: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

ISDC user - country graph

1576 registered and active users and user groups

Page 9: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

ISDC user development graph

User development (2007-10-05)

1596 registered users and user groups

Page 10: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

Product data flow

Number of data files per time (2007-10-05)

Page 11: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

ISDC architecture schema

Page 12: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

http://gcmd.nasa.gov/User/difguide/difman.html

ISDC Metadata Standard = Parent DIF (V. 9.0) + Extended Child DIF(s)*

*in preparation

Metadata

ProducerData pump

User

Page 13: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

DIF metadata fields (extract)

Required fields

• Entry ID • Entry Title• Parameters (Science

Keywords)• ISO Topic Category• Data Center • Summary • Metadata_Name• Metadata_Version

• Personnel• Data Set Citation• Instrument• Platform• Temporal Coverage• Paleo-Temporal Coverage• Data Set Progress• Spatial Coverage• Location• Data Resolution• Project• Keyword (Ancillary Keyword)• Quality• Access Constraints• Use Constraints• Data Set Language• Originating Center• Distribution• Multimedia Sample• Reference• Discipline• Related URL• Parent DIF• …

Science keyword vocabulary5-level hierarchical classification

Page 14: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

GCMD's Science Keywords andAssociated Directory Keywords

Example for the structureof science keywords:

EARTH SCIENCE >

Solid Earth >

Geodetics/Gravity >

Satellite Orbits

Page 15: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

ISDC and GCMD1 DIF2

• Product type independent base DIF V.9 XML schema• Product type dependent child DIF XML schemata

• Product type referencing parent DIF V.9 XML document• Product (data file) referencing child DIF XML documents

(containing skinny DIF + data file describing data)

Generating GCMD DIF standardcompliant metadata documents

1NASA’s Global Change Master Directory, 2Directory Interchange Format

Page 16: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

ISDC Parent DIF V.9 XML document

CH-OG-3-RSO XML schema: base-dif.xsd

-<DIF: xmlns: ... “http://isdc.gfz-potsdam.de/xsd/base-dif.xsd”>

...

Page 17: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

ISDC Child DIF V.9 XML document

CH-OG-3-RSO+CTS-CHA_2000_219_10

<Parent_DIF>CH-OG-3-RSO</Parent_DIF>

+<Data_Parameters> XML schema: CH-OG-3-RSO.xsd

Page 18: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

CH-OG-3-RSO Data Parameters

Page 19: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

Mapping of standards

DIF <=> ISOXSL Transformation

DIF XML metadata file(DIF Version 9.0 XSD)

ISO 19115 XML metadata file

(ISO 19115/19139 XSD)

XSD: XML Schema DefinitionXSL: Extensible Stylesheet Language Source: http://en.wikipedia.org/wiki/Xslt

Page 20: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

ISDC main system components deployment

Page 21: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

ISDC storage management structure (in realization)

Page 22: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

Data lifecycle management (1)

• ISDC product philosophy (product = metadata + data)

• Providing the input of data via dedicated FTP directories for data providers (GFZ internal + external)

• Ensuring the sustainability of data by– Long-term archiving (storage of original and 1 copy)

– Online Product Archive (OPA)

• Filling and maintaining the ISDC product catalog using product type and product related metadata

Page 23: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

Data lifecycle management (2)

• GUI and API for product retrieval– Product type dependent retrieval forms

– Product browser

– Product request list file for bulk requests

• Providing the output of data via dedicated FTP directories for users (GFZ internal + external)

• Personalization (selecting favorite product types, subscription of Really Simple Syndication [RSS] feeds)

• User management, user forum, monitoring components

Page 24: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

Data lifecycle management (3)

Missing tasks:

• Harmonization of data• Tailoring of data • Merging of data• Aggregation of data• Removing of data

– Prediction data– Semi-finished products – Back up files

=> Enhancement of data interoperability

=> Providing data for other scientific domains

=> Keeping the operational status

Science drivendata review processis necessary!!!

Page 25: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

Service Oriented Architecture

Improving the interoperability of the ISDC portal system by using Service Oriented Architecture (SOA) techniques

Page 26: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

Interoperability via OGC CSW

Page 27: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

Networking data Sensor Web concept*

*OGC® Sensor Web Enablement: Overview And High Level Architecture.

+ virtual sensors (database, data archive)*

*extended by the author

Page 28: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

Mashup geoscientific data

Katrina Hurricane Tracking and Google Maps

Page 29: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

Science video portal

www.scivee.tv

Page 30: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

Techniques for Using Web 2.0*

• Dramatically lower the experience barrier

• Collect user contributions

• Enable formation of communities

• Become an open platform

• Provide self-evolving customer relationship management (CRM)

Differences in the way ofInteraction between dataprovider and users

*Dion Hinchcliffe’s Web 2.0 Blog

Page 31: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

Connecting different worlds

Committee driven developments• Metadata/Service

standards• Catalog Web

services• Data standards• Data/Application

services• SOA approach

Community driven developments• Mashups • Social software

• Networks

• (corporate) Blogs

• Wikis

• Chats/Messenger

• Social navigation• Tagging

Integration of sustainable Web techniques from both worlds

Web 3.0* *W. Wahlster (DFKI), acatech Symposium, Berlin, 31 May 07

Semantic web Web 2.0

Page 32: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

ISDC activities (1)

• Preparing the TerraSAR-X data management

• Improving metadata (management) interoperability– Using and developing the Directory Interchange Format

Standard Version 9.x– Changing from ASCII-based DIF to XML-based DIF

documents– Introduction of specific ISDC parent - child principle– Using XML database for parent DIF XML

documents

• Providing thematic catalog product search using ISDC catalog and user generated (Web 2.0) metadata ontologies

Page 33: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

ISDC activities (2)

• Developing interoperable catalog and data services for distributing and networking of metadata and data– Catalog Web Services OGC C-WS and OAI-PMH– Sensor Web Enablement (SWE)– Virtual Observatory (VO object oriented ontology methods, OWL)– Open data access protocol (OPeNDAP)– Evaluating Earth Science Mark-up Language (ESML)

• Providing information about the usage of data via user driven activities like tagging and social navigation data– Object oriented approach (relations between product types)– Different type of classification (project, scientific domain, application)– Networking different semantic layer based on metadata created by

data provider and users (Web 2.0 techniques)

Page 34: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

ISDC activities (3)

• Developing a service for publication of data via unique identifier (e.g. DOI, URN)

• ISDC has become part of CEOS International Directory Network (IDN) gateway to Earth science data and information maintained by NASA's GCMD

• Implementing framework S/W and preparing ISDC DIF metadata XSLT for ISO 19115 compliant CWS

• Providing information and access to data related e-print publications using OAI-PMH harvesting

Page 35: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

ISDC activities (4)

• Integration of science application services – Spatial visualization of retrieval result sets on maps

– Visualization of data products (e.g. profile data)

• Design of ISDC portal (version 3.x) using

• Active role in Global Geodetic Observing System project– System design

– Software development

– ISDC as active data and service provider

– ISDC is part of GEOSS

Page 36: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

Questions and Challenges

• How to improve interoperability concerning metadata and services?– Different metadata standards (DIF, ISO, Dublin Core, …)– OGC WCS standard but different metadata profiles (ISO

19115:profile xyz)– Web 2.0 community is providing new techniques …

• How to make data and data products available for other domains (science and non-science)?– Lack of Information about processing the data (input data

and models, processing software, constraints, original domain for the product)

– Lack of information about applications and domains where data are used

– Product tailoring (inter-domain knowledge is necessary)

Page 37: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

Challenges and Tasks

• Providing sufficient money for all what is necessary in order to guarantee a long-term sustainability of data

• Improving awareness and understanding of ESSI concepts for administration and high management level

• Helping scientists to take theirs responsibility for making data available to all interested communities

Understanding metadata and Web services concepts Describing the process of product generation in a way

scientists form other domains are able to understand it in order to use these data for their own purpuse

Providing data in different kind and formats (tailored products)

Overcome personal egoism in keeping data and just publishing results (most difficult task)

Page 38: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

Einsteinturm (1921)

[email protected] Thank you for your attention.

Page 39: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

Page 40: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

ESSI Goals of GGOS

• Promote the data and products of the services and become the collective voice for IAG;

• Collect and archive, through interoperable** services, geodetic observations, products, and models and ensure their consistency, reliability and accessibility;

• Identify a consistent set of geodetic products generated by the services and establish the requirements concerning the products’ accuracy, time resolution, and consistency;

**added by the author

Page 41: PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007 The ISDC concept for long-term sustainability of geoscience data and information B

PV 2007 CONFERENCE, Germany, Oberpfaffenhofen, October 9 - 11, 2007

ESSI challenges of GGOS

• App. 1000 different geodetic product types (covering all geodetic techniques and level of processing)

• > 100,000,000 data sets, > 100 TB of data (distributed all over the world)

• Complete heterogenous picture concerning the management of data by the different data providers (single scientist <=> world data center)

• Different data policy related to the access of data• No common understanding about the meaning, the

importance and the realization of IT-based geoscientific infrastructure