15
H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19 th 2006 / 1 Data Discovery and Basic Processing within the German Collaborative Climate Community Data and Processing Grid (C3Grid) Project Heinrich Widmann and Stephan Kindermann Model and Data / DKRZ / Max-Planck-Institute for Meteorology Hamburg, Germany GO-ESSP at LLNL Livermore, June 19th – 21st, 2006 C3Grid Home: www.c3grid.de

Heinrich Widmann and Stephan Kindermann

  • Upload
    shirin

  • View
    18

  • Download
    0

Embed Size (px)

DESCRIPTION

Data Discovery and Basic Processing within the German Collaborative Climate Community Data and Processing Grid (C3Grid) Project. Heinrich Widmann and Stephan Kindermann Model and Data / DKRZ / Max-Planck-Institute for Meteorology Hamburg, Germany. GO-ESSP at LLNL - PowerPoint PPT Presentation

Citation preview

Page 1: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 1

Data Discovery and Basic Processing within the German

Collaborative Climate Community Data and Processing Grid (C3Grid)

Project

Heinrich Widmann and Stephan KindermannModel and Data / DKRZ / Max-Planck-Institute for Meteorology

Hamburg, Germany

GO-ESSP at LLNLLivermore, June 19th – 21st, 2006

C3Grid Home: www.c3grid.de

Page 2: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 2

Overview

• C3Grid Background• Data Analysis Workflows• C3Grid Architecture and Interfaces• Data Discovery and Metadata in C3-

Grid• Data Information Service with

Lucene• Data Access and Preprocessing• Summary

Page 3: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 3

C3Grid Background

• C3Grid– Status : month 10 of 36 (phase 1)– is the earth system science community grid

within the German D-Grid initiative– D-Grid includes five further community grid

projects (AstroGrid, HEP-Grid, InGrid, MediGrid, TextGrid)– is a community driven grid

Goal is to develop a grid infrastructure appropriate for typical climate analysis workflows

Stepwise introduction and integration

Page 4: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 4

Requirements

• Metadata• Discovery• Data access(+

preprocessing)

• Security• Scheduling• Complex

processing

Grid technologies

ISO19115 / ISO19139 OAI-PMH + Lucenecommunity

webservice

Shibboleth Globus Toolkit 4 WS-GRAM

C3Grid Data Analysis Workflow Requirements

Page 5: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 5

C3Grid Architecture and Interfaces

Data

Discovery

Data Access and

Basic Processing

Page 6: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 6

C3Grid Data Discovery and Data Access

workspaceworkspace

workspace

data

Scheduling Data Management Service

Portal- Discovery

Data Access Web Service

• oids• time/space constraints• processing constraints

Data request

preprocessing

datadata

DB Files

Prop. Xml

Prop. Rel.

World Data Centers (Climate,Mare,RSAT), DWD

PIK,

IFM-Geomar,..

ISO 19115 /19139

Discovery

Use

Web server / OAI provider

OAI harvester

OAI-PMH

C3 Metadata catalog

workspace

resourceprovider

- Workflow composition

WS-GRAM

Grid Infrastructure Metadata

job submission

analysisjob

Page 7: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 7

<MD_Metadata http://www.isotc211.org/xxx">

<fileIdentifier ../>

<resourceConstraints ../>

<extent … spatial+temporal bounding box .. />

<contentInfo ..>

<attributeDescription ../>

<distributionInfo ..>

<DS_Series>

<composed_of>

<composed_of>

</MD_Metadata>

<MD_Metadata …. >

<MD_Metadata …. >

C3 ISO 19139 Metadata “Profile”

Data Items:

• gridded data

MetadataDatabase

“implicit” Metadata

Metadata

Metadata

ArchiveDatabase

PostprocessedExperiment Data• 2D single variabletime series

Post-processing

Raw Experiment Data• 3D multi variablefiles

Page 8: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 8

C3Grid Data Information Service with Lucene

full-text index

harvestingbackend

Web service frontend

Apache Axis+ Servlet Container

Apache Lucene

Portal

CERAPangaeaArchiv

Webserver

OAI-PMH

DIS

<MD_Metadata>...</MD_Metadata><MD_Metadata>...</MD_Metadata><MD_Metadata>...</MD_Metadata><MD_Metadata>...</MD_Metadata>

Field Term Documentidentifier ABC:123 2

identifier XYZ:223 6

identifier MI6:007 12

abstract region 2,23,112abstract pressure 3,23abstract humid 4,33,215,6,4

min_lat 030.43 1min_lat -023.23 2local file://path/ 4

inverted index

cache for ISO19139 documents

indexingof

selectedfields

[T. Langhammber, ZIB, Berlin]

Page 9: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 9

C3Grid Portal – Simple search

Page 10: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 10

C3Grid Portal – Advanced search

Page 11: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 11

C3Grid Data Access and Preprocessing

• Data access interface– Community-specific webservice (WSDL)– Solutions of the individual institutes will

be adapted to support the webservice•e.g. triggering of local data

processing tools – Support data base and file based

storage types– More detailed use metadata will be

provided during the extraction process with the data

Page 12: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 12

C3Grid Data Access/Preprocessing Interface

datadata

DB

Files

DataAccessWeb

service

Access

CDO processing

Stage file webservice request contains :• ObjectList of OIDs requested• CFList of standard names • Space constraints• Time constraints• Target directory• File format, e.g. netCDF or grib• …

SOAP-XMLStageFileRequest

Constraints

necessaryprocessing

CF standardnames

Local variable

names

data

Page 13: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 13

Summary

• Grid development is application driven• Discovery is based on

– ISO 19115/19139 based metadata catalog– Hierarchical, two-leveled metadata

scheme– Text based search in the catalog

• Data access is implemented by• Proprietary C3Grid data access interface

(webservice)

• Part of the use data are provided along with the data extraction

Page 14: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 14

The end

Page 15: Heinrich Widmann and Stephan Kindermann

H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19th 2006 / 15

C3Grid Architecture

DBMS/File

AvailableResources

Distributed Processing Resources

Distributed Processing Resources

DistributedData Archives

DistributedData Archives

MetaData

JobData

DMS (local)Site C3Grid Components

OAI / WS

Pre-Proc

Grid Workspace

ResourceScheduler

Base Data &Meta Data

File Management

ArchiveInterface

Data Transfer Service

DistributedGrid Infrastructure

• GT4 based• new Metadata-Service

DMS (global)WorkflowScheduler

ResourceInformation

Service

DIS

Staging

Search

Harvesting Task Execution

Matchmaking

User

Job Submission

User Interface API (Web Services) GUIMonitoring