28
1 GFDL GFDL Data Data Portal Portal Current Status, Achievements Current Status, Achievements and Future Development and Future Development NOAATECH-2006 K.Dixon, V.Balaji, S.Nikonov GFDL, Princeton

GFDL Data Portal

  • Upload
    olina

  • View
    43

  • Download
    0

Embed Size (px)

DESCRIPTION

GFDL Data Portal. Current Status, Achievements and Future Development. K.Dixon, V.Balaji, S.Nikonov GFDL, Princeton. NOAATECH-2006. History. Data Portal was launched in 1995 as simple ftp server. The idea and the term “Data Portal” arose 3 years ago. - PowerPoint PPT Presentation

Citation preview

Page 1: GFDL Data Portal

11

GFDLGFDL DataData PortalPortalCurrent Status, Achievements and Current Status, Achievements and

Future DevelopmentFuture Development

NOAATECH-2006

K.Dixon, V.Balaji, S.Nikonov GFDL, Princeton

Page 2: GFDL Data Portal

22

Data Portal was launched in 1995 as simple ftp Data Portal was launched in 1995 as simple ftp server.server.

The idea and the term “Data Portal” arose 3 The idea and the term “Data Portal” arose 3 years ago. years ago.

Originally it served data by occasional requests.Originally it served data by occasional requests. Now the main assets are IPCC data. Now the main assets are IPCC data.

HistoryHistory

NOAATECH-2006

Page 3: GFDL Data Portal

33

Common technical characteristicsCommon technical characteristics

SoftwareSoftware

Red Hat LinuxRed Hat Linux Apache Web Server Apache Web Server DODS Aggregation ServerDODS Aggregation Server THREDDSTHREDDS LAS ServerLAS Server GrADS-DODSGrADS-DODS

NOAATECH-2006

Page 4: GFDL Data Portal

44

HardwareHardware

Dell Power Edge 2650 machine

Dual Processor Intel Xeon 2.4 GHz

3 GB RAM

7 Dell Power Vault 220S with 7 Dell Power Vault 220S with 14 HDs in each, 19 TB total14 HDs in each, 19 TB total (expansion pending up to 35 TB) (expansion pending up to 35 TB)

Network bandwidth: internet – 9 Mbit/s internet-2 – 100 Mbit/s

NOAATECH-2006

Page 5: GFDL Data Portal

55

Main GFDL Page Data PortalModel description ….

CM2.0 Model

CM2.1 Model

Ocean Data Assimilation Experiments

LAS Server

Metadata

Metadata

Ocean Simulation

IPCC DATA (http protocol)

IPCC DATA(http protocol)

IPCC DATA (ftp protocol)

IPCC DATA(ftp protocol)

Flexible Modeling SystemDataset

WEB Site StructureWEB Site Structure

NOAATECH-2006

Page 6: GFDL Data Portal

66

Basic MetadataBasic Metadata

Model descriptionModel description Experiment descriptionExperiment description InstitutionInstitution Extra metadata for treating tripolar grids Extra metadata for treating tripolar grids

(including ferret scripts for their(including ferret scripts for their

visualization) visualization) Metadata is compliant with standard CFMetadata is compliant with standard CF Metadata accompanies each data fileMetadata accompanies each data file

NOAATECH-2006

Page 7: GFDL Data Portal

77

Dynamic data presentation chosen by user Dynamic data presentation chosen by user

Spatial/time subsampling with included metadataSpatial/time subsampling with included metadata

Defining on a fly new variables calculated by Defining on a fly new variables calculated by given formulagiven formula

ferret visualizationferret visualization

NOAATECH-2006

Basic features GFDL LAS serverBasic features GFDL LAS server

Page 8: GFDL Data Portal

88

General StatisticsGeneral Statistics 01-Oct-2004 to 01-Oct-200501-Oct-2004 to 01-Oct-2005

Total amount of CM2 Climate Model Data: 12 TBTotal amount of CM2 Climate Model Data: 12 TB More then 10000 NetCDF files, average file size: 1 GB More then 10000 NetCDF files, average file size: 1 GB Successful requests: ~62,000Successful requests: ~62,000 Average successful requests per day: ~200Average successful requests per day: ~200 Distinct files requested: 5,000Distinct files requested: 5,000 Distinct hosts served: ~850Distinct hosts served: ~850 Data transferred: 15 TBData transferred: 15 TB Average data transferred per day: ~42 GBAverage data transferred per day: ~42 GB Number of journal articles submitted that include Number of journal articles submitted that include

analyses of GFDL CM2 model output: > 100analyses of GFDL CM2 model output: > 100

NOAATECH-2006

Page 9: GFDL Data Portal

99

Current standard procedure Current standard procedure of publishing dataof publishing data

Climate Model Output Rewriter (CMOR) processingClimate Model Output Rewriter (CMOR) processing manual configuring for different models, experiments, variablesmanual configuring for different models, experiments, variables triggered manuallytriggered manually

Quality ControlQuality Control made by scientist, includes checking metadata, time ranges, values diapasons, made by scientist, includes checking metadata, time ranges, values diapasons,

etc.etc.

Splitting up CMORized, QC-ed data into small (<2GB) NCDF Splitting up CMORized, QC-ed data into small (<2GB) NCDF files and pushing them out of firewall to Data Portalfiles and pushing them out of firewall to Data Portal

manual configuring scripts doing this manual configuring scripts doing this starting scripts manuallystarting scripts manually

Preparing checksum report on Data PortalPreparing checksum report on Data Portal running cron started scriptrunning cron started script

Configuring Aggregation Server and LASConfiguring Aggregation Server and LAS made manually made manually

NOAATECH-2006

Page 10: GFDL Data Portal

1010

Current Data Portal workflowCurrent Data Portal workflow

NOAATECH-2006

Page 11: GFDL Data Portal

1111

Desirable Features of Data PortalDesirable Features of Data Portal

Relational Database storing metadata with Relational Database storing metadata with description of description of model components and model configurationmodel components and model configuration scenariosscenarios postprocessing (model output and CMOR) postprocessing (model output and CMOR) experimentsexperiments variablesvariables formulized rules of Quality Controlformulized rules of Quality Control data locations in Archivedata locations in Archive task schedulertask scheduler users and groups accountsusers and groups accounts

XML as data exchange formatXML as data exchange format for compliance with FMS Runtime Environment (FRE)for compliance with FMS Runtime Environment (FRE) working format of existing third party softwareworking format of existing third party software good fitted for hierarchical metadata descriptiongood fitted for hierarchical metadata description prevalent in world, easy to exchange with others Data Portalsprevalent in world, easy to exchange with others Data Portals

Publisher Control Center (PCC)Publisher Control Center (PCC) controls CMOR subsystemcontrols CMOR subsystem controls Data Publisher Managercontrols Data Publisher Manager controls data quality (QAC)controls data quality (QAC)

NOAATECH-2006

Page 12: GFDL Data Portal

1212

Desirable Features of Data PortalDesirable Features of Data Portal(continue)(continue)

Climate Model Output Rewriter (CMOR) subsystemClimate Model Output Rewriter (CMOR) subsystem prepares data consistently with specific project requirementsprepares data consistently with specific project requirements

Data Publisher ManagerData Publisher Manager transfers data to target destination in accordance to settings from DBtransfers data to target destination in accordance to settings from DB

Front-end Data Portal Software PackageFront-end Data Portal Software Package Configuration Manager (configures Aggregation Server and Data Portal Configuration Manager (configures Aggregation Server and Data Portal

Interface)Interface) Search Catalog Engine Search Catalog Engine Data Subsampling EngineData Subsampling Engine Data Computation Engine Data Computation Engine Data Visualization Data Visualization Data Delivery ManagerData Delivery Manager

NOAATECH-2006

Page 13: GFDL Data Portal

1313

Proposed functionality schema of ‘GFDL Data Proposed functionality schema of ‘GFDL Data Factory’Factory’

NOAATECH-2006

Page 14: GFDL Data Portal

1414

Standard scenario of functioning Model Data Factory Standard scenario of functioning Model Data Factory (ideal picture)(ideal picture)

Scientist builds model in existing GFDL FMS Runtime Environment System Scientist builds model in existing GFDL FMS Runtime Environment System (FRE) using available model components, datasets and forcing scenario.(FRE) using available model components, datasets and forcing scenario.

FRE puts metadata about built model, scenario, experiment into “curator” FRE puts metadata about built model, scenario, experiment into “curator” DB and runs experiment; DB and runs experiment;

Postprocessing subsystem extracts metadata about postprocessing plan Postprocessing subsystem extracts metadata about postprocessing plan from “curator” DB and executes it, and on finish puts metadata about from “curator” DB and executes it, and on finish puts metadata about processed experiment back into DB.processed experiment back into DB.

Data Publisher (DP) regularly checks “curator” DB for new experiments Data Publisher (DP) regularly checks “curator” DB for new experiments marked as “public” and if finds any invokes CMOR.marked as “public” and if finds any invokes CMOR.

CMOR goes to “curator” DB for metadata and processes needed data CMOR goes to “curator” DB for metadata and processes needed data following metadata instructions.following metadata instructions.

DP calls QAC and then transfers data to Data Portal storage.DP calls QAC and then transfers data to Data Portal storage.

Configuration Manager configures Aggregation Server and Data Portal Configuration Manager configures Aggregation Server and Data Portal Interface and puts records about new public data in “curator” DB.Interface and puts records about new public data in “curator” DB.

End of process, data is ready to go.End of process, data is ready to go.

NOAATECH-2006

Page 15: GFDL Data Portal

1515

Database Compartments:Database Compartments: Model Metadata CompartmentModel Metadata Compartment contains models’ descriptions, allows to build coupled model of needed configurationcontains models’ descriptions, allows to build coupled model of needed configuration

Variables CompartmentVariables Compartment List of all related physical variables List of all related physical variables

Workflow CompartmentWorkflow Compartment contains scenarios, experiments, institutions, projects and users infocontains scenarios, experiments, institutions, projects and users info

Postprocessing CompartmentPostprocessing Compartment defines postprocessing plan for conducting experimentdefines postprocessing plan for conducting experiment

Data Portal CompartmentData Portal Compartment contains info about experiment datacontains info about experiment data

Database ‘Database ‘curatorcurator ’’ designdesign

NOAATECH-2006

Page 16: GFDL Data Portal

1616

Interaction between compartmentsInteraction between compartments

NOAATECH-2006

Page 17: GFDL Data Portal

1818

Model Metadata CompartmentModel Metadata Compartment(in development)(in development)

Coupled_Models

Model_List

Component_Medias

Models

Experiments

Workflow Compartment

Variables

Variables Compartment

NOAATECH-2006

Page 18: GFDL Data Portal

1919

Data Samples from Model CompartmentData Samples from Model Compartment

Components_Medias Coupled_Models

Model_List

Models

NOAATECH-2006

Page 19: GFDL Data Portal

2020

Variables CompartmentVariables Compartment

Projects

Workflow Compartment

Variables Variable_Bundles

Variable_ListsVariable_List_Contents

Proj_Var_Names

NOAATECH-2006

Page 20: GFDL Data Portal

2121

Variable_Lists Variable_List_Contents

Data Sample from Variables CompartmentData Sample from Variables Compartment

Proj_Var_Names Variables

Variable_Bundles

NOAATECH-2006

Page 21: GFDL Data Portal

2222

Workflow Compartment Workflow Compartment Institutions GFDL_USERS

Experiment_Status

Realization

Projects

Experiments

Scenarios

NOAATECH-2006

Page 22: GFDL Data Portal

2323

Data Samples from Workflow CompartmentData Samples from Workflow Compartment

Experiments

Scenarios

NOAATECH-2006

Page 23: GFDL Data Portal

2424

Coupled_Models

Postprocessing CompartmentPP_Units Post_Proc

PP_Content

Data Samples from Postprocessing CompartmentData Samples from Postprocessing Compartment

PP_Units PP_Content

Variable_Lists

ProjectsGFDL_USERS

Average_Periods

NOAATECH-2006

Page 24: GFDL Data Portal

2525

Data Portal CompartmentData Portal Compartment

MissedData_Descriptors

Data_GridsData_Files

Variables

Experiments

Variable_Bundles

Coupled_Models

NOAATECH-2006

Page 25: GFDL Data Portal

2626

Data Samples from Data Portal CompartmentsData Samples from Data Portal Compartments

Data_Files

Data_Grids

MissedData_Descriptors

NOAATECH-2006

Page 26: GFDL Data Portal

2727

Curator DB on Data Portal streamCurator DB on Data Portal stream

Curator DB is already used on GFDL Data Portal.Curator DB is already used on GFDL Data Portal.

JSP technology with servlets on backend was appliedJSP technology with servlets on backend was applied

New data transferred onto Data Portal is automatically New data transferred onto Data Portal is automatically registered in Curator DB with all accompanied metadata.registered in Curator DB with all accompanied metadata.

It turned out the fastest way to search for data on Data It turned out the fastest way to search for data on Data Portal:Portal:

CM2.0CM2.0

CM2.1CM2.1

NOAATECH-2006

Page 27: GFDL Data Portal

2828

Another Aspects of Future DevelopmentAnother Aspects of Future Development

Set up model metadata schema standards in scientific Set up model metadata schema standards in scientific community and develop SQL metadata schema. community and develop SQL metadata schema.

Populate Curator with real metadata extracted from Populate Curator with real metadata extracted from GFDL models.GFDL models.

Conjugate Curator DB with GFDL FMS Modeling SystemConjugate Curator DB with GFDL FMS Modeling System

Customize LAS server to use the Curator DBCustomize LAS server to use the Curator DB

Design user interfaces Design user interfaces

NOAATECH-2006

Page 28: GFDL Data Portal

2929

ENDEND

Questions? Questions?

Thanks!Thanks!

NOAATECH-2006