Earth System CoG and the Earth System Grid Federation: A Partnership for Improved Data Management...
If you can't read please download the document
Earth System CoG and the Earth System Grid Federation: A Partnership for Improved Data Management and Project Coordination NOAA ESRL Seminar April 8, 2014
Earth System CoG and the Earth System Grid Federation: A
Partnership for Improved Data Management and Project Coordination
NOAA ESRL Seminar April 8, 2014 Boulder, CO Sylvia Murphy
(NOAA/CIRES) ([email protected]), Luca Cinquini (JPL/NOAA),
Cecelia DeLuca (NOAA/CIRES), Allyn Treshansky (NOAA/CIRES)
Slide 2
Presentation Outline ESGF-CoG Integration Overview of ESGF ESGF
Architecture and Local Data Holdings Overview of CoG CoG
Capabilities (Live Demo) ESGF-CoG Integration Development Tasks
Upcoming Tutorials
Slide 3
ESGF-CoG Integration ESGF is an international, federated data
archive focused on climate projects. CoG is a collaboration
environment and hub to connect projects in the Earth sciences. CoG
is going to become the new front-end for ESGF. This will mean a
superior interface to ESGF users and data managers in terms of:
Overall usability Content management Model Intercomparison Project
(MIP) support Multi-project support Online collaboration tools
Reference: 3 rd Annual Earth System Grid Federation and Ultrascale
Visualization Climate Data Analysis Tools Face-to-Face Meeting
Report December
(http://aims-group.github.io/pdf/ESGF_UV-CDAT_Meeting_Report_March2014.pdf)
Slide 4
ESGF Overview The Earth System Grid Federation (ESGF) is a
multi-agency, international collaboration of people and
institutions working together to build an open source software
infrastructure for the management and analysis of Earth Science
data on a global scale Collaboration led by PCMDI, includes
institutions from several agencies from the U.S.A. (DOE, NASA,
NOAA), Canada, Europe (IS-ENES-2), Australia and Asia ESGF manages
and serves a global archive of climate data including: CMIP5 model
output (basis of IPCC-AR5) Possibly the largest modeling effort in
history: 40+ models, 25+ modeling centers, 17 countries, 2 PB of
data Obs4MIPs: selected observations from NASA and DOE especially
formatted for comparison and evaluation of CMIP5 models Ana4MIPs:
reanalysis data also formatted as CMIP5 model output CORDEX:
regional climate models, 2 PB of data TAMIP: atmospheric model
intercomparison GeoMIP: geo-engineering model intercomparison
DCMIP: atmospheric dynamical core model intercomparison WCRP
recommended use of ESGF infrastructure for all future MIPs
Slide 5
ESGF System Architecture ESGF is a system of distributed and
federated Nodes that interact dynamically through a Peer-To-Peer
(P2P) protocol Distributed: data and metadata are published, stored
and served from multiple centers (Nodes) Federated: Nodes
interoperate because of the adoption of common services, protocols
and APIs, and the establishment of mutual trust relationships
Dynamic: Nodes can join/leave the federation dynamically global
data and services will change accordingly A client (browser or
program) can start from any Node in the federation and discover,
download and analyze data from multiple locations as if they were
stored in a single central archive
Slide 6
ESGF Software Stack Software components can be grouped into 4
areas of functionality the Node flavors: Data Node: secure data
publication and access Index Node: metadata indexing and searching
Identity Provider: user authentication and group membership Compute
Node: analysis and visualization The ESGF software stack is based
on the integration of several applications, APIs: Open source
engines (Postgres, Tomcat, Solr) Geo-spatial servers (Thredds Data
Server, Live Access Server) Industry standards: OpenSSL, X509,
OpenID, REST, Custom ESGF software Node flavors can be installed in
various combinations depending on site needs, or to achieve higher
performance and scalability All ESGF software is Open Source (BSD
License) and freely available on GitHub
https://github.com/ESGF
Slide 7
ESGF ESRL Node NOAA/ESRL is hosting a full-featured ESGF Node:
http://hydra.fsl.noaa.gov/http://hydra.fsl.noaa.gov/ Node system
administrator: Doug Ohlhorst (big thanks!) Available data
collections: Ana4MIPs 20 th Century Reanalysis (Gil Compo, Cathy
Smith) DCMIP-2012 ( Atmospheric Dynamical Core Inter-Comparison
workshop at NCAR, led by Christiane Jablonowski), including NOAA
FIM model QED-2013 (Quantitative Evaluation of Downscaling workshop
at NCAR, sponsored by National Climate Projection and Prediction
NCPP- project) ESRL Node is part of ESGF federation: ESRL
collections can be accessed and discovered from other ESGF sites
Vice versa, a user can start from ESRL Node and find CMIP5 data
throughout ESGF Vertical mesh layout from FIM test 5-1 (idealized
tropical cyclone) conducted during DCMIP-2012.
Slide 8
Summary of ESGF Achievements ESGF represents a significant step
forward for the management and access of climate data world-wide:
Established the first global, distributed database of PB of climate
model output and observations Data can be discovered through a
federated faceted search or RESTful API Data download can be
scripted and executed by programs Users need register only once,
authenticate everywhere Architecture is scalable (for increased
model and instrument resolution and rates) and extensible (to other
formats, repositories and scientific domains) ESGF has established
an open source collaboration across agencies and international
boundaries Image courtesy of NCAR/CGD
Slide 9
Overview of CoG CoG is a collaboration environment and hub to
connect projects in the Earth sciences. It hosts software
development projects, model intercomparison projects (MIPS),
university short-courses, and workshops. It includes a configurable
search to data on ANY ESGF data node. It provides projects with a
wiki and customizable navigation to wiki content. Projects, files,
or pages can be made private. It contains an ontology for the
description and management of projects and provides a consolidated
look at this content across a projects network. It contains a file
server for documents and images. It provides services for Earth
system model metadata collection and display. Some of the 74
projects hosted on CoG include: NOAAs High Impact Weather
Prediction Project (HIWPP) Atmospheric Dynamical Core Model
Intercomparison Project (DCMIP) Reanalysis Data for CMIP5
(Ana4MIPs) Observational Data for CMIP5 (Obs4MIPs) National Unified
Operational Prediction Capability (NUOPC) National Climate
Predictions and Projections Platform (NCPP) Earth System
Documentation (ES- DOC) Earth System Prediction Capability (ESPC)
CoG Development Partners
Wiki and Collaboration Tools
https://www.earthsystemcog.org/projects/dcmip-2012/ The CoG layout
is color- coded: The right-hand side (dark yellow) is where
services (data, news, project connectivity) are located. The Upper
Navigation bar (dark teal) contains links to project-level
metadata. On the left (light teal) is an auto-generated navigation
system created when projects develop freeform content. The central
portion of the site is a wiki that allows projects to create their
own content. Screenshot of the CoG project workspace for the 2012
Dynamical Core Model Intercomparison (DCMIP) Workshop.
Slide 12
Customizable Data ServicesInterfacing with ESGF
https://www.earthsystemcog.org/search/downscaling-2013/ Search
widget can be turned on/off. Search can be narrowed to any ESGF
node and to any project (e.g. CMIP). Search facets can be created,
deleted, and grouped. Help text can be added to the top of the
search page. Search results can be saved to a Data Cart associated
with a user. Items in the Data Cart persist. Search results can be:
Forwarded to the Live Access Server (LAS) for simple visualization.
Downloaded directly via a WGET script. Associated with model
metadata if it exists.
Data Cart Items in the Data Cart can be sent individually or
collectively to LAS or WGET. The Data Cart is associated with a
user and not a project.
Slide 15
Show Metadata
Slide 16
Project Networks and the Project Browser
https://www.earthsystemcog.org/projects/nesii/ Projects in CoG are
arranged in a hierarchy of Parents, Peers, and Children. The
Project Browser displays the network and allows for inter-project
navigation. Projects can be tagged with keywords and projects can
be searched for using keywords.
Slide 17
CoG Schema https://www.earthsystemcog.org/projects/cog/ The CoG
schema contains classes to describe software development projects,
short- courses or meetings, and overall project coordination.
Projects select which metadata to display via a simple web form.
Project-level metadata is linked in standardized locations via the
upper navigation bar.
Slide 18
Project-level Metadata Roll-up
https://www.earthsystemcog.org/projects/es-doc-models/aboutus/
Management of information is a major problem in projects that
involve many sub-projects, partners, multiple leads, and many
resources. CoG acts as an index into project information that is
necessary for coordination and collaboration and enables people
responsible for overall coordination to quickly get consolidated
views of information. This example shows the Partners feature that
allows projects to list their project partners and include a logo
for each. Below the list for ED-DOC is a consolidated view of the
partners for ES-DOCs peer projects.
Slide 19
Resources
https://www.earthsystemcog.org/projects/es-doc-models/resources/
Resources are pointers to data, files, and URLs. Resources folders
can be created, moved, and deleted. Projects can turn on a set of
standardized Resources folders (e.g. Presentations, Minutes). Saved
data searches can be saved as a Resource. Each Resource can have a
private wiki-based notes page to facilitate discussions.
Slide 20
News https://www.earthsystemcog.org/projects/climatetranslator/
News is a way to send announcements across a project network. News
is visible in the news widget on any targeted project. News will be
added to social media (Google+, Facebook, Twitter, RSS) in a future
release.
Slide 21
Model Metadata Services The CoG Team is partnering with the
international Earth System Documentation (ES- DOC) project to
develop and use an Earth System Model metadata entry and view
capability. The ES-DOC Viewer is a lightweight JavaScript plugin
that will display any Common Information Model (CIM) record. The
ES-Questionnaire collects standardized CIM metadata through a
high-customizable web form. The output is saved to a community CIM
repository.
Slide 22
CoG-ESGF Future Work Requirements are coming from HIWPP, CMIP6,
the ESGF integration, and other projects. CMIP6 will include a set
of interconnected MIPs. Work is starting on the CMIP6 sites. CoG is
going to replace the ESGF web front end. Work should be completed
by the end of the summer 2014 with a production system in place by
the end of the year. CoG will be federated so that projects hosted
on one CoG-ESGF instance will be visible on others. CoG is being
modified to conform to Federal and DOC requirements. OpenID access
will be added to CoG, which will improve the security of the site.
The local CoG URL will be changed to a.gov address.
Slide 23
Webinar/Tutorials Fridays at 10am Mountain Time Contact
[email protected] for more
[email protected] Other group or individual
sessions available on demand. Scheduled Sessions: 11 Apr: HIWPP 02
May: ESPC
Slide 24
Questions? [email protected] CoG:
https://earthsystemcog.org/https://earthsystemcog.org/ ESRL ESGF
data node:
http://hydra.fsl.noaa.gov/esgf-web-fe/http://hydra.fsl.noaa.gov/esgf-web-fe/
PCMDI ESGF data node:
http://pcmdi9.llnl.gov/esgf-web-fe/http://pcmdi9.llnl.gov/esgf-web-fe/
JPL ESGF data node:
http://esg-datanode.jpl.nasa.gov/esgf-web-fe/http://esg-datanode.jpl.nasa.gov/esgf-web-fe/