Upload
dana-franklin
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
1. Simulation
2. Archive and distribute
3. Analysis
4. Understanding
Heterogeneous HPC environments
Large community
SSH is king
No global view
Very complex workflow
etc, etc, etc
Problem Space
curiehybrid nodes
-q hybrid
curiehybrid nodes
-q hybrid
curiethin nodes-q standard
curiethin nodes-q standard
curielarge nodes
-q xlarge
curielarge nodes
-q xlarge
ESGFESGF
$HOME
$CCCSTOREDIR
$CCCWORKDIR
$SCRATCHDIR
HPSS : Robotic tapes
curiefront-end
curiefront-end
sourcessmall results IGCM_OUT :
MONITORING/ATLAS
temporary REBUILDIGCM_OUT :
files to be packedoutputs of post-proc jobs
IGCM_OUT : Packed results
Output, Analyse SE and TS
Small precious filesSaved space
File system
dods_cp
cp
ccc_hsm get
airainfront-end
airainfront-end
airainnodesairainnodes
cpESGFESGFdods_cp
Temporary space
Saved space
Non saved space
Space on tapes
computecompute
loginlogin
Visible from www
quotasquotas
quotasquotasquotasquotas
quotasquotas
TGCC in a nutshell
Job_EXP00Job_EXP00
Com
pute
curie
Job_EXP00Job_EXP00 Job_EXP00Job_EXP00
TGCC PeriodLength PeriodLength
$SCRATCHDIR/IGCM_OUT/.../REBUILD
$SCRATCHDIR/IGCM_OUT/XXX/Restart Debug
ESGF=TRUE/FALSE
ncrcat
PackFrequency
$CCCSTOREDIR/IGCM_OUT/XXX/Output
pack_outputpack_output
PackFrequency
$CCCSTOREDIR/IGCM_OUT/.../RESTART DEBUG
Post
curietarpack_restart
pack_debugpack_restartpack_debug
create_tscreate_ts
curiemonitoringmonitoring
Post
TimeSerieFrequency
TS et SE : $CCCSTOREDIR/IGCM_OUT/… dods/storeMONITORING et ATLAS : $CCCWORKDIR dods/work
create_secreate_se
SeasonalFrequency
Atlas/metricsAtlas/metrics
$SCRATCHDIR/IGCM_OUT/XXX/Output
Post
RebuildFrequency
rebuildrebuild
curie
MQ Cluster
MQ Apps
API
DB’s
IPS
L
IPS
L
IPSL User @ Browser | Command Line | Desktop
json
TGCC
MQ Relay
IDRIS
MQ Relay
CINES
MQ Relay
IPSL
MQ Relay
XXX
MQ Relay
msg msgmsgmsgmsg
Simulation monitoring & control
ESG-F integration: data publishing
ES-DOC integration: documentation publishing
PCMDI simulation metrics publishing
HPC diagnostics aggregation
Controlled vocabulary management
Push notifications: Web Socket, SMS, SMTP, MQ
Solution Space
d
Metrics Garden User Web Interface
Test Glecker like metrics on CMIP5 version of IPSL models
Metrics Garden
1. Simulation
2. Archive and distribute
3. Analysis
4. Understanding
How do we usually present ourselves• Prodiguer, the national level
– Coordination between french partners
– IPSL, CNRM-CERFACS, TGCC, IDRIS, CINES
– Accompanying the community
• IS-ENES, the European level
– Coordination between European partners
– Heavy workload on operational implementaiton of ESGF (the biggest source of climate models results)
– Strengthening the infrastructure
• ESGF, ES-DOC, international level.
– WGCM Infrastructure Panel– ESGF Governance (Executive Commitee)– ES-DOC Governance (Principal Investigator)
Many, many processes, many, many communities !
Interconnected communities, all needing access to (some of) the data!
Resolution
Complexity
Duration and ensemble size
Ehanced computing resources produce MORE DATA
Earth Observations
The Earth System Grid Federation (ESGF) is a multi-agency, international collaboration of persons and institutions working together to build an open source software infrastructure for the management and analysis of Earth Science data on a global scale
•Software development and project management: ANL, ANU, BADC, CMCC, DKRZ, ESRL, GFDL, GSFC, JPL, IPSL, ORNL, LLNL (lead), PMEL, …
•Operations: tens of data centers across Asia, Australia, Europe and North America
Worldwide distributed system
Storage evolution in 6 years time (from CMIP3 to CMIP5) : a factor x30
Worldwide distributed system
● Operational since 2011● Hundreds of users per month● Hundreds of To per month● About 10 000 registered users
CMIP3: centralizedCMIP5: distributed system 60 climate models 2 PB of data
ESGF France
- Cadre de travail des administrateurs de nœuds ESGF de France
- l'IPSL teste et valide les versions ESGF puis publie les procédures de déploiement détaillées et adaptées aux centres
- Partage des connaissances- Synchronisation des déploiements- Support de production- Réunions annuelles à l'IPSL
- La communauté s'inscrit dans la thématique Big Data du projet ANR Convergence ainsi que dans le groupe de travail dédié aux données du projet européen IS-ENES2.
http://forge.ipsl.jussieu.fr/prodiguer/wiki/ESGF-FR [email protected]
1. ESGF IWT Missions and Challenges
Release management Build, test and validate Provide installation tools Secure deployments Administrators training and support
Missions Challenges
Automated builds and tests Easier installation
Node set up in less than one hour
2. ESGF IWT RM Process
Release Management Process
The ESGF software stack development respects a release management process which ensures the quality of deliverables. Three distinct roles are identified:
• Developers push new features into the system•IWT Release Manager is responsible for code freeze, cutting releases and compilation• IWT Administrators are requested to test and validate release candidates
3. ESGF IWT Continuous BuildContinuous Build
The ESGF software stack project source code is hosted on github repositories. ‘Devel’ branches are continuously updated with new features by development teams. Github webhooks trigger the execution of the project compilation on a dedicated machine running Jenkins. Distribution binaries are then made available to the community for testing via a web server. Continuous build is useful to be aware of source code quality and inter project dependencies consistency in real time throughout development phases.
Developers
GitHub DevelBranches
Jenkins Continuous Build Server
Push Code
Triggers Builds Automatically
Binaries Web Server(wars, jars)
Publish Binaries if build completes Warning email if
build breaks
@
http://esgf-build.ipsl.upmc.fr/jenkins
http://esgf-build.ipsl.upmc.fr/builds
3. ESGF Integration Testing
Integration Testing
The ESGF Test Federation is based on vmware virtual machines. It is completely independent from the production federation and is used to run the esgf test suite which performs user’s perspective tests in order to validate release candidates as well as new installations or upgrades.
ESGF Test Infrastructure ESGF Test Suite
4. ESGF IWT Integration Testing
http://vesgint-data.ipsl.jussieu.fr
https://github.com/ESGF/esgf-test-suite
Python Nose - Test Framework Python Requests - HTTP Support Python Subprocess - System Execution Python Selenium - Browser Simulation Python Multiprocessing - Parallelisation
5. ESGF IWT Installer and Distribution Mirrors
Installer and Distribution Mirrors
Freshly cut and validated releases are followed by deployment into production. The installer helps each node administrator across the federation to pull the new binaries. Three synchronized distribution mirrors (1 master @IPSL, 1 slave @PCMDI, 1 slave @BADC) improve binaries availability and transfer delays as the installer identifies the fastest mirror.
Node Admins
get_fastest_mirror()
U.S.
U.K.
FRExecute
ESGF Installer
Calls
Original Timing:o(2) PB of requestedoutput from 20+modelling centresfinished early 2010!Actual Timing?Years late.
IPSL
CMIP3 : 35 To
CNRM-CERFACS
Our data perspective
Selon les contraintes de sécurité (e.g., centres de calculs), deux architectures possibles :
Datanode ESG-F + données sur le même réseau Exemple : CICLAD (IPSL - Jussieu)
Réseau + Datanode ESG-F + données
Indexnode ESG-F(distant ou non du Datanode)
Selon les contraintes de sécurité (e.g., centres de calculs), deux architectures possibles :
Datanode ESG-F dans une DMZExemple : TGCC-CCRT
DMZ + Datanode ESG-F
Réseau local + données
Indexnode ESG-F
Pas d’accès interactifFlux réseau à sens uniqueExport NFS read-only
Login
(1) Data Reference Syntax(2) {datanode} : Filesystem visible par le datanode(3) {project} : Projet (ex : CMIP5)
Vue d'ensemble