Upload
garry-manning
View
215
Download
0
Embed Size (px)
DESCRIPTION
Background Many NERC institutes now have HPC clusters Beowulf clusters with commodity hardware Common applications are ocean, atmosphere and climate models Pressure to justify spending and increase utilisation Sharing clusters helps increase utilisation Sharing clusters facilitates collaborations Running climate models on remote clusters in traditional way is not easy
Citation preview
Grid Remote Execution of Large Climate Models
(NERC Cluster Grid)
Dan Bretherton, Jon Blower and Keith Haines
Reading e-Science Centrewww.resc.rdg.ac.uk
Environmental Systems Science CentreUniversity of Reading, UK
Main themes of presentation
Sharing HPC clusters used for running climate models Why share clusters Grid approach to cluster sharing (NERC Cluster
Grid: UK Environmental Res. Council) G-Rex Grid middleware Large climate models as grid services Please also see demonstration and poster
Background Many NERC institutes now have HPC clusters
Beowulf clusters with commodity hardware Common applications are ocean, atmosphere and climate
models Pressure to justify spending and increase utilisation
Sharing clusters helps increase utilisation Sharing clusters facilitates collaborations
Running climate models on remote clusters in traditional way is not easy
Using remote clusters the traditional way
Input data
Output data
Local Remote
100 GB
SCP
SCP
SSH
Model input and outputModel setup, including source
code, work-flow scripts, model input and output
Computational challenges of Climate models
Typical requirements Parallel processing (MPI) with large number of
processors (usually 20-100) Each cluster needs high speed interconnection (e.g.
Myrinet or Infiniband) Long runs lasting several days Large volumes of output Large number of separate output files
NEMO Ocean Model (eg. European Operational Oceanogr.)
Main parameters of a typical 1° Global Assimilation run for a one year: Run with 40 processors 2-3 hours per year on Cluster
Outputs 300 MB in 700 separate files as diagnostics every 5-10 minutes
Output for a one year is roughly 20 GB, a total of 50000 separate files
50-year `Reanalysis` = 1Tb. Model automatically re-submitted as a new job each year
NERC Cluster Grid Includes 3 clusters so far... (plans for 11 clusters)
Reading (64 procs.), Proudman, (360 pr.), British Antarctic Survey (160 pr.)
Main aim Make it easier to use remote clusters for running large models
Key features Minimal data footprint on remote clusters Easy job submission and control Light-weight grid middleware (G-Rex) Load and performance monitoring (Ganglia) Security
Grid Remote EXecution G-Rex is light-weight grid middleware Implemented in Java using Spring framework G-Rex server is a Web application
Allows applications to be exposed as services Runs inside a servlet container
G-Rex client program, grexrun, behaves as if the remote service were actually running on the user's own computer
Security based on HTTP digest authentication
NEMO G-Rex service: Deployment scenario 1Client Server
NEMOlaunch scripts and forcing data(same every run)
Input and output via HTTP
Port 9092
G-Rexserver
Tomcat port open to client
Apache TomcatG-Rexclient
NEMO model setup, including source code, work-flow scripts, input data and output from all runs
NEMO G-Rex service: Deployment scenario 2Client Server
NEMOlaunch scripts and forcing data(same every run)
Input and output via HTTP
Port 9092
G-Rexserver
Apache TomcatG-Rexclient
NEMO model setup, including source code, work-flow scripts, input data and output from all runs
Advantages of G-Rex Output continuously transferred back to user
Job can be monitored easily No data transfer delay at end of run
Files deleted from server when no longer needed Prevents unnecessary accumulation of data Reduces data footprint of services
Work-flows can be created using shell scripts Very easy to install and use See Poster; Demonstration also available www.resc.reading.ac.uk