29
The U.S. DOE Accelerated Climate Modeling for Energy Project Robert Jacob April 22, 2015 Third Workshop on Coupling Technologies for Earth System Models

The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

The U.S. DOE Accelerated Climate Modeling for Energy Project

Robert Jacob April 22, 2015 Third Workshop on Coupling Technologies for Earth System Models

Page 2: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Argonne National Laboratory §  $675M  opera,ng  

budget  §  3,200  employees  §  1,450  scien,sts  and  

engineers  §  750  Ph.D.s  

Page 3: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

ACME in a nutshell… A new U.S. climate modeling effort led by the U.S. Department of Energy Office of Biological and Enviornmental Research

or…  

“A  collabora,on  among  the  DOE  na,onal  laboratories  (and  a  few  other  ins,tu,ons)  to  develop  and  apply  the  most  complete,  leading-­‐edge  climate  and  Earth  system  models  for  the  most  challenging  and  demanding  climate-­‐change  research  problems  and  DOE  mission  needs  while  efficiently  using  DOE  Leadership  Compu,ng  Facili,es.”        

Page 4: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Why  should  DOE  in  par3cular  be  interested?  

Page 5: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Leading DOE machines at our last meeting

•  IBM  Blue  Gene/Q  System  –  48  racks  –  49,152  nodes  –  786  TB  of  memory  –  Peak  flop  rate:  10  PF  

•  Cray  XK7  System  –  299,008  cores  –   18,688  NVIDIA  Kepler  

K20x  GPUs  –  710  TB  of  memory  –  Peak  flop  rate:  27  PF  

Page 6: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Upcoming DOE machines

•  Intel/Cray  Aurora  (ALCF)  –  50,000  Xeon  Phi  nodes  (“Knight’s  Hill”)  –  Approx  150PF  –  Produc,on  in  2019  

•  IBM/NVIDIA  Summit  (OLCF)  –  3,400  Power9  nodes  –  Mul,ple  NVIDIA  Volta  GPUs  per  

node  –  Approx  150PF  –  Produc,on  in  2018  

•  Intel/Cray  Cori  (NERSC)  –  9,300  Xeon  Phi  nodes  (“Knight’s  Corner”)  –  Approx  30PF  –  Produc,on  in  2017  

Page 7: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

ACME Project Goals •  a series of prediction and simulation experiments

addressing scientific questions and mission needs; •  a well documented and tested, continuously advancing,

evolving, and improving system of model codes that comprise the ACME Earth system model;

•  the ability to use effectively leading (and “bleeding”) edge computational facilities soon after their deployment at DOE national laboratories; and

•  an infrastructure to support code development, hypothesis testing, simulation execution, and analysis of results.

Page 8: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Climate Science Drivers for ACME

Water cycle: How do the hydrological cycle and water resources interact with the climate system on local to global scales?

Biogeochemistry: How do biogeochemical cycles

interact with global climate change? Cryosphere: How do rapid changes in cryospheric systems

interact with the climate system?

Page 9: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

ACME: let specific science questions drive development

Atmosphere: More accurate simulation of aerosols, clouds, wind, and precipitation

Land: More accurate simulation of terrestrial feedbacks from more complex carbon, nutrient and water cycles Ocean: Introduction of multi-resolution dynamics to more accurately simulate ocean heat uptake and

water masses Sea ice: Recast numerics to focus resolution in polar regions, and add icebergs, sea ice strength, and

snow physics Land ice: Addition of the first realistic, dynamic coupled ice-sheet model

Driver  

Ques,ons   Hypotheses   Experiments   Requirements  

Development  

Page 10: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

ACME Roadmap

Page 11: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

ACME size/start: •  No new funding: 6-7 existing DOE lab climate projects

combined in to one program •  8 U.S. national laboratories and 6 partner institutions •  85 researchers working ¼ time or more •  Total effort ~43 FTE •  Started from beta tag of CESM1.3

–  Using cpl7/MCT for coupler.

•  Started July 1, 2014. 3 years initially.

Page 12: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

ACME organization  

ACME  Council  Dave  Bader,  Chair  

Execu,ve  Commihee:  W.  Collins,  M.  Taylor    R.  Jacob,  P.  Jones,  P.  Rasch,  P.  Thornton,  D.  Williams  

Ex  Officio:  J.  Edmonds,  J.  Hack,  W.  Large,  E.  Ng        

Execu3ve  CommiAee      Chair:  D.  Bader        

Chief  Scien,st:  William  Collins  Chief  Computa,onal  Scien,st:  Mark  Taylor    

 

   Project  Engineer  Renata  McCoy  

   

     

Coupled  Simula3on    Group  Dave  Bader,  Bill  Collins,  

Mark  Taylor      

       

Coupled    Sim.  Task      Leaders  

       

     

Workflow      Group  

Dean  Williams  Katherine  Evans  

   

       

Workflow  Task      Leaders  

       

     

SoLware  Eng./Coupler  Group  

Robert  Jacob  Andrew  Salinger  

   

       

SE/Coupler  Task      Leaders  

       

     

Performance/  Algorithms    Group  

Patrick  Worley  Hans  Johansen  

   

       

Perf.  /  Alg.  Task      Leaders  

       

     

Land      Group  

Peter  Thornton  William  Riley  

   

       

Land  Task        Leaders  

       

     

Atmosphere    Group  

Philip  Rasch  Shaocheng  Xie  

   

       

Atmosphere  Task      Leaders  

       

     

Ocean/Ice    Group  

Philip  Jones  Todd  Ringler  

   

       

Ocean/Ice  Task      Leaders  

       

Page 13: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

ACME organization  

ACME  Council  Dave  Bader,  Chair  

Execu,ve  Commihee:  W.  Collins,  M.  Taylor    R.  Jacob,  P.  Jones,  P.  Rasch,  P.  Thornton,  D.  Williams  

Ex  Officio:  J.  Edmonds,  J.  Hack,  W.  Large,  E.  Ng        

Execu3ve  CommiAee      Chair:  D.  Bader        

Chief  Scien,st:  William  Collins  Chief  Computa,onal  Scien,st:  Mark  Taylor    

 

   Project  Engineer  Renata  McCoy  

   

     

Coupled  Simula3on    Group  Dave  Bader,  Bill  Collins,  

Mark  Taylor      

       

Coupled    Sim.  Task      Leaders  

       

     

Workflow      Group  

Dean  Williams  Katherine  Evans  

   

       

Workflow  Task      Leaders  

       

     

SoLware  Eng./Coupler  Group  

Robert  Jacob  Andrew  Salinger  

   

       

SE/Coupler  Task      Leaders  

       

     

Performance/  Algorithms    Group  

Patrick  Worley  Hans  Johansen  

   

       

Perf.  /  Alg.  Task      Leaders  

       

     

Land      Group  

Peter  Thornton  William  Riley  

   

       

Land  Task        Leaders  

       

     

Atmosphere    Group  

Philip  Rasch  Shaocheng  Xie  

   

       

Atmosphere  Task      Leaders  

       

     

Ocean/Ice    Group  

Philip  Jones  Todd  Ringler  

   

       

Ocean/Ice  Task      Leaders  

       

Page 14: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

ACME  Workflow  group:    Developing  a  comprehensive  approach  to  enable  large  scale  climate  science  

•  The end-to-end workflow integrates –  a model run simulation manager

(AKUNA) –  a data publishing/sharing/archiving

infrastructure (ESGF) –  a secure data transport (Globus) –  analysis/diagnostics/visualization tools

(UV-CDAT) –  provenance capture framework (ProvEn)

to improve reproducibility and tracking

Page 15: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

•  Performance  Monitoring  and  Analysis  •  Internode  and  I/O:    Load  balancing,  communica,on  algorithm  op,miza,on,  

computa,on/communica,on  overlap,  exploi,ng  addi,onal  concurrency      •  On-­‐node:  Accelerators,  threading,  memory  management,  programming  models  •  Next  Genera,on  Architectures:  NERSC  NESAP,  OLCF-­‐4  CAAR,  ALCF-­‐3  ESP  

•  Work  with  DOE  computer  scien,sts  in  other  projects  involving  performance  and  fast  numerical  libraries.  

ACME  Performance  Group:    Simula3on  Throughput  with  a  target  of  5  simulated  years/day  

Page 16: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

•  Best Practices: Common tools, methodologies adopted across ACME science/tech teams. –  Developer’s test suites –  Continuous Integration with Jenkins –  Repository set up and workflow

•  I/O: Parallel I/O at ACME model scales, increased use of in-situ diagnostics.

•  Modularity and configurability: Modular interfaces for all new models, runtime configurability

•  Coupling: Coupler performance, coupler/main design, and MCT!

ACME  SoLware  Engineering/Coupler  Group  

ACME  git  workflow  

Page 17: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Model Coupling Toolkit •  A set of Fortran90 datatypes and functions for building

parallel coupled models. –  With or without a coupler (which MCT doesn’t provide). –  All models are assumed to be parallel with MPI. –  2-sided send/recv model for moving data similar to MPI –  Support for online parallel interpolation using offline-calculated

weights. –  Model registry, decomposition descriptors (by numbering dofs),

distributed data type, communication tables. –  Functions for time accumulation, spatial averaging and merging.

Page 18: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Model Coupling Toolkit

•  At  last  mee,ng,  Feb,  2013,  this  was  s,ll  news  •  Now,  a  lihle  embarrassing  

MCT  website  as  of  4/22/15  

Page 19: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

A lot has been happening with MCT…

GitX  display  of  recent  MCT  repository  history  

Page 20: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Model Coupling Toolkit, v2.9

•  Moved repository from Argonne git server to github.com –  https://github.com/MCSclimate/MCT!

•  New Features to aid in studying Router initialization –  GSMap and MCTWorld print().

•  Print contents to ascii file for later reading –  Router init internal timers

•  Invoked with optional string argument to Router init. –  RouterTest.F90 - test program which reads in output GSMaps

and MCTWorld info and builds a Router. •  Will build on same number of procs and same decomposition as

original model. –  Great for creating coupling benchmarks!

Page 21: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Model Coupling Toolkit, v2.9 •  Support for NAG 6.0 •  Support for Mac builds •  Bug fixes including ones found by valgrind (many thanks to

NCAR’s Sean Santos for above)

•  mpi-serial 2.0 –  a small single-node MPI library. –  For programs that assume MPI and users who don’t want to install

a full MPI-library on their laptop/desktop. Doesn’t require mpirun. –  Not a stub-library MPI_Send/Recv really copies data. –  In 2.0

•  Many more MPI datatypes/functions added. •  Self-contained build system (autoconf)

–  Developed by ALCF’s Raymond Loy •  MCT 2.9 release is imminent.

Page 22: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Model Coupling Toolkit: Development process

•  Use gitworkflows on github. –  Anyone with a free github account can create a fork (git clone) of

the MCT repo. –  Make a branch and develop your cool new feature. –  Submit a “Pull Request” to have your feature included in master

•  A few developers make branches directly in MCT main repo. –  Change gets reviewed and tested (by me) before inclusion. –  Branch gets merged to master.

•  Discussion of bugs/proposed new features with github issues.

•  Documentation on github wiki.

Page 23: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Model Coupling Toolkit: Future plans

•  Near term –  Router initilization benchmarks based on ACME science cases

(0.25 degree atmosphere, 1/10th degree ocean, 10K cores) –  Improve scaling/timing of Router init and Rearranger/MCT_Send/

Recv communication for ACME cases on LCFs. Releasing any MCT improvements.

•  Long term –  MCT-MOAB

•  Talked about before. Mesh Oriented Database •  Tried using MOAB’s Fortran interface to build MCT datatypes/

functions. Not very satisfying. •  New plan (notion): MCT-MOAB in C with MOAB’s C/C++ interface

and Fortran interfaces defined using F2003 standard.

Page 24: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Exascale is coming

Top500  list  and  projected  performance.  (top500.org)  

#1:    Tianhae-­‐2  (33PF)  #2:  Titan  (DOE  OLCF  17PF)  #5:  Mira  (DOE  ALCF  8.5PF)  

Page 25: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Exascale impact on software (including coupling software) •  Massive in-node parallelism (exponential growth)

–  Programmer cannot hand-pick work granularity –  Deeper memory hierarchy –  “Communication is expensive, FLOPS are free”

•  Power as a managed system resource –  Turning on/off components –  Selecting algorithms for speed within power envelope –  Adjusting arithmetic precision –  Potentially adjusting fault protection

•  Dynamic parallelism and work decomposition •  Fault tolerance actively managed in software at many levels •  Architecture organization:

–  Heterogeneous cores –  Specialized functional units –  In-situ NVRAM

Page 26: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Coupling at Exascale: possible problems

•  Coupler (for Earth sub-system interfaces) is almost entirely 2D –  Limited amount of parallelism –  Also not a huge number of flops compared to full model –  Not a huge memory demand except for datatypes that

grow with number of cores. •  But coupler does lots of memory movement

–  Moving data between model’s native data type and coupler data type.

–  Moving data from one model’s processors to another’s

Page 27: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Coupling at Exascale: to do

•  More parallelism through more components executing concurrently –  Ensembles –  Different models

•  Reduce memory movement –  One data type across all model components? –  Co-located decompositions.

Page 28: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

Co-located decomposition

2

3 1

2

3 1

1 1 1

2 2 2

3 3 3

3

3

2

2 3 2

1

1 1 2  models:  unrelated  decomposi,ons   2  models:  related  decomposi,ons  

Page 29: The U.S. DOE Accelerated Climate Modeling for Energy Project · 2015-04-27 · across ACME science/tech teams. – Developer’s test suites – Continuous Integration with Jenkins

hhp://climatemodeling.science.energy.gov/projects/accelerated-­‐climate-­‐modeling-­‐energy  

More  informa3on  

hhp://www.mcs.anl.gov/mct/