30
CREST Challenges of getting ECMWF’s weather forecast model (IFS) to the Exascale ECMWF HPC in Meteorology workshop, 27-31 Oct 2014 George Mozdzynski, Willem Deconinck and Mats Hamrud 1

of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CRESTChallenges of getting ECMWF’s weather forecast model (IFS) to the Exascale

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014

George Mozdzynski, Willem Deconinck and Mats Hamrud

1

Page 2: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

Acknowledgements

Mats Hamrud ECMWF

Nils Wedi ECMWF

Willem Deconinck ECMWF

Jens Doleschal Technische Universität Dresden

Harvey Richardson Cray UK

Alistair Hart Cray UK

John Levesque Cray USA

Peter Messmer Nvidia

Jesus Labarta BSC

And my other partners in the CRESTA Project

The CRESTA project has received funding from the EU Seventh

Framework Programme (ICT-2011.9.13)

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014 2

Page 3: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

Acknowledgements/2

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014 3

An award of computer time was provided by the Innovative and Novel

Computational Impact on Theory and Experiment (INCITE) program.

This research used resources of the Argonne Leadership Computing

Facility, which is a DOE Office of Science User Facility supported under

Contract DE-AC02-06CH11357.

Page 4: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

Outline

CRESTA?

IFS model focus

IFS model evolution

Overlapping computation and communication

one-sided Fortran2008 coarray communications

DAG scheduling (OmpSs) study

Radiation in parallel development

OpenACC port of spectral transform test

Alternative dynamical core option

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014 4

Page 5: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

What is CRESTA - see http://cresta-project.eu/

• Collaborative Research into Exascale Systemware, Tools and Applications

• EU funded project, 3 years (started Oct 2011), ~ 50 scientists

• Six co-design vehicles (aka applications)

• ELMFIRE (CSC, ABO,UEDIN) - fusion plasma

• GROMACS (KTH) - molecular dynamics

• HEMELB (UCL) - biomedical

• IFS (ECMWF) - weather

• NEK5000 (KTH) & OPENFOAM (USTUTT, UEDIN) - comp. fluid dynamics

• Two tool suppliers

• ALLINEA (ddt : debugger ) & TUD (vampir : performance analysis )

• Technology and system supplier – CRAY UK

• Many Others (mostly universities)

• ABO, CRSA, CSC, DLR, JYU, KTH, UCL, UEDIN-EPCC, USTUTT-HRLS

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014 5

Page 6: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

IFS model: some background

10-15 day forecasts, Hi-Resolution and Ensembles

Spectral, semi-implicit, semi-Lagrangian

Long time step (600 seconds today for operational TL1279L137)

Joint development between ECMWF and Météo France

MPI+OpenMP parallelisation

Operational Hi-Res model 10-day forecast to complete in UNDER one

hour

Current and Future Goals

Improving forecast skill & initial conditions from data assimilation

Performance, Scalability

Portability, Reliability and Low Power

We all want these!!!!

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014 6

Page 7: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014 7

Grid spacing halves

about every 8 years

Page 8: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

IFS model: current and future model resolutions

(at start of CRESTA project)

IFS model

resolution

Envisaged

Operational

Implementation

Grid point

spacing (km)

Time-step

(seconds)

Estimated

number of

cores1

T1279 H2 2013 (L137) 16 600 2K

T2047 H 2014-2015 10 450 6K

T3999 NH3 2023-2024 5 240 80K

T7999 NH 2031-2032 2.5 30-120 1-4M

1 – a gross estimate for the number of ‘IBM Power7’ equivalent cores needed to achieve a 10 day

model forecast in under 1 hour (~240 FD/D), system size would normally be ~10 times this number.

2 – Hydrostatic Dynamics

3 – Non-Hydrostatic Dynamics

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014 8

Page 9: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

Compute/Communication Overlap strategies

(for hi-res IFS model)

• OpenMP parallel loops containing,

• Expensive computations

• PGAS one-sided communications

• Implemented in IFS model using Fortran2008 coarrays

• Need for coarray teams (in next Fortran standard)

• GASPI/GPI library calls could be used as an alternative to coarrays

• Graph based approaches (DAG), e.g. OmpSs from BSC

• Create computation tasks and communication tasks including the

dependencies e.g. !$OMP TASK IN(…) OUT(…)

• Run time system to process the graph as it evolves

• Explored with an IFS model kernel (collaboration with BSC)

• Extrae, Paraver, Mercurium & Nanos installed and used on XC-30

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014 9

Page 10: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

IFS coarray optimizations for [Tera,Peta,Exa]scale

Grid-point space

-semi-Lagrangian advection

-physics

-radiation

-GP dynamics

Fourier space

Spectral space

-horizontal gradients

-semi-implicit calculations

-horizontal diffusion

FTDIR

LTDIR

FTINV

LTINV

Fourier space

trmtoltrltom

trltogtrgtol

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014 10

Page 11: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

Semi-Lagrangian Transport

• Computation of a trajectory from each grid-point backwards in time, and

• Interpolation of various quantities at the departure and at the mid-point of the trajectory

x

arrival

departure

mid-point

MPI task partition

x

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014 11

Page 12: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

IFS: T799L91 (25 km)

SL-halos for task 11 / 256

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014

Coarray implementation

– no blue halo; only

columns (red) that are

needed are obtained

12

Page 13: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

• #1 in Nov 2012 Top500 list, and still #2 today

• CRESTA awarded access (INCITE14 programme)

• 7.5X peak perf. of ECMWF’s XC-30 clusters (CCA+CCB=3.6 Petaflops)

• Cray Linux Environment operating system

• Gemini interconnect

• 3-D Torus

• Globally addressable memory

• AMD Interlagos cores (16 cores per node)

• New accelerated node design using NVIDIA K20 “Kepler” multi-core accelerators

• 600 TB DDR3 mem. + 88 TB GDDR5 mem

ORNL’s “Titan” System

Titan Specs

Compute Nodes 18,688

Login & I/O Nodes 512

Memory per node 32 GB + 6 GB

# of NVIDIA K20 “Kepler”

processors14,592

Total System Memory 688 TB

Total System Peak

Performance27 Petaflops

Source (edited): James J. Hack, Director, Oak Ridge National Laboratory

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014 13

Page 14: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

“At the start of the CRESTA

project the maximum number of

cores that an IFS model had

run on was less than 10,000”

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014 14

Page 15: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

0

100

200

300

400

500

600

700

800

900

1000

1100

1200

1300

1400

0 10 20 30 40 50 60 70 80 90 100

Fore

cast

Day

s /

Day

Number of Cores (thousands)

COARRAYS=T

COARRAYS=F

Tc1023L137 10 km (~2015) IFS model scaling on TITAN

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014

No GPUs Used

15

Page 16: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

0

50

100

150

200

250

300

350

400

450

500

0 20 40 60 80 100 120 140 160 180 200 220

Fore

cast

Day

s /

Day

Number of Cores (thousands)

COARRAYS=T

COARRAYS=F

Tc1999L137 5 km (~2024) IFS model scaling on TITAN

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014

No GPUs Used

16

Page 17: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100 120 140 160 180 200 220 240

Fo

rec

as

t D

ays

/ D

ay

Number of Cores (thousands)

CRAY XC-30 TITAN

Tc3999L137 2.5 km (~2032) IFS model scaling

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014 17

No GPUs Used

Page 18: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

2.5 km (~2032) global IFS model EXTRAPOLATION

Est. 10 MW XC-30 for single 2.5 km model

Est. 100 MW – 200 MW for full XC-30 system

Only 5 MW is ‘affordable’

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014 18

Page 19: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

Co-design: IFS initialization study with vampir (& IFS gstats)

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014

T3999 IFS on HECToR using

1536 tasks x 16 threads

= 24,576 cores

Trace files 52 Gbytes

Reduced to 12 Gbytes by

turning off tracing (the purple bit)

19

Page 20: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

IFS kernel using OmpSs (on single node)

PhysicsFFTs

Comms

collaboration with Prof Jesus Labarta @ BSC

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014 20

Page 21: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

Spectral Transform test : OpenACC K20X GPU port

Grid-point space

Fourier space

Spectral space

FTDIR

LTDIR

FTINV

LTINV

Fourier space

trmtoltrltom

trltogtrgtol

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014

FFTs

Legendre

Transforms

21

Page 22: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

646.2 645.1

345.3 351.1

189.3

152.9

281.7 281.9

0

100

200

300

400

500

600

700

LTINV_CTL LTDIR_CTL FTDIR_CTL FTINV_CTL

mse

c p

er

tim

e-s

tep

Tc1999 5 km model Spectral Transform Compute Cost (120 nodes, 800 fields)

XC-30

TITAN

Work in progress: to improve FFT

performance

one K20X GPU peak DP perf. = 2.5

times that of a 24 core (Ivybridge)

CRAY XC-30 node at ECMWF

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014 22

Page 23: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

Lessons Learnt from transform test OpenACC port

OpenACC programming effort

Replaced ~20 OpenMP directives (high-level parallelisation)

By ~280 OpenACC directives (low-level parallelisation)

Most of the porting time spent on,

Strategy for porting existing IFS FFT interface

Replaced by calls to new cuda interface

Calls to NVIDIA cuFFT library routines

Performance issues

GPU/OpenACC: on each GPU, FFTs execute sequentially over latitudes (all

fields)

XC-30/OpenMP: on each node, FFTs execute in parallel over latitudes (all

fields)

FFT data layout important on GPU (fields,latitudes) v (latitudes,fields)

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014 23

Page 24: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

Radiation computations in parallel with model

Sequential Radiation in parallel

Cores

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014

Tim

e

24

Page 25: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

IFS alternative dynamical core option

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014

IFS T63 mesh

Nodes (points) are existing T63 gridExisting IFS EQ_REGIONS partitioning

25

Page 26: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

What do we want to support?

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014

Nearest neighbour communication algorithms, compact stencils

Edge-based Finite-Volume

scheme (MPDATA)

Element-based Finite-Element

scheme

Element-based Discontinuous Higher-

Order schemes

26

Page 27: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

Summary

• Many challenges exist for IFS applications to run at the Exascale

• First of these is for hardware vendors to build Exascale computers

that are both affordable (cost + power) and reliable

• Overlapping computations and communications must be considered

at the Petascale/Exascale (F2008 coarrays v GASPI/GPI or DAG?)

• MPI/OpenMP (high level parallelisation) is easier to program/maintain

than MPI/OpenACC (low level parallelisation)

• Ease of programming GPU technology should be easier in the future

when there is a single address space for GPU and conventional cores

• IFS model initialisation expensive at scale (single reader, grib_api)

• IFS applications (not just the model) will require substantial

development in the years to come to run efficiently on Petascale and

more so on Exascale computers

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014 27

Page 28: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014 28

Thank you for

your attention

Questions?

Page 29: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

!$OMP PARALLEL DO SCHEDULE(DYNAMIC,1) PRIVATE(JM,IM,JW,IPE,ILEN,ILENS,IOFFS,IOFFR)

DO JM=1,D%NUMP

IM = D%MYMS(JM)

CALL LTINV(IM,JM,KF_OUT_LT,KF_UV,KF_SCALARS,KF_SCDERS,ILEI2,IDIM1,&

& PSPVOR,PSPDIV,PSPSCALAR ,&

& PSPSC3A,PSPSC3B,PSPSC2 , &

& KFLDPTRUV,KFLDPTRSC,FSPGL_PROC)

DO JW=1,NPRTRW

CALL SET2PE(IPE,0,0,JW,MYSETV)

ILEN = D%NLEN_M(JW,1,JM)*IFIELD

IF( ILEN > 0 )THEN

IOFFS = (D%NSTAGT0B(JW)+D%NOFF_M(JW,1,JM))*IFIELD

IOFFR = (D%NSTAGT0BW(JW,MYSETW)+D%NOFF_M(JW,1,JM))*IFIELD

FOUBUF_C(IOFFR+1:IOFFR+ILEN)[IPE]=FOUBUF_IN(IOFFS+1:IOFFS+ILEN)

ENDIF

ILENS = D%NLEN_M(JW,2,JM)*IFIELD

IF( ILENS > 0 )THEN

IOFFS = (D%NSTAGT0B(JW)+D%NOFF_M(JW,2,JM))*IFIELD

IOFFR = (D%NSTAGT0BW(JW,MYSETW)+D%NOFF_M(JW,2,JM))*IFIELD

FOUBUF_C(IOFFR+1:IOFFR+ILENS)[IPE]=FOUBUF_IN(IOFFS+1:IOFFS+ILENS)

ENDIF

ENDDO

ENDDO

!$OMP END PARALLEL DO

SYNC IMAGES(D%NMYSETW)

FOUBUF(1:IBLEN)=FOUBUF_C(1:IBLEN)[MYPROC]

Fortran2008 coarray (PGAS) example

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014 29

Page 30: of getting ECMWF’s weather forecast model (IFS) to the ... · epcc |cresta Vi s u a l Id e n tity D e s ig n s CRE ST IFS model: some background 10-15 day forecasts, Hi-Resolution

epcc|crestaVisual Identity Designs

CREST

ECMWF HPC in Meteorology workshop, 27-31 Oct 2014

Tc3999 XC-30 GPU XC-30+GPU prediction

LTINV_CTL 1024.9 324.3 324.3

LTDIR_CTL 1178.6 279.8 279.8

FTDIR_CTL 428.3 342.3 342.3

FTINV_CTL 424.6 341.8 341.8

TRMTOL 752.5 4763.0 752.5

TRLTOM 407.9 4782.9 407.9

TRLTOG 1225.9 1541.9 1225.9

TRGTOL 401.5 1658.4 401.5

HOST2GPU n/a 655.4 655.4

GPU2HOST n/a 650.0 650.0

Total 5844.2 14034.4 5381.4

Tc3999 transform test performance comparison, including computation

and communication (simple 1D parallel, IFS uses 2D)

Using 400 nodes on both XC-30 and TITAN (GPU)

single time-step cost in millisecs

30