33
Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

Embed Size (px)

Citation preview

Page 1: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

Recent Developments for Parallel CMAQ

• Jeff Young AMDB/ASMD/ARL/NOAA

• David Wong SAIC – NESCC/EPA

Page 2: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

Running in “quasi-operational” mode at NCEP

26+ minutes for a 48 hour forecast (on 33 processors)

AQF-CMAQ

Page 3: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

Some background: MPI data communication – ghost (halo) regions

Code modifications to improve data locality

Some vdiff optimization

Jerry Gipson’s MEBI/EBI chem solver

Tried CGRID( SPC, LAYER, COLUMN, ROW ) – tests

Indicated not good for MPI communication of data

Page 4: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

DO J = 1, N

DO I = 1, M

DATA( I,J ) = A( I+2,J ) * A( I,J-1 )

END DO

END DO

Ghost (Halo) Regions

Page 5: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

0 2

3 4 5

1

3 4

Exterior Boundary

Ghost Region

Stencil Exchange Data Communication Function

Horizontal Advection and Diffusion Data Requirements

Page 6: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

Architectural Changes for Parallel I/O

Tests confirmed latest releases (May, Sep 2003) not very scalable

Background:

Original Version

Latest Release

AQF Version

Page 7: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

Parallel I/O

Read and computation

Write and computation

2002 Release

2003 Release

AQF

Computation only

Write only

Read, write and computation

asynchronous

Data transfer by message passing

Page 8: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

Modifications to the I/O API for Parallel I/O

For 3D data, e.g. …

INTERP3 ( FileName, VarName, ProgName, Date, Time,

Data_Buffer )

StartCol, EndCol,

StartRow, EndRow,StartLay, EndLay,

Ncols*Nrows*Nlays,

XTRACT3 ( FileName, VarName,

StartLay, EndLay,

StartRow, EndRow,

StartCol, EndCol,

Data_Buffer )

Data_Buffer )

Date, Time,

Date, Time,

INTERPX ( FileName, VarName, ProgName,

WRPATCH ( FileID, VarID, TimeStamp, Record_No )

Called from PWRITE3 (pario)

time interpolation

spatial subset

Page 9: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

DRIVER read IC’s into CGRID begin output timestep loop

advstep (determine sync timestep) couple begin sync timestep loop SCIPROC X-Y-Z advect adjadv hdiff decouple vdiff DRYDEP cloud WETDEP gas chem (aero VIS) couple end sync timestep loop decouple write conc and avg conc CONC, ACONC

end output timestep loop

Standard (May 2003 Release)

Page 10: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

DRIVER set WORKERs, WRITER if WORKER, read IC’s into CGRID begin output timestep loop

if WORKER advstep (determine sync timestep) couple begin sync timestep loop SCIPROC X-Y-Z advect hdiff decouple vdiff cloud gas chem (aero) couple end sync timestep loop decouple MPI send conc, aconc, drydep, wetdep, (vis) if WRITER completion-wait: for conc, write conc CONC ; for aconc, write aconc, etc. end output timestep loop

AQF CMAQ

Page 11: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

Switch

Power3 Cluster (NESCC’s LPAR)

Memory

0 21 3 87 954 6 1211 141310 15

1211 1413 151211 1413 15

The platform (cypress00, cypress01, cypress02) consists of 3 SP-Nighthawk nodes

All cpu’s share user applications with file servers, interactive use, etc.

2X slower than NCEP’s

Page 12: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

Switch

Memory

0 321

Memory

0 321

Memory

0 321

Memory

0 321

Power4 p690 Servers (NCEP’s LPAR)

Each platform (snow, frost) is composed of 22 p690 (Regatta) servers

Each server has 32 cpu’s LPAR-ed into 8 nodes per server (4 cpu’s per node)

Some nodes are dedicated to file servers, interactive use, etc.

There are effectively 20 servers for general use (160 nodes, 640 cpu’s)

2 “colony” SW connectors/node

~2X performance of cypress nodes

Page 13: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

Internal

Network

“ice” Beowulf Cluster

Pentium 3 – 1.4 GHz

Memory

0 1

Memory

0 1

Memory

0 1

Memory

0 1

Memory

0 1

Memory

0 1

Isolated from outside network traffic

Page 14: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

Network

“global” MPICH Cluster

Pentium 4 XEON – 2.4 GHz

Memory

0 1

Memory

0 1

Memory

0 1

Memory

0 1

Memory

0 1

Memory

0 1

global global1 global2 global3 global4 global5

Page 15: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

RESULTS

5 hour, 12Z – 17Z and 24 hour, 12Z – 12Z runs

20 Sept 2002 test data set used for developing the AQF-CMAQ

Input Met from ETA, processed thru PRDGEN and PREMAQ

Domain seen on following slides

CB4 mechanism, no aerosols, Pleim’s Yamartino advection for AQF-CMAQ

166 columns X 142 rows X 22 layers at 12 km resolution

PPM advection for May 2003 Release

Page 16: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

AQF 2003 Release 24 5 24 5 run wall run wall run wall run wall 4 X X

8 X X X X X X

16 X X

cypress

32 X X X

4 X X

8 X X X X

16 X X

32 X X X X

snow

64 X X

ice 8 X X X

global 8 X X X

The Matrix

run

wall

= comparison of run times

= comparison of relative wall times

4 = 2 X 2

8 = 4 X 2

32 = 8 X 4

16 = 4 X 4

64 = 8 X 8

64 = 16 X 4

for main science processes

Page 17: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

24–Hour AQF vs. 2003 Release

AQF-CMAQ 2003 Release

Data shown at peak hour

Page 18: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

24–Hour AQF vs. 2003 Release

Less than 0.2 ppb diff between Almost 9 ppb max diff between Yamartino and PPMAQF on cypress and snow

Page 19: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

0100020003000400050006000700080009000

10000

AQF REL

cypress1cypress2cypress3cypress4snowiceglobal

Absolute Run Times

24 Hours, 8 Worker Processors

Sec

AQF vs. 2003 Release

Page 20: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

0

2000

4000

6000

8000

10000

12000

AQF REL

cypress1cypress2cypress3cypress4cypress5snow1snow2snow3

AQF vs. 2003 Release on cypress and AQF on snow

Absolute Run Times

24 Hours, 32 Worker Processors

Sec

Page 21: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

0

10

20

30

40

50

60

70

80

90

100 cypress1cypress2cypress3cypress4snow1snow2snow3ice1ice2global1global2

% of Slowest

AQF CMAQ – Various Platforms Relative Run Times

5 Hours, 8 Worker Processors

Page 22: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

0

10

20

30

40

50

60

70

80

90

100

cypress snow

4

4'

4''

4'''

8

8'

8''

8'''

16

16'

16''

16'''

32

32'

64

64'

64b

64b'

% of Slowest

Relative Run Times 5 Hours

AQF-CMAQ cypress vs. snow

Page 23: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

0

10

20

30

40

50

60

70

80

90

100

4 8 16 32 64

cypress1cypress2cypress3cypress4

snow1snow2snow3snow4

ice1ice2

global1global2

% of Slowest

Number of Worker Processors

AQF-CMAQ on Various Platforms

Relative Run Times 5 Hours

Page 24: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

0

5

10

15

20

25

30

35

xadv yadv zadv hdiff vdiff cloud chem

44'4''4'''88'8''8'''1616'16''16'''3232'

%

AQF-CMAQ on cypress

Relative Wall Times 5 Hours

Page 25: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

0

5

10

15

20

25

30

35

xadv yadv zadv hdiff vdiff cloud chem

44'4''88'8''1616'3232'6464'64b64b'

%

AQF-CMAQ on snow

Relative Wall Times 5 Hours

Page 26: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

0

5

10

15

20

25

30

35

xadv yadv zadv hdiff vdiff cloud chem

rel1rel2rel3rel4aqf1aqf2aqf3aqf4

Relative Wall Times

24 hr, 8 Worker Processors

AQF vs. 2003 Release on cypress

%

PPM

Yamo

Page 27: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

0

5

10

15

20

25

30

35

xadv yadv zadv hdiff vdiff cloud chem

rel1rel2rel3rel4aqf1aqf2aqf3aqf4snowsnow32

AQF vs. 2003 Release on cypress

Relative Wall Times

24 hr, 8 Worker Processors

Add snow for 8 and 32 processors

%

Page 28: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

0

5

10

15

20

25

30

x-r-l x-hppm y-c-l y-hppm

cypress1cypress2cypress3cypress4snow1

AQF Horizontal Advection

Relative Wall Times

24 hr, 8 Worker Processors

x-r-l x-row-loop y-c-l

-hppm

y-column-loop

kernel solver in each loop

%

Page 29: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

02468

101214161820

x-r-l x-hppm y-c-l y-hppm

cypress1cypress2cypress3snow1snow2snow3

AQF Horizontal Advection

Relative Wall Times

24 hr, 32 Worker Processors

%

Page 30: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

02468

101214161820

x-r-l x-hppm y-c-l y-hppm

cypress1cypress2cypress3snow1snow2snow3r-cypress1r-cypress2r-cypress3

AQF & Release Horizontal Advection

Relative Wall Times

24 hr, 32 Worker Processors

r- release

%

Page 31: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

Future Work

Add aerosols back in

TKE vdiff

Improve horizontal advection/diffusion scalability

Some I/O improvements

Layer variable horizontal advection time steps

Page 32: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

71

83

71

42 41

35

21 2042 41

35

3636

Aspect Ratios

142

166

Page 33: Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

11 10

35

36

Aspect Ratios

21 20

17

18