35
Parallel IO in CESM Jim Edwards [email protected] NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012 Parallel IO in the Community Earth System Model Jim Edwards John Dennis (NCAR) Ray Loy(ANL) Pat Worley (ORNL)

Parallel IO in CESM Jim Edwards [email protected] NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Embed Size (px)

Citation preview

Page 1: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in the Community Earth System Model

Jim Edwards John Dennis

(NCAR)Ray Loy(ANL)

Pat Worley (ORNL)

Page 2: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Page 3: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• Some CESM 1.1 Capabilities:– Ensemble configurations with multiple

instances of each component– Highly scalable capability proven to

100K+ tasks– Regionally refined grids– Data assimilation with DART

Page 4: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Prior to PIO• Each model component was

independent with it’s own IO interface

• Mix of file formats – NetCDF– Binary (POSIX)– Binary (Fortran)

• Gather-Scatter method to interface serial IO

Page 5: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Steps toward PIO• Converge on a single file format

– NetCDF selected • Self describing• Lossless with lossy capability (netcdf4

only)• Works with the current postprocessing tool

chain

Page 6: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• Extension to parallel• Reduce single task memory profile• Maintain single file decomposition

independent format• Performance (secondary issue)

Page 7: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• Parallel IO from all compute tasks is not the best strategy– Data rearrangement is complicated

leading to numerous small and inefficient IO operations

– MPI-IO aggregation alone cannot overcome this problem

Page 8: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Parallel I/O library (PIO)

• Goals:– Reduce per MPI task memory usage– Easy to use– Improve performance

• Write/read a single file from parallel application

• Multiple backend libraries: MPI-IO,NetCDF3, NetCDF4, pNetCDF, NetCDF+VDC

• Meta-IO library: potential interface to other general libraries

Page 9: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

CPL7 COUPLER

CISL LAND ICE MODEL

CAM ATMOSPHERIC

MODEL

CLM LAND MODEL

POP2 OCEAN MODEL

CICE OCEAN ICE MODEL

PIO

netcdf3pnetcdf

netcdf4

HDF5

VDC

MPI-IO

Page 10: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• Separation of Concerns• Separate computational and I/O

decomposition• Flexible user-level rearrangement• Encapsulate expert knowledge

PIO design principles

Page 11: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• What versus How– Concern of the user:

• What to write/read to/from disk?• eg: “I want to write T,V, PS.”

– Concern of the library developer:• How to efficiently access the disk?• eq: “How do I construct I/O operations so

that write bandwidth is maximized?”

• Improves ease of use• Improves robustness• Enables better reuse

Separation of concerns

Page 12: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Separate computational and I/O decomposition

computational decomposition

I/O decomposition

Rearrangement between computational and I/Odecompositions

Page 13: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• A single technical solution is not suitable for the entire user community:– User A: Linux cluster, 32 core job, 200

MB files, NFS file system– User B: Cray XE6, 115,000 core job,

100 GB files, Lustre file system

Different compute environment requires different technical solution!

Flexible user-level rearrangement

Page 14: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Writing distributed data (I)

+ Maximize size of individual io-op’s to disk- Non-scalable user space buffering- Very large fan-in large MPI buffer allocations

Correct solution for User A

Computational decompositionI/O decomposition

Rearrangement

Page 15: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Writing distributed data (II)

+ Scalable user space memory + Relatively large individual io-op’s to disk- Very large fan-in large MPI buffer allocations

Computational decomposition

Rearrangement

I/O decomposition

Page 16: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Writing distributed data (III)

+ Scalable user space memory+ Smaller fan-in -> modest MPI buffer allocations- Smaller individual io-op’s to disk

Correct solution for User B

Computational decomposition

Rearrangement

I/O decomposition

Page 17: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• Flow-control algorithm• Match size of I/O operations to stripe size

– Cray XT5/XE6 + Lustre file system– Minimize message passing traffic at

MPI-IO layer• Load balance disk traffic over all I/O nodes

– IBM Blue Gene/{L,P}+ GPFS file system

– Utilizes Blue Gene specific topology information

Encapsulate Expert knowledge

Page 18: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• Did we achieve our design goals?• Impact of PIO features

– Flow-control– Vary number of IO-tasks– Different general I/O backends

• Read/write 3D POP sized variable [3600x2400x40]

• 10 files, 10 variables per file, [max bandwidth]

• Using Kraken (Cray XT5) + Lustre filesystem– Used 16 of 336 OST

Experimental setup

Page 19: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

3D POP arrays [3600x2400x40]

Page 20: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

3D POP arrays [3600x2400x40]

Page 21: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

3D POP arrays [3600x2400x40]

Page 22: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

3D POP arrays [3600x2400x40]

Page 23: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

3D POP arrays [3600x2400x40]

Page 24: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

PIOVDCParallel output to a VAPOR Data Collection (VDC)

• VDC:– A wavelet-based, gridded data format supporting both progressive

access and efficient data subsetting

• Data may be progressively accessed (read back) at different levels of detail, permitting the application to trade off speed and accuracy

– Think GoogleEarth: less detail when the viewer is far away, progressively more detail as the viewer zooms in

– Enables rapid (interactive) exploration and hypothesis testing that can subsequently be validated with full fidelity data as needed

• Subsetting– Arrays are decomposed into smaller blocks that significantly improve

extraction of arbitrarily oriented sub arrays

• Wavelet transform– Similar to Fourier transforms– Computationally efficient: order O(n)– Basis for many multimedia compression technologies (e.g. mpeg4,

jpeg2000)

Page 25: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Other PIO Users

• Earth System Modeling Framework (ESMF)

• Model for Prediction Across Scales (MPAS)

• Geophysical High Order Suite for Turbulence (GHOST)

• Data Assimilation Research Testbed (DART)

Page 26: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Penn State University

26

Write performance on BG/L

April 26, 2010

Page 27: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

Penn State University

27

Read performance on BG/L

April 26, 2010

Page 28: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

100:1 Compression with coefficient prioritization10243 Taylor-Green turbulence (enstrophy field) [P. Mininni, 2006]

No compression Coefficient prioritization (VDC2)

Page 29: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

40963 Homogenous turbulence simulation Volume rendering of original enstrophy field and 800:1 compressed field

Data provided by P.K. Yeung at Georgia Tech and Diego Donzis at Texas A&M

Original: 275GBs/field 800:1 compressed: 0.34GBs/field

Page 30: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

F90 code generation

interface PIO_write_darray

! TYPE real,int

! DIMS 1,2,3

module procedure write_darray_{DIMS}d_{TYPE}

end interface

genf90.pl

Page 31: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

# 1 "tmp.F90.in"

interface PIO_write_darray

module procedure dosomething_1d_real

module procedure dosomething_2d_real

module procedure dosomething_3d_real

module procedure dosomething_1d_int

module procedure dosomething_2d_int

module procedure dosomething_3d_int

end interface

Page 32: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• PIO is opensource– http://code.google.com/p/parallelio/

Documentation using doxygen• http://web.ncar.teragrid.org/~dennis/pio_do

c/html/

Page 33: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• Thank you

Page 34: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• netCDF3– Serial– Easy to implement– Limited flexibility

• HDF5– Serial and Parallel– Very flexible– Difficult to implement– Difficult to achieve good performance

• netCDF4– Serial and Parallel – Based on HDF5– Easy to implement– Limited flexibility– Difficult to achieve good performance

Existing I/O libraries

Page 35: Parallel IO in CESM Jim Edwards jedwards@ucar.edu NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA Workshop on Scalable IO in Climate Models 27/02/2012

Parallel IO in CESM Jim [email protected]

NCAR, P.O. Box 3000, Boulder CO, 80307-3000 USA

Workshop on Scalable IO in Climate Models 27/02/2012

• Parallel-netCDF– Parallel– Easy to implement– Limited flexibility– Difficult to achieve good performance

• MPI-IO– Parallel – Very difficult to implement– Very flexible– Difficult to achieve good performance

• ADIOS– Serial and parallel– Easy to implement– BP file format

• Easy to achieve good performance

– All other file formats• Difficult to achieve good performance

Existing I/O libraries (con’t)