March 9, 200910th International LCI Conference - HDF5 Tutorial1 Tutorial II: HDF5 and NetCDF-4 10 th...

Preview:

Citation preview

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 1

Tutorial II: HDF5 and NetCDF-4

10th International LCI Conference

Albert Cheng, Neil Fortner

The HDF Group

Ed Hartnett

Unidata/UCAR

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 2

Outline

8:30 – 9:30

Introduction to HDF5 data, programming models and tools

9:30 – 10:00

Advanced features of the HDF5 library

10:30 – 11:30

Advanced features of the HDF5 library (continued)

11:30 – 12:00

Introduction to Parallel HDF5

1:00 – 2:30

Introduction to Parallel HDF5 (continued) and Parallel I/O

Performance Study

3:00 – 4:30

NetCDF-4

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 3

Introduction to HDF5 Data, Programming Models

and Tools

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 4

What is HDF?

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 5

HDF is…

• HDF stands for Hierarchical Data Format• A file format for managing any kind of data• Software system to manage data in the format

• Designed for high volume or complex data• Designed for every size and type of system• Open format and software library, tools

• There are two HDF’s: HDF4 and HDF5• Today we focus on HDF5

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 6

Brief History of HDF

1987 At NCSA (University of Illinois), a task force formed to create an architecture-independent format and library:

AEHOO (All Encompassing Hierarchical Object Oriented format) Became HDF

Early NASA adopted HDF for Earth Observing System project 1990’s

1996 DOE’s ASC (Advanced Simulation and Computing) Project began collaborating with the HDF group (NCSA) to create “Big HDF” (Increase in computing power of DOE systems at LLNL, LANL and Sandia National labs, required bigger, more complex data files).

“Big HDF” became HDF5. 1998 HDF5 was released with support from National Labs, NASA,

NCSA

2006 The HDF Group spun off from University of Illinois as non-profit corporation

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 7• 7

Why HDF5?

In one sentence ...

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 8• 8

• Matter and the universe

• Weather and climate

• August 24, 2001 • August 24, 2002

• Total Column Ozone (Dobson)

• 60 385 610

• Life and nature

Answering big questions …

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 9• 9

… involves big data …

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 10

• LCI Tutorial

• 10

… varied data …

• Thanks to Mark Miller, LLNL

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 11• 11

• Contig Summaries

• Discrepancies

• Contig Qualities

• Coverage Depth

• Read quality

• Aligned bases

• Contig

• Reads

• Percent match

• Trace

• SNP Score

… and complex relationships …

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 12• 12

… on big computers …

• … and small computers …

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 13• 13

How do we…

• Describe our data? • Read it? Store it? Find it? Share it? Mine it? • Move it into, out of, and between computers and

repositories?• Achieve storage and I/O efficiency?• Give applications and tools easy access our data?

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 14

Solution: HDF5!

• Can store all kinds of data in a variety of ways

• Runs on most systems

• Lots of tools to access data

• Emphasis on standards (HDF-EOS, CGNS)

• Library and format emphasis on I/O efficiency and storage

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 15

A single platform with multiple uses• One general format• One library, with

• Options to adapt I/O and storage to data needs• Layers on top and below

• Ability to interact well with other technologies

• Attention to past, present, future compatibility

HDF5 Philosophy

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 16

Who uses HDF5?

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 17

Who uses HDF5?

• Applications that deal with big or complex data• Over 200 different types of apps• 2+million product users world-wide• Academia, government agencies, industry

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 18

NASA EOS remote sense data

• HDF format is the standard file format for storing data from NASA's Earth Observing System (EOS) mission.

• Petabytes of data stored in HDF and HDF5 to support the Global Climate Change Research Program.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 19

• File or other “storage”

• Virtual file I/O

• Library internals

Structure of HDF5 Library

• Object API (C, F90, C++, Java)

• Applications

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 20• 20

HDF Tools

- HDFView and Java Products

- Command-line utilities (h5dump, h5ls, h5cc, h5diff, h5repack)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 21

HDF5 Applications & Domains

• Simulation, visualization, • remote sensing…

• Examples: Thermonuclear simulations• Product modeling• Data mining tools• Visualization tools

• Climate models

• HDF-EOS CGNS ASC

• Storage

• File on parallel• file system

• File• Split metadata

• and raw data files

• User-defined• device

•?• HDF5 • format

• HDF5 Data Model & API• Stdio • Custom• Split Files • MPI

I/O

• Communities

• Virtual File Layer• (I/O Drivers)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 22

HDF5The Format

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 23

An HDF5 “file” is a container…

• lat | lon | temp• ----|-----|-----• 12 | 23 | 3.1• 15 | 24 | 4.2• 17 | 21 | 3.6

• palette

• …into which you can put your data objects

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 24

Structures to organize objects

• palette

• Raster image

• 3-D array

• 2-D array

• Raster image

• lat | lon | temp• ----|-----|-----• 12 | 23 | 3.1• 15 | 24 | 4.2• 17 | 21 | 3.6

• Table

• “/” (root)

• “/foo”

• “Groups”

• “Datasets”

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 25

HDF5 model

• Groups – provide structure among objects• Datasets – where the primary data goes

• Data arrays• Rich set of datatype options• Flexible, efficient storage and I/O

• Attributes, for metadata

•Everything else is built essentially from these parts.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 26

HDF5The Software

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 27

• HDF5 I/O Library

• Tools, Applications, Libraries

• HDF5 File

HDF5 Software

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 28

• “Virtual file layer” (VFL)

• HDF5 Application • Programming Interface

• Tools & Applications

• “HDF5 File”

• File system, MPI-IO, SAN, other layers

• Modules to adapt I/O to specific features of system, or do I/O in some special way.

• “File” could be on parallel system, in memory, collection of files, etc.

• Applications, tools use this API to create, read, write, query, etc.

• Power users (consumers)

• Most data consumers are here. Scientific/engineering applications.

• Domain-specific libraries/API, tools.

Users of HDF5 Software

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 29

HDF5 Data Model

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 30

HDF5 model (recap)

• Groups – provide structure among objects• Datasets – where the primary data goes

• Data arrays• Rich set of datatype options• Flexible, efficient storage and I/O

• Attributes, for metadata• Other objects

• Links (point to data in a file or in another HDF5 file)• Datatypes (can be stored for complex structures

and reused by multiple datatsets)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 31

HDF5 Dataset

• Data• Metadata• Dataspace

• 3

• Rank

• Dim_2 = 5

• Dim_1 = 4

• Dimensions

• Time = 32.4

• Pressure = 987

• Temp = 56

• Attributes

• Chunked

• Compressed

• Dim_3 = 7

• Storage info

• IEEE 32-bit float

• Datatype

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 32

HDF5 Dataspace

• Two roles• Dataspace contains spatial info about a dataset

stored in a file• Rank and dimensions• Permanent part of dataset

definition

• Dataspace describes application’s data buffer and data elements participating in I/O

• Rank = 2• Dimensions = 4x6

• Rank = 1• Dimensions = 12

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 33

HDF5 Datatype

• Datatype – how to interpret a data element• Permanent part of the dataset definition• Two classes: atomic and compound• Can be stored in a file as an HDF5 object (HDF5

committed datatype)• Can be shared among different datasets

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 34

HDF5 Datatype

• HDF5 atomic types include• normal integer & float• user-definable (e.g., 13-bit integer)• variable length types (e.g., strings)• references to objects/dataset regions• enumeration - names mapped to integers• array

• HDF5 compound types• Comparable to C structs (“records”)• Members can be atomic or compound types

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 35

• Record

• int8 • int4 • int16 • 2x3x2 array of float32• Datatype:

HDF5 dataset: array of records

• Dimensionality: 5 x 3

•3

•5

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 36

Special storage options for dataset

• Better subsetting access time; compressible; extendable;

• chunked

• Improves storage efficiency, transmission speed

• compressed

• Arrays can be extended in any direction

• extendable

• Metadata for Fred

• Dataset “Fred”• File A

• File B

• Data for Fred

• Metadata in HDF5 file, raw data in a binary file

• external

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 37

HDF5 Attribute

• Attribute – data of the form “name = value”, attached to an object by application

• Operations similar to dataset operations, but … • Not extendible • No compression or partial I/O

• Can be overwritten, deleted, added during the “life” of a dataset

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 38

HDF5 Group

• A mechanism for organizing collections of related objects

• Every file starts with a root group

• Similar to UNIXdirectories

• Can have attributes

• “/”

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 39

• “/” •X

• temp

• temp

• / (root)• /X• /Y• /Y/temp• /Y/bar/temp

Path to HDF5 object in a file

• Y

• bar

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 40

Shared HDF5 objects

• /A/P• /B/R• /C/R

• “/”• A • B• C

• P• R • R

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 41

HDF5 Data ModelExample

ENSIGHT

Automotive crash simulation

Automotive crash simulation

March 9, 2009 4210th International LCI Conference - HDF5 Tutorial

Automotive crash simulation

March 9, 2009 4310th International LCI Conference - HDF5 Tutorial

Automotive crash simulation

March 9, 2009 4410th International LCI Conference - HDF5 Tutorial

Solid modeling

March 9, 2009 4510th International LCI Conference - HDF5 Tutorial

Solid modeling

March 9, 2009 4610th International LCI Conference - HDF5 Tutorial

HDF5mesh

March 9, 2009 4710th International LCI Conference - HDF5 Tutorial

• April 28, 2008 • LCI Tutorial • 48

Mesh Example, in HDFView

March 9, 2009 4810th International LCI Conference - HDF5 Tutorial

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 49

HDF5 Software

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 50

• Tools & Applications

• HDF File

• HDF I/O Library

HDF5 software stack

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 51

• Virtual file I/O (C only)· Perform byte-stream I/O operations (open/close, read/write, seek)

· User-implementable I/O (stdio, network, memory, etc.)

• Virtual file I/O (C only)· Perform byte-stream I/O operations (open/close, read/write, seek)

· User-implementable I/O (stdio, network, memory, etc.)

• Library internals• Performs data transformations and other prep for I/O • Configurable transformations (compression, etc.)

• Library internals• Performs data transformations and other prep for I/O • Configurable transformations (compression, etc.)

Structure of HDF5 Library

• Object API (C, Fortran 90, Java, C++)· Specify objects and transformation properties· Invoke data movement operations and data transformations

• Object API (C, Fortran 90, Java, C++)· Specify objects and transformation properties· Invoke data movement operations and data transformations

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 52

Write – from memory to disk

• memory

• disk

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 53

Partial I/O

• (b) Regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array

• memory

• disk• (a) Hyperslab from a 2D array to the corner of a smaller 2D array

• memory

• disk

• Move just part of a dataset

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 54

• (c) A sequence of points from a 2D array to a sequence of points in a 3D array.

• memory

• disk

• (d) Union of hyperslabs in file to union of hyperslabs in memory.

Partial I/O

• memory

• disk

• Move just part of a dataset

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 55

Layers – parallel example

• Application

• Parallel computing system (Linux cluster)• Comp

utenode

• I/O library (HDF5)

• Parallel I/O library (MPI-I/O)

• Parallel file system (GPFS)

• Switch network/I/O servers

• Compute

node

• Compute

node

• Compute

node

• Disk architecture & layout of data on disk

• I/O flows through many layers from application to disk.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 56

• Virtual file I/O (C only)• Virtual file I/O (C only)

• Library internals• Library internals

Virtual I/O layer

• Object API (C, Fortran 90, Java, C++)• Object API (C, Fortran 90, Java, C++)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 57

Virtual file I/O layer

• A public API for writing I/O drivers• Allows HDF5 to interface to disk, memory, or a

user-defined device

• ???

• Custom

• File Family • MPI I/O • Core

• Virtual file I/O drivers

• Memory

• Stdio

• File Family

• File

• “Storage”

Applications & Domains

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 58

• Storage

• File on parallel• file system

• File• Split metadata

• and raw data files

• User-defined• device

•?• Syste

m memo

ry

• HDF5 format

• HDF5 data model & API

• Simulation, visualization, • remote sensing…

• Examples: Thermonuclear simulations• Product modeling• Data mining tools• Visualization tools

• Climate models

• Common domain-specific data models

• HDF5 virtual file layer (I/O drivers)

• MPI I/O• Multi• Stdio • Custom • Core

• HDF5 serial &

parallel I/O

• UDM • SAF • H5Part • HDF-EOS• IDL• Domain-

specific APIs • LANL • LLNL, SNL • Grids COTS • NASA

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 59

Portability & Robustness

• Runs almost anywhere• Linux and UNIX workstations• Windows, Mac OS X• Big ASC machines, Crays, VMS systems• TeraGrid and other clusters• Source and binaries available from http://www.hdfgroup.org/HDF5/release/index.html

• QA• Daily regression tests on key platforms• Meets NASA’s highest technology readiness level

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 60

Other Software

• The HDF Group• HDFView• Java tools• Command-line utilities• Web browser plug-in• Regression and performance testing software• Parallel h5diff

• 3rd Party (IDL, MATLAB, Mathematica, PyTables, HDF Explorer, LabView)

• Communities (EOS, ASC, CGNS)• Integration with other software (iRODS, OPeNDAP)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 61

Creating an HDF5 File with HDFView

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 62

• A • B

• “/” (root)

Example: Create this HDF5 File

• 4x6 array of integers

• Storm

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 63

Demo

• Demonstrate the use of HDFView to create the HDF5 file

• Use h5dump to see the contents of the HDF5 file• Use h5import to add data to the HDF5 file• Use h5repack to change properties of the stored

objects• Use h5diff to compare two files

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 64

Introduction to HDF5 Programming Model

and APIs

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 65

• Virtual file I/O API (C only)· Perform byte-stream I/O operations (open/close, read/write, seek)

· User-implementable I/O (stdio, mpi-io, memory, etc.)

• Virtual file I/O API (C only)· Perform byte-stream I/O operations (open/close, read/write, seek)

· User-implementable I/O (stdio, mpi-io, memory, etc.)

• Library internals• Performs data transformations and other prep for I/O • Configurable transformations (compression, etc.)

• Library internals• Performs data transformations and other prep for I/O • Configurable transformations (compression, etc.)

• Object API (C, Fortran 90, Java, C++)· Specify objects and transformation properties· Invoke data movement operations and data transformations

• Object API (C, Fortran 90, Java, C++)· Specify objects and transformation properties· Invoke data movement operations and data transformations

Structure of HDF5 Library (recap)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 66

Goals of HDF5 Library

• Provide flexible API to support a wide range of operations on data.

• Support high performance access in serial and parallel computing environments.

• Be compatible with common data models and programming languages.

• Because of these goals,• the HDF5 API is rich and large

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 67

Operations Supported by the API

• Create groups, datasets, attributes, linkages• Create complex data types• Assign storage and I/O properties to objects• Perform complex subsetting during read/write• Use variety of I/O “devices” (parallel, remote, etc.)• Transform data during I/O• Query about file and structure and properties• Query about object structure, content, properties

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 68

Characteristics of the HDF5 API

• For flexibility, the API is extensive • 300+ functions

• This can be daunting… but there is hope• A few functions can do a lot• Start simple • Build up knowledge as more features are needed

• Library functions are categorized by object type• “H5Lite” API supports basic capabilities

• Victronix Swiss Army Cybertool 34

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 69

The General HDF5 API

• Currently C, Fortran 90, Java, and C++ bindings. • C routines begin with prefix H5?

? is a character corresponding to the type of object the function acts on

• Example APIs:

• H5D : Dataset interface e.g., H5Dread • H5F : File interface e.g., H5Fopen

• H5S : dataSpace interface e.g., H5Sclose

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 70

Compiling HDF5 Applications

• h5cc – HDF5 C compiler command• Similar to mpicc

• h5fc – HDF5 F90 compiler command• Similar to mpif90

• h5c++ – HDF5 C++ compiler command

To compile:% h5cc h5prog.c% h5fc h5prog.f90

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 71

Compile option: -show

-show: displays the compiler commands and options without executing them

• % h5cc –show Sample_c.c• gcc -I/home/packages/hdf5_1.6.6/Linux_2.6/include -UH5_DEBUG_API • -DNDEBUG -I/home/packages/szip/static/encoder/Linux2.6-gcc/include • -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -

D_FILE_OFFSET_BITS=64 • -D_POSIX_SOURCE -D_BSD_SOURCE -std=c99 -Wno-long-long -O • -fomit-frame-pointer -finline-functions -c Sample_c.c

• gcc -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions • -L/home/packages/szip/static/encoder/Linux2.6-gcc/lib Sample_c.o • -L/home/packages/hdf5_1.6.6/Linux_2.6/lib

/home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5_hl.a /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5.a

• -lsz -lz -lm -Wl,-rpath -Wl,/home/packages/hdf5_1.6.6/Linux_2.6/lib

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 72

General Programming Paradigm

• Properties of object are optionally defined • Creation properties• Access property lists• Default values used if none are defined

• Object is opened or created• Object is accessed, possibly many times• Object is closed

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 73

Order of Operations

• An order is imposed on operations by argument dependencies

For Example:

A file must be opened before a dataset -because-

the dataset open call requires a file handle as an argument.

• Objects can be closed in any order.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 74

HDF5 Defined Types

• For portability, the HDF5 library has its own defined types:

• hid_t: object identifiers (native integer)• hsize_t: size used for dimensions• (unsigned long or unsigned long long)• hssize_t: for specifying coordinates and sometimes for• dimensions (signed long or signed long long)• herr_t: function return value

• hvl_t: variable length datatype

• For C, include hdf5.h in your HDF5 application.

Example: Create this HDF5 File

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 75

• A • B

• “/” (root)

• 4x6 array of integers

Example: Step by Step

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 76

• A

• 4x6 array of integers

• B

• “/” (root)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 77

• “/” (root)

Example: Create a File

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 78

Steps to Create a File

1. Decide any special properties the file should have • Creation properties, like size of user block• Access properties, such as metadata cache size

2. Create property lists, if necessary

3. Create the file

4. Close the file and the property lists, as needed

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 79

Code: Create a File

• • hid_t file_id; • • file_id = H5Fcreate("file.h5",H5F_ACC_TRUNC, • H5P_DEFAULT,H5P_DEFAULT);

• H5F_ACC_TRUNC flag – removes existing file• H5P_DEFAULT flags – create regular UNIX file and access• it with HDF5 SEC2 I/O file driver

Example: Add a Dataset

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 80

• 4x6 array of integers

• A• “/” (root)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 81

Dataset Components

• Data• Metadata• Dataspace

• 3

• Rank

• Dim_2 = 5

• Dim_1 = 4

• Dimensions

• Time = 32.4

• Pressure = 987

• Temp = 56

• Attributes

• Chunked

• Compressed

• Dim_3 = 7

• Storage info

• IEEE 32-bit float

• Datatype

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 82

Dataset Creation Property List

• Dataset creation property list: • information on how to store

data in a file

• Chunked

• Chunked & compressed

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 83

Steps to Create a Dataset

1. Define dataset characteristics• Dataspace – 4x6 Datatype – integer• Properties (if needed)

2. Decide where to put it – “root group”• Obtain location identifier

3. Decide link or path – “A”4. Create link and dataset in file5. (Eventually) Close everything • A

• “/” (root)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 84

• 1 hid_t file_id, dataset_id, dataspace_id; • 2 hsize_t dims[2];• 3 herr_t status; • • 4 file_id = H5Fcreate (”file.h5", H5F_ACC_TRUNC, • H5P_DEFAULT, H5P_DEFAULT); • • 5 dims[0] = 4;• 6 dims[1] = 6;• 7 dataspace_id = H5Screate_simple (2, dims, NULL);

• 8 dataset_id = H5Dcreate(file_id,”A",H5T_STD_I32BE,

• dataspace_id, H5P_DEFAULT);

• 9 status = H5Dclose (dataset_id); • 10 status = H5Sclose (dataspace_id); • 11 status = H5Fclose (file_id);

Code: Create a Dataset

• Terminate access to dataset, dataspace, file

• Create a dataspace • ra

nk

• current dims

• Create a dataset

• dataspace

• datatype

• property list (default)

• pathname

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 85

• A • B

• “/” (root)

Example: Create a Group

• 4x6 array of integers

• file.h5

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 86

Steps to Create a Group

1. Decide where to put it – “root group”• Obtain location identifier

2. Decide link or path – “B”3. Create link and group in file

• Specify number of bytes to store names of objects to be added to group (as a hint) – or use default.

4. (Eventually) Close the group.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 87

Code: Create a Group

• hid_t file_id, group_id; • ...• /* Open “file.h5” */ • file_id = H5Fopen(“file.h5”, H5F_ACC_RDWR,• H5P_DEFAULT);

• /* Create group "/B" in file. */ • group_id = H5Gcreate(file_id, "/B", H5P_DEFAULT,

H5P_DEFAULT);

• /* Close group and file. */ • status = H5Gclose(group_id); • status = H5Fclose(file_id);

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 88

HDF5 Information

HDF Information Centerhttp://www.hdfgroup.org

HDF Help email addresshelp@hdfgroup.org

HDF users mailing listsnews@hdfgroup.org

hdf-forum@hdfgroup.org

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 89

Questions?

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 90

Introduction to HDF5 Command-line Tools

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 91

HDF5 Command-line Tools

• Readers • h5dump, h5diff, h5ls• h5stat, h5check (new in release 1.8)

• Writers• h5import, h5repack, h5repart, h5jam/h5unjam• h5copy, h5mkgrp (new in release 1.8)

• Converters• h4toh5, h5toh4, gif2h5, h52gif

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 92

h5dump

h5dump: exports (dumps) the contents of an HDF5 file Multiple output types

ASCII binary XML

Complete or selected file content Object header information (the structure) Attributes (the metadata) Datasets (the data)

All dataset values Subsets of dataset values

Properties (filters, storage layout, fill value) Specific objects: groups/ datasets/ attributes / named datatypes /

soft links h5dump –help

Lists all option flags

Example: h5dump

HDF5 "Sample.h5" {GROUP "/" { GROUP "Floats" { DATASET "FloatArray" { DATATYPE H5T_IEEE_F32LE DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) } DATA { (0,0): 0.01, 0.02, 0.03, (1,0): 0.1, 0.2, 0.3, (2,0): 1, 2, 3, (3,0): 10, 20, 30 } } } DATASET "IntArray" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 5, 6 ) / ( 5, 6 ) } DATA { (0,0): 0, 1, 2, 3, 4, 5, (1,0): 10, 11, 12, 13, 14, 15, (2,0): 20, 21, 22, 23, 24, 25, (3,0): 30, 31, 32, 33, 34, 35, (4,0): 40, 41, 42, 43, 44, 45 } }}}

No options: “All” contents to standard out

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 93

• % h5dump Sample.h5

h5dump - object header information

HDF5 "Sample.h5" {

GROUP "/" {

GROUP "Floats" {

DATASET "FloatArray" {

DATATYPE H5T_IEEE_F32LE

DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) }

}

}

DATASET "IntArray" {

DATATYPE H5T_STD_I32LE

DATASPACE SIMPLE { ( 5, 6 ) / ( 5, 6 ) }

}

}

}

-H option: Object header information

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 94

• % h5dump –H Sample.h5

h5dump – specific dataset

HDF5 "Sample.h5" {

DATASET "/Floats/FloatArray" {

DATATYPE H5T_IEEE_F32LE

DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) }

DATA {

(0,0): 0.01, 0.02, 0.03,

(1,0): 0.1, 0.2, 0.3,

(2,0): 1, 2, 3,

(3,0): 10, 20, 30

}

}

-d dataset option: Specific dataset

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 95

• % h5dump –d /Floats/FloatArray Sample.h5

h5dump – dataset values to file

HDF5 "Sample.h5" {

DATASET "/IntArray" {

DATATYPE H5T_STD_I32LE

DATASPACE SIMPLE { ( 5, 6 ) / ( 5, 6 ) }

DATA {

}

}

}

-o file option: Dataset values output to file

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 96

• % h5dump –o Ofile –d /IntArray Sample.h5

• (0,0): 0, 1, 2, 3, 4, 5,• (1,0): 10, 11, 12, 13, 14, 15,• (2,0): 20, 21, 22, 23, 24, 25,• (3,0): 30, 31, 32, 33, 34, 35,• (4,0): 40, 41, 42, 43, 44, 45

• % cat Ofile

• -y option: Do not output array indices with data values

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 97

h5dump – binary output

-b FORMAT option: Binary output, FORMAT can be: MEMORY

Data exported with datatypes matching memory on system where h5dump is run.

FILE Data exported with datatypes matching those in HDF5 file

being dumped.LE

Data exported with pre-defined little endian datatype.BE

Data exported with pre-defined big endian datatype.

• Typically used with –d dataset -o outputFile options Allows data values to be exported for use with other applications. When –b and –d used together, array indices are not output.

h5dump – binary output

0000000 000 000 000 000 000 000 000 001 000 000 000 002 000 000 000 003

0000020 000 000 000 004 000 000 000 005 000 000 000 012 000 000 000 013

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 98

• % h5dump –b BE –d /IntArray -o OBE Sample.h5

• % od –b OBE | head -2

• % h5dump –b LE –d /IntArray -o OLE Sample.h5

• % od –b OLE | head -2• 0000000 000 000 000 000 001 000 000 000 002 000 000 000 003 000 000 000• 0000020 004 000 000 000 005 000 000 000 012 000 000 000 013 000 000 000

• % h5dump –b MEMORY –d /IntArray -o OME Sample.h5

• % od –b OME | head -2• 0000000 000 000 000 000 001 000 000 000 002 000 000 000 003 000 000 000• 0000020 004 000 000 000 005 000 000 000 012 000 000 000 013 000 000 000

h5dump – properties information

HDF5 "Sample.h5" {GROUP "/" { GROUP "Floats" { DATASET "FloatArray" { DATATYPE H5T_IEEE_F32LE DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) } STORAGE_LAYOUT { CONTIGUOUS SIZE 48 OFFSET 3696 } FILTERS { NONE } FILLVALUE { FILL_TIME H5D_FILL_TIME_IFSET VALUE 0 } ALLOCATION_TIME { H5D_ALLOC_TIME_LATE } …

-p option: Print dataset filters, storage layout, fill value

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 99

• % h5dump –p –H Sample.h5

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 100

h5import

h5import: loads data into an existing or new HDF5 file• Data loaded from ASCII or binary files• Each file corresponds to data values for one dataset• Integer (signed or unsigned) and float data can be loaded• Per-dataset settable properties include:

• datatype (int or float; size; architecture; byte order)• storage (compression, chunking, external file, maximum

dimensions) • Properties set via

• command line

% h5import in in_opts [in2 in2_opts] –o out• configuration file

% h5import in –c conf1 [in2 –c conf2] –o out

Example: h5import

PATH /Floats/FloatArrayINPUT-CLASS TEXTFPRANK 2DIMENSION-SIZES 4 3

Create Sample2.h5 based on Sample.h5

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 101

• % cat config.FloatArray• 0.01 0.02 0.03• 0.1 0.2 0.3• 1 2 3• 10 20 30

• % cat in.FloatArray

• HDF5 "Sample.h5" {• DATASET “/Float/FloatArray" {• DATATYPE H5T_IEEE_F32LE• DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) }• DATA {• 0.01, 0.02, 0.03,• 0.1, 0.2, 0.3,

• 1, 2, 3,• 10, 20, 30• }• }• }

• % h5dump –d Floats/FloatArray –y Sample.h5

Example: h5import

PATH /IntArrayINPUT-CLASS TEXTINRANK 2DIMENSION-SIZES 5 6

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 102

• % cat config.IntArray

• 0 1 2 3 4 5• 10 11 12 13 14 15• 20 21 22 23 24 25• 30 31 32 38 34 35• 40 41 42 43 44 45

• % cat in.IntArray

Input and configuration files ready; issue command

• % h5import in.FloatArray -c config.FloatArray \• in.IntArray -c config.IntArray -o Sample2.h5

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 103

h5mkgrp

h5mkgrp: makes groups in an HDF5 file.

Usage: h5mkgrp [OPTIONS] FILE GROUP... OPTIONS

-h, --help Print a usage message and exit

-l, --latest Use latest version of file format to create groups

-p, --parents No error if existing, make parent groups as needed

-v, --verbose Print information about OBJECTS and OPTIONS

-V, --version Print version number and exit

Example:

% h5mkgrp Sample2.h5 /EmptyGroup

Introduced in HDF5 release 1.8.0.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 104

h5diff

h5diff: compares HDF5 files and reports differences• compare two HDF5 files

% h5diff file1 file2• compare same object in two files

% h5diff file1 file2 object• compare different objects in two files

% h5diff file1 file2 object1 object2

Option flags:none: report number of differences found in objects and

where they occurred

-r: in addition, report the differences

-v: in addition, print list of object(s) and warnings; typically used when comparing two files without specifying object(s)

Example: h5diff

file1 file2--------------------------------------- x x / x /EmptyGroup x x /Floats x x /Floats/FloatArray x x /IntArray

group : </> and </>0 differences foundgroup : </Floats> and </Floats>0 differences founddataset: </Floats/FloatArray> and </Floats/FloatArray>0 differences founddataset: </IntArray> and </IntArray>size: [5x6] [5x6]position IntArray IntArray difference -------------------------------------------------------------------[ 3 3 ] 33 38 5

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 105

• % h5diff –v Sample.h5 Sample2.h5

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 106

h5repack

h5repack: copies an HDF5 file to a new file with specified filter and storage layout

• Removes unused space introduced when… Objects were deleted Compressed datasets were updated and no longer fit in

original space Full space allocated for variable-length data not used

• Optionally applies filter to datasets gzip, szip, shuffle, checksum

• Optionally applies storage layout to datasets Continuous, chunking, compact

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 107

h5repack: filters

Compression will not be performed if data is smaller than 1K unless –m flag is used.

• -f FILTER option: Apply filter, FILTER can be:

• GZIP to apply GZIP compression

• SZIP to apply SZIP compression

• SHUF to apply the HDF5 shuffle filter

• FLET to apply the HDF5 checksum filter

• NBIT to apply NBIT compression

• SOFF to apply the HDF5 Scale/Offset filter

• NONE to remove all filters

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 108

h5repack: storage layout

• -f LAYOUT option: Apply layout, LAYOUT can

be:

• CHUNK to apply chunking layout

• COMPA to apply compact layout

• CONTI to apply continuous layout

33% reduction in file size

Example: h5repack (filter)

75608 TES-Aura.he5 56808 TES-rp.he5

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 109

• % h5repack –f SHUF –f GZIP=1 TES-Aura.he5 \• TES-rp.he5

• % ls –sk TES-Aura.he5 TES-rp.he5

• Tropspheric Emission Spectrometer on Aura, the third of NASA's Earth Observing System's spacecrafts.

• Makes global 3-d measurements of ozone and other chemical species involved in its formation and destruction.

Example: h5repack (layout)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 110

• % h5repack –m 1 –l Floats/FloatArray:CHUNK=4x1 \

• Sample.h5 Sample-rp.h5HDF5 "Sample-rp.h5" {GROUP "/" { GROUP "Floats" { DATASET "FloatArray" { DATATYPE H5T_IEEE_F32LE DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) } STORAGE_LAYOUT { CHUNKED ( 4, 1 ) SIZE 48

} FILTERS { NONE } FILLVALUE { FILL_TIME H5D_FILL_TIME_IFSET VALUE 0 } ALLOCATION_TIME { H5D_ALLOC_TIME_INCR } …

• % h5dump –p –H Sample-rp.h5

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 111

Performance Tuning & Troubleshooting

• HDF5 tools can assist with performance tuning and troubleshootingDiscover objects and their properties in HDF5 files

h5dump -p Get file size overhead information

h5statFind locations of objects in a file

h5lsDiscover differences

h5diff, h5lsLocation of raw data

h5ls –varDoes file conform to HDF5 File Format Specification?

h5check

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 112

h5stat

h5stat: Prints statistics about HDF5 files

• Reports two types of statistics: High-level information about objects:

Number of different objects (groups, datasets, datatypes) Number of unique datatypes Size of raw data

Information about object’s structural metadata Size of structural metadata (total/free)

• Object header, local and global heaps• Size of B-trees

Object header fragmentation

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 113

h5stat

• Helps… troubleshoot size overhead in HDF5 files choose appropriate properties and storage strategies

• Usage:% h5stat –help % h5stat file.h5

• Full specification at : http://www.hdfgroup.uiuc.edu/RFC/HDF5/h5stat/

Introduced in HDF5 release 1.8.0.

h5check

• Verifies that a file is encoded according to the HDF5 File Format Specificationhttp://www.hdfgroup.org/HDF5/doc/H5.format.html

• Does not use the HDF5 library• Used to confirm that the files written by the HDF5

library are compliant with the specification• Tool is not part of the HDF5 source code

distributionftp://ftp.hdfgroup.org/HDF5/special_tools/h5check/

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 114

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 115

Questions?

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 116

HDF5 Advanced Topics

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 117

Outline

• Part I• Overview of HDF5 datatypes

• Part II• Partial I/O in HDF5

• Hyperslab selection• Dataset region references

• Chunking and compression• Part III

• Performance issues (how to do it right)• Part IV

• Performance benefits of HDF5 version 1.8

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 118

Part IHDF5 Datatypes

Quick overview of the most difficult topics

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 119

HDF5 Datatypes

• HDF5 has a rich set of pre-defined datatypes and supports the creation of an unlimited variety of complex user-defined datatypes.

• Datatype definitions are stored in the HDF5 file with the data.

• Datatype definitions include information such as byte order (endianess), size, and floating point representation to fully describe how the data is stored and to insure portability across platforms.

• Datatype definitions can be shared among objects in an HDF file, providing a powerful and efficient mechanism for describing data.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 120

Example

• Array of integers on IA32 platform• Native integer is little-endian, 4 bytes

• H5T_SDT_I32LE

• H5Dwrite

• Array of integers on SPARC64 platform• Native integer is big-endian, 8 bytes•

• H5T_NATIVE_INT • H5T_NATIVE_INT

• H5Dread

• Little-endian 4 bytes integer

• VAX G-floating

• H5Dwrite

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 121

Storing Variable Length Data in HDF5

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 122

• Data

• Time

• Data

• Data

• Data

• Data

• Data

• Data

• Data

• Data

• Time

HDF5 Fixed and Variable Length Array Storage

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 123

Storing Strings in HDF5

• Array of characters (Array datatype or extra dimension in dataset)• Quick access to each character• Extra work to access and interpret each string

• Fixed lengthstring_id = H5Tcopy(H5T_C_S1);H5Tset_size(string_id, size);

• Wasted space in shorter strings• Can be compressed

• Variable lengthstring_id = H5Tcopy(H5T_C_S1);H5Tset_size(string_id, H5T_VARIABLE);

• Overhead as for all VL datatypes• Compression will not be applied to actual data

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 124

Storing Variable Length Data in HDF5

• Each element is represented by C structure typedef struct { size_t length; void *p;} hvl_t;

• Base type can be any HDF5 typeH5Tvlen_create(base_type)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 125

• Data

• Data

• Data

• Data

• Data

Example• hvl_t data[LENGTH];

• for(i=0; i<LENGTH; i++) { data[i].p=malloc((i+1)*sizeof(unsigned int)); data[i].len=i+1;

• }

• tvl = H5Tvlen_create (H5T_NATIVE_UINT);

• data[0].p

• data[4].len

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 126

Reading HDF5 Variable Length Array

• hvl_t rdata[LENGTH];• /* Create the memory vlen type */• tvl = H5Tvlen_create (H5T_NATIVE_UINT);• ret = H5Dread(dataset,tvl,H5S_ALL,H5S_ALL,• H5P_DEFAULT, rdata); • /* Reclaim the read VL data */• H5Dvlen_reclaim(tvl,H5S_ALL,H5P_DEFAULT,rdata);

• On read HDF5 Library allocates memory to read data in, • application only needs to allocate array of hvl_t elements • (pointers and lengths).

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 127

Storing Tables in HDF5 file

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 128

Example

a_name (integer)

b_name (float)

c_name (double)

0 0. 1.0000

1 1. 0.5000

2 4. 0.3333

3 9. 0.2500

4 16. 0.2000

5 25. 0.1667

6 36. 0.1429

7 49. 0.1250

8 64. 0.1111

9 81. 0.1000

Multiple ways to store a table Dataset for each field Dataset with compound datatype If all fields have the same type: 2-dim array 1-dim array of array datatype continued…..Choose to achieve your goal!How much overhead each type of storage will create?Do I always read all fields?Do I need to read some fields more often?Do I want to use compression?Do I want to access some records?

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 129

HDF5 Compound Datatypes

• Compound types• Comparable to C structs • Members can be atomic or compound

types • Members can be multidimensional• Can be written/read by a field or set of

fields• Not all data filters can be applied (shuffling,

SZIP)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 130

HDF5 Compound Datatypes

• Which APIs to use?• H5TB APIs

• Create, read, get info and merge tables• Add, delete, and append records• Insert and delete fields• Limited control over table’s properties (i.e. only GZIP

compression, level 6, default allocation time for table, extendible, etc.)

• PyTables http://www.pytables.org• Based on H5TB• Python interface• Indexing capabilities

• HDF5 APIs • H5Tcreate(H5T_COMPOUND), H5Tinsert calls to create a

compound datatype• H5Dcreate, etc.• See H5Tget_member* functions for discovering properties of the

HDF5 compound datatype

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 131

Creating and Writing Compound Dataset

• h5_compound.c example

• typedef struct s1_t { • int a; • float b; • double c;• } s1_t;

• s1_t s1[LENGTH];

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 132

Creating and Writing Compound Dataset

• /* Create datatype in memory. */

• s1_tid = H5Tcreate (H5T_COMPOUND, sizeof(s1_t)); • H5Tinsert(s1_tid, "a_name", HOFFSET(s1_t, a),• H5T_NATIVE_INT); • H5Tinsert(s1_tid, "c_name", HOFFSET(s1_t, c),• H5T_NATIVE_DOUBLE); • H5Tinsert(s1_tid, "b_name", HOFFSET(s1_t, b),• H5T_NATIVE_FLOAT);

• Note: • Use HOFFSET macro instead of calculating offset by hand.• Order of H5Tinsert calls is not important if HOFFSET is used.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 133

Creating and Writing Compound Dataset

• /* Create dataset and write data */

• dataset = H5Dcreate(file, DATASETNAME, s1_tid, space,

• H5P_DEFAULT, H5P_DEFAULT);• status = H5Dwrite(dataset, s1_tid, H5S_ALL, H5S_ALL,

• H5P_DEFAULT, s1); • Note: • In this example memory and file datatypes are the same.• Type is not packed. • Use H5Tpack to save space in the file.

• status = H5Tpack(s1_tid);• status = H5Dcreate(file, DATASETNAME, s1_tid, space,• H5P_DEFAULT, H5P_DEFAULT);

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 134

File Content with h5dump

• HDF5 "SDScompound.h5" {• GROUP "/" { • DATASET "ArrayOfStructures" {• DATATYPE { • H5T_STD_I32BE "a_name"; • H5T_IEEE_F32BE "b_name"; • H5T_IEEE_F64BE "c_name"; } • DATASPACE { SIMPLE ( 10 ) / ( 10 ) }

• DATA { • {• [ 0 ],• [ 0 ],• [ 1 ] • }, • { • [ 1 ],• …

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 135

Reading Compound Dataset

• /* Create datatype in memory and read data. */

• dataset = H5Dopen(file, DATASETNAME, H5P_DEFAULT);

• s2_tid = H5Dget_type(dataset);• mem_tid = H5Tget_native_type (s2_tid);• s1 = malloc(H5Tget_size(mem_tid)*number_of_elements);

• status = H5Dread(dataset, mem_tid, H5S_ALL,• H5S_ALL, H5P_DEFAULT, s1);• Note:

• We could construct memory type as we did in writing example.

• For general applications we need to discover the type in the file, find out corresponding memory type, allocate space and do read.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 136

Reading Compound Dataset by Fields

• typedef struct s2_t { • double c; • int a;• } s2_t; • s2_t s2[LENGTH];• …• s2_tid = H5Tcreate (H5T_COMPOUND, sizeof(s2_t)); • H5Tinsert(s2_tid, "c_name", HOFFSET(s2_t, c),• H5T_NATIVE_DOUBLE); • H5Tinsert(s2_tid, “a_name", HOFFSET(s2_t, a),• H5T_NATIVE_INT);• …• status = H5Dread(dataset, s2_tid, H5S_ALL,• H5S_ALL, H5P_DEFAULT, s2);

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 137

New Way of Creating Datatypes

• Another way to create a compound datatype

• #include H5LTpublic.h• …..

• s2_tid = H5LTtext_to_dtype(• "H5T_COMPOUND • {H5T_NATIVE_DOUBLE \"c_name\"; • H5T_NATIVE_INT \"a_name\";• }", • H5LT_DDL);

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 138

Need Help with Datatypes?

• Check our support web pages

• http://www.hdfgroup.uiuc.edu/UserSupport/examples-by-api/api18-c.html

• http://www.hdfgroup.uiuc.edu/UserSupport/examples-by-api/api16-c.html

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 139

Part IIWorking with subsets

Collect data one way ….

• Array of images (3D)

March 9, 2009 14010th International LCI Conference - HDF5 Tutorial

• Stitched image (2D array)

Display data another way …

March 9, 2009 14110th International LCI Conference - HDF5 Tutorial

Data is too big to read….

March 9, 2009 14210th International LCI Conference - HDF5 Tutorial

• Need to select and access the same • elements of a dataset

Refer to a region…

March 9, 2009 14310th International LCI Conference - HDF5 Tutorial

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 144

HDF5 Library Features

• HDF5 Library provides capabilities to• Describe subsets of data and perform write/read

operations on subsets• Hyperslab selections and partial I/O

• Store descriptions of the data subsets in a file• Object references• Region references

• Use efficient storage mechanism to achieve good performance while writing/reading subsets of data• Chunking, compression

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 145

Partial I/O in HDF5

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 146

How to Describe a Subset in HDF5?

• Before writing and reading a subset of data one has to describe it to the HDF5 Library.

• HDF5 APIs and documentation refer to a subset as a “selection” or “hyperslab selection”.

• If specified, HDF5 Library will perform I/O on a selection only and not on all elements of a dataset.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 147

Types of Selections in HDF5

• Two types of selections• Hyperslab selection

• Regular hyperslab• Simple hyperslab• Result of set operations on hyperslabs (union,

difference, …) • Point selection

• Hyperslab selection is especially important for doing parallel I/O in HDF5 (See Parallel HDF5 Tutorial)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 148

•Regular Hyperslab

• • • • •

• • •

• • •

• • • • •

• Collection of regularly spaced equal size blocks

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 149

•Simple Hyperslab

• Contiguous subset or sub-array

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 150

•Hyperslab Selection

• Result of union operation on three simple hyperslabs

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 151

Hyperslab Description

• Start - starting location of a hyperslab (1,1)• Stride - number of elements that separate each

block (3,2)• Count - number of blocks (2,6)• Block - block size (2,1)• Everything is “measured” in number of elements

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 152

Simple Hyperslab Description

• Two ways to describe a simple hyperslab• As several blocks

• Stride – (1,1)• Count – (2,6)• Block – (2,1)

• As one block• Stride – (1,1)• Count – (1,1)• Block – (4,6)

• No performance penalty for • one way or another

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 153

H5Sselect_hyperslab Function

• space_id Identifier of dataspace • op Selection operator

• H5S_SELECT_SET or H5S_SELECT_OR • start Array with starting coordinates of hyperslab • stride Array specifying which positions along a dimension• to select• count Array specifying how many blocks to select from

the • dataspace, in each dimension

• block Array specifying size of element block • (NULL indicates a block size of a single element

in • a dimension)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 154

Reading/Writing Selections

Programming model for reading from a dataset in

a file1. Open a dataset.

2. Get file dataspace handle of the dataset and specify subset to read from.a. H5Dget_space returns file dataspace handle

a. File dataspace describes array stored in a file (number of dimensions and their sizes).

b. H5Sselect_hyperslab selects elements of the array that participate in I/O operation.

3. Allocate data buffer of an appropriate shape and size

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 155

Reading/Writing Selections

Programming model (continued)4. Create a memory dataspace and specify subset to write

to.1. Memory dataspace describes data buffer (its rank and

dimension sizes).

2. Use H5Screate_simple function to create memory dataspace.

3. Use H5Sselect_hyperslab to select elements of the data buffer that participate in I/O operation.

5. Issue H5Dread or H5Dwrite to move the data between file and memory buffer.

6. Close file dataspace and memory dataspace when done.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 156

Example : Reading Two Rows

1 2 3 4 5 6

7 8 9 10 11 12

13 14 15 16 17 18

19 20 21 22 23 24

-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

• Data in a file• 4x6 matrix

• Buffer in memory• 1-dim array of length 14

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 157

Example: Reading Two Rows

1 2 3 4 5 6

7 8 9 10 11 12

13 14 15 16 17 18

19 20 21 22 23 24

• start = {1,0}• count = {2,6}• block = {1,1}• stride = {1,1}

• filespace = H5Dget_space (dataset);• H5Sselect_hyperslab (filespace, H5S_SELECT_SET,• start, NULL, count, NULL)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 158

Example: Reading Two Rows

-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

• start[1] = {1}• count[1] = {12}• dim[1] = {14}

• memspace = H5Screate_simple(1, dim, NULL);• H5Sselect_hyperslab (memspace, H5S_SELECT_SET,• start, NULL, count, NULL)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 159

Example: Reading Two Rows

1 2 3 4 5 6

7 8 9 10 11 12

13 14 15 16 17 18

19 20 21 22 23 24

-1 7 8 9 10 11 12 13 14 15 16 17 18 -1

• H5Dread (…, …, memspace, filespace, …, …);

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 160

Things to Remember

• Number of elements selected in a file and in a memory buffer must be the same • H5Sget_select_npoints returns number of

selected elements in a hyperslab selection• HDF5 partial I/O is tuned to move data between

selections that have the same dimensionality; avoid choosing subsets that have different ranks (as in example above)

• Allocate a buffer of an appropriate size when reading data; use H5Tget_native_type and H5Tget_size to get the correct size of the data element in memory.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 161

HDF5 Region References and Selections

• Need to select and access the same • elements of a dataset

Saving Selected Region in a File

March 9, 2009 16210th International LCI Conference - HDF5 Tutorial

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 163

Reference Datatype

• Reference to an HDF5 object• Pointer to a group or a dataset in a file

• Predefined datatype H5T_STD_REG_OBJ describe object references

• Reference to a dataset region (or to selection)• Pointer to the dataspace selection

• Predefined datatype H5T_STD_REF_DSETREG to describe regions

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 164

Reference to Dataset Region

• REF_REG.h5

• Root

• Region References• Matrix

• 1 1 2 3 3 4 5 5 6

• 1 2 2 3 4 4 5 6 6

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 165

Reference to Dataset Region

Example

dsetr_id = H5Dcreate(file_id, “REGION REFERENCES”, H5T_STD_REF_DSETREG, …);

H5Sselect_hyperslab(space_id, H5S_SELECT_SET, start, NULL, …);H5Rcreate(&ref[0], file_id, “MATRIX”,H5R_DATASET_REGION, space_id);

H5Dwrite(dsetr_id, H5T_STD_REF_DSETREG, H5S_ALL, H5S_ALL, H5P_DEFAULT,ref);

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 166

Reference to Dataset RegionHDF5 "REF_REG.h5" {GROUP "/" { DATASET "MATRIX" { …… } DATASET "REGION_REFERENCES" { DATATYPE H5T_REFERENCE DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): DATASET /MATRIX {(0,3)-(1,5)}, (1): DATASET /MATRIX {(0,0), (1,6), (0,8)} } }}}

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 167

Chunking in HDF5

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 168

HDF5 Chunking

• Dataset data is divided into equally sized blocks (chunks).• Each chunk is stored separately as a contiguous block in

HDF5 file.

• Application memory

• Metadata cache• Dataset header

• ………….• Datatype

• Dataspace• ………….• Attributes

• …

• File

• Dataset data

• A • D• C • B• header• Chunk

index

• Chunkindex

• A • B • C • D

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 169

HDF5 Chunking

• Chunking is needed for• Enabling compression and other filters• Extendible datasets

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 170

HDF5 Chunking

• If used appropriately chunking improves partial I/O for big datasets

• Only two chunks are involved in I/O

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 171

HDF5 Chunking

• Chunk has the same rank as a dataset• Chunk’s dimensions do not need to be factors of

dataset’s dimensions

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 172

Creating Chunked Dataset

1. Create a dataset creation property list.2. Set property list to use chunked storage layout.3. Create dataset with the above property list.

dcpl_id = H5Pcreate(H5P_DATASET_CREATE); rank = 2; ch_dims[0] = 100; ch_dims[1] = 100; H5Pset_chunk(dcpl_id, rank, ch_dims); dset_id = H5Dcreate (…, dcpl_id); H5Pclose(dcpl_id);

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 173

Writing or Reading Chunked Dataset

1. Chunking mechanism is transparent to application.

2. Use the same set of operation as for contiguous dataset, for example,

H5Dopen(…); H5Sselect_hyperslab (…); H5Dread(…);

3. Selections do not need to coincide precisely with the chunks boundaries.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 174

HDF5 Filters

• HDF5 filters modify data during I/O operations• Available filters:

1. Checksum (H5Pset_fletcher32)2. Shuffling filter (H5Pset_shuffle)3. Data transformation (in 1.8.*)4. Compression

• Scale + offset (in 1.8.*)• N-bit (in 1.8.*)• GZIP (deflate), SZIP (H5Pset_deflate, H5Pset_szip)• User-defined filters (BZIP2)

• Example of a user-defined compression filter can be found http://www.hdfgroup.uiuc.edu/papers/papers/bzip2/

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 175

Creating Compressed Dataset

1. Create a dataset creation property list2. Set property list to use chunked storage layout3. Set property list to use filters4. Create dataset with the above property list

crp_id = H5Pcreate(H5P_DATASET_CREATE); rank = 2; ch_dims[0] = 100; ch_dims[1] = 100; H5Pset_chunk(crp_id, rank, ch_dims); H5Pset_deflate(crp_id, 9); dset_id = H5Dcreate (…, crp_id); H5Pclose(crp_id);

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 176

Writing Compressed Dataset

• C • B• A

• …………..

• Default chunk cache size is 1MB. • Filters including compression are applied when chunk is evicted from

cache.• Chunks in the file may have different sizes

• A• B • C

• C

• File

• Chunk cache (per dataset)• Chunked dataset

• Filter pipeline

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 177

Chunking Basics to Remember

• Chunking creates storage overhead in the file.• Performance is affected by

• Chunking and compression parameters • Chunking cache size (H5Pset_cache call)

• Some hints for getting better performance• Use chunk size not smaller than block size (4k) on

a file system.• Use compression method appropriate for your

data.• Avoid using selections that do not coincide with

the chunking boundaries.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 178

Example

Creates a compressed 1000x20 integer dataset in a file

%h5dump –p –H zip.h5

HDF5 "zip.h5" {GROUP "/" { GROUP "Data" { DATASET "Compressed_Data" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 1000, 20 )……… STORAGE_LAYOUT { CHUNKED ( 20, 20 ) SIZE 5316 }

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 179

Example (continued)

FILTERS { COMPRESSION DEFLATE { LEVEL 6 } } FILLVALUE { FILL_TIME H5D_FILL_TIME_IFSET VALUE 0 } ALLOCATION_TIME { H5D_ALLOC_TIME_INCR } } }}}

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 180

Example (bigger chunk)

Creates a compressed integer dataset 1000x20 in afile; better compression ratio is achieved.

h5dump –p –H zip.h5

HDF5 "zip.h5" {GROUP "/" { GROUP "Data" { DATASET "Compressed_Data" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 1000, 20 )……… STORAGE_LAYOUT { CHUNKED ( 200, 20 ) SIZE 2936 }

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 181

Part IIIPerformance Issues(How to Do it Right)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 182

Performance of Serial I/O Operations

• Next slides show the performance effects of using different access patterns and storage layouts.

• We use three test cases which consist of writing a selection to an array of characters.

• Data is stored in a row-major order.• Tests were executed on THG Linux x86_64 box

using h5perf_serial and HDF5 version 1.8.0

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 183

Serial Benchmarking Tool

• Benchmarking tool, h5perf_serial, publicly released with HDF5 1.8.1

• Features inlcude:• Support for POSIX and HDF5 I/O calls.• Support for datasets and buffers with multiple

dimensions.• Entire dataset access using a single or several I/O

operations.• Selection of contiguous and chunked storage for HDF5

operations.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 184

Contiguous Storage (Case 1)

• Rectangular dataset of size 48K x 48K, with write selections of 512 x 48K.

• HDF5 storage layout is contiguous.• Good I/O pattern for POSIX and

HDF5 because each selection is contiguous.

• POSIX: 5.19 MB/s• HDF5: 5.36 MB/s

• 1

• 2

• 3

• 4

• 1 • 2 • 3 • 4

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 185

Contiguous Storage (Case 2)

• Rectangular dataset of 48K x 48K, with write selections of 48K x 512.

• HDF5 storage layout is contiguous.

• Bad I/O pattern for POSIX and HDF5 because each selection is noncontiguous.

• POSIX: 1.24 MB/s• HDF5: 0.05 MB/s

• 1• 2• 3• 4

• 1• 2• 3• 4• 1• 2• 3• 4 • …….

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 186

Chunked Storage

• Rectangular dataset of 48K x 48K, with write selections of 48K x 512.

• HDF5 storage layout is chunked. Chunks and selections sizes are equal.

• Bad I/O case for POSIX because selections are noncontiguous.

• Good I/O case for HDF5 since selections are contiguous due to chunking layout settings.

• POSIX: 1.51 MB/s• HDF5: 5.58 MB/s

• 1• 2• 3• 4

• 1 • 2 • 3 • 4

• 1• 2• 3• 4• 1• 2• 3• 4 • …….

• POSIX

• HDF5

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 187

Conclusions

• Access patterns with small I/O operations incur high latency and overhead costs many times.

• Chunked storage may improve I/O performance by affecting the contiguity of the data selection.

Writing Chunked Dataset

• 1000x100x100 dataset• 4 byte integers• Random values 0-99

• 50x100x100 chunks (20 total)• Chunk size: 2 MB

• Write the entire dataset using 1x100x100 slices• Slices are written sequentially

March 9, 2009 18810th International LCI Conference - HDF5 Tutorial

Test Setup

• 20 Chunks

• 1000 slices• Chunk size is 2MB

March 9, 2009 18910th International LCI Conference - HDF5 Tutorial

Test Setup (continued)

• Tests performed with 1 MB and 5MB chunk cache size• Cache size set with H5Pset_cache function

H5Pget_cache (fapl, NULL, &rdcc_nelmts,

&rdcc_nbytes, &rdcc_w0); H5Pset_cache (fapl, 0, rdcc_nelmts,

5*1024*1024, rdcc_w0);

• Tests performed with no compression and with gzip (deflate) compression

March 9, 2009 19010th International LCI Conference - HDF5 Tutorial

Effect of Chunk Cache Size on Write

Cache size I/O operations Total data written

File size

1 MB (default) 1002 75.54 MB 38.15 MB

5 MB 22 38.16 MB 38.15 MB

• No compression

• Gzip compression

Cache size I/O operations Total data written

File size

1 MB (default) 1982 335.42 MB(322.34 MB read)

13.08 MB

5 MB 22 13.08 MB 13.08 MB

March 9, 2009 19110th International LCI Conference - HDF5 Tutorial

Effect of Chunk Cache Size on Write

• With the 1 MB cache size, a chunk will not fit into the cache• All writes to the dataset must be immediately

written to disk• With compression, the entire chunk must be read

and rewritten every time a part of the chunk is written to• Data must also be decompressed and

recompressed each time• Non sequential writes could result in a larger file

• Without compression, the entire chunk must be written when it is first written to the file• If the selection were not contiguous on disk, it could

require as much as 1 I/O operation for each element

March 9, 2009 19210th International LCI Conference - HDF5 Tutorial

Effect of Chunk Cache Size on Write

• With the 5 MB cache size, the chunk is written only after it is full• Drastically reduces the number of I/O operations• Reduces the amount of data that must be written

(and read)• Reduces processing time, especially with the

compression filter

March 9, 2009 19310th International LCI Conference - HDF5 Tutorial

Conclusion

• It is important to make sure that a chunk will fit into the raw data chunk cache

• If you will be writing to multiple chunks at once, you should increase the cache size even more• Try to design chunk dimensions to minimize the

number you will be writing to at once

March 9, 2009 19410th International LCI Conference - HDF5 Tutorial

Reading Chunked Dataset

• Read the same dataset, again by slices, but the slices cross through all the chunks

• 2 orientations for read plane• Plane includes fastest changing dimension• Plane does not include fastest changing dimension

• Measure total read operations, and total size read• Chunk sizes of 50x100x100, and 10x100x100• 1 MB cache

March 9, 2009 19510th International LCI Conference - HDF5 Tutorial

• Chunks

• Read slices• Vertical and horizontal

Test Setup

March 9, 2009 19610th International LCI Conference - HDF5 Tutorial

Results

• Read slice includes fastest changing dimension

Chunk size Compression I/O operations Total data read

50 Yes 2010 1307 MB

10 Yes 10012 1308 MB

50 No 100010 38 MB

10 No 10012 3814 MB

March 9, 2009 19710th International LCI Conference - HDF5 Tutorial

Results (continued)

• Read slice does not include fastest changing dimension

Chunk size Compression I/O operations Total data read

50 Yes 2010 1307 MB

10 Yes 10012 1308 MB

50 No 10000010 38 MB

10 No 10012 3814 MB

March 9, 2009 19810th International LCI Conference - HDF5 Tutorial

Effect of Cache Size on Read

• When compression is enabled, the library must always read each entire chunk once for each call to H5Dread.

• When compression is disabled, the library’s behavior depends on the cache size relative to the chunk size.• If the chunk fits in cache, the library reads each

entire chunk once for each call to H5Dread• If the chunk does not fit in cache, the library reads

only the data that is selected• More read operations, especially if the read plane

does not include the fastest changing dimension• Less total data read

March 9, 2009 19910th International LCI Conference - HDF5 Tutorial

Conclusion

• In this case cache size does not matter when reading if compression is enabled.

• Without compression, a larger cache may not be beneficial, unless the cache is large enough to hold all of the chunks.• The optimum cache size depends on the exact

shape of the data, as well as the hardware.

March 9, 2009 20010th International LCI Conference - HDF5 Tutorial

Hints for Chunk Settings

• Chunk dimensions should align as closely as possible with hyperslab dimensions for read/write

• Chunk cache size (rdcc_nbytes) should be large enough to hold all the chunks in the selection• If this is not possible, it may be best to disable chunk

caching altogether (set rdcc_nbytes to 0)• rdcc_nelmts should be a prime number that is at

least 10 to 100 times the number of chunks that can fit into rdcc_nbytes

• rdcc_w0 should be set to 1 if chunks that have been fully read/written will never be read/written again

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 201

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 202

Part IVPerformance Benefits of

HDF5 version 1.8

What Did We Do in HDF5 1.8?

• Extended File Format Specification • Reviewed group implementations• Introduced new link object• Revamped metadata cache implementation• Improved handling of datasets and datatypes• Introduced shared object header message• Extended error handling• Enhanced backward/forward APIs and file format

compatibility

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 203

What Did We Do in HDF5 1.8?

And much more good stuff to make HDF5

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 204

•Better and Faster

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 205

HDF5 File Format Extension

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 206

HDF5 File Format Extension

• Why: • Address deficiencies of the original file format• Address space overhead in an HDF5 file• Enable new features

• What: • New routine that instructs the HDF5 library to

create all objects using the latest version of the HDF5 file format (cmp. with the earliest version when object became available, for example, array datatype)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 207

HDF5 File Format Extension

Example

/* Use the latest version of a file format for each object created in a file */

fapl_id = H5Pcreate(H5P_FILE_ACCESS);H5Pset_libver_bounds(fapl_id, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST);fid = H5Fcreate(…,…,…,fapl_id);orfid = H5Fopen(…,…,fapl_id);

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 208

Group Revisions

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 209

Better Large Group Storage

• Why: • Faster, more scalable storage and access for large

groups• What:

• New format and method for storing groups with many links

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 210

Informal Benchmark

• Create a file and a group in a file• Create up to 10^6 groups with one dataset in

each group• Compare files sizes and performance of HDF5

1.8.1 using the latest group format with the performance of HDF5 1.8.1 (default, old format) and 1.6.7

• Note: Default 1.8.1 and 1.6.7 became very slow after 700000 groups

Time to Open and Read a Dataset

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 211

10000 100000 10000000.1

1

10

100

1000

1.61.8 (old groups)1.8 (new groups)

Number of Groups

Tim

e (

mil

lis

ec

on

ds

)

File Size

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 212

0 200000 400000 600000 8000000

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000

1.8 (old groups)1.8 (new groups)

Number of Groups

Siz

e (

kil

ob

yte

s)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 213

Questions?

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 214

Data Storage and I/O in HDF5

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 215

Software stack

• Life cycle: What happens to data when it is transferred from application buffer to HDF5 file and from HDF5 file to application buffer?

• File or other “storage”

• Virtual file I/O

• Library internals

• Object API

• Application • Data buffer

• H5Dwrite

• ?

• Unbuffered I/O

• Data in a file

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 216

Goals

• Understanding of what is happening to data inside the HDF5 library will help to write efficient applications

• Goals of this talk:• Describe some basic operations and data

structures, and explain how they affect performance and storage sizes

• Give some “recipes” for how to improve performance

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 217

Topics

• Dataset metadata and array data storage layouts• Types of dataset storage layouts• Factors affecting I/O performance

• I/O with compact datasets• I/O with contiguous datasets• I/O with chunked datasets• Variable length data and I/O

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 218

HDF5 dataset metadata and array data storage

layouts

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 219

HDF5 Dataset

• Data array• Ordered collection of identically typed data items

distinguished by their indices

• Metadata• Dataspace: Rank, dimensions of dataset array• Datatype: Information on how to interpret data• Storage Properties: How array is organized on

disk• Attributes: User-defined metadata (optional)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 220

HDF5 Dataset

• Dataset data• Metadata• Dataspace

• 3

• Rank

• Dim_2 = 5

• Dim_1 = 4

• Dimensions

• Time = 32.4

• Pressure = 987

• Temp = 56

• Attributes

• Chunked

• Compressed

• Dim_3 = 7

• Storage info

• IEEE 32-bit float

• Datatype

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 221

Metadata cache and dataset data

• Dataset data typically kept in application memory• Dataset header in separate space – metadata cache

• Application memory

• Metadata cache

• File •

• Dataset data

• Dataset header • Dataset data

• Dataset header• ………….• Datatype

• Dataspace• ………….• Attributes

• …

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 222

Metadata and metadata cache

• HDF5 metadata• Information about HDF5 objects used by the library• Examples: object headers, B-tree nodes for group,

B-Tree nodes for chunks, heaps, super-block, etc. • Usually small compared to raw data sizes (KB vs.

MB-GB)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 223

Metadata and metadata cache

• Metadata cache• Space allocated to handle pieces of the HDF5

metadata • Allocated by the HDF5 library in application’s

memory space• Cache behavior affects overall performance• Metadata cache implementation prior to HDF5

1.6.5 could cause performance degradation for some applications

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 224

Types of data storage layouts

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 225

HDF5 datasets storage layouts

• Contiguous• Chunked• Compact

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 226

Contiguous storage layout

• Metadata header separate from dataset data• Data stored in one contiguous block in HDF5 file

• Application memory

• Metadata cache• Dataset header

• ………….• Datatype

• Dataspace• ………….• Attributes

• …

• File •

• Dataset data

• Dataset data

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 227

Chunked storage

• Chunking – storage layout where a dataset is partitioned in fixed-size multi-dimensional tiles or chunks

• Used for extendible datasets and datasets with filters applied (checksum, compression)

• HDF5 library treats each chunk as atomic object• Greatly affects performance and file sizes

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 228

Chunked storage layout

• Dataset data divided into equal sized blocks (chunks)• Each chunk stored separately as a contiguous block in

HDF5 file

• Application memory

• Metadata cache• Dataset header

• ………….• Datatype

• Dataspace• ………….• Attributes

• …

• File

• Dataset data

• A • D• C • B• header• Chunk

index

• Chunkindex

• A • B • C • D

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 229

Compact storage layout

• Dataset data and metadata stored together in the object header

• File

• Application memory

• Dataset header• ………….• Datatype

• Dataspace• ………….• Attributes

• …

• Metadata cache • Dataset data

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 230

Factors affecting I/O performance

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 231

What goes on inside the library?

• Operations on data inside the library• Copying to/from internal buffers• Datatype conversion• Scattering - gathering • Data transformation (filters, compression)

• Data structures used• B-trees (groups, dataset chunks)• Hash tables• Local and Global heaps (variable length data: link names, strings,

etc.)• Other concepts

• HDF5 metadata, metadata cache• Chunking, chunk cache

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 232

Operations on data inside the library

• Copying to/from internal buffers• Datatype conversion, such as

• float integer• Little-endian big-endian• 64-bit integer to 16-bit integer

• Scattering - gathering • Data is scattered/gathered from/to application buffers

into internal buffers for datatype conversion and partial I/O

• Data transformation (filters, compression)• Checksum on raw data and metadata (in 1.8.0)• Algebraic transform• GZIP and SZIP compressions• User-defined filters

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 233

I/O performance

• I/O performance depends on • Storage layouts• Dataset storage properties• Chunking strategy• Metadata cache performance• Datatype conversion performance• Other filters, such as compression• Access patterns

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 234

I/O with different storage layouts

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 235

Writing a compact dataset

• Application memory

• Dataset header• ………….• Datatype

• Dataspace• ………….• Attributes

• …

• File

• Metadata cache

• Dataset data

• One write to store header and dataset data

• Dataset data

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 236

Writing contiguous dataset – no conversion

• Application memory

• Metadata cache• Dataset header

• ………….• Datatype

• Dataspace• ………….• Attributes

• …

• File •

• Dataset data

• No sub-setting in memory or a file is performed

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 237

Writing a contiguous dataset with datatype conversion

• Dataset header• ………….• Datatype

• Dataspace• ………….• Attribute 1• Attribute 2• ………… • Application memory

• Metadata cache

• File

• Conversion buffer 1MB

• Dataset data

• No sub-setting in memory or a file is performed

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 238

Partial I/O with contiguous datasets

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 239

Writing whole dataset – contiguous rows

• File

• Application data in memory

• Data is contiguous in a file

• One I/O operation

• M rows

• M

• N

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 240

Sub-setting of contiguous datasetSeries of adjacent rows

• File

• N

• Application data in memory

• Subset – contiguous in a file

• One I/O operation

• M rows

• M

• Entire dataset – contiguous in a file

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 241

Sub-setting of contiguous datasetAdjacent, partial rows

• File

• N

• M

• …

• Application data in memory

• Data is scattered in a file in M contiguous blocks

• Several small I/O operation

• N elements

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 242

Sub-setting of contiguous datasetExtreme case: writing a column

• N

• M

• Application data in memory

• Subset data is scattered in a file in M different locations

• Several small I/O operation

• …

• 1 element

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 243

Sub-setting of contiguous datasetData sieve buffer

• File

• N

• M

• …

• Application data in memory

• Data is scattered in a file

• 1 element

• Data in a sieve buffer (64K) in memory

• memcopy

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 244

Performance tuning for contiguous dataset

• Datatype conversion• Avoid for better performance• Use H5Pset_buffer function to customize

conversion buffer size• Partial I/O

• Write/read in big contiguous blocks • Use H5Pset_sieve_buf_size to improve

performance for complex subsetting

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 245

I/O with Chunking

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 246

Chunked storage layout

• Raw data divided into equal sized blocks (chunks)• Each chunk stored separately as a contiguous block

in a file

• Application memory

• Metadata cache• Dataset header

• ………….• Datatype

• Dataspace• ………….• Attributes

• …

• File

• Dataset data

• A • D• C • B• header

• Chunkindex

• Chunkindex

• A • B • C • D

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 247

Information about chunking

• HDF5 library treats each chunk as atomic object• Compression and other filters are applied to each chunk• Datatype conversion is performed on each chunk

• Chunk size greatly affects performance• Chunk overhead adds to file size• Chunk processing involves many steps

• Chunk cache• Caches chunks for better performance• Size of chunk cache is set for file (default size 1MB)• Each chunked dataset has its own chunk cache• Chunk may be too big to fit into cache• Memory may grow if application keeps opening datasets

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 248

Chunk cache

• Dataset_1 header

• …………

• Application memory

• Metadata cache

• Chunking B-tree nodes• Chunk cache

• Default size is 1MB• Dataset_N header

• …………

• ………

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 249

Writing chunked dataset

• C • B• A

• …………..

• Filters including compression are applied when chunk is evicted from cache

• A• B • C

• C

• File

• Chunk cache• Chunked dataset

• Filter pipeline

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 250

Partial I/O with Chunking

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 251

Partial I/O for chunked dataset

Example: write the green subset from the dataset , converting the data

Dataset is stored as six chunks in the file. The subset spans four chunks, numbered 1-4 in the figure. Hence four chunks must be written to the file. But first, the four chunks must be read from the file, to preserve

those parts of each chunk that are not to be overwritten.

• 1 • 2

• 3 • 4

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 252

Partial I/O for chunked dataset

• For each of four chunks on writing:• Read chunk from file into chunk

cache, unless it’s already there• Determine which part of the chunk will

be replaced by the selection• Move those elements from application

buffer to conversion buffer • Perform conversion• Replace that part of the chunk in the

cache with the corresponding elements from the conversion buffer

• Apply filters (compression) when chunk is flushed from chunk cache

• For each element 3 (or more) memcopy operations are performed

• 1 • 2

• 3 • 4

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 253

Partial I/O for chunked dataset

• 3

• Application memory

• conversion buffer

• Application buffer

• Chunk

• Elements participating in I/O are gathered into corresponding chunk• after going through conversion buffer

• Chunk cache

• 3

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 254

Partial I/O for chunked dataset

• 3 • Conversion buffer

• Application memory

• Chunk cache

• File • Chunk

• Apply filters and write to file

• Application buffer

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 255

Variable length data and I/O

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 256

Examples of variable length data

• String A[0] “the first string we want to write”

…………………………………

A[N-1] “the N-th string we want to write”• Each element is a record of variable-length

A[0] (1,1,0,0,0,5,6,7,8,9) [length = 10]

A[1] (0,0,110,2005) [length = 4]

………………………..

A[N] (1,2,3,4,5,6,7,8,9,10,11,12,….,M) [length = M]

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 257

Variable length data in HDF5

• Variable length description in HDF5 application

typedef struct { size_t length; void *p;}hvl_t;

• Base type can be any HDF5 type

H5Tvlen_create(base_type)• ~ 20 bytes overhead for each element• Data cannot be compressed

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 258

Variable length data storage in HDF5

• Global hea

p

• Actual variable length data

• Dataset withvariable length elements

• Pointer intoglobal heap

• File

• Dataset header

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 259

Variable length datasets and I/O

• When writing variable length data, elements in application buffer always go through conversion and are copied to the global heaps in a metadata cache before ending in a file

• Global heap

• Application buffer

• Metadata cache

• Raw VL data

• conversion buffer

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 260

There may be more than one global heap

• Global

heap

• Raw data

• Global

heap• Metadata cache

• Application memory

• Raw VL data

• Raw VL data

• Application buffer• Conversion buffer

• Raw VL data

• On a write request, VL data goes through conversion and is written to • a global heap; elements of the same dataset may be written to different heaps.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 261

Variable length datasets and I/O

• File

• Global

heap

• Raw data

• Global

heap• Metadata cache

• Application memory

• Raw VL data

• Raw VL data

• Application buffer• Conversion buffer

• Raw VL data

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 262

VL chunked dataset in a file

• File

• Dataset header

• Chunk B-tree

• Dataset chunks• Heaps with VL data

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 263

Writing chunked VL datasets

• Dataset header

• …………

• Application memory• Metadata cache • B-tree nodes

• Chunk cache

• ………

• Conversion buffer• VL• data

• Raw data

• Global heap

• Chunk cache

• Data in applicati

on buffers

• File

• Filter pipeline

• hvl_t pointers • 1

• 2

• 3 • 4

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 264

Hints for variable length data I/O

• Avoid closing/opening a file while writing VL datasets • Global heap information is lost• Global heaps may have unused space

• Avoid alternately writing different VL datasets• Data from different datasets will go into to the

same heap• If maximum length of the record is known,

consider using fixed-length records and compression

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 265

Questions?

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 266

Parallel HDF5Tutorial

Albert Cheng

The HDF Group

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 267

Parallel HDF5Introductory Tutorial

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 268

Outline

• Overview of Parallel HDF5 design• Setting up parallel environment• Programming model for

• Creating and accessing a File• Creating and accessing a Dataset• Writing and reading Hyperslabs

• Parallel tutorial available at• http://www.hdfgroup.org/HDF5/Tutor/

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 269

Overview of Parallel HDF5 Design

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 270

PHDF5 Requirements

• Support MPI programming• PHDF5 files compatible with serial HDF5 files

• Shareable between different serial or parallel platforms

• Single file image to all processes• One file per process design is undesirable

• Expensive post processing• Not usable by different number of processes

• Standard parallel I/O interface• Must be portable to different platforms

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 271

PHDF5 Implementation Layers

Application

Parallel computing system (Linux cluster)Compute

node

I/O library (HDF5)

Parallel I/O library (MPI-I/O)

Parallel file system (GPFS)

Switch network/I/O servers

Computenode

Computenode

Computenode

Disk architecture & layout of data on disk

PHDF5 built on top of standard MPI-IO API

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 272

Parallel Environment Requirements

• MPI with MPI-IO. E.g.,• MPICH2 ROMIO• Vendor’s MPI-IO

• POSIX compliant parallel file system. E.g.,• GPFS• Lustre

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 273

MPI-IO vs. HDF5

• MPI-IO is an Input/Output API.• It treats the data file as a “linear byte stream”

and each MPI application needs to provide its own file view and data representations to interpret those bytes.

• All data stored are machine dependent except the “external32” representation.

• External32 is defined in Big Endianness• Little-endian machines have to do the data

conversion in both read or write operations.• 64bit sized data types may lose information.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 274

MPI-IO vs. HDF5 Cont.

• HDF5 is a data management software.• It stores the data and metadata according to

the HDF5 data format definition.• HDF5 file is self-described.• Each machine can store the data in its own

native representation for efficient I/O without loss of data precision.

• Any necessary data representation conversion is done by the HDF5 library automatically.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 275

How to Compile PHDF5 Applications

• h5pcc – HDF5 C compiler command• Similar to mpicc

• h5pfc – HDF5 F90 compiler command• Similar to mpif90

• To compile:• % h5pcc h5prog.c• % h5pfc h5prog.f90

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 276

h5pcc/h5pfc -show option

• -show displays the compiler commands and options without executing them, i.e., dry run

% h5pcc -show Sample_mpio.cmpicc -I/home/packages/phdf5/include \-D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE \-D_FILE_OFFSET_BITS=64 -D_POSIX_SOURCE \-D_BSD_SOURCE -std=c99 -c Sample_mpio.c

mpicc -std=c99 Sample_mpio.o \-L/home/packages/phdf5/lib \home/packages/phdf5/lib/libhdf5_hl.a \ /home/packages/phdf5/lib/libhdf5.a -lz -lm -Wl,-rpath \-Wl,/home/packages/phdf5/lib

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 277

Collective vs. Independent Calls

• MPI definition of collective call• All processes of the communicator must

participate in the right order. E.g.,• Process1 Process2• call A(); call B(); call A(); call B(); **right**• call A(); call B(); call B(); call A(); **wrong**

• Independent means not collective• Collective is not necessarily synchronous

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 278

Programming Restrictions

• Most PHDF5 APIs are collective• PHDF5 opens a parallel file with a communicator

• Returns a file-handle• Future access to the file via the file-handle• All processes must participate in collective PHDF5

APIs• Different files can be opened via different

communicators

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 279

Examples of PHDF5 API

• Examples of PHDF5 collective API• File operations: H5Fcreate, H5Fopen, H5Fclose• Objects creation: H5Dcreate, H5Dopen, H5Dclose• Objects structure: H5Dextend (increase dimension

sizes)• Array data transfer can be collective or

independent• Dataset operations: H5Dwrite, H5Dread• Collectiveness is indicated by function parameters,

not by function names as in MPI API

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 280

What Does PHDF5 Support ?

• After a file is opened by the processes of a communicator• All parts of file are accessible by all processes• All objects in the file are accessible by all

processes• Multiple processes may write to the same data

array• Each process may write to individual data array

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 281

PHDF5 API Languages

• C and F90 language interfaces• Platforms supported:

• Most platforms with MPI-IO supported. E.g.,• IBM SP, Linux clusters, SGI Altrix, Cray XT3, …

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 282

Programming model for creating and accessing a file

• HDF5 uses access template object (property list) to control the file access mechanism

• General model to access HDF5 file in parallel:• Setup MPI-IO access template (access

property list)• Open File • Access Data• Close File

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 283

Setup MPI-IO access template

Each process of the MPI communicator creates anaccess template and sets it up with MPI parallel access informationC:

herr_t H5Pset_fapl_mpio(hid_t plist_id, MPI_Comm comm, MPI_Info info);

F90:

h5pset_fapl_mpio_f(plist_id, comm, info) integer(hid_t) :: plist_id integer :: comm, info

plist_id is a file access property list identifier

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 284

C Example Parallel File Create

23 comm = MPI_COMM_WORLD; 24 info = MPI_INFO_NULL; 26 /* 27 * Initialize MPI 28 */ 29 MPI_Init(&argc, &argv); 30 /* 34 * Set up file access property list for MPI-IO access 35 */ ->36 plist_id = H5Pcreate(H5P_FILE_ACCESS); ->37 H5Pset_fapl_mpio(plist_id, comm, info); 38 ->42 file_id = H5Fcreate(H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, plist_id); 49 /* 50 * Close the file. 51 */ 52 H5Fclose(file_id); 54 MPI_Finalize();

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 285

F90 Example Parallel File Create

23 comm = MPI_COMM_WORLD 24 info = MPI_INFO_NULL 26 CALL MPI_INIT(mpierror) 29 ! 30 !Initialize FORTRAN predefined datatypes 32 CALL h5open_f(error) 34 ! 35 !Setup file access property list for MPI-IO access. ->37 CALL h5pcreate_f(H5P_FILE_ACCESS_F, plist_id, error) ->38 CALL h5pset_fapl_mpio_f(plist_id, comm, info, error) 40 ! 41 !Create the file collectively. ->43 CALL h5fcreate_f(filename, H5F_ACC_TRUNC_F, file_id, error, access_prp = plist_id) 45 ! 46 !Close the file. 49 CALL h5fclose_f(file_id, error) 51 ! 52 !Close FORTRAN interface 54 CALL h5close_f(error) 56 CALL MPI_FINALIZE(mpierror)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 286

Creating and Opening Dataset

• All processes of the communicator open/close a dataset by a collective callC: H5Dcreate or H5Dopen; H5DcloseF90: h5dcreate_f or h5dopen_f; h5dclose_f

• All processes of the communicator must extend an unlimited dimension dataset before writing to itC: H5DextendF90: h5dextend_f

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 287

C Example: Create Dataset

56 file_id = H5Fcreate(…); 57 /* 58 * Create the dataspace for the dataset. 59 */ 60 dimsf[0] = NX; 61 dimsf[1] = NY; 62 filespace = H5Screate_simple(RANK, dimsf, NULL); 63 64 /* 65 * Create the dataset with default properties collective. 66 */ ->67 dset_id = H5Dcreate(file_id, “dataset1”, H5T_NATIVE_INT, 68 filespace, H5P_DEFAULT);

70 H5Dclose(dset_id); 71 /* 72 * Close the file. 73 */ 74 H5Fclose(file_id);

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 288

F90 Example: Create Dataset

43 CALL h5fcreate_f(filename, H5F_ACC_TRUNC_F, file_id, error, access_prp = plist_id) 73 CALL h5screate_simple_f(rank, dimsf, filespace, error) 76 ! 77 ! Create the dataset with default properties. 78 ! ->79 CALL h5dcreate_f(file_id, “dataset1”, H5T_NATIVE_INTEGER, filespace, dset_id, error) 90 ! 91 ! Close the dataset. 92 CALL h5dclose_f(dset_id, error) 93 ! 94 ! Close the file. 95 CALL h5fclose_f(file_id, error)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 289

Accessing a Dataset

• All processes that have opened dataset may do collective I/O

• Each process may do independent and arbitrary number of data I/O access calls • C: H5Dwrite and H5Dread• F90: h5dwrite_f and h5dread_f

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 290

Programming model for dataset access

• Create and set dataset transfer property• C: H5Pset_dxpl_mpio

• H5FD_MPIO_COLLECTIVE• H5FD_MPIO_INDEPENDENT (default)

• F90: h5pset_dxpl_mpio_f• H5FD_MPIO_COLLECTIVE_F• H5FD_MPIO_INDEPENDENT_F (default)

• Access dataset with the defined transfer property

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 291

C Example: Collective write

95 /* 96 * Create property list for collective dataset write. 97 */ 98 plist_id = H5Pcreate(H5P_DATASET_XFER); ->99 H5Pset_dxpl_mpio(plist_id, H5FD_MPIO_COLLECTIVE); 100 101 status = H5Dwrite(dset_id, H5T_NATIVE_INT, 102 memspace, filespace, plist_id, data);

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 292

F90 Example: Collective write

88 ! Create property list for collective dataset write 89 ! 90 CALL h5pcreate_f(H5P_DATASET_XFER_F, plist_id, error) ->91 CALL h5pset_dxpl_mpio_f(plist_id, & H5FD_MPIO_COLLECTIVE_F, error) 92 93 ! 94 ! Write the dataset collectively. 95 ! 96 CALL h5dwrite_f(dset_id, H5T_NATIVE_INTEGER, data, & error, & file_space_id = filespace, & mem_space_id = memspace, & xfer_prp = plist_id)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 293

Writing and Reading Hyperslabs

• Distributed memory model: data is split among processes

• PHDF5 uses HDF5 hyperslab model• Each process defines memory and file hyperslabs• Each process executes partial write/read call

• Collective calls• Independent calls

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 294

Set up the Hyperslab for Read/Write

H5Sselect_hyperslab(filespace,H5S_SELECT_SET,offset, stride, count, block

)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 295

P0

P1File

Example 1: Writing dataset by rows

P2

P3

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 296

Writing by rows: Output of h5dump

HDF5 "SDS_row.h5" {GROUP "/" { DATASET "IntArray" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 8, 5 ) / ( 8, 5 ) } DATA { 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13 } } } }

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 297

Memory File

Example 1: Writing dataset by rows

count[0] = dimsf[0]/mpi_sizecount[1] = dimsf[1];offset[0] = mpi_rank * count[0]; /* = 2 */offset[1] = 0;

count[0]

count[1]

offset[0]

offset[1]

Process 1

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 298

Example 1: Writing dataset by rows

71 /* 72 * Each process defines dataset in memory and * writes it to the hyperslab 73 * in the file. 74 */ 75 count[0] = dimsf[0]/mpi_size; 76 count[1] = dimsf[1]; 77 offset[0] = mpi_rank * count[0]; 78 offset[1] = 0; 79 memspace = H5Screate_simple(RANK,count,NULL); 80 81 /* 82 * Select hyperslab in the file. 83 */ 84 filespace = H5Dget_space(dset_id); 85 H5Sselect_hyperslab(filespace, H5S_SELECT_SET,offset,NULL,count,NULL);

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 299

P0

P1

File

Example 2: Writing dataset by columns

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 300

Writing by columns: Output of h5dump

HDF5 "SDS_col.h5" {GROUP "/" { DATASET "IntArray" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 8, 6 ) / ( 8, 6 ) } DATA { 1, 2, 10, 20, 100, 200, 1, 2, 10, 20, 100, 200, 1, 2, 10, 20, 100, 200, 1, 2, 10, 20, 100, 200, 1, 2, 10, 20, 100, 200, 1, 2, 10, 20, 100, 200, 1, 2, 10, 20, 100, 200, 1, 2, 10, 20, 100, 200 } } } }

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 301

Example 2: Writing dataset by column

Process 1

Process 0

FileMemory

block[1]

block[0]

P0 offset[1]

P1 offset[1]stride[1]

dimsm[0]dimsm[1]

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 302

Example 2: Writing dataset by column

85 /*86 * Each process defines a hyperslab in * the file88 */89 count[0] = 1;90 count[1] = dimsm[1];91 offset[0] = 0;92 offset[1] = mpi_rank;93 stride[0] = 1;94 stride[1] = 2;95 block[0] = dimsf[0];96 block[1] = 1;9798 /*99 * Each process selects a hyperslab.100 */101 filespace = H5Dget_space(dset_id);

102 H5Sselect_hyperslab(filespace, H5S_SELECT_SET, offset, stride, count, block);

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 303

Example 3: Writing dataset by pattern

Process 0

Process 2

File

Process 3

Process 1

Memory

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 304

Writing by Pattern: Output of h5dump

HDF5 "SDS_pat.h5" {GROUP "/" { DATASET "IntArray" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 8, 4 ) / ( 8, 4 ) } DATA { 1, 3, 1, 3, 2, 4, 2, 4, 1, 3, 1, 3, 2, 4, 2, 4, 1, 3, 1, 3, 2, 4, 2, 4, 1, 3, 1, 3, 2, 4, 2, 4 } } } }

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 305

Process 2

File

Example 3: Writing dataset by pattern

offset[0] = 0;offset[1] = 1;count[0] = 4;count[1] = 2;stride[0] = 2;stride[1] = 2;

Memory

stride[0]

stride[1]

offset[1]

count[1]

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 306

Example 3: Writing by pattern

90 /* Each process defines dataset in memory and 91 * writes it to the hyperslab in the file. 92 */ 93 count[0] = 4; 94 count[1] = 2; 95 stride[0] = 2; 96 stride[1] = 2; 97 if(mpi_rank == 0) { 98 offset[0] = 0; 99 offset[1] = 0; 100 } 101 if(mpi_rank == 1) { 102 offset[0] = 1; 103 offset[1] = 0; 104 } 105 if(mpi_rank == 2) { 106 offset[0] = 0; 107 offset[1] = 1; 108 } 109 if(mpi_rank == 3) { 110 offset[0] = 1; 111 offset[1] = 1; 112 }

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 307

P0 P2 File

Example 4: Writing dataset by chunks

P1 P3

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 308

Writing by Chunks: Output of h5dump

HDF5 "SDS_chnk.h5" {GROUP "/" { DATASET "IntArray" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 8, 4 ) / ( 8, 4 ) } DATA { 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 3, 3, 4, 4, 3, 3, 4, 4, 3, 3, 4, 4, 3, 3, 4, 4 } } } }

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 309

Example 4: Writing dataset by chunks

FileProcess 2: Memory

block[0] = chunk_dims[0];block[1] = chunk_dims[1];offset[0] = chunk_dims[0];offset[1] = 0;

chunk_dims[0]

chunk_dims[1]

block[0]

block[1]

offset[0]

offset[1]

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 310

Example 4: Writing by chunks

97 count[0] = 1; 98 count[1] = 1 ; 99 stride[0] = 1; 100 stride[1] = 1; 101 block[0] = chunk_dims[0]; 102 block[1] = chunk_dims[1]; 103 if(mpi_rank == 0) { 104 offset[0] = 0; 105 offset[1] = 0; 106 } 107 if(mpi_rank == 1) { 108 offset[0] = 0; 109 offset[1] = chunk_dims[1]; 110 } 111 if(mpi_rank == 2) { 112 offset[0] = chunk_dims[0]; 113 offset[1] = 0; 114 } 115 if(mpi_rank == 3) { 116 offset[0] = chunk_dims[0]; 117 offset[1] = chunk_dims[1]; 118 }

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 311

Parallel HDF5Intermediate Tutorial

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 312

Outline

• Performance• Parallel tools

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 313

My PHDF5 Application I/O is slow

• If my application I/O performance is slow, what can I do?• Use larger I/O data sizes• Independent vs. Collective I/O• Specific I/O system hints• Increase Parallel File System capacity

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 314

Write Speed vs. Block Size

TFLOPS: HDF5 Write vs MPIO Write(File size 3200MB, Nodes: 8)

020406080

100120

1 2 4 8 16 32

Block Size (MB)

MB

/Se

c

HDF5 WriteMPIO Write

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 315

Independent vs. Collective Access

• User reported Independent data transfer mode was much slower than the Collective data transfer mode

• Data array was tall and thin: 230,000 rows by 6 columns

:::

230,000 rows:::

Debug Slow Parallel I/O Speed(1)

• Writing to one dataset• Using 4 processes == 4 columns• data type is 8 bytes doubles• 4 processes, 1000 rows == 4x1000x8 = 32,000

bytes• % mpirun -np 4 ./a.out i t 1000

• Execution time: 1.783798 s.• % mpirun -np 4 ./a.out i t 2000

• Execution time: 3.838858 s.• # Difference of 2 seconds for 1000 more rows =

32,000 Bytes.• # A speed of 16KB/Sec!!! Way too slow.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 316

Debug Slow Parallel I/O Speed(2)

• Build a version of PHDF5 with • ./configure --enable-debug --enable-parallel …• This allows the tracing of MPIO I/O calls in the

HDF5 library.• E.g., to trace

• MPI_File_read_xx and MPI_File_write_xx calls• % setenv H5FD_mpio_Debug “rw”

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 317

Debug Slow Parallel I/O Speed(3)

• % setenv H5FD_mpio_Debug ’rw’• % mpirun -np 4 ./a.out i t 1000 # Indep.; contiguous.• in H5FD_mpio_write mpi_off=0 size_i=96• in H5FD_mpio_write mpi_off=0 size_i=96• in H5FD_mpio_write mpi_off=0 size_i=96• in H5FD_mpio_write mpi_off=0 size_i=96• in H5FD_mpio_write mpi_off=2056 size_i=8• in H5FD_mpio_write mpi_off=2048 size_i=8• in H5FD_mpio_write mpi_off=2072 size_i=8• in H5FD_mpio_write mpi_off=2064 size_i=8• in H5FD_mpio_write mpi_off=2088 size_i=8• in H5FD_mpio_write mpi_off=2080 size_i=8• …• # total of 4000 of this little 8 bytes writes == 32,000 bytes.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 318

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 319

Independent calls are many and small

• Each process writes one element of one row, skips to next row, write one element, so on.

• Each process issues 230,000 writes of 8 bytes each.

• Not good==just like many independent cars driving to work, waste gas, time, total traffic jam.

:::

230,000 rows:::

Debug Slow Parallel I/O Speed (4)

• % setenv H5FD_mpio_Debug ’rw’• % mpirun -np 4 ./a.out i h 1000 # Indep., Chunked.• in H5FD_mpio_write mpi_off=0 size_i=96• in H5FD_mpio_write mpi_off=0 size_i=96• in H5FD_mpio_write mpi_off=0 size_i=96• in H5FD_mpio_write mpi_off=0 size_i=96• in H5FD_mpio_write mpi_off=3688 size_i=8000• in H5FD_mpio_write mpi_off=11688 size_i=8000• in H5FD_mpio_write mpi_off=27688 size_i=8000• in H5FD_mpio_write mpi_off=19688 size_i=8000• in H5FD_mpio_write mpi_off=96 size_i=40• in H5FD_mpio_write mpi_off=136 size_i=544• in H5FD_mpio_write mpi_off=680 size_i=120• in H5FD_mpio_write mpi_off=800 size_i=272• …• Execution time: 0.011599 s.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 320

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 321

Use Collective Mode or Chunked Storage

• Collective mode will combine many small independent calls into few but bigger calls==like people going to work by trains collectively.

• Chunks of columns speeds up too==like people live and work in suburbs to reduce overlapping traffics.

:::

230,000 rows:::

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 322

# of Rows Data Size(MB)

Independent (Sec.)

Collective (Sec.)

16384 0.25 8.26 1.72

32768 0.50 65.12 1.80

65536 1.00 108.20 2.68

122918 1.88 276.57 3.11

150000 2.29 528.15 3.63

180300 2.75 881.39 4.12

Independent vs. Collective write

6 processes, IBM p-690, AIX, GPFS

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 323

Independent vs. Collective write (cont.)

Performance (non-contiguous)

0

100

200

300

400

500

600

700

800

900

1000

0.00 0.50 1.00 1.50 2.00 2.50 3.00

Data space size (MB)

Tim

e (

s)

Independent

Collective

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 324

Effects of I/O Hints: IBM_largeblock_io

• GPFS at LLNL Blue• 4 nodes, 16 tasks• Total data size 1024MB• I/O buffer size 1MB

IBM_largeblock_io=false IBM_largeblock_io=trueTasks MPI-IO PHDF5 MPI-IO PHDF516 write (MB/S) 60 48 354 29416 read (MB/S) 44 39 256 248

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 325

• GPFS at LLNL ASCI Blue machine• 4 nodes, 16 tasks• Total data size 1024MB• I/O buffer size 1MB

0

50

100

150

200

250

300

350

400

MPI-IO PHDF5 MPI-IO PHDF5

IBM_largeblock_io=false IBM_largeblock_io=true

16 write

16 read

Effects of I/O Hints: IBM_largeblock_io

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 326

Parallel Tools

• ph5diff• Parallel version of the h5diff tool

• h5perf• Performance measuring tools showing

I/O performance for different I/O API

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 327

ph5diff

• An parallel version of the h5diff tool• Supports all features of h5diff• An MPI parallel tool• Manager process (proc 0)

• coordinates each the remaining processes (workers) to “diff” one dataset at a time;

• collects any output from each worker and prints them out.

• Works best if there are many datasets in the two files with few differences.

• Available in v1.8.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 328

h5perf

• An I/O performance measurement tool• Test 3 File I/O API

• POSIX I/O (open/write/read/close…)• MPIO (MPI_File_{open,write,read,close})• PHDF5

• H5Pset_fapl_mpio (using MPI-IO)• H5Pset_fapl_mpiposix (using POSIX I/O)

• An indication of I/O speed upper limits

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 329

h5perf: Some features

• Check (-c) verify data correctness• Added 2-D chunk patterns in v1.8• -h shows the help page.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 330

h5perf: example output 1/3 %mpirun -np 4 h5perf # Ran in a Linux systemNumber of processors = 4 Transfer Buffer Size: 131072 bytes, File size: 1.00 MBs # of files: 1, # of datasets: 1, dataset size: 1.00 MBs IO API = POSIX Write (1 iteration(s)): Maximum Throughput: 18.75 MB/s Average Throughput: 18.75 MB/s Minimum Throughput: 18.75 MB/s Write Open-Close (1 iteration(s)): Maximum Throughput: 10.79 MB/s Average Throughput: 10.79 MB/s Minimum Throughput: 10.79 MB/s Read (1 iteration(s)): Maximum Throughput: 2241.74 MB/s Average Throughput: 2241.74 MB/s Minimum Throughput: 2241.74 MB/s Read Open-Close (1 iteration(s)): Maximum Throughput: 756.41 MB/s Average Throughput: 756.41 MB/s Minimum Throughput: 756.41 MB/s

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 331

h5perf: example output 2/3 %mpirun -np 4 h5perf… IO API = MPIO Write (1 iteration(s)): Maximum Throughput: 611.95 MB/s Average Throughput: 611.95 MB/s Minimum Throughput: 611.95 MB/s Write Open-Close (1 iteration(s)): Maximum Throughput: 16.89 MB/s Average Throughput: 16.89 MB/s Minimum Throughput: 16.89 MB/s Read (1 iteration(s)): Maximum Throughput: 421.75 MB/s Average Throughput: 421.75 MB/s Minimum Throughput: 421.75 MB/s Read Open-Close (1 iteration(s)): Maximum Throughput: 109.22 MB/s Average Throughput: 109.22 MB/s Minimum Throughput: 109.22 MB/s

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 332

h5perf: example output 3/3 %mpirun -np 4 h5perf… IO API = PHDF5 (w/MPI-I/O driver) Write (1 iteration(s)): Maximum Throughput: 304.40 MB/s Average Throughput: 304.40 MB/s Minimum Throughput: 304.40 MB/s Write Open-Close (1 iteration(s)): Maximum Throughput: 15.14 MB/s Average Throughput: 15.14 MB/s Minimum Throughput: 15.14 MB/s Read (1 iteration(s)): Maximum Throughput: 1718.27 MB/s Average Throughput: 1718.27 MB/s Minimum Throughput: 1718.27 MB/s Read Open-Close (1 iteration(s)): Maximum Throughput: 78.06 MB/s Average Throughput: 78.06 MB/s Minimum Throughput: 78.06 MB/s Transfer Buffer Size: 262144 bytes, File size: 1.00 MBs # of files: 1, # of datasets: 1, dataset size: 1.00 MBs

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 333

Useful Parallel HDF Links

• Parallel HDF information sitehttp://www.hdfgroup.org/HDF5/PHDF5/

• Parallel HDF5 tutorial available athttp://www.hdfgroup.org/HDF5/Tutor/

• HDF Help email addresshelp@hdfgroup.org

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 334

Questions?

Parallel I/O Performance Study

(preliminary results)Albert Cheng

The HDF Group

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 335

Introduction

• Parallel performance affected by the I/O access pattern, file system, and MPI communication modes.

• Determination of interaction of these elements provides hints for improving performance.

• Study presents four test cases using h5perf and h5perf_serial.• h5perf has been extended to support parallel testing of

2D datasets.

• h5perf_serial, based on h5perf, allows serial testing of n-dimensional datasets and various file drivers.

• Testing includes various combinations of MPI communication modes and HDF5 storage layouts.

• Finally, we make recommendations that can improve the I/O performance for specific patterns.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 336

Testing Systems and Configuration

System Architecture File System MPI Implementation

abe Linux Cluster with Intel 64

Lustre MVAPICH2 1.0.2p1 Message Passing with Intel compiler

cobalt ccNUMA with Itanium 2

CXFS SGI Message Passing Toolkit 1.16

mercury Linux Cluster with Itanium 2

GPFS MPICH Myrinet 1.2.5..10, GM 2.0.8, Intel 8.0

Processors 4

Dataset Size 64K×64K (4GB)

I/O Selection 64MB per processor (shape depends on test case)

API HDF5 v181 (default building options)

Iterations 3

MPI/IO Type Collective / Independent

Storage Layout Contiguous / Chunked (chunk size depend on test case)

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 337

HDF5 Storage Layouts

• Contiguous• HDF5 assigns a static contiguous region of storage

for raw data.

Dataset File storage

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 338

HDF5 Storage Layouts

• Chunked• HDF5 define separate regions of storage for raw data

named chunks, which are pre-allocated in row-major order when a file is created in parallel.

• This layout is only valid when a file is created and the chunks are pre-allocated. Further modification of the file may cause the chunks to be arranged differently.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 339

C0 C1

C2 C3C0 C1 C2 C3

Test Cases

• Case A• The transfer selections extend over the entire columns

with a size of 64K×1K. If the storage is chunked, the size of the chunks is 1K×1K. The selections are interleaved horizontally with respect to the processors.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 340

P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3

64K

64K

1K

Test Cases

• Case B• The transfer selection only spans half the columns with a size of

32K×2K. If the storage is chunked, the size of the chunks is

2K×2K. The selections are interleaved horizontally with respect

to the processors.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 341

P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P332K …

2K

P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3…

64K

64K

Test Cases• Case C

• The transfer selections only span half the rows with a size of

2K×32K. If the storage is chunked, the size of the chunks is

2K×2K. The lower dimension (column) is evenly divided among

the processors.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 342

P0…P0P1…P1P2…P2P3…P3

P0…P0P1…P1P2…P2P3…P3

64K

64K

32K

2K

Test Cases• Case D

• The transfer selection extends over the entire rows with a size of 1K×64K. If the storage is chunked, the size of the chunks is 1K×1K. The lower dimension (column) is evenly divided among the processors.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 343

P0…P0P1…P1P2…P2P3…

64K

1K

64K

P3

Access Patterns

• Contiguous• Each processor retrieves a separate region of

contiguous storage. An example of this pattern is case D using contiguous storage.

• Non-contiguous• Separate regions are still assigned to each processor

but such regions contain gaps. Examples of this pattern include case C using contiguous storage, and collective cases C-D using chunked storage.

P0 P1 P2 P3

P0 … P1 P1 … P2 P2 … P3 P3 ...P0

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 344

Access Patterns

• Interleaved (or overlapped)• Each processor writes into many portions that are

interleaved with respect to the other processors. For example, using contiguous storage along with cases A-B generates

• Another instance results from using chunked storage with collective cases A-B

P0 P1 P2 P3 P0 P1 P2 P3 …

P0 P1 P2 P3 P0 P1 P2 P3 …

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 345

Performance Results and Analysis

• The results correspond to maximum throughput values of Write Open-Close operations during 3 iterations.

• Serial throughput is the performance baseline since our objective is to determine how parallel access can improve performance.

• Unlike GPFS and CXFS, Lustre does not stripe files by default. To enable parallel access, the directory / file must be striped using the command lfs.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 346

I/O Performance in Lustre

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 347

  NON-STRIPED STRIPED

COLLECTIVE Case A Case B Case C Case D Case A Case B Case C Case D

Contiguous 11.66 23.68 46.12 36.67 25.35 50.26 42.67 119.26

Chunked 179.85 117.31 124.88 106.95 180.33 224.28 86.88 93.45

                 INDEPENDENT Case A Case B Case C Case D Case A Case B Case C Case D

Contiguous 5.92 8.17 20.98 304.06 6.7 10.81 73.45 298.09

Chunked 219.15 328.04 12.15 8.16 158.9 133.27 12.94 10.51

I/O Performance in Lustre

• Striping partitions the file space into stripes and assigns them to several Object Storage Targets (OSTs) in round-robin fashion.

• Since each OST stores portions of the file that are different from the other OSTs, they all can access the file in parallel.

• The default configuration on abe uses a stripe size of 4MB and a stripe count of 16.

• Striping improves performance when the I/O request of each processor spans several stripes (and OSTs) after MPI aggregations, if any.

• When the processors make small independent I/O requests that are practically contiguous as cases A-B using chunked storage, a single OST can provide better performance due to asynchronous operations.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 348

I/O Performance

Case A Case B Case C Case D1

10

100

1000

abe

serial/cont

serial/chk

ind/cont

ind/chk

coll/cont

coll/chk

MB

/s

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 349

I/O Performance

Case A Case B Case C Case D1

10

100

1000

cobalt

serial/cont

serial/chk

ind/cont

ind/chk

coll/cont

coll/chk

MB

/s

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 350

I/O Performance

Case A Case B Case C Case D0.1

1

10

100

1000

mercury

serial/cont

serial/chk

ind/cont

ind/chk

coll/cont

coll/chk

MB

/s

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 351

Performance of Serial I/O

• Access using contiguous storage has the steepest performance trend as the cases change from A to D.

• When using chunked storage, the throughput remains almost constant at the upper bound.

• The allocation of chunks at the time they are written causes the access pattern to be virtually contiguous regardless of the test cases.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 352

Performance of Independent I/O

• Processors perform their I/O requests independently from each other.

• For contiguous storage, performance improves as the tests move from A to D.

• For chunked storage, throughput is high for interleaved cases A-B since writing blocks (chunks) become larger and caching is exploited. For cases C-D, the many writing requests (one per chunk) multiply the overhead due to unnecessary locking and caching in Lustre and CXFS.

• Unlike these file systems, GPFS has shown better scalability [1,2].

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 353

Performance of Collective I/O

• The participating processors coordinate and combine their many requests into fewer I/O operations reducing latency.

• Since the file space is evenly divided among the processors, no need for locking which reduces overhead [3].

• For contiguous storage, performance is overall high but there is still an increasing trend as the cases change from A to D.

• For chunked storage, the performance is even higher with minor variations among the tests cases because several chunks can be written with a single I/O operation.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 354

Conclusion

• Important to determine the access pattern by analyzing the I/O requirements of the application and the storage implementation.

• For contiguous access patterns, independent access is preferable because it omits unnecessary overhead of collective calls.

• For non-contiguous patterns, there is little difference between independent and collective access. However, writing many chunks in independent mode may be expensive in Lustre and CXFS if caching is not exploited.

• For interleaved access pattern, collective mode is usually faster.• For all the access patterns, collective mode and chunk storage

provide the combination that yields the highest average performance.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 355

References

1. J. Borrill, L. Oliker, J. Shalf, and H. Shan. Investigation of Leading HPC I/O Performance Using A Scientific-Application Derived Benchmark. In Proceedings of SC’07: High Performance Networking and Computing, Reno, NV, November 2007.

2. W. Liao, A. Ching, K. Coloma, A. Choudhary, and L. Ward. An Implementation and Evaluation of Client-Side File Caching for MPI-IO. In Proceedings - International Parallel and Distributed Processing Symposium, IPDPS 2007, IEEE International Volume, Issue 26-30, pages 1-10, March 2007.

3. R. Thakur, W. Gropp, and E. Lusk. Data Sieving and Collective I/O in ROMIO. In Proceedings of the 7th Symposium of the Frontiers of Massively Parallel Computation. IEEE Computer Society Press, February 1999.

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 356

March 9, 2009 10th International LCI Conference - HDF5 Tutorial 357

Questions?

Recommended