56
www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012 HDF5 Workshop at PSI 1

Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

Embed Size (px)

Citation preview

Page 1: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.org

The HDF Group

HDF5 Datasets and I/O

Dataset storage and its effect on performance

May 30-31, 2012 HDF5 Workshop at PSI 1

Page 2: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 2

Outline

• Dataset metadata and array data storage layouts

• Types of dataset storage layouts• Factors affecting I/O performance

• I/O with compact datasets• I/O with contiguous datasets• I/O with chunked datasets• Variable length data and I/O

Page 3: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.org

HDF5 Layers

May 30-31, 2012

HDF5 Application

HDF5 Internals

VFD Layer

HDF5 file

Application buffer

HDF5 Object Layer (API) H5Dwrite is called

Data is prepared for I/O

SEC2 driver performs I/O

HDF5 Workshop at PSI 3

Page 4: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 4

Goal of this talk

• Present what is happening to data inside the HDF5 library

• Show how application can control the HDF5 library behavior

• Specifically:- Describe some basic operations and data

structures and explain how they affect performance and storage sizes

- Give some “recipes” for how to improve performance

Page 5: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.org

HDF5 DATASET METADATA

May 30-31, 2012 HDF5 Workshop at PSI 5

Page 6: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 6

HDF5 Dataset

• Data array• Also called raw data

• Metadata- Dataspace

- Rank, dimensions of dataset array

- Datatype- Information on how to interpret data

- Storage Properties- How array is organized on disk

- Attributes- User-defined metadata (optional)

Page 7: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 7

HDF5 dataset components

Dataset data arrayDataset header

Dataspace

3

Rank

Dim_2 = 5Dim_1 = 4

Dimensions

Time = 32.4

Pressure = 987

Temp = 56

Attributes

Chunked

Compressed

Dim_3 = 7

Storage info

IEEE 32-bit float

Datatype

Metadata Raw data

Page 8: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 8

HDF5 metadata

• HDF5 metadata• Information about HDF5 objects used by the

HDF5 library• Examples: object headers, B-tree nodes for

group, B-Tree nodes for chunks, heaps, super-block, etc.

• Usually small compared to raw data sizes (KB vs. MB-GB)

Page 9: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 9

HDF5 metadata cache

Application memory

Metadata cache (MDC)

HDF5 File

Dataset array data

HDF5 metadataDataset array data

Dataset header

Dataset header resides in MDC. MDC is handled by HDF5 library

Metadata is mixed with raw data in HDF5 file

Page 10: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 10

HDF5 metadata cache

• Metadata cache• Space allocated to handle pieces of the HDF5

metadata• Allocated by the HDF5 library in application’s

memory space• Allocated per file; released when file is closed• Metadata cache behavior affects overall

performance• Metadata cache implementation prior to HDF5

1.6.5 could cause performance degradation for some applications

Page 11: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.org

HDF5 DATASET STORAGE LAYOUTS

May 30-31, 2012 HDF5 Workshop at PSI 11

Page 12: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 12

HDF5 datasets storage layouts

• Contiguous• External • Chunked• Compact

Page 13: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 13

Contiguous storage layout

• Contiguous storage layout is a default storage layout for an HDF5 dataset

• Dataset raw data is stored in one contiguous block in HDF5 file

Page 14: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 14

Contiguous storage layout

Application memory

Metadata cache (MDC)

Dataset array dataDataset header

HDF5 File

Dataset array data

Dataset header

Raw data is stored in one contiguous block in HDF5 file

Page 15: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 15

External storage layout

• Dataset raw data is stored in an external file(s) that should be kept together with the HDF5 file

• Layout in the external file is specified by an application

• An easy way to make legacy data available to HDF5 library

Page 16: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 16

External storage layout

Metadata cache (MDC)

Dataset array dataDataset header

HDF5 file Unix/Windows file

Metadata is stored in HDF5 file. Raw data is stored in a separate file as specified by application

Dataset header

Application memory

Page 17: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 17

Chunked storage layout

• Chunking – storage layout where a dataset is partitioned in fixed-size multi-dimensional tiles or chunks

• Each chunk is stored as contiguous block• HDF5 library treats each chunk as atomic

object for I/O• Greatly affects performance and file sizes• Use for extendible datasets and datasets with

filters applied (checksum, compression)• Use for sub-setting of big datasets

Page 18: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 18

Chunked storage layout

Application memory

Metadata cache (MDC) Dataset array data

Dataset header

HDF5 File

Dataset header

Chunkindex

A B C D

ChunkindexC ABD

Raw data is stored in separate chunks in HDF5 file

Page 19: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 19

Compact storage layout

• Raw data is stored in a dataset object header• Raw data read/written with the header• Use for small (few K) datasets to minimize

small I/O operations

Page 20: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 20

Compact storage layout

Application memory

Metadata cache (MDC)

Dataset array dataDataset header

HDF5 File Dataset header

Raw data is stored in a dataset object header

Dataset array data

Page 21: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.org

FACTORS AFFECTING I/O PERFORMANCE

May 30-31, 2012 HDF5 Workshop at PSI 21

Page 22: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 22

HDF5 data structures

• Data structures used by HDF5 library• B-trees (groups, dataset chunks)• Hash tables• Local and global heaps (variable length

data: link names, strings, etc.)• Other concepts

• HDF5 metadata cache• HDF5 chunk cache • Free space management data structure• Etc.

Page 23: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 23

Operations on data inside HDF5 library

• Copying to/from internal buffers• Datatype conversion, e.g.,

• Float to integer• Little-endian to big-endian• 64-bit integer to 16-bit integer• Variable-length data conversion from memory

to file• Scattering - gathering

• Data is scattered/gathered from/to application buffers into internal buffers for datatype conversion and partial I/O

Page 24: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 24

Operations on data inside HDF5 library

• Data transformation (filters, compression)- Checksum on raw data and metadata- Algebraic transform- GZIP and SZIP compressions- HDF5 and user-defined data transformations

Page 25: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 25

I/O performance

• I/O performance depends on many factors• Storage layouts• Dataset storage properties• Chunking strategy• Metadata cache performance• Datatype conversion performance• Other filters, such as compression• Access patterns

Page 26: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.org

I/O WITH DIFFERENT STORAGE LAYOUTS

May 30-31, 2012 HDF5 Workshop at PSI 26

Page 27: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.org

WRITING COMPACT DATASET

May 30-31, 2012 HDF5 Workshop at PSI 27

Page 28: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 28

Writing compact dataset

Application memory

Metadata cache (MDC)

Dataset array dataDataset header

HDF5 File Dataset header

Raw data is written when object header is written

Page 29: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.org

WRITING CONTIGUOUS DATASET

May 30-31, 2012 HDF5 Workshop at PSI 29

Page 30: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 30

Writing contiguous dataset

Application memory

Metadata cache (MDC)

Dataset array dataDataset header

HDF5 File

Dataset array data

Dataset header

Raw data is written first. The header is written when flushed to file (H5Dclose, H5Fflush, or MDC flush done by the HDF5 library)

Page 31: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 31

Writing contiguous dataset with conversion

Application memory

Metadata cache (MDC) Dataset array data

Dataset header

HDF5 File Dataset header

Raw data goes through conversion buffer. The header is written when flushed to file (H5Dclose, H5Fflush, or MDC flush done by HDF5 library)

1MB conversion buffer

Page 32: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.org

PARTIAL I/O FOR CONTIGUOUS DATASET

May 30-31, 2012 HDF5 Workshop at PSI 32

Page 33: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 33

Sub-setting of contiguous datasetSeries of adjacent rows

HDF5 File

Application data in memory

Subset is contiguous in file

One I/O operation

M rows

M rowsN

N elements

Page 34: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 34

Sub-setting of contiguous datasetAdjacent, partial rows

HDF5 File

Application data in memory

Subset is in M contiguous blocks in file

Several I/O operation

M rows

M rows

N elements

N elements

Page 35: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 35

Sub-setting of contiguous datasetExtreme case: writing a column

HDF5 File

Application data in memory

Subset data is scattered in a file in M different locations

Several small I/O operation

M rows

1 element

1element

Page 36: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 36

Sub-setting of contiguous datasetData sieve buffer

HDF5 File

M

Application data in memory

1 element

Data is copied to a sieve buffer in memory (64K)memcopy

One write operation

Page 37: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 37

Performance tuning for contiguous dataset

• Datatype conversion• Avoid for better performance• Use H5Pset_buffer function to customize

conversion buffer size• Partial I/O

• Write/read in big contiguous blocks • Use H5Pset_sieve_buf_size to improve

performance for complex sub-setting• Caution:

• Sieve buffer is allocated when the first write occurs and is released when the dataset is closed.

• Memory will grow if there are a lot opened datasets.

Page 38: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.org

I/O FOR CHUNKED DATASET

May 30-31, 2012 HDF5 Workshop at PSI 38

Page 39: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 39

Recall: Chunked storage layout

Application memory

Metadata cache (MDC) Dataset array data

Dataset header

HDF5 File

Dataset header

Chunkindex

A B C D

ChunkindexC ABD

Raw data is stored in separate chunks in HDF5 file

Page 40: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 40

HDF5 chunking

• HDF5 library treats each chunk as atomic object• Compression is applied to each chunk• Datatype conversion, other filters applied per

chunk• Chunk size greatly affects performance

• Chunk overhead adds to file size• Chunk processing involves many steps

Page 41: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 41

HDF5 chunk cache

• Chunk cache (general points, details later)• Caches chunks for better performance; remains

allocated across multiple calls• Created for each chunked dataset• Size of chunk cache is set for file (default size

1MB)• Each chunked dataset has its own chunk cache• Chunk may be too big to fit into cache• Memory may grow if application keeps opening

datasets

Page 42: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 42

HDF5 chunk cache

Application memory

Metadata cache

Default size is 1MB

Metadata cache (MDC)

Dataset header

Chunking B-tree nodes

Chunk caches (per dataset)

Page 43: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 43

Writing chunked dataset

C BA

Datatype conversion is performed before chunked placed in cacheChunk is written when evicted from cacheCompression and other filters are applied on eviction

AB C

C

HDF5 File

Chunk cacheChunked dataset

Filter pipeline

Application memory space

Conversion buffer

Page 44: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.org

PARTIAL I/O FOR CHUNKED DATASET

May 30-31, 2012 HDF5 Workshop at PSI 44

Page 45: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 45

Partial I/O for chunked dataset

• Example: write the green subset from the dataset , converting the data

• Dataset is stored as six chunks in the file.• The subset spans four chunks, numbered 1-4 in the figure.• Hence four chunks must be written to the file.• But first, the four chunks must be read from the file, to

preserve those parts of each chunk that are not to be overwritten.

1 2

3 4

Page 46: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 46

Partial I/O for chunked dataset

• For each of the four chunks:• Read chunk from file into chunk

cache, unless it’s already there.• Determine which part of the chunk will

be replaced by the selection.• Move those elements to conversion

buffer and perform conversion• Move data elements to write from

application buffer to conversion buffer• Move those elements back from

conversion buffer to chunk cache.• Apply filters (compression) when

chunk is flushed from chunk cache• For each element 3 memcopy

performed

1 2

3 4

1 2

3 4

Page 47: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 47

Partial I/O for chunked dataset

3

Conversion buffer

memcopy

memcopy

Application memory

Chunk cache

HDF5 File Chunk

Compress and write to file

memcopy

Page 48: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.org

I/O FOR VARIABLE-LENGTH DATASET

May 30-31, 2012 HDF5 Workshop at PSI 48

Page 49: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 49

Examples of variable length data

• String A[0] “the first string we want to write”

…………………………………

A[N-1] “the N-th string we want to write”• Each element is a record of variable-length

A[0] (1,1,0,0,0,5,6,7,8,9) [length = 10]

A[1] (0,0,110,2005) [length = 4]

………………………..

A[N] (1,2,3,4,5,6,7,8,9,10,11,12,….,M) [length = M]

Page 50: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 50

Variable length data in HDF5

• Variable length description in HDF5 application

typedef struct { size_t length; void *p;}hvl_t;

• Base type can be any HDF5 type

H5Tvlen_create(base_type)• ~ 20 bytes overhead for each element• Data cannot be compressed

Page 51: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 51

How variable length data is stored in HDF5

Global heap

Actual variable length data

Dataset with variable length elements

Pointer intoglobal heap

HDF5 File

Dataset header

Page 52: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 52

Variable length datasets and I/O

• Elements from application buffer “transferred” to/from heaps in the metadata cache during I/O

Global heap

Application bufferRaw data

Metadata cache

Pointers

Page 53: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 53

There may be more than one global heap

Global heap

Application bufferRaw data

Global heap

Pointers

Page 54: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 54

VL dataset and I/O

Global heap

Application buffer

Global heap

HDF5 File

Memory

Conversion buffers

Page 55: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.orgMay 30-31, 2012 HDF5 Workshop at PSI 55

Hints for variable length data I/O

• Avoid closing/opening a file while writing VL datasets • Global heap information is lost• Global heaps may have unused space

• Avoid alternately writing different VL datasets• Data from different datasets will go into to the

same heap• If maximum length of the record is known,

consider using fixed-length records and compression

Page 56: Www.hdfgroup.org The HDF Group HDF5 Datasets and I/O Dataset storage and its effect on performance May 30-31, 2012HDF5 Workshop at PSI 1

www.hdfgroup.org

The HDF Group

HDF5 Workshop at PSI 56

Thank You!

Questions?

May 30-31, 2012