View
226
Download
0
Category
Preview:
Citation preview
April 28, 2008 LCI Tutorial 1
Introduction to HDF5 Tools
Tutorial
Part II
April 28, 2008 LCI Tutorial 2
Outline
• Overview of HDF5 tools• Using tools for problems troubleshooting
April 28, 2008 LCI Tutorial 3
HDF5 command-line tools
• Readers h5dump, h5diff, h5ls1.8 tools: h5check, h5stat
• Writersh5repack, h5repart, h5import, h5jam/h5unjam1.8 tools: h5copy, h5mkgrp
• Convertersh4toh5, h5toh4, gif2h5, h52gif
April 28, 2008 LCI Tutorial 4
h5dump
• Dumps the content of an HDF5 file to standard output and optionally to the following types of files
1. ASCII text file2. XML file3. Binary file
• Flags to remember -H to print header information -p to print objects’ properties -b to export data in a binary form -o to export data to a file (text by default) -y to skip printing indices -w to specify line width
April 28, 2008 LCI Tutorial 5
h5dump -H SDS.h5
HDF5 "SDS.h5" {
GROUP "/" {
GROUP "Floats" {
DATASET "FloatArray" {
DATATYPE H5T_IEEE_F32LE
DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) }
}
}
DATASET "IntArray" {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 5, 6 ) / ( 5, 6 ) }
}
}
}
April 28, 2008 LCI Tutorial 6
h5dump -d /Floats/FloatArray SDS.h5
HDF5 "SDS.h5" {
DATASET "/Floats/FloatArray" {
DATATYPE H5T_IEEE_F32LE
DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) }
DATA {
(0,0): 0.01, 0.02, 0.03,
(1,0): 0.1, 0.2, 0.3,
(2,0): 1, 2, 3,
(3,0): 10, 20, 30
}
}
}
April 28, 2008 LCI Tutorial 7
h5dump -x SDS.h5
April 28, 2008 LCI Tutorial 8
h5dump binary output
-b F, --binary=F The form of the binary output (F):• MEMORY -- for memory type
Data in a file will have the same data type as in memory
• FILE -- for the disk file type Data in a file will have the same data type as
corresponding dataset in an HDF5 file• LE -- for pre-defined little endian type
H5T_IEEE_F64LE• BE -- for pre-defined big endian type
H5T_STD_I32BE
April 28, 2008 LCI Tutorial 9
h5dump -d /IntArray -o out_le.bin -b LE SDS.h5
od --width=24 -t x4 out_le.bin0000000 00000000 00000001 00000002 00000003 00000004 00000005
0000030 0000000a 0000000b 0000000c 0000000d 0000000e 0000000f
0000060 00000014 00000015 00000016 00000017 00000018 00000019
0000110 0000001e 0000001f 00000020 00000021 00000022 00000023
0000140 00000028 00000029 0000002a 0000002b 0000002c 0000002d
Dumps a 32-bit integer dataset, IntArray, from SDS.h5 to a little endian binary file out_le.bin
April 28, 2008 LCI Tutorial 10
h5diff
Using h5diff, you can • compare two objects in the same file
• compare two objects between two files
• compare all objects between two files
April 28, 2008 LCI Tutorial 11
h5diff SDS.h5 SDS2.h5
• Dataset: </IntArray> and </IntArray>• 5 differences found
April 28, 2008 LCI Tutorial 12
h5diff SDS.h5 SDS2.h5 -r /IntArray
Dataset: </IntArray> and </IntArray>
positionIntArray IntArray difference
------------------------------------------------------------
[ 0 0 ] 0 10 10
[ 1 0 ] 10 100 90
[ 2 0 ] 20 200 180
[ 3 0 ] 30 300 270
[ 4 0 ] 40 400 360
5 differences found
April 28, 2008 LCI Tutorial 13
h5repack
• Copies an HDF5 file to a new file with/without compression/chunkingRemove un-used spaceApply compression filterApply layout
April 28, 2008 LCI Tutorial 14
h5repack: Applying filters
-f FILTER GZIP, to apply GZIP compression SZIP, to apply SZIP compression SHUF, to apply the HDF5 shuffle filter FLET, to apply the HDF5 checksum filter NBIT, to apply NBIT compression SOFF, to apply the HDF5 Scale/Offset filter NONE, to remove all filters
For exampleh5repack -i SDS2.h5 -o SDS2_compressed.h5 -f /IntArray:GZIP=9
Remember that if your data is smaller than 1K, compression will not
be applied, see -m flag
April 28, 2008 LCI Tutorial 15
h5repack: Data layout
-l LAYOUTCHUNK, to apply chunking layoutCOMPA, to apply compact layoutCONTI, to apply continuous layout
For exampleh5repack -i SDS.h5 -o SDS_chunk.h5
-l /Floats/FloatArray,/IntArray:CHUNK=2x3
April 28, 2008 LCI Tutorial 16
h5repart
Repartitions a file or family of files
For exampleh5repart -m 200m int16kx16k.h5 part200m%d.h5
977 MB
200 MB part200m0.h5
200 MB part200m1.h5
200 MB part200m2.h5
200 MB part200m3.h5
177 MB part200m1.h5
April 28, 2008 LCI Tutorial 17
h5import
Imports binary/ASCII data into an HDF5 fileh5import infile -c config_file [infile -c config_file2 ...] -outfile
outfile
Example:h5import float5x4x2.txt -c First_set.conf -o First_set.h5
PATH work/First-set INPUT-CLASS TEXTFP RANK 3 DIMENSION-SIZES 5 2 4 OUTPUT-CLASS FP OUTPUT-SIZE 64 OUTPUT-ARCHITECTURE IEEE OUTPUT-BYTE-ORDER LE CHUNKED-DIMENSION-SIZES 2 2 2 MAXIMUM-DIMENSIONS 8 8 -1
GROUP "/" { GROUP "work" { DATASET "First-set" { DATATYPE H5T_IEEE_F64LE DATASPACE SIMPLE { ( 5, 2, 4 ) / ( 8, 8, H5S_UNLIMITED ) } DATA { (0,0,0): 1.01, 1.02, 1.03, 1.04, (0,1,0): 1.11, 1.12, 1.13, 1.14, (1,0,0): 1.21, 1.22, 1.23, 1.24, (1,1,0): 1.31, 1.32, 1.33, 1.34, (2,0,0): 1.41, 1.42, 1.43, 1.44, (2,1,0): 1.51, 1.52, 1.53, 1.54, (3,0,0): 2.01, 2.02, 2.03, 2.04, (3,1,0): 2.11, 2.12, 2.13, 2.14, (4,0,0): 2.21, 2.22, 2.23, 2.24, (4,1,0): 2.31, 2.32, 2.33, 2.34 } } }}}
April 28, 2008 LCI Tutorial 18
h5jam/h5unjam
• Adds/removes a file at the beginning of an HDF5 file
• Example:
• h5jam -- adds text to User Blockh5jam -u test_ub.txt -i test_ub.h5
• h5unjam -- removes text from User Blockh5unjam -i test_ub.h5 -o out_ub.txt -o out_ub.h5
April 28, 2008 LCI Tutorial 19
h5ls
• Lists selected information about file objects in the specified format
Example: h5ls -r SDS2.h5
/Floats Group/Floats/DoubleArray Dataset {10, 5}/Floats/FloatArray Dataset {4, 3}/Floats/subs Group/IntArray Dataset {5, 6}
April 28, 2008 LCI Tutorial 20
gif2h5 / h52gif
• gif2h5 – Converts a GIF file into HDF5
gif2h5 apollo17_earth.gif apollo17_earth.h5• h52gif – Converts an HDF5 file into GIF
h52gif apollo17_earth.h5 apollo17_earth2.gif
-i /apollo17_earth.gif/Image0 -p "/apollo17_earth.gif/Global Palette"
April 28, 2008 LCI Tutorial 21
h5copy
• Copies an object from one location to another location within a file or across files
• Available in 1.8.0 and later
/
FloatArray
FloatsIntArray
/
FloatArray
April 28, 2008 LCI Tutorial 22
h5copy
usage: h5copy [OPTIONS] [OBJECTS...]• -i, --input input file name• -o, --output output file name• -s, --source source object name• -d, --destination destination object name• -f, --flag <value>
shallow Copy only immediate members for groups
soft Expand soft links into new objects
ext Expand external links into new objects
ref Copy objects that are pointed by references
noattr Copy object without copying attributes
April 28, 2008 LCI Tutorial 23
h5copy
Exampleh5copy -i SDS.h5 -o SDS_cp.h5 -s /Floats/FloatArray -
d /FloatArray
/
FloatArray
FloatsIntArray
/
FloatArray
SDS.h5
SDS_cp.h5
April 28, 2008 LCI Tutorial 24
h5copy -f shallow
/
i1
floatsintegers
64-bit
i2
f32 f2f1
/
floats
64-bitf32
f2f1
/
floats
64-bitf32
-f shallow
April 28, 2008 LCI Tutorial 25
h5copy -f soft
/
-f soft
dset_SL
/f1/f1
f1
/
dset_SL
/f1/f1
f1
/
dset_SL
/f1/f1
April 28, 2008 LCI Tutorial 26
h5copy -f ref
/
-f ref
d1
dset_ref
d2
1895
763
/
d1
dset_ref
d2
679
1287
/
dset_ref
0
0
April 28, 2008 LCI Tutorial 27
h5stat
• Prints different statistics about HDF5 file• Helps
To troubleshoot size overhead in HDF5 files To choose specific object’s properties and storage strategies
• Available in 1.8.0 and later
April 28, 2008 LCI Tutorial 28
h5check
Verifies if an HDF5 file is encoded according to the HDF5 File Format Specification
Does not use HDF5 library Serves as a watch dog that the HDF5 library implementation is
compliant with the HDF5 File Format Specification Tool is NOT a part of the HDF5 source code distribution
April 28, 2008 LCI Tutorial 29
How to use it?
h5check [-vn] <filename>-vn verboseness mode
n=0 Terse—only prints if the file is compliant or not
n=1 Default—prints its progress and all errors found
n=2 Verbose—prints everything it knows, usually for
debugging
April 28, 2008 LCI Tutorial 30
Example: a compliant file
% h5check example1.h5VALIDATING example1.h5FOUND super block signatureVALIDATING the super block at 0...VALIDATING the object header at 928...VALIDATING the btree at 384...FOUND btree signature.VALIDATING the local heap at 96...FOUND local heap signature.…Result: File is in compliance.
April 28, 2008 LCI Tutorial 31
Example: a non-compliant file
h5check invalid2.h5FOUND super block signatureVALIDATING the super block at 0...VALIDATING the object header at 928...VALIDATING the btree at 384...FOUND btree signature.VALIDATING the SNOD at 1248...FOUND SNOD signature.VALIDATING the object header at 976...check_sym(at 1248): Errors from check_obj_header()decode_validate_messages(): Failure in type->decode().H5O_sdspace_decode(): Bad version number in simple dataspace message.VALIDATING the local heap at 96...FOUND local heap signature.Main(): Errors from check_obj_header().decode_validate_messages(): Failure in type->decode().H5O_attr_decode(): Can't decode attribute dataspace.H5O_sdspace_decode(): Bad version number in simple dataspace message.…Result: File is not in compliance.
April 28, 2008 LCI Tutorial 32
Using HDF5 Tools for Performance Tuning and
Troubleshooting
April 28, 2008 LCI Tutorial 33
Introduction
• HDF5 tools may be very useful for performance tuning and troubleshooting Discover objects and their properties in HDF5 files
h5dump -p Get file size overhead information
h5stat Get locations of the objects in a file
h5ls Discover differences
h5diff, h5ls Location of raw data
h5ls –var
April 28, 2008 LCI Tutorial 34
h5stat
• Prints different statistics about HDF5 file• Helps
To troubleshoot size overhead in HDF5 files To choose specific object’s properties and storage strategies
• To use h5stat --helph5stat file.h5
• Full spec can be found http://www.hdfgroup.uiuc.edu/RFC/HDF5/h5stat/
• Let us know if you need some “special” type of statistics
April 28, 2008 LCI Tutorial 35
h5stat
• Reports two types of statistics:
• High-level information about objects (examples): Number of different objects (groups, datasets, datatypes) in a file Number of unique datatypes Size of raw data in a file
• Information about object’s structural metadata
• Sizes of structural metadata (total/free) Object headers, local and global heaps Sizes of B-trees
• Object headers fragmentation
April 28, 2008 LCI Tutorial 36
h5stat
• Examples of high-level information:
File information # of unique groups: 10008 # of unique datasets: 30 # of unique named datatypes: 0……………………Max. # of links to object: 1 Max. depth of hierarchy: 4 Max. # of objects in group: 19……………………Group bins: # of groups of size 0: 10000 # of groups of size 1 - 9: 7 # of groups of size 10 - 99: 1……………………
Max. dimension size of 1-D datasets: 1643……………………Dataset filters information: Number of datasets with ……………… SZIP filter: 2 ……………… NBIT filter: 10 USER-DEFINED filter: 1
April 28, 2008 LCI Tutorial 37
h5stat
• Conclusion:
• There are a lot of empty groups in the file; good candidate for compact group feature (h5repack -l ….)
• Some datasets use “user-defined” filters and may not be readable by HDF5 library
• SZIP compression is needed to read some datasets
Oh… my application uses buffers of size 1024 to read data…No wonder it crashes on reading…Do I have all filters needed to read the data?
April 28, 2008 LCI Tutorial 38
h5stat
• Examples of structural metadata information:Object header size: (total/unused) Groups: 1808/72 Datasets: 15792/832………Dataset storage information: Total raw data size: 6140688………Dataset datatype #3: Count (total/named) = (2/0) Size (desc./elmt) = (10/65535)Dataset datatype #4: Count (total/named) = (1/0) Size (desc./elmt) = (10/32000)
April 28, 2008 LCI Tutorial 39
• Conclusions• File size: 6228197• 1.5% overhead (not bad at all!)• There some elements of size 65535 and 32000
Oh… Is it really what I want?Should I use other datatype and get advantage of compression?
h5stat
April 28, 2008 LCI Tutorial 40
Case study: Using HDF5tools to debug a problem
• My application creates files on Windows with VS2005 and VS2003. I can read the VS2003 file but not the VS2005 one. H5dump reads both files OK and there are no differences. What am I doing wrong?
• h5diff good.h5 bad.h5 Datatype: </Definitions/timespec> and </Definitions/timespec> 1 differences
found
• h5ls –var good.h5 /Definitions/timespec Type Location: 0:1:0:900
• h5debug good.h5 900Message Information:Type class: compoundSize: 8 bytes
• h5debug bad.h5 900Message Information:Type class: compoundSize: 16 bytes
April 28, 2008 LCI Tutorial 41
• ConclusionsCompound datatype “timespec” requires
different number of bytes on VS2005 (16 bytes; 2x8bytes) and on VS2003 (8bytes; 2x4bytes)
Oh… How do I read my data back?I assumed that my struct would need only 8 bytes for each element but it needs 16 bytes on VS2005. I need H5Tget_native_type functionto find the type of my data in memory
Case study: Using HDF5tools to debug a problem
April 28, 2008 LCI Tutorial 42
Questions?
End of Part II
Recommended