N-bit and ScaleOffset filters

1

N-bit and ScaleOffset N-bit and ScaleOffset filtersfilters

MuQun Yang

National Center for Supercomputing ApplicationsUniversity of Illinois at Urbana-Champaign

Urbana, IL 61801contact: [email protected]

2

N-Bit Filter OutlineN-Bit Filter Outline

►DefinitionDefinition►An usage exampleAn usage example►LimitationsLimitations

33

N-Bit datatypeN-Bit datatypeA user-defined HDF5 datatype that can use A user-defined HDF5 datatype that can use

less bits than predefined datatype.less bits than predefined datatype.

Illustration of an N-Bit datatype on a little-endian machinebase type: integeroffset: 4precision: 16

| byte 3 | byte 2 | byte 1 | byte 0 ||????????|????SPPP|PPPPPPPP|PPPP????|

S - sign bit P - significant bit ? - padding bit

Without using N-bit filter, HDF5 saves the padding bits of N-bit datatype, no disk space is saved.

4

N-bit filterN-bit filter

►When using N-bit filter for N-bit When using N-bit filter for N-bit datatype,datatype,all padding bits will be chopped off during all padding bits will be chopped off during compression, and will be stored on disk like:compression, and will be stored on disk like:

|SPPPPPPP|PPPPPPPP|SPPPPPPP|PPPPPPPP||SPPPPPPP|PPPPPPPP|SPPPPPPP|PPPPPPPP|

5

Enable N-Bit filterEnable N-Bit filter

► Create a dataset creation property listCreate a dataset creation property list► Set chunking (and specify chunk dimensions)Set chunking (and specify chunk dimensions)► Set up use of the N-Bit filterSet up use of the N-Bit filter► Create dataset specifying this property listCreate dataset specifying this property list► Close property list Close property list

6

N-Bit filter: usage exampleN-Bit filter: usage example/*Define dataset datatype (N-Bit), and set precision, offset *//*Define dataset datatype (N-Bit), and set precision, offset */ datatype = H5Tcopy(H5T_NATIVE_INT);datatype = H5Tcopy(H5T_NATIVE_INT); precision = 17; precision = 17; H5Tset_precision(datatype,precision);H5Tset_precision(datatype,precision); offset = 4;offset = 4; if(H5Tset_offset(datatype,offset);if(H5Tset_offset(datatype,offset);

/* Set the dataset creation property list for N-Bit *//* Set the dataset creation property list for N-Bit */ chunk_size[0] = CH_NX;chunk_size[0] = CH_NX; chunk_size[1] = CH_NY;chunk_size[1] = CH_NY; properties = H5Pcreate (H5P_DATASET_CREATE);properties = H5Pcreate (H5P_DATASET_CREATE); H5Pset_chunk (properties, 2, chunk_size);H5Pset_chunk (properties, 2, chunk_size); H5Pset_nbit (properties);H5Pset_nbit (properties);

/* Create a new dataset with N-Bit datatype *//* Create a new dataset with N-Bit datatype */ dataset = H5Dcreate (file, DATASET_NAME, datatype, dataset = H5Dcreate (file, DATASET_NAME, datatype,

dataspace, dataspace, propertiesproperties););

7

N-bit filter restrictionsN-bit filter restrictions

► Only compresses N-Bit datatype or field Only compresses N-Bit datatype or field derived from integer or floating-pointderived from integer or floating-point

► Assumes padding bits of zero Assumes padding bits of zero ► No handling if fill value datatype differs from No handling if fill value datatype differs from

dataset datatypedataset datatype

8

ScaleOffset Filter OutlineScaleOffset Filter Outline

► DefinitionDefinition► How does ScaleOffset filter workHow does ScaleOffset filter work► Why uses ScaleOffset filterWhy uses ScaleOffset filter► Usage examplesUsage examples► Performance with EOS dataPerformance with EOS data► LimitationsLimitations

9

ScaleOffset filterScaleOffset filter► ScaleOffset compression performs a scale

and/or offset operation on each data value and truncates the resulting value to a minimum number of bits before storing it. The datatype is either integer or floating-point.

► offsetoffset in Scale-Offset compression means the in Scale-Offset compression means the minimum value of a set of data valuesminimum value of a set of data values

► If a fill value is defined for the dataset, the If a fill value is defined for the dataset, the filter will ignore it when finding the minimum filter will ignore it when finding the minimum valuevalue

10

How ScaleOffset worksHow ScaleOffset works

An example for Integer Type

► Maximum is 7065; Minimum is 2970Maximum is 7065; Minimum is 2970The "span" = Max-Min+1 = 4096

► Case 1: No fill value is defined.Minimum number of bits per element to store

= ceiling(log2(span)) = 12

► Case2: Fill value is defined in this array.Minimum number of bits per element to store = ceiling(log2(span+1)) = 13

11

How ScaleOffset worksHow ScaleOffset worksAn example for Integer Type (cont.)

► Compression:1. Subtract minimum value from each element2. Pack all data with minimum number of bits

► Decompression:1. Unpack all data2. Add minimum value to each element

► Outcome:1. Save about 60% disk space for this case

12

How ScaleOffset worksHow ScaleOffset worksAn example for Floating-point Type

► D-scaling factor: the number of decimal precision to keep for the filter output

► Floating-point data: {104.561, 99.459, 100.545, 105.644}

► D-scaling factor: 2► Preparation for Compression

1. Calculate the minimum value = 99.4592. Subtract the minimum value

Modified data: {5.102, 0, 1.086, 6.185}3. Scale the data by multiplying 10 ^ D-scaling factor = 100

Modified data: {510.2, 0, 108.6, 618.5}4. Round the data to integer

Modified data: {510 , 0, 109, 619}

13

How ScaleOffset worksHow ScaleOffset works

An example for Floating-point Type (cont.)

► Compression and Decompression: using ScaleOffset for integer

► Restoration after decompression1. Divide each value by 10^ D-scaling factor2. Add the offset 99.4593. The floating point data

{104.559, 99.459, 100.549, 105.649}► Outcome:

1. Lossy compression 2. Compression ratio will depend on D-scaling factor

14

Scale-Offset filter Scale-Offset filter compresses floating-point compresses floating-point

datadata► GRiB data packing methodGRiB data packing method► The Scale-Offset compression of floating-point The Scale-Offset compression of floating-point

data is data is lossylossy ► Two scaling methods: Two scaling methods:

D-scaling D-scaling E-scaling E-scaling

Currently only D-scaling is implementedCurrently only D-scaling is implemented

15

Why ScaleOffset Filter?Why ScaleOffset Filter?►Internal HDF5 filter► Easy to understand

Integer: lossless (by default) Floating-point: GRIB lossy compression

► Easy to control floating compression D-scaling factor

► Easy to estimate the compression ratio

16

H5Pset_scaleoffset APIH5Pset_scaleoffset API

► plist_id IN: Dataset creation property list plist_id IN: Dataset creation property list identifieridentifier

► scale_type IN: Flag indicating compression scale_type IN: Flag indicating compression methodmethod

H5Z_SO_FLOAT_DSCALE (0) Floating-point typeH5Z_SO_FLOAT_DSCALE (0) Floating-point type H5Z_SO_INT (2) Integer type H5Z_SO_INT (2) Integer type

► scale_factor IN: Flag indicating compression methodscale_factor IN: Flag indicating compression method If scale_type is H5Z_SO_FLOAT_DSCALE, If scale_type is H5Z_SO_FLOAT_DSCALE,

decimal scale factordecimal scale factor If scale_type is H5Z_SO_INT, If scale_type is H5Z_SO_INT,

scale_factor denotes minimum-bits, should be a scale_factor denotes minimum-bits, should be a positive integer or H5Z_SO_INT_MINBITS_DEFAULTpositive integer or H5Z_SO_INT_MINBITS_DEFAULT

H5Pset_scaleoffset (hid_t plist_id, H5Z_SO_scale_type_t H5Pset_scaleoffset (hid_t plist_id, H5Z_SO_scale_type_t scale_type, scale_type,

int scale_factor)int scale_factor)

17

Integer exampleInteger example /* Set the fill value of dataset *//* Set the fill value of dataset */ fill_val = 10000;fill_val = 10000; H5Pset_fill_value(properties, H5T_NATIVE_INT, H5Pset_fill_value(properties, H5T_NATIVE_INT,

&fill_val);&fill_val);

/* Set parameters for Scale-Offset /* Set parameters for Scale-Offset compression*/compression*/

H5Pset_scaleoffset (properties, H5Z_SO_INT H5Pset_scaleoffset (properties, H5Z_SO_INT H5Z_SO_INT_MINBITS_DEFAULT);H5Z_SO_INT_MINBITS_DEFAULT);

/* Create a new dataset *//* Create a new dataset */ dataset = H5Dcreate (file, DATASET_NAME, dataset = H5Dcreate (file, DATASET_NAME,

H5T_NATIVE_INT, dataspace, properties);H5T_NATIVE_INT, dataspace, properties);

18

Floating-point exampleFloating-point example fill_val = 10000.0;fill_val = 10000.0;

/* Set the fill value of dataset *//* Set the fill value of dataset */ H5Pset_fill_value(properties, H5T_NATIVE_FLOAT, H5Pset_fill_value(properties, H5T_NATIVE_FLOAT, &fill_val);&fill_val);

/* Set parameters for Scale-Offset compression; /* Set parameters for Scale-Offset compression; use D-scaling method,use D-scaling method, set decimal scale factor to 3 */set decimal scale factor to 3 */ H5Pset_scaleoffset (properties,H5Z_SO_FLOAT_DSCALE, 3);H5Pset_scaleoffset (properties,H5Z_SO_FLOAT_DSCALE, 3);

/* Create a new dataset *//* Create a new dataset */ dataset = H5Dcreate (file, DATASET_NAME, H5T_NATIVE_FLOAT,dataset = H5Dcreate (file, DATASET_NAME, H5T_NATIVE_FLOAT, dataspace, properties);dataspace, properties);

19

0

1

2

3

4

5

2.43 29.46 47.35 47.35 47.35 47.35 47.35 47.35 109.94 109.94

-9999 -1.27E30 0 -1.27E30 -1.27E30 -1.27E30 -1.27E30

DCR SCR RE

Co

mp

ressio

n R

ati

o &

Rela

tive E

ffic

ien

cy

Data Size(100KB)

Missing Value

Compression Ratio Comparison & Relative Efficiency (Scale-offset v.s. Deflate)(Data Type : H5_IEEE_F32)

DCR: Deflate Compression RatioSCR: Scale-offset Compression RatioRE : Relative EfficiencyRE=(1-SCR)/(1-DCR)

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o






20

LimitationsLimitations

Compressed floating-point data range is limited by the size of corresponding unsigned integer type.

Long double is not supported.

21

Thank youThank you

This work was funded by the NASA Earth Science Technology Office under NASA award AIST-02-0071 and UCAR Subaward No S03-38820. Other support was provided based upon the Cooperative Agreement with the National Aeronautics and Space Administration (NASA) under NASA grant NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NASA.

Documents

N-bit and ScaleOffset filters