1
NASA EOS DATA COMPRESSION WITH HDF5 SCALEOFFSET FILTER NASA EOS DATA COMPRESSION WITH HDF5 SCALEOFFSET FILTER This work was funded by the NASA Earth Science Technology Office under NASA award AIST-02-0071 and UCAR Subaward No S03-38820. Other support was provided based upon the Cooperative Agreement with the National Aeronautics and Space Administration (NASA) under NASA grant NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NASA. For more information or documentation, contact For more information or documentation, contact [email protected] [email protected] Muqun Yang, Fang Guo, Xiaowen Wu, Robert E. McGrath, Quincey Koziol National Center for Supercomputing Applications University of Illinois, Urbana-Champaign •ScaleOffset compression performs a scale and/or offset operation on each data value and truncates the resulting value to a minimum number of bits before storing it. The datatype is either integer or floating-point. •Offset in Scale-Offset compression means the minimum value of a set of data values. •If a fill value is defined for the dataset, the filter will ignore it when finding the minimum value. An example for Integer Type Dataset : : 32-bit integer array Maximum is 7065; Minimum Maximum is 7065; Minimum is 2970 is 2970 The "span" of dataset values = Max-Min+1 = 4076 Case 1: No fill value is defined. Minimum number of bits per element to store = ceiling(log2(span)) = 12 Case2: Fill value is defined in this array. Minimum number of bits per element to store = ceiling(log2(span+1)) = 13 Compression: 1. Subtract each element from minimum value. 2. Pack all data with minimum number of bits. Decompression: 1. Unpack all data. 2. Add each element to minimum value. Outcome: It can save about 60% disk space for this case. An example for Floating-point Type D-scaling factor: the decimal precision to keep for the filter output Floating-point data: {104.561, 99.459, 100.545, 105.644} D-scaling factor: 2 Preparation for Compression Step 1: Determine the minimum value = 99.459 Step 2: Subtract the minimum value from each data element Modified data: {5.102, 0, 1.086, 6.185} Step 3: Scale the data by multiplying 10 ^ D- scaling factor = 100 Modified data: {510.2, 0, 108.6, 618.5} Step 4: Round the data to integer Modified data: {510 , 0, 109, 619} Compression and Decompression: using ScaleOffset for integer Restoration after decompression Step 1: Divide each value by 10^ D-scaling factor Step 2: Add the offset 99.459 The floating point data {104.559, 99.459, 100.549, 105.649} Outcome: Produces lossy compression. Compression ratio will depend on D-scaling factor. Internal HDF5 filter Easy to understand - Integer: lossless - Floating-point: GRIB lossy compression Easy to control floating compression - D-scaling factor Easy to estimate the outcome How does ScaleOffset work? What is the ScaleOffset filter? Why use ScaleOffset filter? • ScaleOffset Filter can gain better compression ratio for floating-point data • ScaleOffset Filter can use more encoding time and decoding time. • Knowledge of data characteristics benefits the usage of ScaleOffset filter. EOS Data Compression Performance Comparison EOS Data Compression Performance Comparison ScaleOffset vs. Deflate Filter ScaleOffset vs. Deflate Filter Limitations Compressed floating-point data range is limited by the size of corresponding unsigned integer type. • Long double is not supported. 1 2 3 4 5 6 7 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Inform ation on D ataS et: D ata S ize :(2030,660) C om m on V alue:189.67592 M issing V alue :3.402824E+38 SCR D C R w ith C om pression Level6 C o m p re s s io n R a tio S cale Factor C om pression R atio v.s.S cale F acto r (S cale-o ffset) (D ata Type :H 5_IE E E _F 32) O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o 1 2 3 4 5 6 7 0.0 0.2 0.4 0.6 0.8 1.0 1.2 SET D E T w ith C om pression Level6 E n c o d in g T im e (s ) D ata P o in ts E n co d in g T im e v.s.S cale F acto r (S cale-o ffset) (D ata T ype :H 5_IE E E _F 32) Inform ation on D ataS e t: D a ta S ize :(2030,660) C om m on V alue:189.67592 M issing V a lue :3.402824E+38 O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 2.43 29.46 47.35 47.35 47.35 47.35 47.35 47.35 109.94 109.94 -9999 -1.27E 30 0 -1.27E 30 -1.27E 30 -1.27E 30 -1.27E 30 DCR SCR RE C o m p re s s io n R a t io & R e la t iv e E ffic ie n cy D ata P o in ts Miss Val. : Data Size(100KB): C om pression R atio C om p ariso n & R elative E fficien cy (S cale-o ffset v.s.D eflate) (D ata T ype :H 5_IE E E _F 32) D C R :D eflate C om pression R atio S C R :S cale-offsetC om pression R atio RE : R elative E fficiency R E =(1-S C R )/(1-D C R ) O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 D C R :D eflate C om pression R atio S C R :S cale-offsetC om pression R atio DCR SCR C o m p re s s io n R a tio C om pression R atio v.s.E n co d in g T im e (S cale-o ffset v.s.D eflate) (D ata T ype :H 5T _ IE E E _F 32 ) E n co d in g T im e (s) O r ig in D em o O rig in D em o O rig in D em o O r ig in D em o O rig in D em o O rig in D em o O r ig in D em o O rig in D em o O rig in D em o O r ig in D em o O rig in D em o O rig in D em o O r ig in D em o O rig in D em o O rig in D em o O r ig in D em o O rig in D em o O rig in D em o O r ig in D em o O rig in D em o O rig in D em o 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 D D T:D eflate D ecoding T im e S D T:S cale-offsetD ecoding T im e DDT SDT D e c o d in g T im e (s ) E ncooding T im e (s) D eco d in g T im e v.s.E n co d in g T im e (S cale-o ffset & D eflate) (D ata T ype :H 5_IE E E _F 32) O r ig in D e m o O r ig in D e m o O r ig in D em o O r ig in D e m o O r ig in D e m o O r ig in D em o O r ig in D e m o O r ig in D e m o O r ig in D em o O r ig in D e m o O r ig in D e m o O r ig in D em o O r ig in D e m o O r ig in D e m o O r ig in D em o O r ig in D e m o O r ig in D e m o O r ig in D em o 0 1 2 3 4 5 6 7 8 9 10 11 12 0.0 0.5 1.0 4 .38 4.38 4.38 5 .5 5.5 10.46 10.46 10.46 43.98 54.97 146.68 -32768 0 0 DCR SCR C o m p re s s io n R a tio D ata P o in ts Miss Val. : Data Size(MB): C om pression R atio C om p ariso n (S cale-o ffset v.s.D eflate) (D ata T ype :H 5T_S TD _I16) O r ig in D em o O r ig in D e m o O r ig in D em o O r ig in D em o O r ig in D e m o O r ig in D em o O r ig in D em o O r ig in D e m o O r ig in D em o O r ig in D em o O r ig in D e m o O r ig in D em o O r ig in D em o O r ig in D e m o O r ig in D em o O r ig in D em o O r ig in D e m o O r ig in D em o O r ig in D em o O r ig in D e m o O r ig in D em o 0 1 2 3 4 5 6 7 8 9 10 11 12 -2 0 2 4 6 8 10 4.38 4.3 8 4.38 5.5 5.5 10.46 10.46 10.46 43.98 54.97 146.68 -32768 0 0 DDT SDT D e c o d in g T im e D ata P o in ts Miss Val. : Data Size(MB): D eco d in g T im e C om p ariso n (S cale-o ffset v.s.D eflate) (D ata T ype :H 5T_S TD _I16) O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o 0 1 2 3 4 5 6 7 8 9 10 11 12 -2 0 2 4 6 8 10 12 4.38 4.38 4.38 5.5 5.5 10.46 10.46 10.46 43.98 54.97 146.68 -32768 0 0 DET SET E n c o d in g T im e D ata P o in ts Miss Val. : Data Size(MB): E n co d in g T im e C om p ariso n (S cale-o ffset v.s.D eflate) (D ata T ype :H 5T_S TD _I16) O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o O r ig in D em o Conclusions Conclusions

NASA EOS DATA COMPRESSION WITH HDF5 SCALEOFFSET FILTER This work was funded by the NASA Earth Science Technology Office under NASA award AIST-02-0071 and

Embed Size (px)

Citation preview

Page 1: NASA EOS DATA COMPRESSION WITH HDF5 SCALEOFFSET FILTER This work was funded by the NASA Earth Science Technology Office under NASA award AIST-02-0071 and

NASA EOS DATA COMPRESSION WITH HDF5 SCALEOFFSET FILTERNASA EOS DATA COMPRESSION WITH HDF5 SCALEOFFSET FILTER

This work was funded by the NASA Earth Science Technology Office under NASA award AIST-02-0071 and UCAR Subaward No S03-38820. Other support was provided based upon the Cooperative Agreement with the National Aeronautics and Space Administration (NASA) under NASA grant NNG05GC60A.  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NASA. 

For more information or documentation, contactFor more information or documentation, [email protected]@ncsa.uiuc.edu

Muqun Yang, Fang Guo, Xiaowen Wu, Robert E. McGrath, Quincey Koziol

National Center for Supercomputing ApplicationsUniversity of Illinois, Urbana-Champaign

• ScaleOffset compression performs a scale and/or offset operation on each data value and truncates the resulting value to a minimum number of bits before storing it. The datatype is either integer or floating-point.

• Offset in Scale-Offset compression means the minimum value of a set of data values.• If a fill value is defined for the dataset, the filter will ignore it when finding the minimum value.

An example for Integer Type

Dataset: : 32-bit integer array Maximum is 7065; Minimum is 2970Maximum is 7065; Minimum is 2970 The "span" of dataset values

= Max-Min+1 = 4076

Case 1: No fill value is defined. Minimum number of bits per element to store

= ceiling(log2(span)) = 12

Case2: Fill value is defined in this array.

Minimum number of bits per element to store = ceiling(log2(span+1)) = 13

Compression:1. Subtract each element from minimum value.2. Pack all data with minimum number of bits.

Decompression:1. Unpack all data.2. Add each element to minimum value.

Outcome:It can save about 60% disk space for this case.

An example for Floating-point Type

D-scaling factor: the decimal precision to keep for the filter output

Floating-point data: {104.561, 99.459, 100.545, 105.644}

D-scaling factor: 2Preparation for Compression

Step 1: Determine the minimum value = 99.459Step 2: Subtract the minimum value from each data element

Modified data: {5.102, 0, 1.086, 6.185}

Step 3: Scale the data by multiplying 10 ^ D-scaling factor = 100 Modified data: {510.2, 0, 108.6, 618.5}

Step 4: Round the data to integer

Modified data: {510 , 0, 109, 619}

Compression and Decompression: using ScaleOffset for integer Restoration after decompression

Step 1: Divide each value by 10^ D-scaling factor

Step 2: Add the offset 99.459

The floating point data {104.559, 99.459, 100.549, 105.649}

Outcome:Produces lossy compression. Compression ratio will depend on D-scaling factor.

• Internal HDF5 filter• Easy to understand

- Integer: lossless- Floating-point: GRIB lossy compression

• Easy to control floating compression- D-scaling factor

• Easy to estimate the outcome

How does ScaleOffset work?

What is the ScaleOffset filter?

Why use ScaleOffset filter?

• ScaleOffset Filter can gain better compression ratio for floating-point data.• ScaleOffset Filter can use more encoding time and decoding time. • Knowledge of data characteristics benefits the usage of ScaleOffset filter.

EOS Data Compression Performance ComparisonEOS Data Compression Performance ComparisonScaleOffset vs. Deflate Filter ScaleOffset vs. Deflate Filter

Limitations

• Compressed floating-point data range is limited by the size of corresponding unsigned integer type.

• Long double is not supported.

1 2 3 4 5 6 70.0

0.2

0.4

0.6

0.8

1.0

1.2

Information on DataSet:Data Size : (2030,660)Common Value: 189.67592Missing Value : 3.402824E+38

SCR DCR with Compression Level 6

Co

mp

ressio

n R

ati

o

Scale Factor

Compression Ratio v.s. Scale Factor (Scale-offset)(Data Type : H5_IEEE_F32)

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

1 2 3 4 5 6 70.0

0.2

0.4

0.6

0.8

1.0

1.2

SET DET with Compression Level 6

En

co

din

g T

ime (

s)

Data Points

Encoding Time v.s. Scale Factor (Scale-offset)(Data Type : H5_IEEE_F32)

Information on DataSet:Data Size : (2030,660)Common Value: 189.67592Missing Value : 3.402824E+38

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

0 1 2 3 4 5 6 7 8 9 10 110

1

2

3

4

5

2.43 29.46 47.35 47.35 47.35 47.35 47.35 47.35 109.94 109.94-9999 -1.27E30 0 -1.27E30 -1.27E30 -1.27E30 -1.27E30

DCR SCR RE

Co

mp

ressio

n R

ati

o &

Rela

tive E

ffic

ien

cy

Data Points Miss Val. :Data Size(100KB):

Compression Ratio Comparison & Relative Efficiency (Scale-offset v.s. Deflate)(Data Type : H5_IEEE_F32)

DCR: Deflate Compression RatioSCR: Scale-offset Compression RatioRE : Relative EfficiencyRE=(1-SCR)/(1-DCR)

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

DCR: Deflate Compression RatioSCR: Scale-offset Compression Ratio

DCR SCR

Co

mp

ressio

n R

ati

o

Compression Ratio v.s. Encoding Time (Scale-offset v.s. Deflate)(Data Type : H5T_IEEE_F32)

Encoding Time (s)

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

0.0 0.5 1.0 1.5 2.0 2.5 3.00.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

DDT: Deflate Decoding TimeSDT: Scale-offset Decoding Time

DDT SDT

Deco

din

g T

ime (

s)

Encooding Time (s)

Decoding Time v.s. Encoding Time (Scale-offset & Deflate)(Data Type : H5_IEEE_F32)

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

0 1 2 3 4 5 6 7 8 9 10 11 12

0.0

0.5

1.0

4.38 4.38 4.38 5.5 5.5 10.46 10.46 10.46 43.98 54.97 146.68 -32768 0 0

DCR SCR

Co

mp

ressio

n R

atio

Data Points

Miss Val. : Data Size(MB):

Compression Ratio Comparison (Scale-offset v.s. Deflate)(Data Type : H5T_STD_I16)

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

0 1 2 3 4 5 6 7 8 9 10 11 12

-2

0

2

4

6

8

10

4.38 4.38 4.38 5.5 5.5 10.46 10.46 10.46 43.98 54.97 146.68 -32768 0 0

DDT SDT

Deco

din

g T

ime

Data Points

Miss Val. : Data Size(MB):

Decoding Time Comparison (Scale-offset v.s. Deflate)(Data Type : H5T_STD_I16)

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

0 1 2 3 4 5 6 7 8 9 10 11 12-2

0

2

4

6

8

10

12

4.38 4.38 4.38 5.5 5.5 10.46 10.46 10.46 43.98 54.97 146.68 -32768 0 0

DET SET

En

co

din

g T

ime

Data Points

Miss Val. : Data Size(MB):

Encoding Time Comparison (Scale-offset v.s. Deflate)(Data Type : H5T_STD_I16)

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o

ConclusionsConclusions