Upload
felicia-warren
View
213
Download
1
Embed Size (px)
Citation preview
NASA EOS DATA COMPRESSION WITH HDF5 SCALEOFFSET FILTERNASA EOS DATA COMPRESSION WITH HDF5 SCALEOFFSET FILTER
This work was funded by the NASA Earth Science Technology Office under NASA award AIST-02-0071 and UCAR Subaward No S03-38820. Other support was provided based upon the Cooperative Agreement with the National Aeronautics and Space Administration (NASA) under NASA grant NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NASA.
For more information or documentation, contactFor more information or documentation, [email protected]@ncsa.uiuc.edu
Muqun Yang, Fang Guo, Xiaowen Wu, Robert E. McGrath, Quincey Koziol
National Center for Supercomputing ApplicationsUniversity of Illinois, Urbana-Champaign
• ScaleOffset compression performs a scale and/or offset operation on each data value and truncates the resulting value to a minimum number of bits before storing it. The datatype is either integer or floating-point.
• Offset in Scale-Offset compression means the minimum value of a set of data values.• If a fill value is defined for the dataset, the filter will ignore it when finding the minimum value.
An example for Integer Type
Dataset: : 32-bit integer array Maximum is 7065; Minimum is 2970Maximum is 7065; Minimum is 2970 The "span" of dataset values
= Max-Min+1 = 4076
Case 1: No fill value is defined. Minimum number of bits per element to store
= ceiling(log2(span)) = 12
Case2: Fill value is defined in this array.
Minimum number of bits per element to store = ceiling(log2(span+1)) = 13
Compression:1. Subtract each element from minimum value.2. Pack all data with minimum number of bits.
Decompression:1. Unpack all data.2. Add each element to minimum value.
Outcome:It can save about 60% disk space for this case.
An example for Floating-point Type
D-scaling factor: the decimal precision to keep for the filter output
Floating-point data: {104.561, 99.459, 100.545, 105.644}
D-scaling factor: 2Preparation for Compression
Step 1: Determine the minimum value = 99.459Step 2: Subtract the minimum value from each data element
Modified data: {5.102, 0, 1.086, 6.185}
Step 3: Scale the data by multiplying 10 ^ D-scaling factor = 100 Modified data: {510.2, 0, 108.6, 618.5}
Step 4: Round the data to integer
Modified data: {510 , 0, 109, 619}
Compression and Decompression: using ScaleOffset for integer Restoration after decompression
Step 1: Divide each value by 10^ D-scaling factor
Step 2: Add the offset 99.459
The floating point data {104.559, 99.459, 100.549, 105.649}
Outcome:Produces lossy compression. Compression ratio will depend on D-scaling factor.
• Internal HDF5 filter• Easy to understand
- Integer: lossless- Floating-point: GRIB lossy compression
• Easy to control floating compression- D-scaling factor
• Easy to estimate the outcome
How does ScaleOffset work?
What is the ScaleOffset filter?
Why use ScaleOffset filter?
• ScaleOffset Filter can gain better compression ratio for floating-point data.• ScaleOffset Filter can use more encoding time and decoding time. • Knowledge of data characteristics benefits the usage of ScaleOffset filter.
EOS Data Compression Performance ComparisonEOS Data Compression Performance ComparisonScaleOffset vs. Deflate Filter ScaleOffset vs. Deflate Filter
Limitations
• Compressed floating-point data range is limited by the size of corresponding unsigned integer type.
• Long double is not supported.
1 2 3 4 5 6 70.0
0.2
0.4
0.6
0.8
1.0
1.2
Information on DataSet:Data Size : (2030,660)Common Value: 189.67592Missing Value : 3.402824E+38
SCR DCR with Compression Level 6
Co
mp
ressio
n R
ati
o
Scale Factor
Compression Ratio v.s. Scale Factor (Scale-offset)(Data Type : H5_IEEE_F32)
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
1 2 3 4 5 6 70.0
0.2
0.4
0.6
0.8
1.0
1.2
SET DET with Compression Level 6
En
co
din
g T
ime (
s)
Data Points
Encoding Time v.s. Scale Factor (Scale-offset)(Data Type : H5_IEEE_F32)
Information on DataSet:Data Size : (2030,660)Common Value: 189.67592Missing Value : 3.402824E+38
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
0 1 2 3 4 5 6 7 8 9 10 110
1
2
3
4
5
2.43 29.46 47.35 47.35 47.35 47.35 47.35 47.35 109.94 109.94-9999 -1.27E30 0 -1.27E30 -1.27E30 -1.27E30 -1.27E30
DCR SCR RE
Co
mp
ressio
n R
ati
o &
Rela
tive E
ffic
ien
cy
Data Points Miss Val. :Data Size(100KB):
Compression Ratio Comparison & Relative Efficiency (Scale-offset v.s. Deflate)(Data Type : H5_IEEE_F32)
DCR: Deflate Compression RatioSCR: Scale-offset Compression RatioRE : Relative EfficiencyRE=(1-SCR)/(1-DCR)
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
DCR: Deflate Compression RatioSCR: Scale-offset Compression Ratio
DCR SCR
Co
mp
ressio
n R
ati
o
Compression Ratio v.s. Encoding Time (Scale-offset v.s. Deflate)(Data Type : H5T_IEEE_F32)
Encoding Time (s)
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
0.0 0.5 1.0 1.5 2.0 2.5 3.00.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
DDT: Deflate Decoding TimeSDT: Scale-offset Decoding Time
DDT SDT
Deco
din
g T
ime (
s)
Encooding Time (s)
Decoding Time v.s. Encoding Time (Scale-offset & Deflate)(Data Type : H5_IEEE_F32)
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
0 1 2 3 4 5 6 7 8 9 10 11 12
0.0
0.5
1.0
4.38 4.38 4.38 5.5 5.5 10.46 10.46 10.46 43.98 54.97 146.68 -32768 0 0
DCR SCR
Co
mp
ressio
n R
atio
Data Points
Miss Val. : Data Size(MB):
Compression Ratio Comparison (Scale-offset v.s. Deflate)(Data Type : H5T_STD_I16)
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
0 1 2 3 4 5 6 7 8 9 10 11 12
-2
0
2
4
6
8
10
4.38 4.38 4.38 5.5 5.5 10.46 10.46 10.46 43.98 54.97 146.68 -32768 0 0
DDT SDT
Deco
din
g T
ime
Data Points
Miss Val. : Data Size(MB):
Decoding Time Comparison (Scale-offset v.s. Deflate)(Data Type : H5T_STD_I16)
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
0 1 2 3 4 5 6 7 8 9 10 11 12-2
0
2
4
6
8
10
12
4.38 4.38 4.38 5.5 5.5 10.46 10.46 10.46 43.98 54.97 146.68 -32768 0 0
DET SET
En
co
din
g T
ime
Data Points
Miss Val. : Data Size(MB):
Encoding Time Comparison (Scale-offset v.s. Deflate)(Data Type : H5T_STD_I16)
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
O r i g i n D e m o O r i g i n D e m o O r i g i n D e m o
ConclusionsConclusions