Upload
kevin-townsend
View
80
Download
6
Embed Size (px)
Citation preview
A Multi-phase Approach to Floating-Point Compression
Kevin Townsend and Joseph Zambreno
Reconfigurable Computing LaboratoryIowa State University
EIT’15
Townsend and Zambreno (RCL@ISU) float zip EIT’15 1 / 16
Outline
1 Introduction
2 Approach8-byte patternsLess Than 8-byte patternsMore Than 8-byte patternsCombining into fzip
3 Results
4 Future Work
Townsend and Zambreno (RCL@ISU) float zip EIT’15 2 / 16
Introduction
Introduction
What are floating point datasets?
They are arrays of floating point values.A 64-bit floating point value has a sign bit 11 exponent bits and 52fractional bits.However, you can view this as compressing an array of 64-bit integers.
Why compress them?
Compressed floating point datasets take up less space.Compression can accelerate data transfer.Knowledge of floating point datasets can lead to better compressionover general compression schemes.
Townsend and Zambreno (RCL@ISU) float zip EIT’15 3 / 16
Approach
Approach
Analysis of 3 different patterns:
Repeating valuesCommon prefixesPatterns in the value sequence
We created 3 different compression schemes:
List all values and use indices in this list.Create a tree of all prefixes and create prefix codes.Use the Burrows-Wheeler Transform and a simple compression scheme.
We combined all 3 algorithms into one algorithm.
Townsend and Zambreno (RCL@ISU) float zip EIT’15 4 / 16
Approach 8-byte patterns
Analysis
0% 20% 40% 60% 80% 100%msg
btmsglumsgsp
msgsppm
msgsw
eep3dnum
brainnum
comet
numcontro
lnum
plasma
obs errorobs info
obs spitzer
obs temp
Percent of total values
Dat
aS
ets
Many Repeats No Repeats
In all the datasets atleast 50% of thevalues have arepeat.
Values are 8-bytesso indices are muchsmaller than values.
Townsend and Zambreno (RCL@ISU) float zip EIT’15 5 / 16
Approach 8-byte patterns
Algorithm
All repeats stored in a separate array.
One bit indicates if the value that is encoded repeats or not.
If the value does not repeat the 64-bit value follows.
If the value does repeat the index in the repeat array follows.
Townsend and Zambreno (RCL@ISU) float zip EIT’15 6 / 16
Approach 8-byte patterns
Results
msgbtmsglumsgspmsg
sppmmsg
sweep3d
numbrain
numcomet
numcontro
lnum
plasmaobs errorobs infoobs spitz
erobs tempaverage
1 2 4 8 16
Dat
aS
ets
Compression Ratio
fzipBWT CompressionPrefix CompressionRepeat Compression
Townsend and Zambreno (RCL@ISU) float zip EIT’15 7 / 16
Approach Less Than 8-byte patterns
Analysis
SIGN EXPONENT FRACTION
100% 0%
0 16 32 48 64msg
btmsglumsgsp
msgsppm
msgsw
eep3dnum
brainnum
comet
numcontro
l
numplasm
aobs erro
robs infoobs spitz
erobs temp
number of bits matching previous value
This figureshows theamount thatadjacent prefixesrepeat in a givendataset.
As seen the bitsquickly startdiffering afterthe 12th bit.
Townsend and Zambreno (RCL@ISU) float zip EIT’15 8 / 16
Approach Less Than 8-byte patterns
Algorithm
0 0 1 0 1 1 1 0
0 0 1 1 1 1 0 0
0 1 0 0 0 0 0 0
0 1 0 0 0 0 1 0
0 1 0 0 0 0 1 0
0 1 0 0 0 1 0 0
0 1 0 0 0 1 0 1
0 1 0 1 0 1 1 0
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
0.1:
1.0:
2.0:
3.0:
3.0:
4.0:
5.0:
100.0:
Encoded Not Encoded
8
8
2 6
2 6
1 1 5 1
1 1 5 1
1 1 3 2 1
1 1 1 2 2 1
1 1 1 2 1 1 1
1 1 1 2 1 1 1
0.1
1.0
2.0
3.0
4.0
5.0
10
0.0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.Not
En
cod
edE
nco
ded
(00,00), (010000,01), (010001,10), (0101,11)
Townsend and Zambreno (RCL@ISU) float zip EIT’15 9 / 16
Approach Less Than 8-byte patterns
Results
msgbtmsglumsgspmsg
sppmmsg
sweep3d
numbrain
numcomet
numcontro
lnum
plasmaobs errorobs infoobs spitz
erobs tempaverage
1 2 4 8 16
Dat
aS
ets
Compression Ratio
fzipBWT CompressionPrefix CompressionRepeat Compression
Townsend and Zambreno (RCL@ISU) float zip EIT’15 10 / 16
Approach More Than 8-byte patterns
Burrows Wheeler Transform
ABCDEABCDEABC$$ABCDEABCDEABCC$ABCDEABCDEABBC$ABCDEABCDEAABC$ABCDEABCDEEABC$ABCDEABCDDEABC$ABCDEABCCDEABC$ABCDEABBCDEABC$ABCDEAABCDEABC$ABCDEEABCDEABC$ABCDDEABCDEABC$ABCCDEABCDEABC$ABBCDEABCDEABC$A
ABCDEABCDEABC$ABCDEABC$ABCDEABC$ABCDEABCDEBCDEABCDEABC$ABCDEABC$ABCDEABC$ABCDEABCDEACDEABCDEABC$ABCDEABC$ABCDEABC$ABCDEABCDEABDEABCDEABC$ABCDEABC$ABCDEABCEABCDEABC$ABCDEABC$ABCDEABCD$ABCDEABCDEABC
$EEAAABBBCCDDCNew arrays:11010010010101$EABCDC
Townsend and Zambreno (RCL@ISU) float zip EIT’15 11 / 16
Approach More Than 8-byte patterns
Results
msgbtmsglumsgspmsg
sppmmsg
sweep3d
numbrain
numcomet
numcontro
lnum
plasmaobs errorobs infoobs spitz
erobs tempaverage
1 2 4 8 16
Dat
aS
ets
Compression Ratio
fzipBWT CompressionPrefix CompressionRepeat Compression
Townsend and Zambreno (RCL@ISU) float zip EIT’15 12 / 16
Approach Combining into fzip
Algorithm
fzip starts with the BWT compression which creates a new dataset.
Repeats are added to the prefix codes to combine repeat and prefixcompression.
Townsend and Zambreno (RCL@ISU) float zip EIT’15 13 / 16
Approach Combining into fzip
Results
msgbtmsglumsgspmsg
sppmmsg
sweep3d
numbrain
numcomet
numcontro
lnum
plasmaobs errorobs infoobs spitz
erobs tempaverage
1 2 4 8 16
Dat
aS
ets
Compression Ratio
fzipBWT CompressionPrefix CompressionRepeat Compression
Townsend and Zambreno (RCL@ISU) float zip EIT’15 14 / 16
Results
Floating Point Compression Performance
msgbtmsglumsgspmsg
sppmmsg
sweep3d
numbrain
numcomet
numcontro
lnum
plasmaobs errorobs infoobs spitz
erobs tempaverage
1 2 4 8 16
Dat
aS
ets
Compression Ratio
fzipbzip -9FPC 25gzip -9
Townsend and Zambreno (RCL@ISU) float zip EIT’15 15 / 16
Future Work
Future Work
2 directions for future work: towards tradition dictionary approachesand towards BWT.
The BWT road:
Replace prefix and repeat compression with a ”Move-to-Front”algorithm.Has the potential for high compression ratios.BWT makes this slower and less hardware amenable.
The traditional road:
Replace BWT with dictionary approach (LZW).This is more hardware amenable.
Townsend and Zambreno (RCL@ISU) float zip EIT’15 16 / 16