Upload
shon-baldwin
View
228
Download
5
Embed Size (px)
Citation preview
High Throughput Compression of Double-Precision Floating-Point Data
Martin Burtscher and Paruj Ratanaworabhan
School of Electrical and Computer Engineering
Cornell University
Fast Floating-Point Compression
Introduction Scientific programs
Produce and transfer lots of 64-bit FP data Exchange 100s of MB/s, generate 1TB/day of new data
Large amounts of data Are expensive to store and transfer Take a long time to transfer
Data compression Can reduce amount of data Can speed up transfer
Fast Floating-Point Compression
IEEE 754 Double-Precision Values Goal
Compress linear streams of FP data fast and well Online operation and lossless compression
Challenges Floating-point data are hard to compress
FP codes may generate over 90% unique values
Related work on lossless FP compression Focuses on 32-bit single-precision values Relies on smoothness of data or known geometry
Fast Floating-Point Compression
Floating-Point Data Compression Our approach
Predict FP data with value prediction algorithms and encode the difference
Format:
Value predictors Hardware devices to speed up processors Predict instruction result by extrapolating
previously sequences of computed results Employ very fast and simple algorithms
63 62 52 51 0
S Exponent Mantissa
Fast Floating-Point Compression
FPC Algorithm
Make two predictions Select closer value XOR with true value Count leading zeros Encode value Update predictors
64
FCM DFCM 64 64
3f82 4… 3f51 9…
compare compare
predictor closercode value
1 64leading
zero bytecounter
encoder
bita cnta bitb cntb remaindera
x y 0 2 z
. . .
compressedstream
3f82 3b1e 0e32 f39d. . .
uncompressed 1Dstream of doubles
selector
double
XOR
remainderb. . . . . .
1+3 0 to 8 bytes
7129 889b 0e5d
Fast Floating-Point Compression
Algorithm/Implementation Co-Design Inner loop (about 50 and 70 C statements)
Compresses or decompresses one block of data Accounts for over 90% of execution time
Loop body optimizations Loop body is used to hide memory latency No fp, int mult, or int div instructions No branches (only conditional moves) Single basic block (>100 machine instructions) Average IPC > 5.4 and 5.1 on Itanium 2
Fast Floating-Point Compression
Evaluation Method System
1.6 GHz Itanium 2, Intel C Itanium Compiler 9.1 Red Hat Enterprise Linux AS4
Scientific datasets Linear streams of 64-bit FP data (18 – 277MB) 4 observations: spitzer, temp, error, info 4 simulations: comet, plasma, brain, control 5 messages: bt, lu, sp, sppm, sweep3d
Fast Floating-Point Compression
Compression Throughput
0
1
2
3
4
5
6
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0compression ratio
com
pres
sion
thr
ough
put
(Gb/
s)
BZIP2
GZIP
PLMI
FSD
DFCM
FPC
Fast Floating-Point Compression
Decompression Throughput
0
1
2
3
4
5
6
7
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0compression ratio
de
com
pre
ssio
n t
hro
ug
hp
ut
(Gb
/s)
BZIP2DFCMFPCFSDGZIPPLMI
Fast Floating-Point Compression
Summary and Conclusions FPC algorithm
Highest throughput and mean compression ratio 1.02 – 15.05 absolute compression ratio 840 and 680 MB/s throughput on a 1.6GHz
Itanium 2 (= 2 and 2.5 machine cycles per byte) http://www.csl.cornell.edu/~burtscher/research/FPC/
Conclusions Value predictors are fast & accurate data models Algorithm/implementation co-design is essential