Upload
moses-ryan
View
216
Download
0
Embed Size (px)
Citation preview
gFPC: A Self-Tuning Compression Algorithm
Martin Burtscher1 and Paruj Ratanaworabhan2
1The University of Texas at Austin2Kasetsart University
Introduction Many compression algorithms are parameterizable
Some parameters allow straightforward trade-offs E.g., compression ratio vs. speed Controlled via command line
Other parameters provide no obvious trade-off Best value is input dependent and changes dynamically E.g., hash function in a predictor Typically hardcoded
gFPC: A Self-Tuning Compression Algorithm 2
Contribution Self-tuning approach to optimize parameters
Automatic, on-line, and genetic-algorithm-based Slower compression but higher compression ratio
gFPC algorithm for IEEE 754 double-precision data Compresses linear streams of FP values Lossless single-pass algorithm Repeatedly self-tunes 4 hash-table parameters
gFPC: A Self-Tuning Compression Algorithm 3
4
64
FCM DFCM 64 64
3f82 4… 3f51 9…
compare compare
predictor closercode value
1 64leading
zero bytecounter
encoder
bita cnta bitb cntb remaindera
x y 0 2 z
. . .
compressedstream
3f82 3b1e 0e32 f39d. . .
uncompressed 1Dstream of doubles
selector
double
XOR
remainderb. . . . . .
1+3 0 to 8 bytes
7129 889b 0e5d
FPC Algorithm [DCC’07]
Make two predictions Select closer value XOR with true value Count leading zero bytes Encode value Update predictors
gFPC: A Self-Tuning Compression Algorithm
Hash Function Parameters Two predictors
FCM predicts values, DFCM predicts differences
fcm_prediction = fcm[fcm_hash]; // prediction: read hash table entryfcm[fcm_hash] = true_value; // update: write hash table entryfcm_hash = ((fcm_hash << lshift) ^ (true_value >> rshift)) & (table_size–1);
Two parameters each lshift for aging rshift for eliminating random bits 802,816 possibilities with 256 kB table_size
gFPC: A Self-Tuning Compression Algorithm 5
cross over
mutation
parent 1 parent 2
child
population compressed block
set1
set2 output shortest
. . . . . .
population size setn
block size
data block
Genetic Self-Tuning Compress blocks with several sets of parameters
Start with FPC and otherwise random sets
Create new sets for next data block Keep best set of parameters Evolve remaining sets
gFPC: A Self-Tuning Compression Algorithm 6
Related Work Genetic algorithms (GAs) for evolving programs
Program output approximates original data GAs for evolving compressor parameters off-line
Rate distortion Vector quantization Fractal codes Dictionary n-grams Best compressor for each block
We use on-line GA: faster, adapts dynamically
gFPC: A Self-Tuning Compression Algorithm 7
Evaluation Method System
Sun Fire X2270 Server, Ubuntu Linux 8.06 2.93 GHz 64-bit Intel Xeon 5570 (Nehalem) processor
Datasets Linear streams of real-world data (18 – 277 MB) 4 observations: error, info, spitzer, temp 4 simulations: brain, comet, control, plasma 5 MPI messages: bt, lu, sp, sppm, sweep3d
gFPC: A Self-Tuning Compression Algorithm 8
9
Population Size Affects
Compression speed Compression ratio
Result Population size of 4
performs within .5% of maximum
(P. size = 1 → FPC)
gFPC: A Self-Tuning Compression Algorithm
1.20
1.25
1.30
1.35
1.40
1.45
1.50
1.55
1 3 5 7 9 11 13 15 17 19
h-m
ean
com
pres
sion
ratio
population size
8 kB256 kB8 MB
10
Block Size Affects
Reconfiguration frequency
Compression ratio
Result 512 kB blocks good Medium sizes best Warm-up versus
adaptivity tradeoff
gFPC: A Self-Tuning Compression Algorithm
1.201.251.301.351.401.451.501.55
4 kB 16 kB
64 kB
256 kB
1 MB
4 MB
16 MB
64 MB
256 MB
h-m
ean
com
pres
sion
ratio
block size
8 kB256 kB8 MB
11
Compression Ratio Comparison FPCsize and FPCall
Use off-line GA an LS to find best parameters for each size (and input)
Results FPC is 5% worse FPCsize no input adaptivity FPCall (mostly) better
gFPC is retroactive (but can adapt on-the-fly)
gFPC is 317 times faster
gFPC: A Self-Tuning Compression Algorithm
1.201.251.301.351.401.451.501.55
8 kB
16 kB
32 kB
64 kB
128
kB25
6 kB
512
kB1
MB
2 M
B4
MB
8 M
B
h-m
ean
com
pres
sion
ratio
predictor size
FPCgFPCFPCsizeFPCall
12
Self-Tuning Benefit Rarely worse, mostly better (up to 72%) Relative to FPC, which was tuned for these inputs
Benefit is likely higher on other inputs
gFPC: A Self-Tuning Compression Algorithm
0.90
0.95
1.00
1.05
1.10
1.15
1.20
bt lu sp
sppm
swee
p3d
brai
n
com
et
cont
rol
plas
ma
erro
r
info
spitz
er
tem
p
com
pres
sion
ratio
impr
ovem
ent
8 kB
32 kB
128 kB
512 kB
2 MB
8 MB
Throughput on Xeon System Compression is slower with larger population size Small compression overhead due to self tuning Decompression is faster due to better compression
gFPC: A Self-Tuning Compression Algorithm 13
012345678
8 kB
16 kB
32 kB
64 kB
128
kB
256
kB
512
kB
1 M
B
2 M
B
4 M
B
8 M
Bh-m
ean
thro
ughp
ut (G
b/s)
predictor size
compression FPCgFPC4gFPC1
012345678
8 kB
16 kB
32 kB
64 kB
128
kB
256
kB
512
kB
1 M
B
2 M
B
4 M
B
8 M
Bh-m
ean
thro
ughp
ut (G
b/s)
predictor size
decompression
FPCgFPC4gFPC1
Summary Self-tuning approach
Based on on-line genetic algorithm Repeatedly tunes 4 hash-table parameters in gFPC Applicable to other compressors
Results Higher compression ratio, lower compression speed gFPC compresses at 1 Gb/s, decompresses at 7 Gb/s
C source code of gFPC is freely available http://users.ices.utexas.edu/~burtscher/research/gFPC/
gFPC: A Self-Tuning Compression Algorithm 14