Floating Point Compression EIT'15

A Multi-phase Approach to Floating-Point Compression

Kevin Townsend and Joseph Zambreno

Reconfigurable Computing LaboratoryIowa State University

EIT’15

Townsend and Zambreno (RCL@ISU) float zip EIT’15 1 / 16

Outline

1 Introduction

2 Approach8-byte patternsLess Than 8-byte patternsMore Than 8-byte patternsCombining into fzip

3 Results

4 Future Work


Introduction

Introduction

What are floating point datasets?

They are arrays of floating point values.A 64-bit floating point value has a sign bit 11 exponent bits and 52fractional bits.However, you can view this as compressing an array of 64-bit integers.

Why compress them?

Compressed floating point datasets take up less space.Compression can accelerate data transfer.Knowledge of floating point datasets can lead to better compressionover general compression schemes.


Approach

Approach

Analysis of 3 different patterns:

Repeating valuesCommon prefixesPatterns in the value sequence

We created 3 different compression schemes:

List all values and use indices in this list.Create a tree of all prefixes and create prefix codes.Use the Burrows-Wheeler Transform and a simple compression scheme.

We combined all 3 algorithms into one algorithm.


Approach 8-byte patterns

Analysis

0% 20% 40% 60% 80% 100%msg

btmsglumsgsp

msgsppm

msgsw

eep3dnum

brainnum

comet

numcontro

lnum

plasma

obs errorobs info

obs spitzer

obs temp

Percent of total values

Dat

aS

ets

Many Repeats No Repeats

In all the datasets atleast 50% of thevalues have arepeat.

Values are 8-bytesso indices are muchsmaller than values.



Algorithm

All repeats stored in a separate array.

One bit indicates if the value that is encoded repeats or not.

If the value does not repeat the 64-bit value follows.

If the value does repeat the index in the repeat array follows.



Results

msgbtmsglumsgspmsg

sppmmsg

sweep3d

numbrain

numcomet

numcontro

lnum

plasmaobs errorobs infoobs spitz

erobs tempaverage

1 2 4 8 16

Dat

aS

ets

Compression Ratio

fzipBWT CompressionPrefix CompressionRepeat Compression


Approach Less Than 8-byte patterns

Analysis

SIGN EXPONENT FRACTION

100% 0%

0 16 32 48 64msg

btmsglumsgsp

msgsppm

msgsw

eep3dnum

brainnum

comet

numcontro

l

numplasm

aobs erro

robs infoobs spitz

erobs temp

number of bits matching previous value

This figureshows theamount thatadjacent prefixesrepeat in a givendataset.

As seen the bitsquickly startdiffering afterthe 12th bit.



Algorithm

0 0 1 0 1 1 1 0

0 0 1 1 1 1 0 0

0 1 0 0 0 0 0 0

0 1 0 0 0 0 1 0

0 1 0 0 0 0 1 0

0 1 0 0 0 1 0 0

0 1 0 0 0 1 0 1

0 1 0 1 0 1 1 0

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

0.1:

1.0:

2.0:

3.0:

3.0:

4.0:

5.0:

100.0:

Encoded Not Encoded

8

8

2 6

2 6

1 1 5 1

1 1 5 1

1 1 3 2 1

1 1 1 2 2 1

1 1 1 2 1 1 1

1 1 1 2 1 1 1

0.1

1.0

2.0

3.0

4.0

5.0

10

0.0

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.Not

En

cod

edE

nco

ded

(00,00), (010000,01), (010001,10), (0101,11)



Results

msgbtmsglumsgspmsg

sppmmsg

sweep3d

numbrain

numcomet

numcontro

lnum


erobs tempaverage

1 2 4 8 16

Dat

aS

ets

Compression Ratio



Approach More Than 8-byte patterns

Burrows Wheeler Transform

ABCDEABCDEABC$$ABCDEABCDEABCC$ABCDEABCDEABBC$ABCDEABCDEAABC$ABCDEABCDEEABC$ABCDEABCDDEABC$ABCDEABCCDEABC$ABCDEABBCDEABC$ABCDEAABCDEABC$ABCDEEABCDEABC$ABCDDEABCDEABC$ABCCDEABCDEABC$ABBCDEABCDEABC$A

ABCDEABCDEABC$ABCDEABC$ABCDEABC$ABCDEABCDEBCDEABCDEABC$ABCDEABC$ABCDEABC$ABCDEABCDEACDEABCDEABC$ABCDEABC$ABCDEABC$ABCDEABCDEABDEABCDEABC$ABCDEABC$ABCDEABCEABCDEABC$ABCDEABC$ABCDEABCD$ABCDEABCDEABC

$EEAAABBBCCDDCNew arrays:11010010010101$EABCDC


Approach More Than 8-byte patterns

Results

msgbtmsglumsgspmsg

sppmmsg

sweep3d

numbrain

numcomet

numcontro

lnum


erobs tempaverage

1 2 4 8 16

Dat

aS

ets

Compression Ratio



Approach Combining into fzip

Algorithm

fzip starts with the BWT compression which creates a new dataset.

Repeats are added to the prefix codes to combine repeat and prefixcompression.


Approach Combining into fzip

Results

msgbtmsglumsgspmsg

sppmmsg

sweep3d

numbrain

numcomet

numcontro

lnum


erobs tempaverage

1 2 4 8 16

Dat

aS

ets

Compression Ratio



Results

Floating Point Compression Performance

msgbtmsglumsgspmsg

sppmmsg

sweep3d

numbrain

numcomet

numcontro

lnum


erobs tempaverage

1 2 4 8 16

Dat

aS

ets

Compression Ratio

fzipbzip -9FPC 25gzip -9


Future Work

Future Work

2 directions for future work: towards tradition dictionary approachesand towards BWT.

The BWT road:

Replace prefix and repeat compression with a ”Move-to-Front”algorithm.Has the potential for high compression ratios.BWT makes this slower and less hardware amenable.

The traditional road:

Replace BWT with dictionary approach (LZW).This is more hardware amenable.


Engineering

Floating Point Compression EIT'15