16
A Multi-phase Approach to Floating-Point Compression Kevin Townsend and Joseph Zambreno Reconfigurable Computing Laboratory Iowa State University EIT’15 Townsend and Zambreno (RCL@ISU) float zip EIT’15 1 / 16

Floating Point Compression EIT'15

Embed Size (px)

Citation preview

Page 1: Floating Point Compression EIT'15

A Multi-phase Approach to Floating-Point Compression

Kevin Townsend and Joseph Zambreno

Reconfigurable Computing LaboratoryIowa State University

EIT’15

Townsend and Zambreno (RCL@ISU) float zip EIT’15 1 / 16

Page 2: Floating Point Compression EIT'15

Outline

1 Introduction

2 Approach8-byte patternsLess Than 8-byte patternsMore Than 8-byte patternsCombining into fzip

3 Results

4 Future Work

Townsend and Zambreno (RCL@ISU) float zip EIT’15 2 / 16

Page 3: Floating Point Compression EIT'15

Introduction

Introduction

What are floating point datasets?

They are arrays of floating point values.A 64-bit floating point value has a sign bit 11 exponent bits and 52fractional bits.However, you can view this as compressing an array of 64-bit integers.

Why compress them?

Compressed floating point datasets take up less space.Compression can accelerate data transfer.Knowledge of floating point datasets can lead to better compressionover general compression schemes.

Townsend and Zambreno (RCL@ISU) float zip EIT’15 3 / 16

Page 4: Floating Point Compression EIT'15

Approach

Approach

Analysis of 3 different patterns:

Repeating valuesCommon prefixesPatterns in the value sequence

We created 3 different compression schemes:

List all values and use indices in this list.Create a tree of all prefixes and create prefix codes.Use the Burrows-Wheeler Transform and a simple compression scheme.

We combined all 3 algorithms into one algorithm.

Townsend and Zambreno (RCL@ISU) float zip EIT’15 4 / 16

Page 5: Floating Point Compression EIT'15

Approach 8-byte patterns

Analysis

0% 20% 40% 60% 80% 100%msg

btmsglumsgsp

msgsppm

msgsw

eep3dnum

brainnum

comet

numcontro

lnum

plasma

obs errorobs info

obs spitzer

obs temp

Percent of total values

Dat

aS

ets

Many Repeats No Repeats

In all the datasets atleast 50% of thevalues have arepeat.

Values are 8-bytesso indices are muchsmaller than values.

Townsend and Zambreno (RCL@ISU) float zip EIT’15 5 / 16

Page 6: Floating Point Compression EIT'15

Approach 8-byte patterns

Algorithm

All repeats stored in a separate array.

One bit indicates if the value that is encoded repeats or not.

If the value does not repeat the 64-bit value follows.

If the value does repeat the index in the repeat array follows.

Townsend and Zambreno (RCL@ISU) float zip EIT’15 6 / 16

Page 7: Floating Point Compression EIT'15

Approach 8-byte patterns

Results

msgbtmsglumsgspmsg

sppmmsg

sweep3d

numbrain

numcomet

numcontro

lnum

plasmaobs errorobs infoobs spitz

erobs tempaverage

1 2 4 8 16

Dat

aS

ets

Compression Ratio

fzipBWT CompressionPrefix CompressionRepeat Compression

Townsend and Zambreno (RCL@ISU) float zip EIT’15 7 / 16

Page 8: Floating Point Compression EIT'15

Approach Less Than 8-byte patterns

Analysis

SIGN EXPONENT FRACTION

100% 0%

0 16 32 48 64msg

btmsglumsgsp

msgsppm

msgsw

eep3dnum

brainnum

comet

numcontro

l

numplasm

aobs erro

robs infoobs spitz

erobs temp

number of bits matching previous value

This figureshows theamount thatadjacent prefixesrepeat in a givendataset.

As seen the bitsquickly startdiffering afterthe 12th bit.

Townsend and Zambreno (RCL@ISU) float zip EIT’15 8 / 16

Page 9: Floating Point Compression EIT'15

Approach Less Than 8-byte patterns

Algorithm

0 0 1 0 1 1 1 0

0 0 1 1 1 1 0 0

0 1 0 0 0 0 0 0

0 1 0 0 0 0 1 0

0 1 0 0 0 0 1 0

0 1 0 0 0 1 0 0

0 1 0 0 0 1 0 1

0 1 0 1 0 1 1 0

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

0.1:

1.0:

2.0:

3.0:

3.0:

4.0:

5.0:

100.0:

Encoded Not Encoded

8

8

2 6

2 6

1 1 5 1

1 1 5 1

1 1 3 2 1

1 1 1 2 2 1

1 1 1 2 1 1 1

1 1 1 2 1 1 1

0.1

1.0

2.0

3.0

4.0

5.0

10

0.0

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.Not

En

cod

edE

nco

ded

(00,00), (010000,01), (010001,10), (0101,11)

Townsend and Zambreno (RCL@ISU) float zip EIT’15 9 / 16

Page 10: Floating Point Compression EIT'15

Approach Less Than 8-byte patterns

Results

msgbtmsglumsgspmsg

sppmmsg

sweep3d

numbrain

numcomet

numcontro

lnum

plasmaobs errorobs infoobs spitz

erobs tempaverage

1 2 4 8 16

Dat

aS

ets

Compression Ratio

fzipBWT CompressionPrefix CompressionRepeat Compression

Townsend and Zambreno (RCL@ISU) float zip EIT’15 10 / 16

Page 11: Floating Point Compression EIT'15

Approach More Than 8-byte patterns

Burrows Wheeler Transform

ABCDEABCDEABC$$ABCDEABCDEABCC$ABCDEABCDEABBC$ABCDEABCDEAABC$ABCDEABCDEEABC$ABCDEABCDDEABC$ABCDEABCCDEABC$ABCDEABBCDEABC$ABCDEAABCDEABC$ABCDEEABCDEABC$ABCDDEABCDEABC$ABCCDEABCDEABC$ABBCDEABCDEABC$A

ABCDEABCDEABC$ABCDEABC$ABCDEABC$ABCDEABCDEBCDEABCDEABC$ABCDEABC$ABCDEABC$ABCDEABCDEACDEABCDEABC$ABCDEABC$ABCDEABC$ABCDEABCDEABDEABCDEABC$ABCDEABC$ABCDEABCEABCDEABC$ABCDEABC$ABCDEABCD$ABCDEABCDEABC

$EEAAABBBCCDDCNew arrays:11010010010101$EABCDC

Townsend and Zambreno (RCL@ISU) float zip EIT’15 11 / 16

Page 12: Floating Point Compression EIT'15

Approach More Than 8-byte patterns

Results

msgbtmsglumsgspmsg

sppmmsg

sweep3d

numbrain

numcomet

numcontro

lnum

plasmaobs errorobs infoobs spitz

erobs tempaverage

1 2 4 8 16

Dat

aS

ets

Compression Ratio

fzipBWT CompressionPrefix CompressionRepeat Compression

Townsend and Zambreno (RCL@ISU) float zip EIT’15 12 / 16

Page 13: Floating Point Compression EIT'15

Approach Combining into fzip

Algorithm

fzip starts with the BWT compression which creates a new dataset.

Repeats are added to the prefix codes to combine repeat and prefixcompression.

Townsend and Zambreno (RCL@ISU) float zip EIT’15 13 / 16

Page 14: Floating Point Compression EIT'15

Approach Combining into fzip

Results

msgbtmsglumsgspmsg

sppmmsg

sweep3d

numbrain

numcomet

numcontro

lnum

plasmaobs errorobs infoobs spitz

erobs tempaverage

1 2 4 8 16

Dat

aS

ets

Compression Ratio

fzipBWT CompressionPrefix CompressionRepeat Compression

Townsend and Zambreno (RCL@ISU) float zip EIT’15 14 / 16

Page 15: Floating Point Compression EIT'15

Results

Floating Point Compression Performance

msgbtmsglumsgspmsg

sppmmsg

sweep3d

numbrain

numcomet

numcontro

lnum

plasmaobs errorobs infoobs spitz

erobs tempaverage

1 2 4 8 16

Dat

aS

ets

Compression Ratio

fzipbzip -9FPC 25gzip -9

Townsend and Zambreno (RCL@ISU) float zip EIT’15 15 / 16

Page 16: Floating Point Compression EIT'15

Future Work

Future Work

2 directions for future work: towards tradition dictionary approachesand towards BWT.

The BWT road:

Replace prefix and repeat compression with a ”Move-to-Front”algorithm.Has the potential for high compression ratios.BWT makes this slower and less hardware amenable.

The traditional road:

Replace BWT with dictionary approach (LZW).This is more hardware amenable.

Townsend and Zambreno (RCL@ISU) float zip EIT’15 16 / 16