37
Huffman Coding, Arithmetic Coding, and JBIG2 Illustrations Arber Borici 2010 University of N British Columbia

Huffman and Arithmetic Coding

Embed Size (px)

DESCRIPTION

This presentation illustrates the mechanisms behind Huffman and Arithmetic Coding for lossless data compression.

Citation preview

Page 1: Huffman and Arithmetic Coding

Huffman Coding, Arithmetic Coding, and JBIG2

Illustrations

Arber Borici

2010

University of N British Columbia

Page 2: Huffman and Arithmetic Coding

Huffman Coding

Entropy encoder for lossless compression Input: Symbols and corresponding

probabilities Output: Prefix-free codes with minimum

expected lengths Prefix property: There exists no code in the

output that is a prefix of another code Optimal encoding algorithm

Page 3: Huffman and Arithmetic Coding

Huffman Coding: Algorithm

1. Create a forest of leaf nodes for each symbol

2. Take two nodes with lowest probabilities and make them siblings. The new internal node has a probability equal to the sum of the probabilities of the two child nodes.

3. The new internal node acts as any other node in the forest.

4. Repeat steps 2–3 until a tree is established.

Page 4: Huffman and Arithmetic Coding

Consider the string ARBER The probabilities of symbols A, B, E, and R

are:

The initial forest will thus comprise four nodes Now, we apply the Huffman algorithm

Huffman Coding: Example

Symbol A B E R

Frequency 1 1 1 2

Probability 20% 20% 20% 40%

Page 5: Huffman and Arithmetic Coding

Generating Huffman Codes

A 0.2 B 0.2 E 0.2 R 0.4

1 0.4

2 0.6

r

00 11

00 11

00 11

Page 6: Huffman and Arithmetic Coding

Generating Huffman Codes

A 0.2 B 0.2

E 0.2

R 0.4

1 0.4

2 0.6

r

00 11

00 11

00 11

Symbol Code

A B

E R

Page 7: Huffman and Arithmetic Coding

Generating Huffman Codes

A 0.2 B 0.2

E 0.2

R 0.4

1 0.4

2 0.6

r

00 11

00 11

00 11

Symbol Code

B

E R

A 0 0 00 0 0

Page 8: Huffman and Arithmetic Coding

Generating Huffman Codes

A 0.2 B 0.2

E 0.2

R 0.4

1 0.4

2 0.6

r

00 11

00 11

00 11

Symbol Code

B

E R

A 0 0 00 0 0

0 0 10 0 1

Page 9: Huffman and Arithmetic Coding

Generating Huffman Codes

A 0.2 B 0.2

E 0.2

R 0.4

1 0.4

2 0.6

r

00 11

00 11

00 11

Symbol Code

B

E

R

A 0 0 00 0 0

0 0 10 0 1

0 10 1

Page 10: Huffman and Arithmetic Coding

Generating Huffman Codes

A 0.2 B 0.2

E 0.2

R 0.4

1 0.4

2 0.6

r

00 11

00 11

00 11

Symbol Code

B

E

R

A 0 0 00 0 0

0 0 10 0 1

0 10 1

11

Page 11: Huffman and Arithmetic Coding

Huffman Codes: Decoding

A 0.2 B 0.2

E 0.2

R 0.4

1 0.4

2 0.6

r

00 11

00 11

00 11

0001001011

A

Page 12: Huffman and Arithmetic Coding

Huffman Codes: Decoding

A 0.2 B 0.2

E 0.2

R 0.4

1 0.4

2 0.6

r

00 11

00 11

00 11

1001011

A

R

Page 13: Huffman and Arithmetic Coding

Huffman Codes: Decoding

A 0.2 B 0.2

E 0.2

R 0.4

1 0.4

2 0.6

r

00 11

00 11

00 11

001011

A R

B

Page 14: Huffman and Arithmetic Coding

Huffman Codes: Decoding

A 0.2 B 0.2

E 0.2

R 0.4

1 0.4

2 0.6

r

00 11

00 11

00 11

011

A R B

E

Page 15: Huffman and Arithmetic Coding

Huffman Codes: Decoding

A 0.2 B 0.2

E 0.2

R 0.4

1 0.4

2 0.6

r

00 11

00 11

00 11

1

A R B E

R

Page 16: Huffman and Arithmetic Coding

Huffman Codes: Decoding

A 0.2 B 0.2

E 0.2

R 0.4

1 0.4

2 0.6

r

00 11

00 11

00 11

A R B E R

0001001011

The prefix property ensures unique decodability

Page 17: Huffman and Arithmetic Coding

Arithmetic Coding

Entropy coder for lossless compression Encodes the entire input data using a real

interval Slightly more efficient than Huffman Coding Implementation is harder: practical

implementation variations have been proposed

Page 18: Huffman and Arithmetic Coding

Arithmetic Coding: Algorithm

Create an interval for each symbol, based on cumulative probabilities.

The interval for a symbol is [low, high). Given an input string, determine the interval

of the first symbol Scale the remaining intervals:

New Low = Current Low + Sumn-1(p)*(H – L)

New High = Current High + Sumn(p)*(H – L)

Page 19: Huffman and Arithmetic Coding

Consider the string ARBER The intervals of symbols A, B, E, and R are:

A: [0, 0.2); B: [0.2, 0.4); E: [0.4, 0.6); and

R: [0.6, 1);

Arithmetic Coding: Example

Symbol A B E R

Low 0 0.2 0.4 0.6

High 0.2 0.4 0.6 1

Page 20: Huffman and Arithmetic Coding

Arithmetic Coding: ExampleA R B E R

0

0.2

0.4

0.6

1

A

B

E

R

0.2

020% of (0, 0.2)

0.04

20% of (0, 0.2)0.08

20% of (0, 0.2)

0.12

40% of (0, 0.2)

0.12

0.2

20% of (0.12, 0.2)

20% of (0.12, 0.2)

20% of (0.12, 0.2)

40% of (0.12, 0.2)

0.136

0.152

0.168

Page 21: Huffman and Arithmetic Coding

Arithmetic Coding: ExampleB E R

0

0.2

0.4

0.6

1

A

B

E

R

0.2

0

0.04

0.08

0.12

0.12

0.2

0.136

0.152

0.168

0.136

0.152

20% of (0.136, 0.152)

20% of (0.136, 0.152)

20% of (0.136, 0.152)

40% of (0.136, 0.152)

0.1392

0.1424

0.1456

Page 22: Huffman and Arithmetic Coding

Arithmetic Coding: ExampleE R

0

0.2

0.4

0.6

1

A

B

E

R

0.2

0

0.04

0.08

0.12

0.12

0.2

0.136

0.152

0.168

0.136

0.152

0.1392

0.1424

0.1456

0.1424

0.1456

20% of (0.1424, 0.1456)

20% of (0.1424, 0.1456)

20% of (0.1424, 0.1456)

40% of (0.1424, 0.1456)

0.14304

0.14368

0.14432

Page 23: Huffman and Arithmetic Coding

Arithmetic Coding: ExampleR

0

0.2

0.4

0.6

1

A

B

E

R

0.2

0

0.04

0.08

0.12

0.12

0.2

0.136

0.152

0.168

0.136

0.152

0.1392

0.1424

0.1456

0.1424

0.1456

0.14304

0.14368

0.14432

0.14432

0.1456

Page 24: Huffman and Arithmetic Coding

Arithmetic Coding: Example

The final interval for the input string ARBER is [[0.144320.14432,, 0.14560.1456)).

In bits, one chooses a number in the interval and encodes the decimal part.

For the sample interval, one may choose point 0.144320.14432, which in binary is:

0 0 1 0 0 1 0 0 1 1 1 1 0 0 1 0 0 0 1 0 0 1 1 1 0 0 1 0 0 1 0 0 1 1 1 1 0 0 1 0 0 0 1 0 0 1 1 1 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 1 1 1 11 1 1 (51 bits)

Page 25: Huffman and Arithmetic Coding

Arithmetic Coding

Practical implementations involve absolute frequencies (integers), since the low and high interval values tend to become really small.

An END-OF-STREAM flag is usually required (with a very small probability)

Decoding is straightforward: Start with the last interval and divide intervals proportionally to symbol probabilities. Proceed until and END-OF-STREAM control sequence is reached.

Page 26: Huffman and Arithmetic Coding

JBIG-2 Lossless and lossy bi-level data compression

standard Emerged from JBIG-1

Joint Bi-Level Image Experts Group Supports three coding modes:

Generic Halftone Text

Image is segmented into regions, which can be encoded using different methods

Page 27: Huffman and Arithmetic Coding

JBIG-2: Segmentation The image on the left is segmented into a

binary image, text, and a grayscale image:

binary

grayscale

text

Page 28: Huffman and Arithmetic Coding

JBIG-2: Encoding Arithmetic Coding (QM Coder) Context-based prediction

Larger contexts than JBIG-1 Progressive Compression (Display)

Predictive context uses previous information Adaptive Coder

X

A

AA

A• X = Pixel to be coded• A = Adaptive pixel (which can

be moved)

Page 29: Huffman and Arithmetic Coding

JBIG-2: Halftone and Text Halftone images are coded

as multi-level images,along with patternand grid parameters

Each text symbol is

encoded in a dictionary

along with relative

coordinates:

Page 30: Huffman and Arithmetic Coding

Color Separation

Images comprising discrete colors can be considered as multi-layered binary images: Each color and the image background form one

binary layer If there are N colors, where one color

represents the image background, then there will be N-1 binary layers: A map with white background and four colors will

thus yield 4 binary layers

Page 31: Huffman and Arithmetic Coding

Color Separation: ExampleThe following Excel graph comprises 34 colors + the white background:

Page 32: Huffman and Arithmetic Coding

Layer 1

Page 33: Huffman and Arithmetic Coding

Layer 5

Page 34: Huffman and Arithmetic Coding

Layer 12

Page 35: Huffman and Arithmetic Coding

Comparison with JBIG2 and JPEG

Our Method: 96%

JBIG2: 94%

JPEG: 91%

Our Method: 98%

JBIG2: 97%

JPEG: 92%

Page 36: Huffman and Arithmetic Coding

Encoding Example

Codebook RCRC Uncompressible

0

The compression ratio is the size of the encoded stream over the original size:

1 – (1 + 20 + 64) / 192 = 56%

Original size:

64 * 3 = 192 bits

Page 37: Huffman and Arithmetic Coding

Definitions (cont.) Compression ratio is defined as the number of bits

after a coding scheme has been applied on the source data over the original source data size Expressed as a percentage, or usually is bits per pixel

(bpp) when source data is an image JBIG-2 is the standard binary image compression

scheme Based mainly on arithmetic coding with context modeling Other methods in the literature designed for specific

classes of binary images Our objective: design a coding method

notwithstanding the nature of a binary image