Computer notes - Huffman Encoding

Preview:

DESCRIPTION

Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths for the letters used in the original message. Huffman code is also part of the JPEG image compression scheme. The algorithm was introduced by David Huffman in 1952 as part of a course assignment at MIT.

Citation preview

1

Class No.24

Data Structures

http://ecomputernotes.com

2

Huffman Encoding

Huffman code is method for the compression for standard text documents.

It makes use of a binary tree to develop codes of varying lengths for the letters used in the original message.

Huffman code is also part of the JPEG image compression scheme.

The algorithm was introduced by David Huffman in 1952 as part of a course assignment at MIT.

http://ecomputernotes.com

3

Huffman Encoding

To understand Huffman encoding, it is best to use a simple example.

Encoding the 32-character phrase: "traversing threaded binary trees",

If we send the phrase as a message in a network using standard 8-bit ASCII codes, we would have to send 8*32= 256 bits.

Using the Huffman algorithm, we can send the message with only 116 bits.

http://ecomputernotes.com

4

Huffman Encoding

List all the letters used, including the "space" character, along with the frequency with which they occur in the message.

Consider each of these (character,frequency) pairs to be nodes; they are actually leaf nodes, as we will see.

Pick the two nodes with the lowest frequency, and if there is a tie, pick randomly amongst those with equal frequencies.

http://ecomputernotes.com

5

Huffman Encoding

Make a new node out of these two, and make the two nodes its children.

This new node is assigned the sum of the frequencies of its children.

Continue the process of combining the two nodes of lowest frequency until only one node, the root, remains.

http://ecomputernotes.com

6

Huffman Encoding

Original text: traversing threaded binary trees

size: 33 characters (space and newline)

NL : 1SP : 3a : 3b : 1d : 2e : 5g : 1h : 1

i : 2n : 2r : 5s : 2t : 3v : 1y : 1

http://ecomputernotes.com

7

Huffman Encoding

v1

y1

SP3

r5

h1

e5

g

1

b1

NL

1

s2

n2

i2

d2

t3

a3

2

2 is equal to sum of the frequencies of the two children nodes.

http://ecomputernotes.com

8

Huffman Encoding

v1

y1

SP3

r5

h1

e5

g

1

b1

NL

1

s2

n2

i2

d2

t3

a3

2 2

There a number of ways to combine nodes. We have chosen just one such way.

http://ecomputernotes.com

9

Huffman Encoding

v1

y1

SP3

r5

h1

e5

g

1

b1

NL

1

s2

n2

i2

d2

t3

a3

2 2 2

http://ecomputernotes.com

10

Huffman Encoding

v1

y1

SP3

r5

h1

e5

g

1

b1

NL

1

s2

n2

i2

d2

t3

a3

2 2 2

4 4

http://ecomputernotes.com

11

Huffman Encoding

v1

y1

SP3

r5

h1

e5

g

1

b1

NL

1

s2

n2

i2

d2

t3

a3

2 2 2

5444

6

http://ecomputernotes.com

12

Huffman Encoding

v1

y1

SP3

r5

h1

e5

g

1

b1

NL

1

s2

n2

i2

d2

t3

a3

2 2 2

5444

869 10

http://ecomputernotes.com

13

Huffman Encoding

v1

y1

SP3

r5

h1

e5

g

1

b1

NL

1

s2

n2

i2

d2

t3

a3

2 2 2

5444

86

14

9

19

10

http://ecomputernotes.com

14

Huffman Encoding

v1

y1

SP3

r5

h1

e5

g

1

b1

NL

1

s2

n2

i2

d2

t3

a3

2 2 2

5444

86

14

9

19

10

33

http://ecomputernotes.com

15

Huffman Encoding

List all the letters used, including the "space" character, along with the frequency with which they occur in the message.

Consider each of these (character,frequency) pairs to be nodes; they are actually leaf nodes, as we will see.

Pick the two nodes with the lowest frequency, and if there is a tie, pick randomly amongst those with equal frequencies.

http://ecomputernotes.com

16

Huffman Encoding

Make a new node out of these two, and make the two nodes its children.

This new node is assigned the sum of the frequencies of its children.

Continue the process of combining the two nodes of lowest frequency until only one node, the root, remains.

http://ecomputernotes.com

17

Huffman Encoding

Start at the root. Assign 0 to left branch and 1 to the right branch.

Repeat the process down the left and right subtrees.

To get the code for a character, traverse the tree from the root to the character leaf node and read off the 0 and 1 along the path.

http://ecomputernotes.com

18

Huffman Encoding

v1

y1

SP3

r5

h1

e5

g

1

b1

NL

1

s2

n2

i2

d2

t3

a3

2 2 2

5444

86

14

9

19

10

3310

http://ecomputernotes.com

19

Huffman Encoding

v1

y1

SP3

r5

h1

e5

g

1

b1

NL

1

s2

n2

i2

d2

t3

a3

2 2 2

5444

86

14

9

19

10

3310

1010

http://ecomputernotes.com

20

Huffman Encoding

v1

y1

SP3

r5

h1

e5

g

1

b1

NL

1

s2

n2

i2

d2

t3

a3

2 2 2

5444

86

14

9

19

10

3310

1010

1 010

1010

10 10 1 0 10

http://ecomputernotes.com

21

Huffman Encoding

v1

y1

SP3

r5

h1

e5

g

1

b1

NL

1

s2

n2

i2

d2

t3

a3

2 2 2

5444

86

14

9

19

10

3310

1010

1 010

1010

10 10 1 0 10

1 0 10 10

http://ecomputernotes.com

22

Huffman Encoding

Huffman character codes

NL 10000SP 1111 a 000 b 10001 d 0100 e 101 g 10010 h 10011 i 0101 n 0110 r 110 s 0111 t 001 v 11100 y 11101

• Notice that the code is variable length.

• Letters with higher frequencies have shorter codes.

• The tree could have been built in a number of ways; each would yielded different codes but the code would still be minimal.

http://ecomputernotes.com

23

Huffman Encoding

Original: traversing threaded binary trees

Encoded:

001110000111001011100111010101101001011110011001111010100001001010100111110000101011000011011101111100111010110101110000

t r a v e

http://ecomputernotes.com

24

Huffman Encoding

Original: traversing threaded binary treesWith 8 bits per character, length is 264.

Encoded:001110000111001011100111010101101001011110011001111010100001001010100111110000101011000011011101111100111010110101110000

Compressed into 122 bits, 54% reduction.

http://ecomputernotes.com

25

Mathematical Properties of Binary Trees

http://ecomputernotes.com

26

Properties of Binary Tree

Property: A binary tree with N internal nodeshas N+1 external nodes.

http://ecomputernotes.com

27

Properties of Binary Tree

A binary tree with N internal nodes has N+1 external nodes.

D F

B C

G

A

E

FE

internal nodes: 9external nodes: 10

external node

internal node

http://ecomputernotes.com

28

Properties of Binary Tree

Property: A binary tree with N internal nodes has 2N links: N-1 links to internal nodes and N+1 links to external nodes.

http://ecomputernotes.com

29

Threaded Binary TreeProperty: A binary tree with N internal nodes has 2N links: N-1 links to internal nodes and N+1 links to external nodes.

D F

B C

G

A

E

FE

Internal links: 8External links: 10

external link

internal link

http://ecomputernotes.com

30

Properties of Binary Tree

Property: A binary tree with N internal nodes has 2N links: N-1 links to internal nodes and N+1 links to external nodes.

• In every rooted tree, each node, except the root, has a unique parent.

• Every link connects a node to its parent, so there are N-1 links connecting internal nodes.

• Similarly, each of the N+1 external nodes has one link to its parent.

• Thus N-1+N+1=2N links.http://ecomputernotes.com

Recommended