Huffman Code and Data Decomposition Pranav Shah CS157B

Preview:

Citation preview

Huffman Code and Data Decomposition

Pranav Shah CS157B

Why Data Compression?

Fixed Length Data inefficient for transfers and storage.

Types Of Compressions

Lossless Compression Exact Original data reconstructed from compressed

data. Nothing lost. Examples : Zip, Bank Account Records

Types of Compressions

Lossy Compression Approximation of original data reconstructed from

compressed data. Examples : JPEG – Loss of data quality after

repeated compressions.

File Size: 87 KB File Size:26 KB

Variable Length Bit Coding

Maps source symbols to a variable number of bits.

Allows sources to be compressed and decompressed with zero error.

Examples : Huffman Coding, Lempel-Ziv Coding and Arithmetic Coding

Variable Bit Coding Rules

Use Minimum Number of bits. Helps to speed up the transfer rate and increase

storage.

Variable Bit Coding Rules

Cannot have code which contains prefix for another code Example: Assume A has the code 01. Then, B

cannot have the code 010 as it contains A.

Enable left to right unambiguous decoding. Example: If you have 01, then you know that it is A

and not any other character (Not B!)

Huffman Code

Entropy encoding algorithm used for lossless data compression. Variable length code using average length formula : L =

l1p1 + l2p2 + … + lMpM where l1,l2,l3…lM = length and p1,p2,p3…pM = Probabilities of Source Alphabets A1,A2,…AM being generated.

Uses binary tree.

The Huffman Code generated using binary Huffman Code construction method. Equivalent to simple binary block encoding (Example:

ASCII)

Algorithm

Make a leaf node for each code symbol Add the generation probability of each symbol to

the leaf node.

Take the two leaf nodes with the smallest probability and connect them into a new node. Add 1 or 0 to each of the two branches. The probability of the new node is the sum of the

probabilities of the two connecting nodes.

If there is only one node left, the code construction is completed. If not, go back to (2)

Example

Characters Frequency

A 19% (0.19)

B 28% (0.28)

C 13% (0.13)

D 30% (0.30)

E 10% (0.10)

Step 1

Take lowest two frequencies and make a node.

0.10 0.13

0.23

Step 2

Take next two lowest and connect into a node.

0.10 0.13

0.23

0.42

0.19

Step 3

Continue…

0.10

0.13

0.19

0.23

0.42

0.58

0.300.28

Completed Tree

1.0

0.42 0.58

0.28 0.300.19

0.10

0.23

0.13

Add 0 or 1 to each branch

1.0

0.42 0.58

0.28 0.300.19

0.10

0.23

0.13

01

0 0

0

1 1

1

Generated Code

Characte

rsFrequenc

yCode

A 19% (0.19)

00

B 28% (0.28)

10

C 13% (0.13)

011

D 30% (0.30)

11

E 10% (0.10)

010

References

http://gadgethobby.com/wp-content/plugins/blog/images/data-compression.jpg

http://www.steves-digicams.com/knowledge-center/jpeg-images-counting-your-losses.html

http://en.wikipedia.org/wiki/Variable-length_code

http://en.wikipedia.org/wiki/Huffman_coding

http://www.aykew.com/aboutwork/speed.html

http://www.000studio.com/kobe_biennale2007/main/gallery.php?id=1