Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Compression
• Representing data using less space than the original representation
• Encoding data using fewer bits than the original encoding
Compression• Lossless compression – The original encoding can be fully
reconstructed from the compressed encoding – No information is lost
• Lossy compression – Information is lost; the original encoding
cannot be fully reconstructed from the compressed encoding
Huffman Encoding
• Create variable length encodings for each distinct character in the input
• Encodings are created so characters that appear more frequently are given shorter encodings compared to characters that appear less frequently
• In a standard text file each character uses the same number of bits
• ASCII • Unicode
Huffman Encoding• 1. Find the frequency of each character in the input file
• 2. Build a Huffman tree from the frequency data
• 3. Traverse the Huffman tree and build the encodings for each character found in the input file
• 4. Write a representation of the Huffman tree to the output file
• 5. Write the number of characters in the input file to the output file
• 6. For each character in the input file, write the bits of the Huffman encoding to the output file.
Huffman Decoding• 1. Read the representation of the Huffman tree from
the input file and rebuild the Huffman tree
• 2. Read the number of characters from the input file
• 3. Starting at the root of the Huffman tree, read each bit from the input file and walk down the Huffman tree. When a leaf is reached write the character value in the leaf to the output file and go back to the root of the Huffman tree.
• 4. Repeat step 3 until the number of characters written to the output file match the value found in step 2.
Huffman Encoding
• Find the binary representation for each character in the Huffman tree shown on the previous slide
9
Huffman Encoding
• Decode the following bit string based on the Huffman tree shown in the previous slide.
• 11000000110011000011001001100101001001101
11
• 11000000110011000011001001100101001001101
A I
S E
F !
N UC
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
0
11000 000 11001 10 000 11001 0011 00101 00100 1101
C !NUFSIS
Huffman Encoding
• Encode the following bit string based on the Huffman tree shown in the previous slide.
• FUN CASE
13
• FUN CASE
A I
S E
F !
N UC
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
0
11000 00011001 010011 00101 00100 111
F U N C A S E
Build a Huffman Tree Given the following Frequency Counts
Character Frequency
A 100
B 30
C 40
D 10
E 130
F 20
G 5
H 25
I 57