Upload
donald-walton
View
215
Download
0
Embed Size (px)
Citation preview
Data Structures
Week 6: Assignment #2 Problemhttp://www.cs.hongik.ac.kr/~rhanha/rhanha_teaching.html/
Requirement
Encode a message using Huffman's algorithmUse Min Heap as the priority queue dynamic allocation
The input consists of stings A string consists of alphabets only
Upper case and lower case letters are treated as different characters
stored in a text file given in separate lines
Requirement – cont’Output should be stored in a text file in the
following format
Due date 2001/5/23 24:00
Heap Traversal:[character or string]...
Huffman Tree Traversal:[character or string]...
character: frequency, code...
the code for the message:
Encoding
Encode the message as a long bit string assign a bit string code to each symbol
of the alphabet then, concatenate the individual codes
of the symbols making up the message to produce an encoding for the message
Example#1 Symbol Code A 010 B 100 C 000 D 111ABACCDA 010100010000000111010 Three bits are used for each symbol 21 bits are needed to encode the message
inefficient
Example#2 Symbol Code
A 00B 01C 10D 11
ABACCDA 00010010101100 Two bits are used for each symbol 14 bits are needed to encode the
message
Example#3
ABACCDA Each of the letters B and D appears only
once in the message The letter A appears three times The letter A assigned a shorter bit string
than the letters B and D
Example#3 - cont’ Symbol Code
A 0B 110C 10D 111
ABACCDA 0110010101110 Encoding of the message requires only
13 bits more efficient
Variable-Length Code
If variable-length codes are used the code for one symbol may not be a
prefix of the code for another
Example The code for a symbol x, c(x)
a prefix of the code of another symbol y, c(y) When c(x) is encountered in a left-to-
right scan It is unclear whether c(x) represents the
symbol x or whether it is the first part of c(y).
Optimal Encoding Scheme (1)
Symbol FrequencyA 3B 1C 2D 1
Find the two symbols that appear least frequentlyThese are B and DCombine these two symbols into the single symbol BDThe frequency of this new symbol is the sum of the frequencies of its two symbolsThe frequency of BD is 2
Optimal Encoding Scheme (2) Symbol Frequency
A 3 C 2 BD 2Again choose the two symbols with smallest frequency These are C and BDCombine these two symbols into the single symbol CBDThe frequency of this new symbol is the sum of the frequencies of its two symbolsThe frequency of CBD is 4
Optimal Encoding Scheme (3) Symbol Frequency
A 3 CBD 4 There are now only two symbols remainingThese are combined into the single symbol ACBDThe frequency of ACBD is 7
Symbol Frequency ACBD 7
Optimal Encoding Scheme (4)
ACBD (A and CBD) assigned the codes 0 and 1
CBD (C and BD) assigned the codes 10 and 11
BD (B and D) assigned the codes 110 and 111
D1
C2B1
A3
The Huffman’s Algorithm (1)
The Huffman’s Algorithm (2)
C2
B1 D1
A3
The Huffman’s Algorithm (3)
B1 D1
C2
A3
BD2
BD2
The Huffman’s Algorithm (4)
B1 D1
A3
BD2C2
The Huffman’s Algorithm (5)
B1 D1
A3
BD2C2
CBD4CBD4
The Huffman’s Algorithm (6)
B1 D1
A3
BD2C2
CBD4
The Huffman’s Algorithm (7)
B1 D1
A3
BD2C2
CBD4
ACBD7ACBD7
The Huffman’s Algorithm (8)
1. Build a min heap which contains the nodes of all symbols with the frequency values as the keys
2. Delete two nodes from the heap, concatenate the two symbols, add their frequencies, and put the result back into the heap
3. Make the two nodes become the two children of the node of the concatenated symboli.e) if s=s1 s2 is the symbol concatenated from s1 and s2,
then s1 and s2 become the left child and right child of s
4. Continue steps 2 and 3 until priority queue is empty
The Huffman’s Algorithm (9)
Once the Huffman tree is constructed the code of any symbol can be
constructed by starting at the leaf representing that symbol
climbing up to the root The code is initialized to null each time that a left branch is climbed
0 is appended to the beginning of the code each time that a right branch is climbed
1 is appended to the beginning of the code
VARposition[i] : a pointer to the ith symboln : the number of symbols /*none zero frequency */frequency[i] : the relative frequency of the ith symbolcode[i] : the code assigned to the ith symbolp, p1, p2: a pointer to Min heap's node or huffman tree's node
Main Function{
initialization;count the frequency of each symbol within the message;
// construct a node for each symbol for(i=0; i < n; i++){
<p> = create <frequency[i]> a node;position[i] = p; //a pointer to the leaf containing
the ith symbolinsert <p> into Min heap ;
}//end for
The Huffman’s Algorithm (10)
The Huffman’s Algorithm (11)
while(Min heap contains more than one item){
<p1> = delete Min heap;
<p2> = delete Min heap;
//combine p1 and p2 as branches of a single tree
<p> = create < info(p1)+info(p2) > a node;
set <p1> to be left_child of huffman tree p;
set <p2> to be right_child of huffman tree p;
insert <p> into Min heap;
}//end while
The Huffman’s Algorithm (12)
//the tree is now constructed; use it to find codes<root> = delete Min heap;for(i=0; i<n; i++){
p = position[i];code[i] = NULL;while(p!=root){
//travel up to the rootif(is left<p>)
code[i]= 0 followed by code[i];else
code[i]= 1 followed by code[i];<p> = move <p> to father node;
} // end while}//end for
}//end main