Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

Preview:

Citation preview

Data Structures

Week 6: Assignment #2 Problemhttp://www.cs.hongik.ac.kr/~rhanha/rhanha_teaching.html/

Requirement

Encode a message using Huffman's algorithmUse Min Heap as the priority queue dynamic allocation

The input consists of stings A string consists of alphabets only

Upper case and lower case letters are treated as different characters

stored in a text file given in separate lines

Requirement – cont’Output should be stored in a text file in the

following format

Due date 2001/5/23 24:00

Heap Traversal:[character or string]...

Huffman Tree Traversal:[character or string]...

character: frequency, code...

the code for the message:

Encoding

Encode the message as a long bit string assign a bit string code to each symbol

of the alphabet then, concatenate the individual codes

of the symbols making up the message to produce an encoding for the message

Example#1 Symbol Code A 010 B 100 C 000 D 111ABACCDA 010100010000000111010 Three bits are used for each symbol 21 bits are needed to encode the message

inefficient

Example#2 Symbol Code

A 00B 01C 10D 11

ABACCDA 00010010101100 Two bits are used for each symbol 14 bits are needed to encode the

message

Example#3

ABACCDA Each of the letters B and D appears only

once in the message The letter A appears three times The letter A assigned a shorter bit string

than the letters B and D

Example#3 - cont’ Symbol Code

A 0B 110C 10D 111

ABACCDA 0110010101110 Encoding of the message requires only

13 bits more efficient

Variable-Length Code

If variable-length codes are used the code for one symbol may not be a

prefix of the code for another

Example The code for a symbol x, c(x)

a prefix of the code of another symbol y, c(y) When c(x) is encountered in a left-to-

right scan It is unclear whether c(x) represents the

symbol x or whether it is the first part of c(y).

Optimal Encoding Scheme (1)

Symbol FrequencyA 3B 1C 2D 1

Find the two symbols that appear least frequentlyThese are B and DCombine these two symbols into the single symbol BDThe frequency of this new symbol is the sum of the frequencies of its two symbolsThe frequency of BD is 2

Optimal Encoding Scheme (2) Symbol Frequency

A 3 C 2 BD 2Again choose the two symbols with smallest frequency These are C and BDCombine these two symbols into the single symbol CBDThe frequency of this new symbol is the sum of the frequencies of its two symbolsThe frequency of CBD is 4

Optimal Encoding Scheme (3) Symbol Frequency

A 3 CBD 4 There are now only two symbols remainingThese are combined into the single symbol ACBDThe frequency of ACBD is 7

Symbol Frequency ACBD 7

Optimal Encoding Scheme (4)

ACBD (A and CBD) assigned the codes 0 and 1

CBD (C and BD) assigned the codes 10 and 11

BD (B and D) assigned the codes 110 and 111

D1

C2B1

A3

The Huffman’s Algorithm (1)

The Huffman’s Algorithm (2)

C2

B1 D1

A3

The Huffman’s Algorithm (3)

B1 D1

C2

A3

BD2

BD2

The Huffman’s Algorithm (4)

B1 D1

A3

BD2C2

The Huffman’s Algorithm (5)

B1 D1

A3

BD2C2

CBD4CBD4

The Huffman’s Algorithm (6)

B1 D1

A3

BD2C2

CBD4

The Huffman’s Algorithm (7)

B1 D1

A3

BD2C2

CBD4

ACBD7ACBD7

The Huffman’s Algorithm (8)

1. Build a min heap which contains the nodes of all symbols with the frequency values as the keys

2. Delete two nodes from the heap, concatenate the two symbols, add their frequencies, and put the result back into the heap

3. Make the two nodes become the two children of the node of the concatenated symboli.e) if s=s1 s2 is the symbol concatenated from s1 and s2,

then s1 and s2 become the left child and right child of s

4. Continue steps 2 and 3 until priority queue is empty

The Huffman’s Algorithm (9)

Once the Huffman tree is constructed the code of any symbol can be

constructed by starting at the leaf representing that symbol

climbing up to the root The code is initialized to null each time that a left branch is climbed

0 is appended to the beginning of the code each time that a right branch is climbed

1 is appended to the beginning of the code

VARposition[i] : a pointer to the ith symboln : the number of symbols /*none zero frequency */frequency[i] : the relative frequency of the ith symbolcode[i] : the code assigned to the ith symbolp, p1, p2: a pointer to Min heap's node or huffman tree's node

 Main Function{

initialization;count the frequency of each symbol within the message;

 // construct a node for each symbol for(i=0; i < n; i++){

<p> = create <frequency[i]> a node;position[i] = p; //a pointer to the leaf containing

the ith symbolinsert <p> into Min heap ;

}//end for

The Huffman’s Algorithm (10)

The Huffman’s Algorithm (11)

while(Min heap contains more than one item){

<p1> = delete Min heap;

<p2> = delete Min heap;

//combine p1 and p2 as branches of a single tree

<p> = create < info(p1)+info(p2) > a node;

set <p1> to be left_child of huffman tree p;

set <p2> to be right_child of huffman tree p;

insert <p> into Min heap;

}//end while

The Huffman’s Algorithm (12)

//the tree is now constructed; use it to find codes<root> = delete Min heap;for(i=0; i<n; i++){

p = position[i];code[i] = NULL;while(p!=root){

//travel up to the rootif(is left<p>)

code[i]= 0 followed by code[i];else

code[i]= 1 followed by code[i];<p> = move <p> to father node;

} // end while}//end for

}//end main

Recommended