26
Data Structures Week 6: Assignment #2 Problem http://www.cs.hongik.ac.kr/~rhanha/rhanha_teaching.html/

Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

Embed Size (px)

Citation preview

Page 1: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

Data Structures

Week 6: Assignment #2 Problemhttp://www.cs.hongik.ac.kr/~rhanha/rhanha_teaching.html/

Page 2: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

Requirement

Encode a message using Huffman's algorithmUse Min Heap as the priority queue dynamic allocation

The input consists of stings A string consists of alphabets only

Upper case and lower case letters are treated as different characters

stored in a text file given in separate lines

Page 3: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

Requirement – cont’Output should be stored in a text file in the

following format

Due date 2001/5/23 24:00

Heap Traversal:[character or string]...

Huffman Tree Traversal:[character or string]...

character: frequency, code...

the code for the message:

Page 4: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

Encoding

Encode the message as a long bit string assign a bit string code to each symbol

of the alphabet then, concatenate the individual codes

of the symbols making up the message to produce an encoding for the message

Page 5: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

Example#1 Symbol Code A 010 B 100 C 000 D 111ABACCDA 010100010000000111010 Three bits are used for each symbol 21 bits are needed to encode the message

inefficient

Page 6: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

Example#2 Symbol Code

A 00B 01C 10D 11

ABACCDA 00010010101100 Two bits are used for each symbol 14 bits are needed to encode the

message

Page 7: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

Example#3

ABACCDA Each of the letters B and D appears only

once in the message The letter A appears three times The letter A assigned a shorter bit string

than the letters B and D

Page 8: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

Example#3 - cont’ Symbol Code

A 0B 110C 10D 111

ABACCDA 0110010101110 Encoding of the message requires only

13 bits more efficient

Page 9: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

Variable-Length Code

If variable-length codes are used the code for one symbol may not be a

prefix of the code for another

Example The code for a symbol x, c(x)

a prefix of the code of another symbol y, c(y) When c(x) is encountered in a left-to-

right scan It is unclear whether c(x) represents the

symbol x or whether it is the first part of c(y).

Page 10: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

Optimal Encoding Scheme (1)

Symbol FrequencyA 3B 1C 2D 1

Find the two symbols that appear least frequentlyThese are B and DCombine these two symbols into the single symbol BDThe frequency of this new symbol is the sum of the frequencies of its two symbolsThe frequency of BD is 2

Page 11: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

Optimal Encoding Scheme (2) Symbol Frequency

A 3 C 2 BD 2Again choose the two symbols with smallest frequency These are C and BDCombine these two symbols into the single symbol CBDThe frequency of this new symbol is the sum of the frequencies of its two symbolsThe frequency of CBD is 4

Page 12: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

Optimal Encoding Scheme (3) Symbol Frequency

A 3 CBD 4 There are now only two symbols remainingThese are combined into the single symbol ACBDThe frequency of ACBD is 7

Symbol Frequency ACBD 7

Page 13: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

Optimal Encoding Scheme (4)

ACBD (A and CBD) assigned the codes 0 and 1

CBD (C and BD) assigned the codes 10 and 11

BD (B and D) assigned the codes 110 and 111

Page 14: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

D1

C2B1

A3

The Huffman’s Algorithm (1)

Page 15: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

The Huffman’s Algorithm (2)

C2

B1 D1

A3

Page 16: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

The Huffman’s Algorithm (3)

B1 D1

C2

A3

BD2

BD2

Page 17: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

The Huffman’s Algorithm (4)

B1 D1

A3

BD2C2

Page 18: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

The Huffman’s Algorithm (5)

B1 D1

A3

BD2C2

CBD4CBD4

Page 19: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

The Huffman’s Algorithm (6)

B1 D1

A3

BD2C2

CBD4

Page 20: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

The Huffman’s Algorithm (7)

B1 D1

A3

BD2C2

CBD4

ACBD7ACBD7

Page 21: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

The Huffman’s Algorithm (8)

1. Build a min heap which contains the nodes of all symbols with the frequency values as the keys

2. Delete two nodes from the heap, concatenate the two symbols, add their frequencies, and put the result back into the heap

3. Make the two nodes become the two children of the node of the concatenated symboli.e) if s=s1 s2 is the symbol concatenated from s1 and s2,

then s1 and s2 become the left child and right child of s

4. Continue steps 2 and 3 until priority queue is empty

Page 22: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

The Huffman’s Algorithm (9)

Once the Huffman tree is constructed the code of any symbol can be

constructed by starting at the leaf representing that symbol

climbing up to the root The code is initialized to null each time that a left branch is climbed

0 is appended to the beginning of the code each time that a right branch is climbed

1 is appended to the beginning of the code

Page 23: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html
Page 24: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

VARposition[i] : a pointer to the ith symboln : the number of symbols /*none zero frequency */frequency[i] : the relative frequency of the ith symbolcode[i] : the code assigned to the ith symbolp, p1, p2: a pointer to Min heap's node or huffman tree's node

 Main Function{

initialization;count the frequency of each symbol within the message;

 // construct a node for each symbol for(i=0; i < n; i++){

<p> = create <frequency[i]> a node;position[i] = p; //a pointer to the leaf containing

the ith symbolinsert <p> into Min heap ;

}//end for

The Huffman’s Algorithm (10)

Page 25: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

The Huffman’s Algorithm (11)

while(Min heap contains more than one item){

<p1> = delete Min heap;

<p2> = delete Min heap;

//combine p1 and p2 as branches of a single tree

<p> = create < info(p1)+info(p2) > a node;

set <p1> to be left_child of huffman tree p;

set <p2> to be right_child of huffman tree p;

insert <p> into Min heap;

}//end while

Page 26: Data Structures Week 6: Assignment #2 Problem rhanha/rhanha_teaching.html

The Huffman’s Algorithm (12)

//the tree is now constructed; use it to find codes<root> = delete Min heap;for(i=0; i<n; i++){

p = position[i];code[i] = NULL;while(p!=root){

//travel up to the rootif(is left<p>)

code[i]= 0 followed by code[i];else

code[i]= 1 followed by code[i];<p> = move <p> to father node;

} // end while}//end for

}//end main