Lec7 8 9_10 coding techniques

Preview:

Citation preview

INTERACTIVE

MULTIMEDIA SYSTEMS

CODING TECHNIQUES

RUN LENGTH CODING

• Suited for compressing any type of data regardless of its information, but content of data will affect the compression ratio. It achieves low compression ratios, but easy to implement and quick to execute. It works by reducing the size of repeating string of characters. Repeating strings called RUN is typically encoded into two bytes. The first byte represents no of characters in run & called RUN COUNT. The second byte is value of character in run, is called run value.

• This is most useful on data that contains many such runs: for example, relatively simple graphic images such as icons, line drawings, and animations. It is not useful with files that don't have many runs as it could potentially double the file size.

• RLE also refers to a little-used image format• Consider a screen containing plain black text on a solid

white background. There will be many long runs of white pixels in the blank space, and many short runs of black pixels within the text. Let us take a hypothetical single scan line, with B representing a black pixel and W representing white:

RUN LENGTH CODING

• WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWW

• If we apply the run-length encoding (RLE) data compression algorithm to the above hypothetical scan line, we get the following:

– 12W1B12W3B15W

• Interpret this as twelve W's, one B, twelve W's, three B's, etc.

• The run-length code represents the original 43 characters in only 13

RUN LENGTH CODING

HUFFMAN CODING

• Huffman coding algorithm determines the optimal coding using minimum number of bits. Huffman codes have the unique prefix attribute, which means they can be correctly decoded despite being variable length. The procedure for building the tree is simple and elegant. The individual symbols are laid out as a string of leaf nodes that are going to be connected by a binary tree. Each node has a weight, which is simply the frequency or probability of the symbol’s appearance.

• The tree is then built with the following steps

1. The two free nodes with the lowest weights are located.

2. A parent node for these two nodes is created.

3. It is assigned a weight equal to the sum of the two child nodes.

4. The parent node is added to the list of free nodes, and the two child nodes are removed from that list.

5. The previous steps are repeated until only one free node is left. This free node is designated the root of the tree

HUFFMAN CODING

• To generate a Huffman code you traverse the tree to the value you want, outputting a 0 every time you take a left-hand branch, and a 1 every time you take a right-hand branch.

HUFFMAN CODING

• Lets say you have a set of numbers and their frequency of use and want to create a Huffman encoding for them:

FREQUENCY VALUE

5 1

7 2

10 3

15 4

20 5

45 6

HUFFMAN CODING

• Sort the list in ascending order of weights and then just follow the steps.

• Create a parent node with a frequency that is the sum of the two lower element's frequencies:

12:*

5:1 7:2

HUFFMAN CODING

• The two elements are removed from the list and the new parent node, with frequency 12, is inserted into the list by frequency. So now the list, sorted by frequency, is:

10 : 3

12 : *

15 : 4

20 : 5

45 : 6

HUFFMAN CODING

• You then repeat the loop, combining the two lowest elements.

22 : *

10 : 3 12 : *

5 : 1 7 : 2

HUFFMAN CODING

• and the list is now:

15 : 4

20 : 5

22 : *

45 : 6

• You repeat until there is only one element left in the list.

35 : *

15 : 4 20 : 5

HUFFMAN CODING

22 : *

35 : *

45 : 6 57 : *

HUFFMAN CODING

102 : *

45 : 6

HUFFMAN CODING

HUFFMAN CODING

0

0

0

0

0

1

1

1

1

1

Thus value for 15:4 becomes - 010

And the value for 5:1 becomes- 0010 and so on.

ARITHMETIC CODING

• It bypasses the idea of replacing an input symbol with a specific code. It replaces a stream of input symbols with a single floating-point output number. The output from an arithmetic coding process is a single number less than 1 and greater than or equal to 0.

• This single number can be uniquely decoded to create the exact stream of symbols that went into its construction. To construct the output number, the symbols are assigned a set of probabilities.

• The message “BILL GATES,” for example, would have a probability distribution like this:

ARITHMETIC CODING

Character Probability

SPACE 1/10

A 1/10

B 1/10

E 1/10

G 1/10

I 1/10

L 2/10

S 1/10

T 1/10

ARITHMETIC CODING

• Once character probabilities are known, individual symbols need to be assigned a range along a “probability line,” nominally 0 to 1. The nine-character symbol set used here would look like the following:

ARITHMETIC CODING

SPACE 1/10 0.00 <= r < 0.10A 1/10 0.10 <= r < 0.20B 1/10 0.20 <= r < 0.30E 1/10 0.30 <= r < 0.40G 1/10 0.40 <= r < 0.50I 1/10 0.50 <= r < 0.60L 2/10 0.60 <= r < 0.80S 1/10 0.80 <= r < 0.90T 1/10 0.90 <= r < 1.00

Character Probability Range

ARITHMETIC CODING

• Each character is assigned the portion of the 0 to 1 range that corresponds to its probability of appearance. The most significant portion of an arithmetic-coded message belongs to the first symbols—or B, in the message “BILL GATES.”

ARITHMETIC CODING

• To decode the first character properly, the final coded message has to be a number greater than or equal to .20 and less than .30. To encode this number, track the range it could fall in. After the first character is encoded, the low end for this range is .20 and the high end is .30. During the rest of the encoding process, each new symbol will further restrict the possible range of the output number. The next character to be encoded, the letter I, owns the range .50 to .60 in the new sub range of .2 to .3

ARITHMETIC CODING

• So the new encoded number will fall somewhere in the 50th to 60th percentile of the currently established range. Applying this logic will further restrict our number to .25 to .26. The algorithm to accomplish this for a message of any length is

ARITHMETIC CODING

ARITHMETIC CODING

So the final low value, 0.2572167752, will uniquely encode the message “BILL GATES” using our present coding scheme.

Decoding The Text• Given this encoding scheme, it is relatively easy to

see how the decoding process operates.

• Find the first symbol in the message by seeing which symbol owns the space our encoded message falls in.

• Since 0.2572167752 falls between .2 and .3, the first character must be B. Then remove B from the encoded number.

• We know the low and high ranges of B, remove their effects by reversing the process that put them in.

• Now, subtract the low value of B, giving .0572167752. Then divide by the width of the range of B, or .1. This gives a value of .572167752. Then calculate where that lands, which is in the range of the next letter, I.

The algorithm for decoding the incoming number is