Information Theory
Computer Engineering Department
Second Year
Dr. Eng. Riyadh J.S. Al-Bahadili
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
1 | P a g e
Information Theory Information theory provides a quantitative measure of info contained in message signals and
allows us to determine the capacity of a communication system to transfer this info from source
to destination. Information theory is originally known as ‘Mathematical Theory of
Communication’ and it deals with mathematical modeling and analysis of a communication system
rather than with physical channel.
By the information theory, we can consider the efficient way to communicate the data. It basically
provides limits on:
1. The minimum number of bits per symbol required to fully represent the source.
2. The maximum rate at which reliable communication can take over the channel.
1. Concept of Information
An info source is an object that produces an event, the outcome of which is selected at random
according to a probability distribution.
A discrete info source is a source that has only a finite set of symbols as outputs. The set of source
symbols is called the source alphabet, and the elements of the set are called symbols or letters. Info
sources can be classified as having memory or being memoryless.
A memory source is one for which a current symbol depends on the previous symbols. A
memoryless source is one for which each symbol produced is independent of the previous symbols.
The communication system never be described in the deterministic sense, it can be considered of
Statistical nature. It means to describe a communication system completely we have to use its
unpredictable ‘or’ uncertain behavior.
It can be easily understand by example that each transmitter transmit the information randomly,
we cannot predict which one message, and transmitter is going to be transfer just next moment.
But we know the probability of transmitting a particular message.
So to define a system completely, we need statistical study of system and statistical study of system
is performed with the help of concept of probability.
Now, just an example of two messages:
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
2 | P a g e
(a) Bird flies
(b) Cat flies
Sentence (a) has minimum information, but sentence (b) has maximum information as this
sentence (b) has least probability to occur. So we can say there is a sort of inverse relationship
between the probability of event and the amount of information associated with it. Thus, we can
say: ( ) = 1 ( ) Where xi is an event with a probability of P ( ) and the amount of information associated with it is I ( ). Generally, for simplicity we define the logarithm measurement of information.
( ) = log ( ) = − log ( ) Example.1:
How many bits per symbol to encode 32 different symbols?
We have M=32 symbols, so P(x) = 1/32, ( ) = log 32 = 5 /
The advantage of logarithm presentation is that if we have probability of joint event ( , ) and if both are statistically independent
, = ( ). ( ) So, , = ( ) + ( ) [proof H.W] Example.2:
The symbols A, B, C, and D occur with probability ½, ¼, 1/8, and 1/8 respectively. Find the info content in the message ‘BDA’, where the symbols are independent.
I (BDA) = I (B) + I (D) + I (A) = log 4 + log 8 + log 2 =6 bits As we know that the base of logarithm can be different, so we may have different units of information:
• Bits (Base 2) • Nats (Base e) • Decits (Base 10)
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
3 | P a g e
e.g. 1 = 1log = 0.6932 1 = 1log 10 = 0.3010 1 = 1log 2 = 3.219
2. Entropy
In the communication system, we don’t have only a single message, but have a number of
messages. So to calculate the total information instead of calculating information due to individual
messages and adding them, we calculate the average information of the system knows as entropy
of the source.
Let there be M different messages m1, m2... mM with their respective probability P1, P2, PM.
Let us assume that in a long time interval, L messages have been generated. Let L be very large so
that L > > M; then the number of messages m1=P1 L
The amount of information in message ml = log (1/P1)
Thus, the total amount of information in ml = P1 L log (1/P1)
The total amount of information in all L messages will be
It = P1 L log (1/P1) + P2 L log (1/P2) + … + PM L log (1/PM)
So, the average information will be
H=It / L = P1 log (1/P1) + P2 log (1/P2) + … + PM log (1/PM)
Or = ∑ log 1/ Thus the unit of entropy wi1l be information/message. I (x) is called self-information and simply
H (x) is called self-entropy.
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
4 | P a g e
Example.3:
A discrete source has 4 symbols x=[x1, x2, x3, x4] with probability P= [1/2, 1/4, 1/8, 1/8]. Find info content in each symbol then calculate the entropy. Also calculate the bit average for the message ‘x1 x2 x1 x4 x3 x1 x1 x2’.
( ) = log ( ) I(x1) = 1 bit, I(x2) = 2 bits, I(x3) = 3 bits, I(x4) = 3 bits ( ) = −∑ P ( ) log P ( ) = 1.75 bit/symbol The message ‘x1 x2 x1 x4 x3 x1 x1 x2’ has 8 symbols which consist of 14 bits
So the bit average = 14/8 = 1.75 bit/symbol
Exercise.1: For binary discrete source, x has two symbols x1 and x2. Prove mathematically that H(x) is maximum when p(x1) = p(x2) and max H(x) = 1 bit/symbol.
3. Rate of Information
If a message source generates message (or symbols) at the rate of r messages (or symbols) per second; then rate of information
R= r H bits/second
Example.4:
An event has total six outcomes with probability P1=1/2, P2=1/4, P3=1/8, P4=1/16, P5=1/32, and P6=1/32. Find the entropy of the system. Also find the rate of information if these are 18 outcomes per second.
Solution. By the formula of entropy ∑ log 1/ H= ½ log 2+ ¼ log 4+1/8 log 8+ 1/16 log 16+ 1/32 log 32+1/32 log 32 = 31/16 bits/message
But now r= 18 outcomes / second
R= r H =18 (31/16) = 34.875 bits/sec
Exercise.2: A discrete source emits one of five symbol once every one microsecond with probabilities ½, ¼, 1/8, 1/16, 1/32 respectively. Determine the entropy and information rate.
(Check answer: H=57/32 bits/symbol, R=1.78125 Mb/s)
Exercise.3: TV picture consists of 2 × 10 pixels and 16 different grey levels. The pictures are repeated at the rate of 32 picture/sec. All grey levels have equal likelihood of occurrence. Find the average rate of info conveyed by this TV.
(Check answer: R= 256 Mbits/sec)
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
5 | P a g e
Exercise.4: A telegraph source has two symbols (dot and dash). The time of dot is 0.5 sec, the time of dash is 3 times of dot time, and the time between symbols is 0.2 sec. The probability of dot’s occurring is twice that of the dash. Find the average rate of info for this telegraph.
(Check answer: R= 1.725 bit/sec)
4. Discrete Memoryless Channel (DMC)
A communication channel is the path or medium through the symbols flow to the receiver. DMC is a statistical model with input X and output Y. Its ‘memoryless’ when the current output depends only on the current input and not on any of the previous inputs.
4.1 Conditional Probability ( / ) Represents the conditional probability of obtaining output given that the input , it’s also called channel transition probability.
x1 y1
x2 X ( / ) Y y2 xi yj
xm yn
A channel is completely specified by the complete set of transition probabilities:
( ⁄ ) = ( 1 1⁄ ) ( 1 2⁄ ) ( 2 1⁄ ) ( 2 2⁄ ) ⋯ ( 1⁄ ) ( 2⁄ )⋮ ⋱ ⋮ ( 1 ⁄ ) ( 2 ⁄ ) ⋯ ( ⁄ ) The matrix ( ⁄ ) is called channel matrix, and each row in this matrix must sum to unity. ∑ ( / ) = 1 Let
P(X) = [p (x1) p (x2) … p (xm)]
P(Y) = [p (y1) p (y2) … p (yn)]
Then
( ) = ( ) ( / )
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
6 | P a g e
4.2 Joint Probability
If P(X) is represented as a diagonal matrix:
( ) = ( 1) 00 ( 2) ⋯ 00⋮ ⋱ ⋮0 0 ⋯ ( ) Then
( , ) = ( ) ( ⁄ ) The matrix P(X, Y) is the joint probability matrix, and the element p ( , ) is the joint probability of transmitting and receiving . , = ( ) ⁄ = ( ) ( ⁄ ) Note:
( ) = ∑ ( , ) And = ∑ ( , ) Example.5:
For the following binary channel (p(x1) = p(x2) =1/2)
X1 .9 y1
0.1
0.2
X2 0.8 y2
a. Construct the channel matrix for this channel b. Find p(y1) and p(y2) c. Find the joint probability p(x1, y2) and p(x2,y1)
Solution:
(a) ( ⁄ ) = ( 1 1⁄ ) ( 2 1⁄ ) ( 1 2⁄ ) ( 2 2⁄ ) = . 9 . 1. 2 . 8 (b) P(X) = [p(x1) p(x2)] = [ 0.5 0.5]
P(Y) = P(X) P(Y/X)
= [0.5 0.5] . 9 . 1. 2 . 8 = [.55 .45] = [p(y1) p(y2)] (c) ( , ) = ( ) ( ⁄ )
= . 5 00 . 5 . 9 . 1. 2 . 8 = . 45 . 05. 1 . 4 Hence p(x1, y2) =.05 and p(x2, y1) =.1
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
7 | P a g e
Exercise.5:
The following channel matrix has P(X) = [.5 .5]
( ⁄ ) = 1 − 00 1 − a. Draw the channel b. Find P(Y) if p=0.2
5. Joint Entropy and Conditional Entropy
Suppose, we have totally m messages
[X] = [x1, x2, ... xm]
And now at receiver, we receive totally n messages
[Y] = [y1, y2, ... yn]
Then P ( ) = called marginal probability of x messages P ( ) = called marginal probability of y messages
Thus, marginal entropy
( ) = −∑ P ( ) log P ( ) ( ) = −∑ P ( ) log P ( ) Where P ( ) may be defined as ( ) = ∑ ( , ) (Note: 0 ≤ ( ) ≤ ) ( , ) = joint probability of event x and y
Joint entropy of x and y
( , ) = −∑ ∑ , log ( , )
Note that ( ⁄ ) and ( ⁄ ) are called conditional probability that will be clear by their definitions.
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
8 | P a g e
( ⁄ ) = Probability of X when Y has been received ( ⁄ ) = Probability of Y when X has been transmitted
For conditional entropy; we use the relation
( ⁄ ) = −∑ ∑ , log ( ⁄ )
Similarly
( ⁄ ) = −∑ ∑ , log ( ⁄ )
6. Relations between the different entropies
The relations between joint, conditional and marginal entropies given by,
( , ) = ( ⁄ ) + ( ) = ( ⁄ ) + ( )
Exercise.6:
Find H(X), P(X, Y), and H(Y) for given channel shown in figure, given that P(X1) =0.2, P(X2) =0.5, and P(X3) =0.3
0.8 X1 Y1
0.2
X2 1 Y2
0.3
X3 0.7 Y3
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
9 | P a g e
Exercise.7:
A transmitter has an alphabet of four letters [x1, x2, x3, x4] and the receiver has an alphabet of three letters. Calculate all entropies if the joint probability matrix is:
P(X, Y) = 0.3 0.05 0 0.25 00 0 0.15 0 0.05 0.050.15 (Check answer: H(X) = 1.96, H(Y/X) =0.53, H(X, Y) =2.49, H(Y) =1.49, H(X/Y) =1.0)
7. Mutual Information
We have
( ) = Probability of transmitting ( ⁄ ) = Probability of transmitting , when has been received Thus ( ) shows the probability ‘or’ uncertainty of x; when we have not received any thing called prior uncertainty and ( ⁄ ) called final uncertainty of ‘x’ when we have received that at receiver side and the difference of these uncertainties called mutual information. ; Mutual information represents the uncertainty about input that is resolved by observing the output.
( ; ) = ∑ ∑ ( , ) log ( ⁄ ) ( ) Properties of ( ; )
• I(X; Y) = I(Y; X) • I(X; Y) ≥ 0 • I(X; Y) = H(Y) − H(Y X⁄ ) = H(X) − H(X Y⁄ ) • I(X; Y) = H(Y) + H(X) − H(X, Y)
8. Channel Types
8.1 Lossless Channel
Channel matrix with only nonzero element in each column. No source info is lost in transmission.
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
10 | P a g e
e.g.
( ⁄ ) = 3/4 1/4 0 0 000 00 1/30 2/30 01 , where ⁄ = 0 1
8.2 Deterministic Channel
Channel matrix with only nonzero element in each row. The element must be 1.
e.g. ( ⁄ ) = ⎣⎢⎢⎢⎡11 00 00000 110 001⎦⎥⎥
⎥⎤ 8.3 Noiseless Channel
It’s both lossless and deterministic, with m = n.
X1 y1
X2 y2
Xm yn
Where ⁄ = 1 = 0 ≠ 8.4 Binary Symmetric Channel (BSC)
( ⁄ ) = 1 − 1 − Example.6:
Consider BSC channel with ( ) =∝ 1-p p
p
1-p
a. Show that:
( ; ) = ( ) + log + (1 − ) log (1 − ) PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
11 | P a g e
b. Calculate ( ; ) for α=0.5 and p=0.1 c. Repeat (b) for p=0.5 and comment on the result.
Solution:
a. We have ( ) =∝ , so ( ) = 1−∝ ( , ) = ( ) ( ⁄ ) ( , ) = ∝ 00 1−∝ 1 − 1 − = ∝ (1 − ) ∝ (1−∝) (1−∝)(1 − ) ( ⁄ ) = −∑ ∑ , log ( ⁄ ) = −∝ (1 − ) log (1 − )−∝ log − (1−∝) log − (1−∝)(1 − ) log (1 − ) = − log − (1 − ) log (1 − ) I(X; Y) = H(Y) − H(Y X⁄ ) = ( ) + log + (1 − ) log (1 − )
b. When α=0.5 and p=0.1
P(Y) = P(X) P(Y/X) = [. 5 .5] . 9 . 1. 1 . 9 = [. 5 .5] = [ ( ) ( )] ( ) = −∑ p ( ) log p ( ) = −.5 log . 5 − .5 log . 5 = 1 log + (1 − ) log (1 − ) = −0.469 Hence I(X; Y) = 1 − 0.469 = 0.531
c. When α=0.5 and p=0.5
P(Y) = P(X) P(Y/X) = [. 5 .5] . 5 . 5. 5 . 5 = [. 5 .5] = [ ( ) ( )] ( ) = −∑ p ( ) log p ( ) = −.5 log . 5 − .5 log . 5 = 1 log + (1 − ) log (1 − ) = −1 Hence I(X; Y) = 1 − 1 = 0 When I(X; Y) = 0, the channel is useless, i.e. when p=0.5 no information is being transmitted at all.
Exercise.8:
For a lossless channel show that: H(X/Y) = 0
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
12 | P a g e
Exercise.9:
For a noise channel with inputs = outputs = m, show that
H(X) = H(Y), and H(Y/X) =0
Exercise.10:
Show that: ( , ) = ( ⁄ ) + ( ) Exercise.11:
Show that: ( ; ) = ∑ ∑ ( , ) log ( ⁄ ) ( ) Exercise.12:
Show that: I(X; Y) = I(Y; X) Exercise.13: Show that: I(X; Y) ≥ 0 (Hint: log = − log( ) , and ln ∝≤∝ −1)
9. Channel Capacity
The mutual information also shows the average information per symbol transmitted the system. And it will be practical as the Shannon has also showed that capacity of channel can be said as the max practical rate of information. So capacity per symbol Cs of channel is given by:
Cs = max ( ) I(X; Y) … Bit/symbol = max [H(X) – H(X/Y)] = max [H(Y) – H(Y/X)]
If r is symbol rate, then the channel capacity per second C is:
C= r Cs …. Bit/sec
• For lossless channel: Cs=log2 m • For deterministic channel: Cs=log2 n • For noiseless channel: Cs=log2 m = log2 n • For BSC channel: Cs=1 + log + (1 − ) log (1 − )
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
13 | P a g e
Example.7:
Find the capacity for the following channel (where ( ) =∝) 1-p p
p
1-p Solution:
( ⁄ ) = 1 − 00 1 − ( ) = ( ) ( ⁄ ) = [∝ 1−∝] 1 − 00 1 − = [∝ (1 − ) (1−∝)(1 − )] ( , ) = ( ) ( ⁄ ) = ∝ 00 1−∝ 1 − 00 1 − = ∝ (1 − ) ∝ 00 (1−∝) (1−∝)(1 − ) ( ) = −∑ p ( ) log p ( ) =(1 − )[−∝ log ∝ − (1−∝) log (1−∝)] − log − (1 − ) log (1 − ) ( ⁄ ) = −∑ ∑ , log ( ⁄ ) =− log − (1 − ) log (1 − ) ( ; ) = ( ) − ( ⁄ ) = (1 − ) ( ) Cs=max ( ) I(X; Y) = max ( )(1 − p)H(X) = (1 − ) max ( ) ( ) = (1 − ) (Note: max ( ) H(X) = log = log 2 = 1)
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
14 | P a g e
10. Additive White Gaussian Noise Channel (AWGN)
X y
N
The noise characteristic of channels practically observed are assumed Gaussian. The channel capacity for this channel is:
C= Max{R}
Or = log (1 + / ) … bit/sec Where:
B: bandwidth of the channel (Hz)
S/N: signal to noise ratio (SNR)
S: signal power in watt
N: noise power in watt (N=B N0) and N0 is PSD of the noise (W/HZ)
Note: The channel is error-free if and only if ≥ Example.8:
Consider AWGN channel with 4 KHz bandwidth and noise PSD is 2x10-12 W/HZ. The signal power required at the receiver is 0.1 mW. Calculate the capacity of this channel.
Solution:
We have: B=4000 Hz, S=0.1x10-3 W, N0=2x10-12 W/Hz
N=N0B =2 (10-12) (4000) = 8x10-9 W
SNR= S/N = . ( ) ( ) = 1.25(10 ) Hence = log 1 + = 4000 log [1 + 1.25(10 )] = 54.44 / Example.9:
The terminal of a computer used to enter alphabetic data is connected to the computer through a voice grade telephone line having a usable bandwidth of 3 KHz and SNR=10. Assume that terminal has 128 characters, determine:
a. Capacity of channel b. The max rate of transmission without error
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
15 | P a g e
Solution:
a. = log 1 + = 3000 log (1 + 10) = 10.378 / b. Average information: = log 128 = 7 / ℎ
The rate is = For error-free transmission: ≤ ≤ 7 ≤ 10378 Hence ≤ 1482 ℎ / Exercise.14:
Calculate the capacity of low pass channel with a usable bandwidth of 3 KHz and SNR=100 at channel output. Assume the channel noise to be white Gaussian.
Exercise.15:
A discrete signal with 256 samples is transmitted by rate of 104 sample/sec.
a. What is the information rate? b. Can the output be transmitted without error over AWGN channel with B=10 KHz and
SNR=100? c. Find the required SNR for error-free transmission for part b d. Find the required Bandwidth for AWGN channel for error-free transmission if SNR=100.
11. Code Length, Code Efficiency and Redundancy
The length of a code word is the number of bits in the code word (symbol). The average code word length per source symbol is:
= ∑ ( ) Where is the length of symbol in bits. Code efficiency may be defined as the ratio of actual transmission rate to the max transmission rate.
= ( ; ) ( ; ) Or = ( ) . 100% The redundancy of the channel is defined as
= 1 − We know that rate of information
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
16 | P a g e
= / Where r is the symbol rate (symbol/sec). If all symbols convey the same amount of information then, = log so = log Now consider we are using an encoder that converts the incoming symbols to code words consisting of bits produced at same fixed rate. Then
= log 2 = If the symbols have different probabilities, then ≤ or ≥ Here represents a very important parameter; and called average code length denoted by . For optimum source coding = , but practically ≥
12. Kraft Inequality
A necessary and sufficient condition for a binary code to be uniquely decipherable, the code length must be such that,
= ∑ 2 ≤ 1
Now simplest coding is that we generate a fixed length code in which all codes have the same length given by: = , so = 2 This means that for decipherable (equally identified) codes in the case of fixed length coding, we need
≥ log So, the resulting efficiency can be calculated as:
≤ Here the result of this discussion is that if < log and we need higher efficiency, we have to reduce the average code length . That is why we use variable code length code.
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
17 | P a g e
Example.10:
For the following codes:
xi Code A Code B Code C Code D 00
01
10
11
0
10
11
110
0
11
100
110
0
100
110
111
a. Show that all codes except code B satisfy Kraft Inequality. b. Show that code A and D are uniquely decodable but codes B and C are not.
Solution:
a. For Code A: = = = = 2 = ∑ 2 =1/4+1/4+1/4+1/4 = 1 For Code B: = 1, = = 2, = 3 = ∑ 2 =1/2+1/4+1/4+1/8 = 9/8 >1 For Code C: = 1, = 2, = = 3 = ∑ 2 =1/2+1/4+1/8+1/8 = 1 For Code D: = 1, = = = 3 = ∑ 2 =1/2+1/8+1/8+1/8 = 7/8
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
18 | P a g e
An objective of source encoding is to minimize the average bit rate required for representation of source by reducing the redundancy of information source.
13.1 Prefix Coding
Codes generated by encoder should be unique for minimum error during decoding. But due to this; every source have limitation on the number of codes generated. For the purpose that we have codes that are uniquely identified can be solved by using ‘prefix coding ’.
‘A prefix code is defined as a code in which no code word is the prefix of any other code-word’
This can be understand by the following example:
Code I Code II (Prefix code)
S0
S1
S2
S3
0
1
00 S0 is the prefix
11 S1 is the prefix
0
10
110
111
It can be easily see that these code can be easily decoded by decoder as this follows the Kraft inequality.
It is also found that there are some codes that can follow the Kraft inequality but not prefix code, so they can be decoded without any error. For example:
Code III
0
01
011
0111
13.2. Shannon-Fano Coding
For variable length coding; if we apply a practical concept that the generally used codes (means codes with high-probability) should be coded in the minimum length and rarely used codes should be coded in the long length so that don’t effect the efficiency too much.
Shannon Fano coding generates efficient codes in which the word length increases as the probability of symbol decreases.
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
19 | P a g e
In this method; first of all arrange message according to the descending order of probability. Now draw a line that divides the symbol into two groups such that the group probabilities are as nearly as equal as possible.
Then assign the digit 0 to each symbol in the group above the line and digit l to each symbol in the
group below the line. For all the subsequent steps, subdivide each group into subgroups and again
assign digits following the previous rule. Whenever a group contains one symbol no further
subdivision is possible and code word for that symbol is completed. When all the groups have been
reduced to one symbol, the code words for each symbol is assigned.
Example.11: For the given message sequence with their probabilities, Apply Shannon Fano coding, calculate the code efficiency.
[x] = [x1 x2 x3 x4 x5 x6 x7 x8]
[P] = [1/4 1/8 1/16 1/16 1/16 1/4 1/16 1/8]
Solution. Arrange the probabilities in descending order.
Message
X
Prob. Encoding Code Length X1
X6
X2
X8
X3
X4
X5
X7
¼
¼
0 0
0
0
2
2
3
3
4
0 1
1/8
1/8
1/16
1/16
1/16
1/16
1
1
0
0 1
1
1
1
1
1
1
0
0 1 4
1
1
1 0 4
1 1 4
Now length/message = ∑ = 2.75 letters/message ( ) = ∑ P ( ) log 1/P ( ) = 2.75 bits/message Code efficiency, = ( ) . 100 = . . .100 = 100 %
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
20 | P a g e
Exercise.16:
Apply the Shannon Fano coding and find the code efficiency
[x] = [x1 x2 x3 x4 x5 x6 x7]
[P] = [.4 .2 .12 .08 .08 .08 .04]
(Check answer: efficiency = 96.03 % )
13.3. Huffmann Coding
The Huffmann coding (compact coding) is an optimum coding in the sense that no other uniquely
decodable set of code-words has a smaller average code-word length for a given source.
The Huffmann encoding algorithm proceeds as follows:
1. The source symbols are arranged in the descending order
2. The two source symbols of least probabilities are regarded as being combined into a new source
symbol with probability equal to the sum of the two original probabilities. The probability of the
new symbol is placed in the list in accordance with its value.
3. The procedure is repeated until we are left with a final list of source. Symbol of only two for
which a ‘0’ and a ‘1 ‘are assigned. The code for each symbol is found by working backward and
tracing the sequence of 0s and 1s assigned to that symbol as well as its successor.
Example.12: We have 5 symbols for a discrete source
Xi S0 S1 S2 S3 S4
P (xi) .4 .2 .2 .1 .1
Obtain Huffmann coding, average code word length, entropy of the given system.
Solution (a) Entropy of the system is given by: ( ) = ∑ P ( ) log 1/P ( ) = 2.12193 bits/message (b) Huffmann coding can be performed as follow:
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
21 | P a g e
Symbol Step 1 Final Codes Step 2 Step 3 Step 4
S0
S1
S2
S3
S4
.4
.2
.2
.1
.1
00
10
11
010
011
.4
.2
.2
.2
00
01
10
11
.4
.4
.2
1
00
01
.6
.4
0
1
Thus we have finally
Symbol Code Code length (Ni)
S0
S1
S2
S3
S4
00
10
11
010
011
2
2
2
3
3
(c) Average Code length
= ∑ = 2.2 letter/message Exercise.17:
A message source generates ten messages with probabilities 0.1, 0.13, 0.01, 0.04, 0.08, 0.29, 0.06, 0.22, 0.05 and 0.02. The rate of message generation is 300 message/sec. Find the entropy of source and information rate. Obtain the Huffmann codes for message and calculate the average number of bits/message. What is code redundancy?
(Check answer: H= 2.38 bits/message, R=714 bits/sec, = 2.43 letter/message, = 97.94 %, Redundancy γ= 2.06 %)
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
22 | P a g e
14. Error Detection and Correction
Tx
Codec Modem Channel
Rx
Model of digital communication system
The goal of the channel encoding and decoding process is to detect (or decode) the data digits with minimum probability of error. This is an effective way of increasing the channel capacity.
The basic idea of coding is to add a group of check digits to the message digits. The check digits may then provide the receiver with sufficient information to either detect or correct channel errors.
= + k: number of message digits
r: number of check digits
n: code word
Single parity check
A simple one error detection with r= 1 bit
k r
xor
Hamming code
A class of linear codes which can correct all patterns of single error in received word.
= 2 − 1
Block coding
Let the encoding word
Message Source Encoder
Channel Encoder
Modulator
User Source Decoder
Channel Decoder
Demodulator
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
23 | P a g e
= [ … … ]
Message check digits
Then
= ( , , … , ) = ( , , … , ) …
= ( , , … , ) For k=3, and r=3: = [ ] With the following functions for check digits:
= ⨁ = ⨁ = ⨁ (Note: the operator ⨁ is modulo-2 addition) Then
= 0. ⨁ 1. ⨁ 1. = 1. ⨁ 0. ⨁ 1. = 1. ⨁ 1. ⨁ 0. Or in matrix form:
= 0 1 11 0 11 1 0 0 1 1 1 0 01 0 11 1 0 0 1 00 0 1 = 0 In general:
= 0 Where H is r x n matrix called parity check matrix.
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
24 | P a g e
Decoding process
Let the error vector be
= [ … ] , where = 0 … 1 … The received word is
= [ … ] And = ⊕ The decoder begins computing the syndrome S
=
Now we have two cases:
• If E=0 (no error), then = 0, i.e. = = • If E≠0 (there’s an error), then:
= = ( + ) = + But = 0, then = That’s mean represents a column of H matrix.
Example.13:
For k=3, n=6 and the parity check matrix is:
= 0 1 1 1 0 01 0 11 1 0 0 1 00 0 1
If the received word is R= [0 1 0 0 1 1], check if there’s an error occurred in R, then find the correct transmitted word C.
Solution:
= = 0 1 1 1 0 01 0 11 1 0 0 1 00 0 1 ⎣⎢⎢⎢⎢⎡010011⎦⎥⎥⎥⎥⎤ = 110
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com
Dr. Riyadh Jabbar Information Theory 2nd class computer engineering
25 | P a g e
Since S≠0 (an error occurred)
E= [0 0 1 0 0 0]
= ⊕ = [0 1 0 0 1 1] ⊕[0 0 1 0 0 0] = [0 1 1 0 1 1] Exercise.18:
For k=4, n=7. If the received word is R= [1 1 1 1 0 1 0], check if there’s an error occurred in R, then find the correct transmitted word C for the flowing functions.
= ⨁ ⨁ = ⨁ ⨁ = ⨁ ⨁
PDF created with pdfFactory Pro trial version www.pdffactory.com
http://www.pdffactory.com