Dcs unit 2

Unit 2

Information Theory and CodingInformation Theory and Coding

By

Prof A K Nigam

9/4/2013 1Lt Col A K Nigam, ITM University

Syllabus for Unit 2

• Definition of information

• Concept of entropy• Concept of entropy

• Shannon’s theorem for channel capacity

• Shannon‐Hartley theorem

• Shannon channel capacityp y(Reference Book Communication Systems 4Th Edition Simon Haykin)

Definition of information

We define the amount of information gained after observing th t S hi h ith d fi d b bilit ththe event S which occurs with a defined probability as the logarithmic function

( ) 1logkk

I sp

⎛ ⎞⎜ ⎟⎝ ⎠k

k k

pWhere p is probability of occuranceof event s

⎝ ⎠

RememberJoint Probability P(x, Y)

di i l b biliConditional Probability P(A/B)= Pr. Of occurrence of A after B has occurred)

Important properties• If we are absolutely certain of the outcome of anevent, even before it occurs, there is no informationgained.gained.

•The occurrence of an event either provides some or noinformation but never brings about a loss ofinformation, but never brings about a loss ofinformation.

•The less probable an event is, the more information wegain when it occurs.

•If sk and sl are statistically independent.


Standard Practice for defining informationStandard Practice for defining information

• It is the standard practice today to use a logarithm to base 2.p y gThe resulting unit of information is called the bit

• When pk = 1/2, we have I(sk) = 1 bit. Hence, one bit is theWhen pk 1/2, we have I(sk) 1 bit. Hence, one bit is theamount of information that we gain when one of twopossible and equally likely events occurs.


Entropy of a discrete memoryless sourceEntropy of a discrete memoryless source

• Entropy of a discrete memory‐less source with sourcepy f yalphabet ‘S’ is a measure of the average information contentper source symbol.


Properties of EntropyProperties of Entropy

1. Entropy is a measure of the uncertainty of the randompy yvariable

2. H(s)=0, if and only if the probability p= 1 for some k, and theremaining probabilities in the set are all zero; this lowerbound on entropy corresponds to no uncertainty.py p y

3. H(s)= log2K, if and only if pk = 1/K for all k (i.e., all thesymbols in the alphabet Y are equi‐probable); this upperbound on entropy corresponds to maximum uncertainty.


Proof of these properties of H(s)Proof of these properties of H(s)

2nd Property

• Since each probability pk is less than or equal to unity, it follows that each term pk Iog2(1/pk) is always nonnegativefollows that each term pk Iog2(1/pk) is always nonnegative, and so H(s) ≥ 0.

• Next, we note that the product term pk Iog2(1/pk) is zero if, and only if pk = 0 or 1.

• We therefore deduce that H(s) = 0 if, and only if pk= 0 or 1, that is pk = 1 for some k and all the rest are zero.


Example: Entropy of Binary Memoryless Source

• We consider a binary source for which symbol 0 occurs• We consider a binary source for which symbol 0 occurs with probability P(0) and symbol 1 with probability P(1) = 1 – P(0) We assume that the source is memory-less

• The entropy of such a source equals

H(s) = - P(0) log2 P(o) - P(1) log2 P(1)= - P(o) log2 P(o) - {1 – P(o)} log2{l – P(o)} bits

• For P(0)=0 P(1)=1 and thus H(s)=0

• For P(0)=1 P(1)=0 and thus H(s)=0

• For P(0)=P(1)=1/2 it is maximum=19/4/2013 9Lt Col A K Nigam, ITM University

2 2log (1 ) log (1 )

{ l l (1 ) l (1 )}

H p p p pdH d= + − −

2 2 2{ log log (1 ) .log (1 )}

1log loga a

p p p p pdp dp

dweknowthat x e thus wecan writedx x

= − + − − −

=

2 2 2 2 21 1{log . log log ( 1) log ( 1) log (1 )}

1 11

dx xdH pp p e e e pdp p p p

= − + + − − − − −− −

2 2 2 21{log log ( 1) log log (1

1p e p e

p= − + + − − −

−

2 2 2 2

)

{log log log log (1 )}

p

p e e p= − + − − −

2 2

2 2

{log log (1 )} 0log log (1 )(1 ) 5

p pp p

or p P p

= − − − =⇒ = −= − ⇒ =

2 2

(1 ) .51 1( ) log 2 log 2 12 2

max max

or p P p

Entropy is thus H s

Thus Entropy is whenthe probabilities areequal andwecan write valueof

= ⇒ =

= + =

max

max maxThus Entropy is whenthe probabilities areequal andwecan write valueof

Entropy H p= =1

1 1log log /M

kk k

M M bits Messagep M=

⎛ ⎞= =⎜ ⎟⎝ ⎠

∑9/4/2013 10Lt Col A K Nigam, ITM University


Proof of 3rd statement:Condition for Maximum Entropy

• We know that the entropy can achieve maximum value oflog2 M where M is the number of symbols.

If th t ll b l i b bl th• If we assume that all symbols are equiprobable thenprobability of each occurring is 1/M

• The associated entropy is thereforepy

21

1( ) logM

kk k

H s pp=

=∑

21 1. log

1/

k kp

MM M

=

• This is maximum value of entropy and thus it is maximum

2log M=

• This is maximum value of entropy and thus it is maximumwhen all symbols have equal probability of occurrence


EXAMPLE: Entropy of Source


EXAMPLE: Entropy of SourceEXAMPLE: Entropy of Source

• Six messages with probabilities 0 30 0 25Six messages with probabilities 0.30, 0.25, 0.15, 0.12, 0.10, and 0.08, respectively are transmitted Find the entropytransmitted. Find the entropy

2 2 2 2 2 2( ) (.30log .30 .25log .25 .15log .15 .12log .12 .10log .10 .08log .08)H x = − + + + + +

10 10 10 10 10 101 (.30log .30 .25log .25 .15log .15 .12log .12 .10log .10 .08log .08)

.3011 .7292

= − + + + + +

= + ×.3012.422644= +


Discrete Memoryless Channel

• A discrete memory‐less channel is a statistical model with an input X and an output Y that is a noisy version of X; both X and Y are random variables.


Channel matrix, or transition matrix

A convenient way of describing a discrete memory-lesschannel is to arrange the various transition probabilities ofg pthe channel in the form of a matrix as follows:


Joint EntropyJoint Entropy

• Joint Entropy is defined asJoint Entropy is defined as

( )1( ) l

m n

∑∑H(XY)= 21 1

1( , ) log( , )j k

j k j k

p x yp x y= =

∑∑

= 2( ) log ( )m n

k kp x y p x y−∑∑ 21 1

( , ) log ( , )j k j kj k

p x y p x y= =∑∑


Conditional EntropyConditional Entropy

• The quantity H(x/y) is called a conditional entropyThe quantity H(x/y) is called a conditional entropy.

• It represents the amount of uncertainty remainingabout the channel input after the channel outputhas been observed and is given by:‐

• H(x/y)• H(x/y)

• Similarly H(y/x) can be computed which is averageSimilarly H(y/x) can be computed which is average uncertainty of the channel output given that x was transmitted.


Conditional Entropy: ProofConditional Entropy: Proof

• Conditional probability is defined asp y

( , )( / )( )

p x yp x yp y

• If received symbol is yk

( )p y

m

1( , )

( / )( )

m

j kj

kk

p x ythen p X y

p y==∑

• The associated entropy is therefore can be computed as

( )kp y


21

( , ) ( , )( / ) log

( ) ( )

nj k j k

kj k k

p x y p x yH X y k k

p y p y=

= −∑

21

( / ) log ( / )............(1)n

j k j kj

p x y p x y=

= −∑

___________________

( / ) ( / )

Taking average for all values of k

H X Y H X yk=

1

( / ) ( /

( ) ( / )

)n

k kk

H X Y H X yk

p y H X y

=

=∑1

21 1

( ) ( / ) log (

kn n

k j k jk j

p y p x y p x

=

= =

= −∑ ∑ / )ky1 1k j

21 1

( ) ( / ) log ( / )n n

k j k j kk j

p y p x y p x y= =

= −∑∑

21 1

( , ) log ( / )

j

n n

j k j kk j

p x y p x y= =

= −∑∑9/4/2013 20Lt Col A K Nigam, ITM University

Mutual Information

Problem statement

Mutual Information

Given that we the channel output yk is as a noisy version of the channel input Xj.

Given that the entropy H(X) is a measure of the prior uncertaintyGiven that the entropy H(X) is a measure of the prior uncertainty about X, how can we measure the uncertainty about X after observing Y?

Mutual Information Defined• Note that the entropy H(x) represents our uncertainty about the

channel input before observing the channel output, and theconditional entropy H(x/y) represents our uncertainty about thechannel input after observing the channel output.

• It follows that the difference H(x) - H(x/y) must represent our( ) ( y) puncertainty about the channel input that is resolved by observingthe channel output.

• This important quantity is called the mutual information of the channel denoted by I(x; y)

• We may thus write I(X; Y)= H(x) - H(x/y) or= H(y) – H(y/x)

Al it b h th t I(X Y) H( ) +H( ) H( )• Also it can be shown that I(X; Y)= H(x) +H(y)- H(x,y)


Capacity of a Discrete Memoryless Channelp y y

• Channel capacity of a discrete memoryless channel isp y ydefined as the maximum mutual information I(x; y) in anysingle use of the channel where the maximization is over allpossible input probability distributions {p(xj)} on Xpossible input probability distributions {p(xj)} on X.

• The channel capacity is commonly denoted by C. We thusp y y ywrite

{ ( )}( ; )max

p xC I X Y=

• The channel capacity C is measured in bits per channel use,or bits per transmission

{ ( )}jp x

or bits per transmission.


Examples of Mutual Information Numericalsp

D N i l f Si h d S Ch t• Do Numerical from Singh and Sapre Chapter 10 (10.3.1, 10.4.1, 10.4.2, 10.4.3, 10.5.2, 10 6 2)10.6.2)


Example: Find Mutual Information for theh l h b lchannel shown below

.8P(X1)=.6 y1

.2 .3

P(X2)=.4 .7 y2( ) y

.8 .2( / )P y x ⎡ ⎤

= ⎢ ⎥( / ).3 .7

P y x ⎢ ⎥⎣ ⎦


• We know that I(x, y)=H(y)‐H(y/x)……..1

SolutionWe know that I(x, y) H(y) H(y/x)……..1

• Finding H(y)

• P(y1)=0.6×0.8+.4×0.3=.6

• P(y2)=0.6×0.2+0.4×0.7=.4

• H(y)=‐3.322× [0.6log0.6+0.4log0.4]=0.971 bits/message

• Finding H(y/x)= ‐

• Finding P(x, y)

( , ) log ( / )p x y p y x∑∑

48 12⎡ ⎤

• H(y/x)= ‐3 322[0 48×log0 8+0 12×log0 2+ 0 12×logo 3+0 28×log0 7]

.48 .12( , )

.12 .28P x y ⎡ ⎤

= ⎢ ⎥⎣ ⎦

• H(y/x)= ‐3.322[0.48×log0.8+0.12×log0.2+ 0.12×logo.3+0.28×log0.7]

= 0.7852

• Putting values in 1 we get I(x, y)=0.971‐0.7852=0.1858 bitsg g ( , y)


Types of channels and associated EntropyTypes of channels and associated Entropy

• Lossless channelLossless channel

• Deterministic channel

i l h l• Noiseless channel

• Binary symmetric channel


General Treatment for all the channels

( , ) ( ) ( / ) ........(1)( ) ( / ) ........(2)

WeknowI x y H x H x yH y H y x

= −= −( ) ( / ) ........(2)

( / ) ( ) l ( / )n n

H y H y xAlsothat

H X Y ∑∑ 21 1

( / ) ( , ) log ( / )

( , ) ( ) ( / ) ( ) ( / )

j k j kk j

H X Y p x y p x y

Weknowthat p x y p x p y x p y p x y thus wecanwrite= =

= −

= =

∑∑

21 1

( , ) ( ) ( ) ( ) ( )

( / ) ( ) ( / ) log ( / ).n n

k j k j kk j

p y p p y p y p y

H X Y p y p x y p x y= =

= −∑ ∑ .....3j

( / ) ( ) ( / ) log ( / ) 4n n

Similarly wecanwrite

H Y X p x p y x p y x= ∑ ∑ 21 1

( / ) ( ) ( / ) log ( / )........4j k j k jj k

H Y X p x p y x p y x= =

= −∑ ∑9/4/2013 28Lt Col A K Nigam, ITM University

Lossless channel• For a lossless channel no source information is lost in

transmission. It has only one non zero element in each column For examplecolumn. For example

[ ]3 / 4 1/ 4 0 0 0

( / ) 0 0 1/ 3 2 / 3 0P Y X⎡ ⎤⎢ ⎥⎢ ⎥

• In case of lossless channel p(x/y)=0/1 as the probability of x

[ ]( / ) 0 0 1/ 3 2 / 3 00 0 0 0 1

P Y X = ⎢ ⎥⎢ ⎥⎣ ⎦

• In case of lossless channel p(x/y)=0/1 as the probability of x given that y has occurred is 0/1

• Putting this in eq 3 we get H(x/y)=0• Thus from eq. 1 we get

I(x, y)=H(x)Also C=max H(x)Also C=max H(x)


Deterministic channel• Channel matrix has only one non zero element in each row, for

example1 0 01 0 0⎡ ⎤⎢ ⎥⎢ ⎥

[ ]1 0 0

( / ) 0 1 00 1 00 0 1

P Y X⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

• In case of Deterministic channel p(y/x)=0/1 as the probability of y given that x has occurred is 0/1

0 0 1⎢ ⎥⎣ ⎦

• Putting this in eq 3 we get H(y/x)=0

• Thus from eq. 1 we get

I(x, y)=H(y)Also C=max H(y)


Noiseless channel• A channel which is both lossless and deterministic, has only one

element in each row and column. For example

1 0 0 0⎡ ⎤

[ ]

1 0 0 00 1 0 0

( / )0 0 1 0

P y x

⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥

• Noiseless channel is both lossless and deterministic thus H( / ) H( / ) 0

0 0 1 00 0 0 1⎢ ⎥⎢ ⎥⎣ ⎦

H(x/y)=H(y/x)=0

• Thus from eq. 1 we getI(x, y)=H(y)=H(x)Also C=max H(y)=max H(x)=log2m=log2n where m and n areAlso C=max H(y)=max H(x)=log2m=log2n where m and n are

number of symbols9/4/2013 31Lt Col A K Nigam, ITM University

Binary Symmetric Channel

α (1 )( / )

p pX Y

−⎡ ⎤⎢ ⎥

1‐α

( / )(1 )

p X Yp p

=⎢ ⎥−⎣ ⎦1 α

(1 )p pα α−⎡ ⎤(1 )( , )

(1 ) (1 )(1 )p p

p X Yp p

α αα α

⎡ ⎤= ⎢ ⎥− − −⎣ ⎦


21 1

( / ) ( , ) log ( / )n n

j k k jk j

H Y Y p x y p y x= =

= −∑∑

( / ) [ (1 ) log(1 ) log (1 ) logputting values frommatrix we getH Y Y p p p p p pα α α= − − − + + −

(1 )(1 ) log(1 )][ log (1 ) log(1 )]

p pp p p p

α− − −= − + − −

.1( , ) ( ) log (1 ) log(1

Putting this ineq we getI X Y H y p p p= + + − − )p


CHANNEL CAPACITY OF A CONTINUOUS CHANNEL

• For a discrete random variable x the entropy H(x) was defined as

• H(x) for continuous random variables is obtained by using the integral instead of discrete summation thus


Similarly

( , ) ( , ) log ( , )H X Y p x y p x y dxdy∞ ∞

−∞ −∞

= − ∫ ∫

( / ) ( , ) log ( / )H X Y p x y p x y dxdy

∞ ∞

∞ ∞

−∞ −∞

= − ∫ ∫

( / ) ( , ) log ( / )H Y X p x y p y x dxdy

−∞ −∞

∞ ∞

∞ ∞

= − ∫ ∫( ; )

( ; )

For a contineous channel I x y is defined as

p x y

−∞ −∞

∞ ∞

∫ ∫( ; )( ; ) ( , )

( ) ( )p x yI x y p x y dxdy

p x p y−∞ −∞

= − ∫ ∫


Transmission Efficiency of a channelTransmission Efficiency of a channel

Actual transinformationM i t i f ti

η =Maximum transinformation

( ; ) ( ; )I X Y I X Y= =

max ( ; )I X Y C

Redundancy of a channely

( ; )1 C I X YR η −= − =1R

Cη


Information Capacity Theorem for band‐limited, power‐limited Gaussian channelslimited, power limited Gaussian channels

• Consider X(t) that is band-limited to B hertz.

• Also we assume that uniform sampling of the process X(t)at the transmitter at Nyquist rate of 2B samples per secondproduces 2B samples per second which are to betransmitted over the channel

• We also know that Mutual Information for a channel isI(X; Y)=H(y) – H(y/x)=H(x) - H(x/y)….already done( ; ) (y) (y ) ( ) ( y) y


Information Capacity Theorem…….

•For Gaussian channel the probability density is given by

2 21 2 2/ 2

2

1( )2

xp x e σ

πσ−=

•For this p(x), H(x) can be shown to be (not required to be solved)

2 21( ) log 2 log(2 )H x e eπ σ π σ= =………….1

•If signal power is S and noise power is N then the received signal is sum of t itt d i l ith S d i ith N th j i t t

( ) og og( )2

x e eπ σ π σ

transmitted signal with power S and noise with power N then joint entropy of the source and noise is


( , ) ( ) ( / )( / ) ( )

H x n H x H n xIf thetransmitted signal and noiseareindependent then H n x H n

= +=( / ) ( )

( , ) ( ) ( )............

If thetransmitted signal and noiseareindependent then H n x H nThusH x n H x H n A= +

signal( , ) ( , )

Sincethe received is sumof signal x and noisen wemay equateH x y H x n=

( , ) ( ) ( / )But H x y H y H x y= + using this and eq.A weget( ) ( / ) ( ) ( )

R i hiH y H x y H x H n+ = +

2

Rearranging this weget( ) ( / ) ( ) ( ) ..........2

(using ( ) 1H x H x y H y H n Mutual InformationNow N or S N from Eq we getσ

− = − =

+(using ( ) .11( ) log{2 ( )} ( )2

Now N or S N from Eq we get

H y e S N y S N

σ

π

= +

= + = +

1( ) log{2 ( )}2

and H N e Nπ=9/4/2013 39Lt Col A K Nigam, ITM University

•Putting these values in eq 2 we get

1( ) l S N+⎛ ⎞

Putting these values in eq. 2 we get

1( , ) log21 l

S NI X YNS

+⎛ ⎞= ⎜ ⎟⎝ ⎠⎛ ⎞1 log 1

2No.of samples per second×Mutual Information

SN

C

⎛ ⎞= +⎜ ⎟⎝ ⎠

= p p12 log 1 log 12

S SB BN N

⎛ ⎞ ⎛ ⎞= × + = +⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠2 N N⎝ ⎠ ⎝ ⎠

(Note: No of samples per sec is 2B as per sampling theorem)(Note: No. of samples per sec is 2B as per sampling theorem)


• With noise spectral density N the total noise in BW B is• With noise spectral density N0 , the total noise in BW B is spectral density multiplied by BW ie BN0. Thus we can be write

• This is Shannon theorem for Channel capacity and is used widely in communication computations.


BW and S/N trade off

0

0 0 0

log 1 log 1BNS S SC BBN N S BN

⎛ ⎞ ⎛ ⎞= + = +⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠0

0

1/

log 1 log 1

BNS S BNS S S S

N BN N BN

⎝ ⎠ ⎝ ⎠

⎛ ⎞ ⎛ ⎞= + = +⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠0 0 0 0

1/(1 )lim x

N BN N BNWe know that

x e

⎝ ⎠ ⎝ ⎠

+ =0

1

(1 )limx

x e

Thus for→

+ =

→∞B

0

1/

lim

0 0 0 0

( ) log 1 log 1.44S BN

B

S S S SC Max eN BN N N→∞

⎛ ⎞= + = =⎜ ⎟

⎝ ⎠


Engineering

Dcs unit 2