42
Unit 2 Information Theory and Coding Information Theory and Coding By Prof A K Nigam 9/4/2013 1 Lt Col A K Nigam, ITM University

Dcs unit 2

Embed Size (px)

DESCRIPTION

Information Theory

Citation preview

Page 1: Dcs unit 2

Unit 2

Information Theory and CodingInformation Theory and Coding

By 

Prof A K Nigam

9/4/2013 1Lt Col A K Nigam, ITM University

Page 2: Dcs unit 2

Syllabus for Unit 2

• Definition of information

• Concept of entropy• Concept of entropy

• Shannon’s theorem for channel capacity

• Shannon‐Hartley theorem

• Shannon channel capacityp y(Reference Book Communication Systems 4Th Edition Simon Haykin)

Page 3: Dcs unit 2

Definition of information

We define the amount of information gained after observing th t S hi h ith d fi d b bilit ththe event S which occurs with a defined probability as the logarithmic function

( ) 1logkk

I sp

⎛ ⎞⎜ ⎟⎝ ⎠k

k k

pWhere p is probability of occuranceof event s

⎝ ⎠

RememberJoint Probability P(x, Y)

di i l b biliConditional Probability P(A/B)= Pr. Of occurrence of A after B has occurred)

Page 4: Dcs unit 2

Important properties• If we are absolutely certain of the outcome of anevent, even before it occurs, there is no informationgained.gained.

•The occurrence of an event either provides some or noinformation but never brings about a loss ofinformation, but never brings about a loss ofinformation.

•The less probable an event is, the more information wegain when it occurs.

•If sk and sl are statistically independent.

9/4/2013 4Lt Col A K Nigam, ITM University

Page 5: Dcs unit 2

Standard Practice for defining informationStandard Practice for defining information

• It is the standard practice today to use a logarithm to base 2.p y gThe resulting unit of information is called the bit

• When pk = 1/2, we have I(sk) = 1 bit. Hence, one bit is theWhen pk 1/2, we have I(sk) 1 bit. Hence, one bit is theamount of information that we gain when one of twopossible and equally likely events occurs.

9/4/2013 5Lt Col A K Nigam, ITM University

Page 6: Dcs unit 2

Entropy of a discrete memoryless sourceEntropy of a discrete memoryless source

• Entropy of a discrete memory‐less source with sourcepy f yalphabet ‘S’ is a measure of the average information contentper source symbol.

9/4/2013 6Lt Col A K Nigam, ITM University

Page 7: Dcs unit 2

Properties of EntropyProperties of Entropy

1. Entropy is a measure of the uncertainty of the randompy yvariable

2. H(s)=0, if and only if the probability p= 1 for some k, and theremaining probabilities in the set are all zero; this lowerbound on entropy corresponds to no uncertainty.py p y

3. H(s)= log2K, if and only if pk = 1/K for all k (i.e., all thesymbols in the alphabet Y are equi‐probable); this upperbound on entropy corresponds to maximum uncertainty.

9/4/2013 7Lt Col A K Nigam, ITM University

Page 8: Dcs unit 2

Proof of these properties of H(s)Proof of these properties of H(s)

2nd Property

• Since each probability pk is less than or equal to unity, it follows that each term pk Iog2(1/pk) is always nonnegativefollows that each term pk Iog2(1/pk) is always nonnegative, and so H(s) ≥ 0. 

• Next, we note that the product term pk Iog2(1/pk) is zero if, and only if pk = 0 or 1. 

• We therefore deduce that H(s) = 0 if, and only if pk= 0 or 1, that is pk = 1 for some k and all the rest are zero.

9/4/2013 8Lt Col A K Nigam, ITM University

Page 9: Dcs unit 2

Example: Entropy of Binary Memoryless Source

• We consider a binary source for which symbol 0 occurs• We consider a binary source for which symbol 0 occurs with probability P(0) and symbol 1 with probability P(1) = 1 – P(0) We assume that the source is memory-less

• The entropy of such a source equals

H(s) = - P(0) log2 P(o) - P(1) log2 P(1)= - P(o) log2 P(o) - {1 – P(o)} log2{l – P(o)} bits

• For P(0)=0 P(1)=1 and thus H(s)=0

• For P(0)=1 P(1)=0 and thus H(s)=0

• For P(0)=P(1)=1/2 it is maximum=19/4/2013 9Lt Col A K Nigam, ITM University

Page 10: Dcs unit 2

2 2log (1 ) log (1 )

{ l l (1 ) l (1 )}

H p p p pdH d= + − −

2 2 2{ log log (1 ) .log (1 )}

1log loga a

p p p p pdp dp

dweknowthat x e thus wecan writedx x

= − + − − −

=

2 2 2 2 21 1{log . log log ( 1) log ( 1) log (1 )}

1 11

dx xdH pp p e e e pdp p p p

= − + + − − − − −− −

2 2 2 21{log log ( 1) log log (1

1p e p e

p= − + + − − −

2 2 2 2

)

{log log log log (1 )}

p

p e e p= − + − − −

2 2

2 2

{log log (1 )} 0log log (1 )(1 ) 5

p pp p

or p P p

= − − − =⇒ = −= − ⇒ =

2 2

(1 ) .51 1( ) log 2 log 2 12 2

max max

or p P p

Entropy is thus H s

Thus Entropy is whenthe probabilities areequal andwecan write valueof

= ⇒ =

= + =

max

max maxThus Entropy is whenthe probabilities areequal andwecan write valueof

Entropy H p= =1

1 1log log /M

kk k

M M bits Messagep M=

⎛ ⎞= =⎜ ⎟⎝ ⎠

∑9/4/2013 10Lt Col A K Nigam, ITM University

Page 11: Dcs unit 2

9/4/2013 11Lt Col A K Nigam, ITM University

Page 12: Dcs unit 2

Proof of 3rd statement:Condition for Maximum Entropy

• We know that the entropy can achieve maximum value oflog2 M where M is the number of symbols.

If th t ll b l i b bl th• If we assume that all symbols are equiprobable thenprobability of each occurring is 1/M

• The associated entropy is thereforepy

21

1( ) logM

kk k

H s pp=

=∑

21 1. log

1/

k kp

MM M

=

• This is maximum value of entropy and thus it is maximum

2log M=

• This is maximum value of entropy and thus it is maximumwhen all symbols have equal probability of occurrence

9/4/2013 12Lt Col A K Nigam, ITM University

Page 13: Dcs unit 2

EXAMPLE: Entropy of Source

9/4/2013 13Lt Col A K Nigam, ITM University

Page 14: Dcs unit 2

EXAMPLE: Entropy of SourceEXAMPLE: Entropy of Source

• Six messages with probabilities 0 30 0 25Six messages with probabilities 0.30, 0.25, 0.15, 0.12, 0.10, and 0.08, respectively are transmitted Find the entropytransmitted. Find the entropy

2 2 2 2 2 2( ) (.30log .30 .25log .25 .15log .15 .12log .12 .10log .10 .08log .08)H x = − + + + + +

10 10 10 10 10 101 (.30log .30 .25log .25 .15log .15 .12log .12 .10log .10 .08log .08)

.3011 .7292

= − + + + + +

= + ×.3012.422644= +

9/4/2013 14Lt Col A K Nigam, ITM University

Page 15: Dcs unit 2

Discrete Memoryless Channel

• A discrete memory‐less channel is a statistical model with an input X and an output Y that is a noisy version of X; both X and Y are random variables.

9/4/2013 15Lt Col A K Nigam, ITM University

Page 16: Dcs unit 2

Channel matrix, or transition matrix

A convenient way of describing a discrete memory-lesschannel is to arrange the various transition probabilities ofg pthe channel in the form of a matrix as follows:

9/4/2013 16Lt Col A K Nigam, ITM University

Page 17: Dcs unit 2

Joint EntropyJoint Entropy

• Joint Entropy is defined asJoint Entropy is defined as

( )1( ) l

m n

∑∑H(XY)= 21 1

1( , ) log( , )j k

j k j k

p x yp x y= =

∑∑

= 2( ) log ( )m n

k kp x y p x y−∑∑ 21 1

( , ) log ( , )j k j kj k

p x y p x y= =∑∑

9/4/2013 17Lt Col A K Nigam, ITM University

Page 18: Dcs unit 2

Conditional EntropyConditional Entropy

• The quantity H(x/y) is called a conditional entropyThe quantity H(x/y) is called a conditional entropy.

• It represents the amount of uncertainty remainingabout the channel input after the channel outputhas been observed and is given by:‐

• H(x/y)• H(x/y)

• Similarly H(y/x) can be computed which is averageSimilarly H(y/x) can be computed which is average uncertainty of the channel output given that x was transmitted.

9/4/2013 18Lt Col A K Nigam, ITM University

Page 19: Dcs unit 2

Conditional Entropy: ProofConditional Entropy: Proof

• Conditional probability is defined asp y

( , )( / )( )

p x yp x yp y

• If received symbol is yk

( )p y

m

1( , )

( / )( )

m

j kj

kk

p x ythen p X y

p y==∑

• The associated  entropy is therefore can be computed as 

( )kp y

9/4/2013 19Lt Col A K Nigam, ITM University

Page 20: Dcs unit 2

21

( , ) ( , )( / ) log

( ) ( )

nj k j k

kj k k

p x y p x yH X y k k

p y p y=

= −∑

21

( / ) log ( / )............(1)n

j k j kj

p x y p x y=

= −∑

___________________

( / ) ( / )

Taking average for all values of k

H X Y H X yk=

1

( / ) ( /

( ) ( / )

)n

k kk

H X Y H X yk

p y H X y

=

=∑1

21 1

( ) ( / ) log (

kn n

k j k jk j

p y p x y p x

=

= =

= −∑ ∑ / )ky1 1k j

21 1

( ) ( / ) log ( / )n n

k j k j kk j

p y p x y p x y= =

= −∑∑

21 1

( , ) log ( / )

j

n n

j k j kk j

p x y p x y= =

= −∑∑9/4/2013 20Lt Col A K Nigam, ITM University

Page 21: Dcs unit 2

Mutual Information

Problem statement

Mutual Information 

Given that we the channel output yk is as a noisy version of the channel input Xj.

Given that the entropy H(X) is a measure of the prior uncertaintyGiven that the entropy H(X) is a measure of the prior uncertainty about X, how can we measure the uncertainty about X after observing Y?

Page 22: Dcs unit 2

Mutual Information Defined• Note that the entropy H(x) represents our uncertainty about the

channel input before observing the channel output, and theconditional entropy H(x/y) represents our uncertainty about thechannel input after observing the channel output.

• It follows that the difference H(x) - H(x/y) must represent our( ) ( y) puncertainty about the channel input that is resolved by observingthe channel output.

• This important quantity is called the mutual information of the channel denoted by I(x; y)

• We may thus write I(X; Y)= H(x) - H(x/y) or= H(y) – H(y/x)

Al it b h th t I(X Y) H( ) +H( ) H( )• Also it can be shown that I(X; Y)= H(x) +H(y)- H(x,y)

9/4/2013 22Lt Col A K Nigam, ITM University

Page 23: Dcs unit 2

Capacity of a Discrete Memoryless Channelp y y

• Channel capacity of a discrete memoryless channel isp y ydefined as the maximum mutual information I(x; y) in anysingle use of the channel where the maximization is over allpossible input probability distributions {p(xj)} on Xpossible input probability distributions {p(xj)} on X.

• The channel capacity is commonly denoted by C. We thusp y y ywrite

{ ( )}( ; )max

p xC I X Y=

• The channel capacity C is measured in bits per channel use,or bits per transmission

{ ( )}jp x

or bits per transmission.

9/4/2013 23Lt Col A K Nigam, ITM University

Page 24: Dcs unit 2

Examples of Mutual Information Numericalsp

D N i l f Si h d S Ch t• Do Numerical from Singh and Sapre Chapter 10   (10.3.1, 10.4.1, 10.4.2, 10.4.3, 10.5.2, 10 6 2)10.6.2)

9/4/2013 24Lt Col A K Nigam, ITM University

Page 25: Dcs unit 2

Example: Find Mutual Information for theh l h b lchannel shown below

.8P(X1)=.6                                                    y1

.2            .3

P(X2)=.4                        .7                       y2( ) y

.8 .2( / )P y x ⎡ ⎤

= ⎢ ⎥( / ).3 .7

P y x ⎢ ⎥⎣ ⎦

9/4/2013 25Lt Col A K Nigam, ITM University

Page 26: Dcs unit 2

• We know that I(x, y)=H(y)‐H(y/x)……..1

SolutionWe know that I(x, y) H(y) H(y/x)……..1

• Finding H(y)

• P(y1)=0.6×0.8+.4×0.3=.6

• P(y2)=0.6×0.2+0.4×0.7=.4

• H(y)=‐3.322× [0.6log0.6+0.4log0.4]=0.971 bits/message

• Finding H(y/x)= ‐

• Finding P(x, y)

( , ) log ( / )p x y p y x∑∑

48 12⎡ ⎤

• H(y/x)= ‐3 322[0 48×log0 8+0 12×log0 2+ 0 12×logo 3+0 28×log0 7]

.48 .12( , )

.12 .28P x y ⎡ ⎤

= ⎢ ⎥⎣ ⎦

• H(y/x)=  ‐3.322[0.48×log0.8+0.12×log0.2+ 0.12×logo.3+0.28×log0.7]

=    0.7852

• Putting values in 1 we get   I(x, y)=0.971‐0.7852=0.1858 bitsg g ( , y)

9/4/2013 26Lt Col A K Nigam, ITM University

Page 27: Dcs unit 2

Types of channels and associated EntropyTypes of channels and associated Entropy

• Lossless channelLossless channel

• Deterministic channel

i l h l• Noiseless channel

• Binary symmetric channel

9/4/2013 27Lt Col A K Nigam, ITM University

Page 28: Dcs unit 2

General Treatment for all the channels

( , ) ( ) ( / ) ........(1)( ) ( / ) ........(2)

WeknowI x y H x H x yH y H y x

= −= −( ) ( / ) ........(2)

( / ) ( ) l ( / )n n

H y H y xAlsothat

H X Y ∑∑ 21 1

( / ) ( , ) log ( / )

( , ) ( ) ( / ) ( ) ( / )

j k j kk j

H X Y p x y p x y

Weknowthat p x y p x p y x p y p x y thus wecanwrite= =

= −

= =

∑∑

21 1

( , ) ( ) ( ) ( ) ( )

( / ) ( ) ( / ) log ( / ).n n

k j k j kk j

p y p p y p y p y

H X Y p y p x y p x y= =

= −∑ ∑ .....3j

( / ) ( ) ( / ) log ( / ) 4n n

Similarly wecanwrite

H Y X p x p y x p y x= ∑ ∑ 21 1

( / ) ( ) ( / ) log ( / )........4j k j k jj k

H Y X p x p y x p y x= =

= −∑ ∑9/4/2013 28Lt Col A K Nigam, ITM University

Page 29: Dcs unit 2

Lossless channel• For a lossless channel no source information is lost in 

transmission. It has only one non zero element in each column For examplecolumn. For example

[ ]3 / 4 1/ 4 0 0 0

( / ) 0 0 1/ 3 2 / 3 0P Y X⎡ ⎤⎢ ⎥⎢ ⎥

• In case of lossless channel p(x/y)=0/1 as the probability of x

[ ]( / ) 0 0 1/ 3 2 / 3 00 0 0 0 1

P Y X = ⎢ ⎥⎢ ⎥⎣ ⎦

• In case of lossless channel p(x/y)=0/1 as the probability of x given that y has occurred is 0/1

• Putting this in eq 3 we get H(x/y)=0• Thus from eq. 1 we get

I(x, y)=H(x)Also C=max H(x)Also C=max  H(x)

9/4/2013 29Lt Col A K Nigam, ITM University

Page 30: Dcs unit 2

Deterministic channel• Channel matrix has only one non zero element in each row, for 

example1 0 01 0 0⎡ ⎤⎢ ⎥⎢ ⎥

[ ]1 0 0

( / ) 0 1 00 1 00 0 1

P Y X⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

• In case of Deterministic channel p(y/x)=0/1 as the probability of y given that x has occurred is 0/1

0 0 1⎢ ⎥⎣ ⎦

• Putting this in eq 3 we get H(y/x)=0

• Thus from eq. 1 we get

I(x, y)=H(y)Also C=max  H(y)

9/4/2013 30Lt Col A K Nigam, ITM University

Page 31: Dcs unit 2

Noiseless channel• A channel which is both lossless and deterministic, has only one 

element in each row and column. For example

1 0 0 0⎡ ⎤

[ ]

1 0 0 00 1 0 0

( / )0 0 1 0

P y x

⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥

• Noiseless channel is both lossless and deterministic thus H( / ) H( / ) 0

0 0 1 00 0 0 1⎢ ⎥⎢ ⎥⎣ ⎦

H(x/y)=H(y/x)=0

• Thus from eq. 1 we getI(x, y)=H(y)=H(x)Also C=max H(y)=max H(x)=log2m=log2n where m and n areAlso C=max  H(y)=max  H(x)=log2m=log2n where m and n are 

number of symbols9/4/2013 31Lt Col A K Nigam, ITM University

Page 32: Dcs unit 2

Binary Symmetric Channel

α (1 )( / )

p pX Y

−⎡ ⎤⎢ ⎥

1‐α

( / )(1 )

p X Yp p

=⎢ ⎥−⎣ ⎦1 α

(1 )p pα α−⎡ ⎤(1 )( , )

(1 ) (1 )(1 )p p

p X Yp p

α αα α

⎡ ⎤= ⎢ ⎥− − −⎣ ⎦

9/4/2013 32Lt Col A K Nigam, ITM University

Page 33: Dcs unit 2

21 1

( / ) ( , ) log ( / )n n

j k k jk j

H Y Y p x y p y x= =

= −∑∑

( / ) [ (1 ) log(1 ) log (1 ) logputting values frommatrix we getH Y Y p p p p p pα α α= − − − + + −

(1 )(1 ) log(1 )][ log (1 ) log(1 )]

p pp p p p

α− − −= − + − −

.1( , ) ( ) log (1 ) log(1

Putting this ineq we getI X Y H y p p p= + + − − )p

9/4/2013 33Lt Col A K Nigam, ITM University

Page 34: Dcs unit 2

CHANNEL CAPACITY OF A CONTINUOUS CHANNEL

• For a discrete random variable x the entropy H(x) was defined as

• H(x) for continuous random variables is obtained by using the integral instead of discrete summation thus

9/4/2013 34Lt Col A K Nigam, ITM University

Page 35: Dcs unit 2

Similarly

( , ) ( , ) log ( , )H X Y p x y p x y dxdy∞ ∞

−∞ −∞

= − ∫ ∫

( / ) ( , ) log ( / )H X Y p x y p x y dxdy

∞ ∞

∞ ∞

−∞ −∞

= − ∫ ∫

( / ) ( , ) log ( / )H Y X p x y p y x dxdy

−∞ −∞

∞ ∞

∞ ∞

= − ∫ ∫( ; )

( ; )

For a contineous channel I x y is defined as

p x y

−∞ −∞

∞ ∞

∫ ∫( ; )( ; ) ( , )

( ) ( )p x yI x y p x y dxdy

p x p y−∞ −∞

= − ∫ ∫

9/4/2013 35Lt Col A K Nigam, ITM University

Page 36: Dcs unit 2

Transmission Efficiency of a channelTransmission Efficiency of a channel

Actual transinformationM i t i f ti

η =Maximum transinformation

( ; ) ( ; )I X Y I X Y= =

max ( ; )I X Y C

Redundancy of a channely

( ; )1 C I X YR η −= − =1R

9/4/2013 36Lt Col A K Nigam, ITM University

Page 37: Dcs unit 2

Information Capacity Theorem for band‐limited, power‐limited Gaussian channelslimited, power limited Gaussian channels

• Consider X(t) that is band-limited to B hertz.

• Also we assume that uniform sampling of the process X(t)at the transmitter at Nyquist rate of 2B samples per secondproduces 2B samples per second which are to betransmitted over the channel

• We also know that Mutual Information for a channel isI(X; Y)=H(y) – H(y/x)=H(x) - H(x/y)….already done( ; ) (y) (y ) ( ) ( y) y

9/4/2013 37Lt Col A K Nigam, ITM University

Page 38: Dcs unit 2

Information Capacity Theorem…….

•For Gaussian channel the probability density is given by

2 21 2 2/ 2

2

1( )2

xp x e σ

πσ−=

•For this p(x), H(x) can be shown to be (not required to be solved)

2 21( ) log 2 log(2 )H x e eπ σ π σ= =………….1

•If signal power is S and noise power is N then the received signal is sum of t itt d i l ith S d i ith N th j i t t

( ) og og( )2

x e eπ σ π σ

transmitted signal with power S and noise with power N then joint entropy of the source and noise is

9/4/2013 38Lt Col A K Nigam, ITM University

Page 39: Dcs unit 2

( , ) ( ) ( / )( / ) ( )

H x n H x H n xIf thetransmitted signal and noiseareindependent then H n x H n

= +=( / ) ( )

( , ) ( ) ( )............

If thetransmitted signal and noiseareindependent then H n x H nThusH x n H x H n A= +

signal( , ) ( , )

Sincethe received is sumof signal x and noisen wemay equateH x y H x n=

( , ) ( ) ( / )But H x y H y H x y= + using this and eq.A weget( ) ( / ) ( ) ( )

R i hiH y H x y H x H n+ = +

2

Rearranging this weget( ) ( / ) ( ) ( ) ..........2

(using ( ) 1H x H x y H y H n Mutual InformationNow N or S N from Eq we getσ

− = − =

+(using ( ) .11( ) log{2 ( )} ( )2

Now N or S N from Eq we get

H y e S N y S N

σ

π

= +

= + = +

1( ) log{2 ( )}2

and H N e Nπ=9/4/2013 39Lt Col A K Nigam, ITM University

Page 40: Dcs unit 2

•Putting these values in eq 2 we get

1( ) l S N+⎛ ⎞

Putting these values in eq. 2 we get

1( , ) log21 l

S NI X YNS

+⎛ ⎞= ⎜ ⎟⎝ ⎠⎛ ⎞1 log 1

2No.of samples per second×Mutual Information

SN

C

⎛ ⎞= +⎜ ⎟⎝ ⎠

= p p12 log 1 log 12

S SB BN N

⎛ ⎞ ⎛ ⎞= × + = +⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠2 N N⎝ ⎠ ⎝ ⎠

(Note: No of samples per sec is 2B as per sampling theorem)(Note: No. of samples per sec is 2B as per sampling theorem)

9/4/2013 40Lt Col A K Nigam, ITM University

Page 41: Dcs unit 2

• With noise spectral density N the total noise in BW B is• With noise spectral density N0 , the total noise in BW B is spectral density multiplied by BW ie BN0.  Thus we can be write

• This is Shannon theorem for Channel capacity and is used widely in communication computations.

9/4/2013 41Lt Col A K Nigam, ITM University

Page 42: Dcs unit 2

BW and S/N trade off

0

0 0 0

log 1 log 1BNS S SC BBN N S BN

⎛ ⎞ ⎛ ⎞= + = +⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠0

0

1/

log 1 log 1

BNS S BNS S S S

N BN N BN

⎝ ⎠ ⎝ ⎠

⎛ ⎞ ⎛ ⎞= + = +⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠0 0 0 0

1/(1 )lim x

N BN N BNWe know that

x e

⎝ ⎠ ⎝ ⎠

+ =0

1

(1 )limx

x e

Thus for→

+ =

→∞B

0

1/

lim

0 0 0 0

( ) log 1 log 1.44S BN

B

S S S SC Max eN BN N N→∞

⎛ ⎞= + = =⎜ ⎟

⎝ ⎠

9/4/2013 42Lt Col A K Nigam, ITM University