Information Theory Differential Entropy

8/6/2019 Information Theory Differential Entropy

1/29

1

Differential Entropy


2/29

2

Definition

Let be a random variable with cumulative

distribution function If F(x) iscontinuous, the r.v. is said to be continuous. Let

f(x) = F(x) when the derivative is defined. If

, then f(x) is called the pdf for .

The set where f(x) > 0 is called the support set

of .

( ) 1=

dxxf

X

X

X

).xX(PF(x) r =


3/29

3

Definition

The differential entropy of a continuous r.v.

with a density function f(x) is defined as

(1)

where S is the support of the r.v.

Since depends only on f(x), sometimes the

differential entropy is written as rather then.

( )fh

( )Xh X

( ) ( ) ( ) dxxfxfXhs= log

( )Xh

( )Xh


4/29

4

EX.1: (Uniform distribution)

Note: For a


5/29

5

Ex. 2: (Normal distribution)

Let , then

( )

( )

[ ] ( )

22

2

22

2

2

2

2

2ln2

1

2

1

2ln

2

1

2ln2

ln

21

+=

+=

=

=

XE

x

x

h

22

2

22

1~

x

eX

=


6/29

6

Changing the base of the logarithm, we have

nats2ln2

1

2ln2

1ln

2

1

2ln2

1

2

1

2

2

2

e

e

=

+=

+=

( ) bits.2log2

1 2eh =


7/29

7

Theorem 1

Let be a sequence of rvs drawn

i.i.d. according to the density f(x). Then

proof: The proof follows directly from the weak

law of the large numbers.

( )

( )[ ] ( ) yprobabilitinlog

,,log1

21

XhXfE

XXXf

nn

=

L

nXXX ,,, 21 L


8/29

8

Def: For>0 and any n, we define the typical set

w.r.t. f(x) as follows:

( )nA

( ) ( ) ( ) ( )

( ) ( )in

in

nn

nn

XfXXXfwhere

XhXXXfn

SXXXA

121

2121

,,

,,,log1:,,

=

=

=

L

LL


9/29

9

Def: The volume Vol(A) of a set ARn is defined as

Thm: The typical set has the following

properties:

( ) = AndxdxdxAvol

L

21

( )nA

( )( )( )( ) ( )( )( )( ) ( ) ( )( ) largelysufficientnfor21Vol3.

nallfor2Vol2.

largelysufficientnfor1Pr.1

+

>

Xhnn

Xhnn

n

A

A

A


10/29

10

Thm: The Set is the smallest volume set with

probability 1-, to the first order in the exponent.

The volume of the smallest set that contains most ofthe Prob. Is approximately 2nh. This is an n-D volume, so

the corresponding side length is (2nh)1/n=2h.

The differential entropy is the logarithm of the

equivalent side length of the smallest set that contains

most of the Prob. low entropy implies that the rv is confined to a small

effective volume and high entropy indicates that the rv is

widely dispersed.

( )nA


11/29

11

Relation of Differential Entropy to Discrete Entropy

Spose we divide the range of into bins

of length . Lets assume that the densityis continuous within the bins.

Quantization of a continuous rvx

f(x)

X


12/29

12

By the mean value theorem, there is a value

within each bin such that

Consider the quantized rv , which is defined by

Then the prob. that is

( ) ( )+

=

)1(i

ii dxxfxf

( ) ( )+

==

)1(i

iii xfdxxfP

ix

X

( )+


13/29

13

The entropy of the quantized version is

Since

( ) ( )1==

xfxf

i

( )( ) ( )( )

( ) ( ) ( )( ) ( ) =

=

=

=

loglog

loglog

log

log

ii

iii

ii

ii

xfxf

xfxfxf

xfxf

PPXH


14/29

14

If f(x)logf(x) is Riemann integrable, then

This proves the following

Thm: If the density f(x) of the rv is Riemann

integrable, then

Thus the entropy of an n-bit quantization of a

continuous rv is approximately

( ) ( ) ( ) ( ) 0as,dxloglog xfxfxfxf ii

X

( ) ( ) 0as,log =+ XhfhXH

quantizeruniformbit-nafor2 -n=Since

X ( ) nXh +


15/29

Joint and Conditional Differential Entropy:

)(),()|(

)|(log),()|(

)(log)(),,,( 21

YhYXhYXh

dxdyyxfyxfYXh

dxxfxfXXXhnnn

n

=

=

=

L

15


16/29

Theorem (Entropy of a multivariate normal distribution)

( )

.oftdeterminanthedenoteserewh

bits)2log(2

1),(),,,(

ThenK.matrixcovarianceandmeanwithondistributinormaltemultivariaahave,,,Let

21

21

KK

KeKhXXXh

XXX

n

nn

n

==L

L

16


17/29

17

( )

( ) ( )

( ) ( ) ( )

( )( ) ( )

( )( )( ) KKXXE

KXKXE

dxKxKxxffh

K

xfKNXXX

n

ij

ijjjii

n

ij

jjijii

nT

xKx

nnn

T

)2ln(21

21

)2ln(2

1

2

1

2ln~~

2

1)()(

Then

exp

2

1)~(),(~),,,(:pf

1

1

2

11

~~

2

1

2

121

1

+

=

+

=

=

=

L


18/29

( )( )[ ]( )

( )

( )

bits)2log(21

nats)2ln(2

1

)2ln(2

1

2

)2ln(2

1

2

1

)2ln(2

1

2

1

)2ln(21

21

)2ln(2

1

2

1

1

1

1

Ke

Ke

K

n

KI

KKK

KKK

KKXXE

n

n

n

j

n

jj

j

n

jj

j i

nijji

n

ij

ij

iijj

=

=

+=

+=

+=

+=

+=

18


19/29

Relative Entropy and Mutual Information

=

=

dxdyyfxf

yxfyxfYXI

g

ffgfD

)()(),(log),();(

log)//(

19

( ))()(//),();(),()()()|()()|()();(

yfxfyxfDYXI

YXhYhXhXYhYhYXhXhYXI

=

+===


20/29

Remark:

The mutual information between two continuous r.vs is the limit of

the mutual information between their quantized versions.

20

( ) );(log)|(log)(

)|()();(

YXIYXhXh

YXHXHYXI

=

=


21/29

21

Properties of

h(X)Y)h(X

0Y)(X;I

.01loglog

)inequalitysJensen'(fgflog

fglogfg)D(f-:pf

0g)D(f

Y)I(X;,q)D(p,)(

ss

==

=

g

xh

s


22/29

22

alog)()(

entropyaldifferentithechangenotdoesntranslatio:)()(Theorems

)()X,,X,(Xh

),,()X,,X,(Xh

n21

1

121n21

+=

=+

=

=

=

xhaXh

xhcXh

Xh

XXXXh

i

n

i

ii

L

LL


23/29

23

det(A)logh(X)h(AX):Corollary

log)(log)(log)(f

))dy

a

y(f

a

1(log)(

1

)y(flog)(f-h(aX)

),(

1

)(f,ThenaX.let Y:

x

x

YY

Y

+=

+=+=

=

=

==

axhadxxx

a

yf

a

dy

anda

y

faypf

x

x

y

x


24/29

24

Theorem : The multivariate normal distribution

maximizes the entropy over all distributions

with the same variance.

)k,0(N~Xiffequalitywith,e)(2log2

1h(x)

Then,),1,XEXK,(i.e.XXEK

econvariancandmeanzerohaveRvector XrandomLet the

n

n

jiij

T

n

K

nji

==


25/29

25

Pf:

)()(log)(log)(

)log()(0

.~)x~(andformquadraticais)x~(logthatNote

vector.K)(0,aofdensitythebe

.,allfor~)~g(satisfyingdensityanybe)x~(g

k

kkk

k

kk

k

hghghggh

gggD

Kxdxx

Let

jiKxdxxxLet

k

ijji

ijji

+===

=

=

=


26/29

26

variance.samethewith

onsdistributialloverentropythemaximizesondistributiGaussianthe

(x)logformquadraictheofmomentssametheyieldandgthat

factthefromfollowsglogonsubstitutithewhere

kk

k

h(x)entropyaldifferentiwithvariablerandomabeXLet


27/29

27

natsinbeh(x)Leterror.

predictionexpectedthebe)X-XE(.letandXofestimateanbeXLet

h(x)entropyaldifferentiwithvariablerandomabeXLet

2

XofmeantheisXandGaussianisXiffequalitywith

e2

1)X-E(X

XestimatorandXr.v.anyFor:Theorem

2h(x)2

e


28/29

28

2

2h(x)

2

2

2

2ln2

1h(x),i.e.

varance]givenaforentropymaximumthehasondistributi[Gaussian

(2)e

2

1(X)var

]for XestimatorbesttheisXofmeanthe[E(X))-E(X

(1))X-(XEmin)X-E(X

thenXofestimatoranybeXLet:pf

e

e

x

=

=


29/29

29

inequalitysFano'

e2

1(Y))X-E(X

thatfollowsit

(Y)XestimatorandYninformatiosideGiven:Gorollary

Gaussian.isXifonly(2)inequalityandX)ofmeantheisx,(i.e.

estimatorbesttheisxifonly(1),inonlyequality,haveWe

Y)2h(X2

e

Documents

Information Theory Differential Entropy