Upload
ganeshkumarmuthuraj
View
224
Download
0
Embed Size (px)
Citation preview
8/6/2019 Information Theory Differential Entropy
1/29
1
Differential Entropy
8/6/2019 Information Theory Differential Entropy
2/29
2
Definition
Let be a random variable with cumulative
distribution function If F(x) iscontinuous, the r.v. is said to be continuous. Let
f(x) = F(x) when the derivative is defined. If
, then f(x) is called the pdf for .
The set where f(x) > 0 is called the support set
of .
( ) 1=
dxxf
X
X
X
).xX(PF(x) r =
8/6/2019 Information Theory Differential Entropy
3/29
3
Definition
The differential entropy of a continuous r.v.
with a density function f(x) is defined as
(1)
where S is the support of the r.v.
Since depends only on f(x), sometimes the
differential entropy is written as rather then.
( )fh
( )Xh X
( ) ( ) ( ) dxxfxfXhs= log
( )Xh
( )Xh
8/6/2019 Information Theory Differential Entropy
4/29
4
EX.1: (Uniform distribution)
Note: For a
8/6/2019 Information Theory Differential Entropy
5/29
5
Ex. 2: (Normal distribution)
Let , then
( )
( )
[ ] ( )
22
2
22
2
2
2
2
2ln2
1
2
1
2ln
2
1
2ln2
ln
21
+=
+=
=
=
XE
x
x
h
22
2
22
1~
x
eX
=
8/6/2019 Information Theory Differential Entropy
6/29
6
Changing the base of the logarithm, we have
nats2ln2
1
2ln2
1ln
2
1
2ln2
1
2
1
2
2
2
e
e
=
+=
+=
( ) bits.2log2
1 2eh =
8/6/2019 Information Theory Differential Entropy
7/29
7
Theorem 1
Let be a sequence of rvs drawn
i.i.d. according to the density f(x). Then
proof: The proof follows directly from the weak
law of the large numbers.
( )
( )[ ] ( ) yprobabilitinlog
,,log1
21
XhXfE
XXXf
nn
=
L
nXXX ,,, 21 L
8/6/2019 Information Theory Differential Entropy
8/29
8
Def: For>0 and any n, we define the typical set
w.r.t. f(x) as follows:
( )nA
( ) ( ) ( ) ( )
( ) ( )in
in
nn
nn
XfXXXfwhere
XhXXXfn
SXXXA
121
2121
,,
,,,log1:,,
=
=
=
L
LL
8/6/2019 Information Theory Differential Entropy
9/29
9
Def: The volume Vol(A) of a set ARn is defined as
Thm: The typical set has the following
properties:
( ) = AndxdxdxAvol
L
21
( )nA
( )( )( )( ) ( )( )( )( ) ( ) ( )( ) largelysufficientnfor21Vol3.
nallfor2Vol2.
largelysufficientnfor1Pr.1
+
>
Xhnn
Xhnn
n
A
A
A
8/6/2019 Information Theory Differential Entropy
10/29
10
Thm: The Set is the smallest volume set with
probability 1-, to the first order in the exponent.
The volume of the smallest set that contains most ofthe Prob. Is approximately 2nh. This is an n-D volume, so
the corresponding side length is (2nh)1/n=2h.
The differential entropy is the logarithm of the
equivalent side length of the smallest set that contains
most of the Prob. low entropy implies that the rv is confined to a small
effective volume and high entropy indicates that the rv is
widely dispersed.
( )nA
8/6/2019 Information Theory Differential Entropy
11/29
11
Relation of Differential Entropy to Discrete Entropy
Spose we divide the range of into bins
of length . Lets assume that the densityis continuous within the bins.
Quantization of a continuous rvx
f(x)
X
8/6/2019 Information Theory Differential Entropy
12/29
12
By the mean value theorem, there is a value
within each bin such that
Consider the quantized rv , which is defined by
Then the prob. that is
( ) ( )+
=
)1(i
ii dxxfxf
( ) ( )+
==
)1(i
iii xfdxxfP
ix
X
( )+
8/6/2019 Information Theory Differential Entropy
13/29
13
The entropy of the quantized version is
Since
( ) ( )1==
xfxf
i
( )( ) ( )( )
( ) ( ) ( )( ) ( ) =
=
=
=
loglog
loglog
log
log
ii
iii
ii
ii
xfxf
xfxfxf
xfxf
PPXH
8/6/2019 Information Theory Differential Entropy
14/29
14
If f(x)logf(x) is Riemann integrable, then
This proves the following
Thm: If the density f(x) of the rv is Riemann
integrable, then
Thus the entropy of an n-bit quantization of a
continuous rv is approximately
( ) ( ) ( ) ( ) 0as,dxloglog xfxfxfxf ii
X
( ) ( ) 0as,log =+ XhfhXH
quantizeruniformbit-nafor2 -n=Since
X ( ) nXh +
8/6/2019 Information Theory Differential Entropy
15/29
Joint and Conditional Differential Entropy:
)(),()|(
)|(log),()|(
)(log)(),,,( 21
YhYXhYXh
dxdyyxfyxfYXh
dxxfxfXXXhnnn
n
=
=
=
L
15
8/6/2019 Information Theory Differential Entropy
16/29
Theorem (Entropy of a multivariate normal distribution)
( )
.oftdeterminanthedenoteserewh
bits)2log(2
1),(),,,(
ThenK.matrixcovarianceandmeanwithondistributinormaltemultivariaahave,,,Let
21
21
KK
KeKhXXXh
XXX
n
nn
n
==L
L
16
8/6/2019 Information Theory Differential Entropy
17/29
17
( )
( ) ( )
( ) ( ) ( )
( )( ) ( )
( )( )( ) KKXXE
KXKXE
dxKxKxxffh
K
xfKNXXX
n
ij
ijjjii
n
ij
jjijii
nT
xKx
nnn
T
)2ln(21
21
)2ln(2
1
2
1
2ln~~
2
1)()(
Then
exp
2
1)~(),(~),,,(:pf
1
1
2
11
~~
2
1
2
121
1
+
=
+
=
=
=
L
8/6/2019 Information Theory Differential Entropy
18/29
( )( )[ ]( )
( )
( )
bits)2log(21
nats)2ln(2
1
)2ln(2
1
2
)2ln(2
1
2
1
)2ln(2
1
2
1
)2ln(21
21
)2ln(2
1
2
1
1
1
1
Ke
Ke
K
n
KI
KKK
KKK
KKXXE
n
n
n
j
n
jj
j
n
jj
j i
nijji
n
ij
ij
iijj
=
=
+=
+=
+=
+=
+=
18
8/6/2019 Information Theory Differential Entropy
19/29
Relative Entropy and Mutual Information
=
=
dxdyyfxf
yxfyxfYXI
g
ffgfD
)()(),(log),();(
log)//(
19
( ))()(//),();(),()()()|()()|()();(
yfxfyxfDYXI
YXhYhXhXYhYhYXhXhYXI
=
+===
8/6/2019 Information Theory Differential Entropy
20/29
Remark:
The mutual information between two continuous r.vs is the limit of
the mutual information between their quantized versions.
20
( ) );(log)|(log)(
)|()();(
YXIYXhXh
YXHXHYXI
=
=
8/6/2019 Information Theory Differential Entropy
21/29
21
Properties of
h(X)Y)h(X
0Y)(X;I
.01loglog
)inequalitysJensen'(fgflog
fglogfg)D(f-:pf
0g)D(f
Y)I(X;,q)D(p,)(
ss
==
=
g
xh
s
8/6/2019 Information Theory Differential Entropy
22/29
22
alog)()(
entropyaldifferentithechangenotdoesntranslatio:)()(Theorems
)()X,,X,(Xh
),,()X,,X,(Xh
n21
1
121n21
+=
=+
=
=
=
xhaXh
xhcXh
Xh
XXXXh
i
n
i
ii
L
LL
8/6/2019 Information Theory Differential Entropy
23/29
23
det(A)logh(X)h(AX):Corollary
log)(log)(log)(f
))dy
a
y(f
a
1(log)(
1
)y(flog)(f-h(aX)
),(
1
)(f,ThenaX.let Y:
x
x
YY
Y
+=
+=+=
=
=
==
axhadxxx
a
yf
a
dy
anda
y
faypf
x
x
y
x
8/6/2019 Information Theory Differential Entropy
24/29
24
Theorem : The multivariate normal distribution
maximizes the entropy over all distributions
with the same variance.
)k,0(N~Xiffequalitywith,e)(2log2
1h(x)
Then,),1,XEXK,(i.e.XXEK
econvariancandmeanzerohaveRvector XrandomLet the
n
n
jiij
T
n
K
nji
==
8/6/2019 Information Theory Differential Entropy
25/29
25
Pf:
)()(log)(log)(
)log()(0
.~)x~(andformquadraticais)x~(logthatNote
vector.K)(0,aofdensitythebe
.,allfor~)~g(satisfyingdensityanybe)x~(g
k
kkk
k
kk
k
hghghggh
gggD
Kxdxx
Let
jiKxdxxxLet
k
ijji
ijji
+===
=
=
=
8/6/2019 Information Theory Differential Entropy
26/29
26
variance.samethewith
onsdistributialloverentropythemaximizesondistributiGaussianthe
(x)logformquadraictheofmomentssametheyieldandgthat
factthefromfollowsglogonsubstitutithewhere
kk
k
h(x)entropyaldifferentiwithvariablerandomabeXLet
8/6/2019 Information Theory Differential Entropy
27/29
27
natsinbeh(x)Leterror.
predictionexpectedthebe)X-XE(.letandXofestimateanbeXLet
h(x)entropyaldifferentiwithvariablerandomabeXLet
2
XofmeantheisXandGaussianisXiffequalitywith
e2
1)X-E(X
XestimatorandXr.v.anyFor:Theorem
2h(x)2
e
8/6/2019 Information Theory Differential Entropy
28/29
28
2
2h(x)
2
2
2
2ln2
1h(x),i.e.
varance]givenaforentropymaximumthehasondistributi[Gaussian
(2)e
2
1(X)var
]for XestimatorbesttheisXofmeanthe[E(X))-E(X
(1))X-(XEmin)X-E(X
thenXofestimatoranybeXLet:pf
e
e
x
=
=
8/6/2019 Information Theory Differential Entropy
29/29
29
inequalitysFano'
e2
1(Y))X-E(X
thatfollowsit
(Y)XestimatorandYninformatiosideGiven:Gorollary
Gaussian.isXifonly(2)inequalityandX)ofmeantheisx,(i.e.
estimatorbesttheisxifonly(1),inonlyequality,haveWe
Y)2h(X2
e