Upload
myron-goodman
View
229
Download
0
Tags:
Embed Size (px)
Citation preview
Information Geometry ofInformation Geometry of MaxEnt PrincipleMaxEnt Principle
Shun-ichi AmariShun-ichi AmariRIKEN Brain Science InstituteRIKEN Brain Science Institute
MaxEnt 07’
Information GeometryInformation GeometryInformation GeometryInformation Geometry
Systems Theory Information Theory
Statistics Neural Networks
Combinatorics PhysicsInformation Sciences
Riemannian ManifoldDual Affine Connections
Manifold of Probability Distributions
Math. AI
2
2
1; , ; , exp
22
xS p x p x
Information GeometryInformation Geometry ? ?Information GeometryInformation Geometry ? ?
p x
;S p x θ
Riemannian metric
Dual affine connections
( , ) θ
Manifold of Probability DistributionsManifold of Probability DistributionsManifold of Probability DistributionsManifold of Probability Distributions
1 2 3 1 2 3
1,2,3 { ( )}
, , 1
x p x
p p p p p p
3p
2p1p
p
;M p x
Manifold of Probability DistributionsManifold of Probability DistributionsManifold of Probability DistributionsManifold of Probability Distributions
,
1, 2,3
P x
x
M p
3
21
3
3 1 21
, 1i ii
P x p x p p p
p3
21
M 1 2 3
1 2 3
, ,
1
p p p
p p p
p
2 2 21 2 3 1
i ip
1
13 1 2
22
3
log
,
log
p
p
p
p
InvarianceInvarianceInvarianceInvariance ,S p x
1. Invariant under reparameterization1. Invariant under reparameterization
, ,p x
2. 2. Invariant under different representationInvariant under different representation
, ,y y x p y
2
1 2
21 2
, ,
| ( , ) ( , ) |
p x p x dx
p y p y dy
2 2 i iD
Two StructuresTwo StructuresTwo StructuresTwo Structures
Riemannian metric—Riemannian metric—FisherFisher information information
2ij i jds g d d
Affine connectionAffine connection-- geodesic, straight line-- geodesic, straight line
how curved is the manifold?how curved is the manifold?
Riemannian Structure
2 ( )
( )
( ) ( )
Euclidean
i jij
T
ij
ds g d d
d G d
G g
G E
Kullback-Leibler Divergence
quasi-distance( )
[ ( ) : ( )] ( ) log( )
[ ( ) : ( )] 0 =0 iff ( ) ( )
[ : ] [ : ]
no triangular inequality ---square of distance
Pythagorean Theorem
x
p xD p x q x p x
q x
D p x q x p x q x
D p q D q p
KL-divergence and Riemannian Structure
relation
{ ( , ) : ( , )} ( )
log ( , ) log ( , )( ) [ ]
T
iji j
D p x p x d d G d
p x p xg E
Fisher information matrix ( , )p x
( , )p x d
Affine Connection
covariant derivative
geodesic X=X X=X(t)
( )
c
i jij
X Y
s g d d
minimal distance
straight line
XY
α - connection
( )
(1)
( 1)
(0)
1 1*
2 2
*
:
Levi-Civita (Riemannian)
Exponential connection
Mixture connection
Renyi-Tasallis
EntropyKL-divergence
Affine ConnectionsAffine Connections
e-geodesice-geodesic
m-geodesicm-geodesic
log , log 1 lgr x t t p x t q x c t
, 1r x t tp x t q t
,
q x
p x
*( , )
DualityDualityDualityDuality
, , , i jijX Y X Y X Y g X Y
Riemannian geometryRiemannian geometry::
X
Y
X
Y
*
*, , ,X XX Y Z Y Z Z Y
1 2{ ( , )}S p x x1 2, 0,1x x
1 2{ ( ) ( )}M q x q x
Independent Distributions
Dually flat manifold
2 2
-coordinates -coordinates
potential functions ,
0
, exp : exponential family
: cumulant generating functio
char
n
: negative entrop ay--- cteriz
ijij
i j i j
i i
i i
g g
p x x
ed by flatness
S = {p(x), x discrete}
Dually Flat ManifoldDually Flat ManifoldDually Flat ManifoldDually Flat Manifold
1. Potential Functions1. Potential Functions
---convex (Legendre transformation)
2. Divergence2. Divergence :D p q
3. Pythagoras Theorem3. Pythagoras Theorem
: : :D p q D q r D p r
4. Projection Theorem4. Projection Theorem
p
rq
KL-divergence
Projection Theorem
arg min [ : ]s Mq D p s
arg min [ : ]s Mq D s p
m-geodesic
e-geodesic
p
sq
M
S
Applications to Statistics
curved exponential family:
, expp x u u u x
1
1 n
k
x x kn
: estimation
u
ˆ x
1, 2( , ) ,... np x u x x x( , ) exp{ ( )}p x x
1ˆ( ,..., )nu x x
0 0:H u u : testing
High-Order AsymptoticsHigh-Order AsymptoticsHigh-Order AsymptoticsHigh-Order Asymptotics
1
1
, (u) : , ,
u u , ,
n
n
p x x x
x x
ˆ ˆ Te E u u u u
1 22
1 1e G G
n n
11G G :Cramér-Rao
2 2 2
2e m mM AG H H
Other Applications
• Systems theory• Information theory• Neuromanifold• Belief propagation• Boosting (Murata-Eguchi)• Higher-order correlations• Mathematics --- Orlicz space (Pistone, Gracceli)• Physics ---
Amari-Nagaoka, Methods of Information Geometry, AMS & Oxford U., 2000
Amari, Differential-Geometrical Methods of Statistics, Springer, 1985Kass and Vos, Geomtrical Foundations of Asymptotic Inference, Wiley, 1997Murrey and Rice, Differential Geometry and Statistics, Chapman, 1993
Exponential Family : dually flat
,S p x
,
, exp
exp
ii
ii
p
p k
x
E
x
x x
x
Two coordinate systems
1
1
, , : e-flat
, , : m-flat
n
n
L
L
Exponential Familyexample (1) : discrete distributions
0 , ,
, exp
log ,
( ) log
n
ii
i i
o
i i i
i i
X x x
p x x
p
p
E x p
p p
L
0log
log 1 exp i
p
Negative entropy
example (2) : Gaussian distributions
example (3) : AR model
2
2
22
2 2 2
1 2 21
2 2 21 2
1, exp
22
1exp log 2
2 2
exp
,
xp x
x x
x x
E x E x
0 1 2, , , , ,t i t i tx a x x x x x L L
Legendre transformation
i i
i
i
min
: entropy
cumulant generating function
H
Divergence
:
: log
P P QD P Q
pKL P Q p
q
x
xx
x
1:
2i j
ijD P P dP g d d
Pythagorean Theorem
: : :D P Q D Q R D P R
P
R
Q
m-flat
e-flat
Divergence and Entropy
Max entropy : 0i
i
,0 exp 0 : uniformp x
, : ,0 0D p p H x x
,0p x
equi-divergence: equi-entropy
Dual Foliation
1 2
1 2
;
;
1 1 2
2 2 1
: fixed, free
: fixed, free
M
E
1M c 2E d
2E 0 Pythagorean theorem
1 2M E
Maximum Entropy
, 1, ,i iE a x c i k L
max H : , 1, ,i iE a i k L x
1 2
1 2
;
;
min : 0
P MD P
c H
ˆ ˆmin : 0 : : 0P M
D P D P P D P
P
P0
Simple Example : independence
1 2 1 2
1 2 121 2 1 2
, , 0,1
, exp
x x x x
p x x x x
x
x
1 2 12
1 2 12
, ;
, ;
12 1 2
i iE x
E x x
, 1, 2
maxi iE x c i
H
12 0E
M c
E 00
Simple example : Gaussian
exp iip x x
2
1 2 3
, ;
, ; ,
E x E x
L
L
2
2
1exp
2p x x
Time Series
0 1 2, , , ,x x x L Lx
: , 0,1
: ,
t i t i t t
t i t i
AR x a x N
MA x b
:
0t i t i
i
x h
t tx h
2
,i iiS H e H z h z
Geometry
, ,
1log , log ,
2
: dually flat
1log
4
iji j
S S S
g S S d
S
H S S d
Potentials
1 11 0
0 0
12
21
22
1: 1 log
2
const
H
H
S SD S S d
S S
H
Stochastic Realization
1
1 2
0
: autocorrelation
; ,
;
, max
: exp. fam.
i t t i
k
E x x
S M H
S E S AR k
Lc
c
Dual Problem
: inverse autocorrectioni, 1, ,i ic i k L
minimize H
modelMA
1inverse autocorrelation cost
t
tS
1 12 2
1
1
entropy :
geometry
-divergence : 1
max entropy, ,
family of probability distributions
T
T
D P Q p x q x dx
E E D
p c c D
x xx
x x x
Rényi-Tsallis entropy Manifold of positive measures m(x)
flat 1
21
{1 ( ) }2
1
2
H p x
q
Entropy (alpha-entropy) is a fundamental quantity
---- It is given rise to from a fundamental geometrical structure.
KL-divergence is derived therefrom.