37
Information Geometry Information Geometry of of MaxEnt Principl MaxEnt Principl e e Shun-ichi Amari Shun-ichi Amari RIKEN Brain Science Institute RIKEN Brain Science Institute MaxEnt 07’

Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Embed Size (px)

Citation preview

Page 1: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Information Geometry ofInformation Geometry of MaxEnt PrincipleMaxEnt Principle

Shun-ichi AmariShun-ichi AmariRIKEN Brain Science InstituteRIKEN Brain Science Institute

MaxEnt 07’

Page 2: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Information GeometryInformation GeometryInformation GeometryInformation Geometry

Systems Theory Information Theory

Statistics Neural Networks

Combinatorics PhysicsInformation Sciences

Riemannian ManifoldDual Affine Connections

Manifold of Probability Distributions

Math. AI

Page 3: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

2

2

1; , ; , exp

22

xS p x p x

Information GeometryInformation Geometry ? ?Information GeometryInformation Geometry ? ?

p x

;S p x θ

Riemannian metric

Dual affine connections

( , ) θ

Page 4: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Manifold of Probability DistributionsManifold of Probability DistributionsManifold of Probability DistributionsManifold of Probability Distributions

1 2 3 1 2 3

1,2,3 { ( )}

, , 1

x p x

p p p p p p

3p

2p1p

p

;M p x

Page 5: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Manifold of Probability DistributionsManifold of Probability DistributionsManifold of Probability DistributionsManifold of Probability Distributions

,

1, 2,3

P x

x

M p

3

21

3

3 1 21

, 1i ii

P x p x p p p

p3

21

M 1 2 3

1 2 3

, ,

1

p p p

p p p

p

2 2 21 2 3 1

i ip

1

13 1 2

22

3

log

,

log

p

p

p

p

Page 6: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

InvarianceInvarianceInvarianceInvariance ,S p x

1. Invariant under reparameterization1. Invariant under reparameterization

, ,p x

2. 2. Invariant under different representationInvariant under different representation

, ,y y x p y

2

1 2

21 2

, ,

| ( , ) ( , ) |

p x p x dx

p y p y dy

2 2 i iD

Page 7: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Two StructuresTwo StructuresTwo StructuresTwo Structures

Riemannian metric—Riemannian metric—FisherFisher information information

2ij i jds g d d

Affine connectionAffine connection-- geodesic, straight line-- geodesic, straight line

how curved is the manifold?how curved is the manifold?

Page 8: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Riemannian Structure

2 ( )

( )

( ) ( )

Euclidean

i jij

T

ij

ds g d d

d G d

G g

G E

Page 9: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Kullback-Leibler Divergence

quasi-distance( )

[ ( ) : ( )] ( ) log( )

[ ( ) : ( )] 0 =0 iff ( ) ( )

[ : ] [ : ]

no triangular inequality ---square of distance

Pythagorean Theorem

x

p xD p x q x p x

q x

D p x q x p x q x

D p q D q p

Page 10: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

KL-divergence and Riemannian Structure

relation

{ ( , ) : ( , )} ( )

log ( , ) log ( , )( ) [ ]

T

iji j

D p x p x d d G d

p x p xg E

Fisher information matrix ( , )p x

( , )p x d

Page 11: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Affine Connection

covariant derivative

geodesic X=X X=X(t)

( )

c

i jij

X Y

s g d d

minimal distance

straight line

XY

Page 12: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

α - connection

( )

(1)

( 1)

(0)

1 1*

2 2

*

:

Levi-Civita (Riemannian)

Exponential connection

Mixture connection

Renyi-Tasallis

EntropyKL-divergence

Page 13: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Affine ConnectionsAffine Connections

e-geodesice-geodesic

m-geodesicm-geodesic

log , log 1 lgr x t t p x t q x c t

, 1r x t tp x t q t

,

q x

p x

*( , )

Page 14: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

DualityDualityDualityDuality

, , , i jijX Y X Y X Y g X Y

Riemannian geometryRiemannian geometry::

X

Y

X

Y

*

*, , ,X XX Y Z Y Z Z Y

Page 15: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

1 2{ ( , )}S p x x1 2, 0,1x x

1 2{ ( ) ( )}M q x q x

Independent Distributions

Page 16: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Dually flat manifold

2 2

-coordinates -coordinates

potential functions ,

0

, exp : exponential family

: cumulant generating functio

char

n

: negative entrop ay--- cteriz

ijij

i j i j

i i

i i

g g

p x x

ed by flatness

S = {p(x), x discrete}

Page 17: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Dually Flat ManifoldDually Flat ManifoldDually Flat ManifoldDually Flat Manifold

1. Potential Functions1. Potential Functions

---convex (Legendre transformation)

2. Divergence2. Divergence :D p q

3. Pythagoras Theorem3. Pythagoras Theorem

: : :D p q D q r D p r

4. Projection Theorem4. Projection Theorem

p

rq

KL-divergence

Page 18: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Projection Theorem

arg min [ : ]s Mq D p s

arg min [ : ]s Mq D s p

m-geodesic

e-geodesic

p

sq

M

S

Page 19: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Applications to Statistics

curved exponential family:

, expp x u u u x

1

1 n

k

x x kn

: estimation

u

ˆ x

1, 2( , ) ,... np x u x x x( , ) exp{ ( )}p x x

1ˆ( ,..., )nu x x

0 0:H u u : testing

Page 20: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

High-Order AsymptoticsHigh-Order AsymptoticsHigh-Order AsymptoticsHigh-Order Asymptotics

1

1

, (u) : , ,

u u , ,

n

n

p x x x

x x

ˆ ˆ Te E u u u u

1 22

1 1e G G

n n

11G G :Cramér-Rao

2 2 2

2e m mM AG H H

Page 21: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Other Applications

• Systems theory• Information theory• Neuromanifold• Belief propagation• Boosting (Murata-Eguchi)• Higher-order correlations• Mathematics --- Orlicz space (Pistone, Gracceli)• Physics ---

Amari-Nagaoka, Methods of Information Geometry, AMS & Oxford U., 2000

Amari, Differential-Geometrical Methods of Statistics, Springer, 1985Kass and Vos, Geomtrical Foundations of Asymptotic Inference, Wiley, 1997Murrey and Rice, Differential Geometry and Statistics, Chapman, 1993

Page 22: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Exponential Family : dually flat

,S p x

,

, exp

exp

ii

ii

p

p k

x

E

x

x x

x

Two coordinate systems

1

1

, , : e-flat

, , : m-flat

n

n

L

L

Page 23: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Exponential Familyexample (1) : discrete distributions

0 , ,

, exp

log ,

( ) log

n

ii

i i

o

i i i

i i

X x x

p x x

p

p

E x p

p p

L

0log

log 1 exp i

p

Negative entropy

Page 24: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

example (2) : Gaussian distributions

example (3) : AR model

2

2

22

2 2 2

1 2 21

2 2 21 2

1, exp

22

1exp log 2

2 2

exp

,

xp x

x x

x x

E x E x

0 1 2, , , , ,t i t i tx a x x x x x L L

Page 25: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Legendre transformation

i i

i

i

min

: entropy

cumulant generating function

H

Page 26: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Divergence

:

: log

P P QD P Q

pKL P Q p

q

x

xx

x

1:

2i j

ijD P P dP g d d

Pythagorean Theorem

: : :D P Q D Q R D P R

P

R

Q

m-flat

e-flat

Page 27: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Divergence and Entropy

Max entropy : 0i

i

,0 exp 0 : uniformp x

, : ,0 0D p p H x x

,0p x

equi-divergence: equi-entropy

Page 28: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Dual Foliation

1 2

1 2

;

;

1 1 2

2 2 1

: fixed, free

: fixed, free

M

E

1M c 2E d

2E 0 Pythagorean theorem

1 2M E

Page 29: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Maximum Entropy

, 1, ,i iE a x c i k L

max H : , 1, ,i iE a i k L x

1 2

1 2

;

;

min : 0

P MD P

c H

ˆ ˆmin : 0 : : 0P M

D P D P P D P

P

P0

Page 30: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Simple Example : independence

1 2 1 2

1 2 121 2 1 2

, , 0,1

, exp

x x x x

p x x x x

x

x

1 2 12

1 2 12

, ;

, ;

12 1 2

i iE x

E x x

, 1, 2

maxi iE x c i

H

12 0E

M c

E 00

Page 31: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Simple example : Gaussian

exp iip x x

2

1 2 3

, ;

, ; ,

E x E x

L

L

2

2

1exp

2p x x

Page 32: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Time Series

0 1 2, , , ,x x x L Lx

: , 0,1

: ,

t i t i t t

t i t i

AR x a x N

MA x b

:

0t i t i

i

x h

t tx h

2

,i iiS H e H z h z

Page 33: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Geometry

, ,

1log , log ,

2

: dually flat

1log

4

iji j

S S S

g S S d

S

H S S d

Potentials

1 11 0

0 0

12

21

22

1: 1 log

2

const

H

H

S SD S S d

S S

H

Page 34: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Stochastic Realization

1

1 2

0

: autocorrelation

; ,

;

, max

: exp. fam.

i t t i

k

E x x

S M H

S E S AR k

Lc

c

Page 35: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Dual Problem

: inverse autocorrectioni, 1, ,i ic i k L

minimize H

modelMA

1inverse autocorrelation cost

t

tS

Page 36: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

1 12 2

1

1

entropy :

geometry

-divergence : 1

max entropy, ,

family of probability distributions

T

T

D P Q p x q x dx

E E D

p c c D

x xx

x x x

Rényi-Tsallis entropy Manifold of positive measures m(x)

flat 1

21

{1 ( ) }2

1

2

H p x

q

Page 37: Information Geometry of MaxEnt Principle MaxEnt Principle Shun-ichi Amari RIKEN Brain Science Institute MaxEnt 07’

Entropy (alpha-entropy) is a fundamental quantity

---- It is given rise to from a fundamental geometrical structure.

KL-divergence is derived therefrom.