Bert Kappen SNN Donders Institute, Radboud University ...krzakala/LESHOUCHES2017/talks/BertKappen.pdf · udt = E q dWeS E q eS J;u can be computed by forward sampling from q. Bert

Latent state estimation using control theory

Bert KappenSNN Donders Institute, Radboud University, Nijmegen

Gatsby Unit, UCL London

February 27, 2017

1

1with Hans Christian Ruiz

Bert Kappen

Time series inference

The joint probability is

p(x1:T , y1:T ) = p(x1:T )p(y1:T |x1:T )

p(x1:T ) =T

t=1

p(xt|xt1) p(y1:T |x1:T ) =T

t=1

p(yt|xt)

Bert Kappen 1


Given observed data y1:T , estimate the latent states.

p(x1:T |y1:T ) =p(x1:T , y1:T )

y1:T )p(y1:T ) =

x1:T

p(x1:T , y1:T )

Bert Kappen 2


Given observed data y1:T , estimate the latent states.

p(x1:T |y1:T ) =p(x1:T , y1:T )

y1:T )p(y1:T ) =

x1:T

p(x1:T , y1:T )

Problem occurs in system identification: identify system dynamics from observa-tions.

cellular dynamical processes, finance,

Bert Kappen 3

fMRI

Bert Kappen 4

Parameter estimation by likelihood maximization

Suppose the model is known up to parameters: p(x, y|). 2.

Maximize the log likelihood:

L() = log p(y|) = log

x

p(x, y|)

L() =

1p(y|)

x

p(x, y|) =

x

log p(x, y|)

p(x|y, )

Parameter estimate requires statistics from the posterior.

Common approaches:Kalman Filtering: exact for linear dynamics and Gaussian observationsVariational, EP, extended Kalman Filter: valid for simple posterior distributionsParticle filtering, SMC: generic, but inefficient

2x = x1:T , y = y1:T .

Bert Kappen 5


p(x1:T |x0) : dXt = dWtp(yT |xT ) : exp((yT xT )2)

0 10 20 30 40 50 60 70 80 90 1000.5

0

0.5

1

1.5

2

t

Bert Kappen 6



0 10 20 30 40 50 60 70 80 90 1000.5

0

0.5

1

1.5

2

t

Bert Kappen 7



0 10 20 30 40 50 60 70 80 90 100

0.4

0.2

0

0.2

0.4

0.6

0.8

1

t

0 10 20 30 40 50 60 70 80 90 100

0.4

0.2

0

0.2

0.4

0.6

0.8

1

t

Bert Kappen 8

Control approach to time-series inference

Outline:

Introduction to control theory and path integral control theory

Optimal control optimal sampling

Adaptive importance sampler: Estimate improved sampler from self-generateddata

fMRI application

Bert Kappen 9

Control theory

0 0.5 1 1.5 22

1

0

1

2

Consider a stochastic dynamical system

dXt = f (Xt, u)dt + dWt E(dWt,idWt. j) = i jdt

Given X0 find control function u(x, t) that minimizes the expected future cost

C = E((XT ) +

T0

dtV(Xt, u(Xt, t)))

Bert Kappen 10

Control theory

0.5 1 1.5 2

2

1

0

1

20 0.5 1 1.5 2

2

1

0

1

2

Standard approach: define J(x, t) is optimal cost-to-go from x, t.

J(x, t) = min ut:TEu

((XT ) +

Tt

dtV(Xt, u(Xt, t)))

Xt = x

J satisfies a partial differential equation

tJ(t, x) = minu

(V(x, u) + f (x, u)xJ(x, t) +

122xJ(x, t)

)J(x,T ) = (x)

with u = u(x, t).This is HJB equation. Optimal control u(x, t) defines distributionover trajectories p() (= p(|x0, 0)).

Bert Kappen 11

Path integral control theory

0 0.5 1 1.5 22

1

0

1

2

dXt = f (Xt, t)dt + g(Xt, t)(u(Xt, t)dt + dWt) X0 = x0

Goal is to find function u(x, t) that minimizes

C = Eu

(S () +

T0

dt12

u(Xt, t)2)

S () = (XT ) + T

0V(Xt, t)

Bert Kappen 12


0 0.5 1 1.5 22

1

0

1

2

0 0.5 1 1.5 22

1

0

1

2

Equivalent formulation: Find distribution over trajectories p that minimizes 3

C(p) =

dp()(S () + log

p()q()

)q(|x0, 0) is distribution over uncontrolled trajectories.

The optimal solution is given by p() = 1q()eS ()

3Eu T0 dt

12u(Xt, t)

2 =

dp() log p()q() .

Bert Kappen 13


0 0.5 1 1.5 22

1

0

1

2

0 0.5 1 1.5 22

1

0

1

2

Equivalent formulation: Find distribution over trajectories p that minimizes

C(p) =

dp()(S () + log

p()q()

)q(|x0, 0) is distribution over uncontrolled trajectories.

The optimal solution is given by p() = 1q()eS () = p(|u).

Equivalence of optimal control and discounted cost (Girsanov)

Bert Kappen 14


0 0.5 1 1.5 22

1

0

1

2

0 0.5 1 1.5 22

1

0

1

2

The optimal control cost is C(p) = log = J(x0, 0) with

=

dq()eS () = EqeS

udt =Eq

(dWeS

)Eq

(eS

)J, u can be computed by forward sampling from q.

Bert Kappen 15

Estimating = EeS

0 0.5 1 1.5 22

1

0

1

2

ESS = 1.8, C=31.7

Sample N trajectories from uncontrolled dynamics

i q() wi = eS (i) =1N

i

wi

unbiased estimate of .

Sampling efficiency is inversely proportional to variance in (normalized) wi.

ES S =N

1 + N2Var(w)

Bert Kappen 16

Importance sampling

0 0.5 1 1.5 22

1

0

1

2

ESS = 1.8, C=31.7

0 0.5 1 1.5 22

1

0

1

2

ESS = 3.5, C=5.0

0 0.5 1 1.5 22

1

0

1

2

ESS=9.5, C=2.0

Sample N trajectories from controlled dynamics and reweight yields unbiased es-timate of cost-to-go:

i p() wi = eS (i)q(i)p(i)

= eS u(i) =1N

i

wi

S u() = S () + T

0dt

12

u(Xt, t)2 + T

0u(Xt, t)dWt

Bert Kappen 17

Importance sampling

0 0.5 1 1.5 22

1

0

1

2

ESS = 1.8, C=31.7

0 0.5 1 1.5 22

1

0

1

2

ESS = 3.5, C=5.0

0 0.5 1 1.5 22

1

0

1

2

ESS=9.5, C=2.0

S u() = S () + T

0dt

12

u(Xt, t)2 + T

0u(Xt, t)dWt

Thm: Better u (in the sense of optimal control) provides a better sampler (in the senseof effective sample size). Optimal u = u (in the sense of optimal control) requires only one sample andS u() deterministic!

Thijssen, Kappen 2015

Bert Kappen 18

Proof

Control cost is C(p) = Ep(S () + log p()q()

)Using Jensens inequality:

C = log

q()eS () = log

p()eS ()logp()q()

p()(S () + log

p()q()

)

Bert Kappen 19

Proof

Control cost is C(p) = Ep(S () + log p()q()

)Using Jensens inequality:

C = log

q()eS () = log

p()eS ()logp()q()

p()(S () + log

p()q()

)

The inequality is saturated when S () + log p()q() has zero variance.

This is realized when p = p 4.

4p exists when q()eS () <

Bert Kappen 20

The Path Integral Cross Entropy (PICE) method

We wish to estimate

=

dq()eS ()

The optimal (zero variance) importance sampler is p() = 1q()eS ().

We approximate p() with pu(), where u(x, t|) is a parametrized control function.

Following the Cross Entropy method, we minimise KL(p|pu).

EueS u T

0dWt

u(Xt, t|)

u(x, t|) is arbitrary.

Kappen, Ruiz 2016

Bert Kappen 21

Adaptive importance samplingfor k = 0, . . . do

datak = generate data(model, uk) % Importance sampleruk+1 = learn control(datak, uk) % Gradient descent

end for

Parallel samplingParallel gradient computation

Bert Kappen 22

Inverted pendulum

Simple 2nd order pendulum with noise, X = (, )

= cos + u C = E T

0dtV(Xt) +

12

u(Xt, t)2

Naive grid: u(x) =

k ukx,xk.

0 200 400 600 800 10000

0.2

0.4

0.6

0.8

1

ss

0 200 400 600 800 10001.4

1.2

1

0.8

0.6

0.4

0.2

0

J

0 2 4 6

2

1

0

1

2 3

2

1

0

1

2

3

Final fraction ES S < 1 due to time discretization, finite sample size effects.

Bert Kappen 23

Acrobot

Swing up and stabilize underactuated stochastic double pendulum.

(acrobot.mp4)

Neural network 2 hidden layers, 50 neurons per layer. Input is position and velocity. 2000iterations, with 30000 rollouts per iteration. 100 cores. 15 minutes

Bert Kappen 24

Lavf56.36.100

basic_animation.mp4Media File (video/mp4)

Coordination of UAVs

Centralized path integral solution computed in real time (simulation) for 10 quadrotors. Objectiveis to fly a holding pattern near a fixed location maintaining a minimal velocity and distance to otherdrones. Video at: http://www.snn.ru.nl/bertk/control_theory/PI_quadrotors.mp4

Gomez et al. 2016

Bert Kappen 25

http://www.snn.ru.nl/~bertk/control_theory/PI_quadrotors.mp4

Coordination of UAVs

This behavior was replicated on real quadrotors demonstrating high dimensionalnon-linear stochastic optimal control in real-time.

Chao Xu ACC 2017

Bert Kappen 26

Control Inference

Fleming, Mitter

Bert Kappen 27

Infering neural activity from BOLD fMRI

Subjects were asked to respond as fast as possible to a visual or auditory stimulusMeasure fMRI BOLD response in motor cortex

Reconstruct neural signal from BOLD signal.

Neural activity dZt = AZtdt + zdWt

HemodynamicsVasodilation s(z, s, f )Blood flow f (s)

Balloon modelBlood volume v( f , v)Deoxygenated hemoglobin q( f , v, q)

BOLD y(v, q)

Bert Kappen 28







BOLD y(v, q)

Bert Kappen 29







BOLD y(v, q)

Bert Kappen 30


Neural network with 8 layers and 50 units per layer. 45 learning iterations. 300.000 MC samplesper iteration. 500 cores. CPU time 3100 sec.

Bert Kappen 31


Bert Kappen 32

Thank you!

Kappen, Hilbert Johan, and Hans Christian Ruiz. Adaptive importance sampling for control andinference. Journal of Statistical Physics 162.5 (2016): 1244-1266.

Bert Kappen 33

Documents

Bert Kappen SNN Donders Institute, Radboud University ...krzakala/LESHOUCHES2017/talks/BertKappen.pdf · udt = E q dWeS E q eS J;u can be computed by forward sampling from q. Bert