Upload
vanhanh
View
219
Download
1
Embed Size (px)
Citation preview
Latent state estimation using control theory
Bert KappenSNN Donders Institute, Radboud University, Nijmegen
Gatsby Unit, UCL London
February 27, 2017
1
1with Hans Christian Ruiz
Bert Kappen
Time series inference
The joint probability is
p(x1:T , y1:T ) = p(x1:T )p(y1:T |x1:T )
p(x1:T ) =T
t=1
p(xt|xt1) p(y1:T |x1:T ) =T
t=1
p(yt|xt)
Bert Kappen 1
Time series inference
Given observed data y1:T , estimate the latent states.
p(x1:T |y1:T ) =p(x1:T , y1:T )
y1:T )p(y1:T ) =
x1:T
p(x1:T , y1:T )
Bert Kappen 2
Time series inference
Given observed data y1:T , estimate the latent states.
p(x1:T |y1:T ) =p(x1:T , y1:T )
y1:T )p(y1:T ) =
x1:T
p(x1:T , y1:T )
Problem occurs in system identification: identify system dynamics from observa-tions.
cellular dynamical processes, finance,
Bert Kappen 3
fMRI
Bert Kappen 4
Parameter estimation by likelihood maximization
Suppose the model is known up to parameters: p(x, y|). 2.
Maximize the log likelihood:
L() = log p(y|) = log
x
p(x, y|)
L() =
1p(y|)
x
p(x, y|) =
x
log p(x, y|)
p(x|y, )
Parameter estimate requires statistics from the posterior.
Common approaches:Kalman Filtering: exact for linear dynamics and Gaussian observationsVariational, EP, extended Kalman Filter: valid for simple posterior distributionsParticle filtering, SMC: generic, but inefficient
2x = x1:T , y = y1:T .
Bert Kappen 5
Time series inference
p(x1:T |x0) : dXt = dWtp(yT |xT ) : exp((yT xT )2)
0 10 20 30 40 50 60 70 80 90 1000.5
0
0.5
1
1.5
2
t
Bert Kappen 6
Time series inference
p(x1:T |x0) : dXt = dWtp(yT |xT ) : exp((yT xT )2)
0 10 20 30 40 50 60 70 80 90 1000.5
0
0.5
1
1.5
2
t
Bert Kappen 7
Time series inference
p(x1:T |x0) : dXt = dWtp(yT |xT ) : exp((yT xT )2)
0 10 20 30 40 50 60 70 80 90 100
0.4
0.2
0
0.2
0.4
0.6
0.8
1
t
0 10 20 30 40 50 60 70 80 90 100
0.4
0.2
0
0.2
0.4
0.6
0.8
1
t
Bert Kappen 8
Control approach to time-series inference
Outline:
Introduction to control theory and path integral control theory
Optimal control optimal sampling
Adaptive importance sampler: Estimate improved sampler from self-generateddata
fMRI application
Bert Kappen 9
Control theory
0 0.5 1 1.5 22
1
0
1
2
Consider a stochastic dynamical system
dXt = f (Xt, u)dt + dWt E(dWt,idWt. j) = i jdt
Given X0 find control function u(x, t) that minimizes the expected future cost
C = E((XT ) +
T0
dtV(Xt, u(Xt, t)))
Bert Kappen 10
Control theory
0.5 1 1.5 2
2
1
0
1
20 0.5 1 1.5 2
2
1
0
1
2
Standard approach: define J(x, t) is optimal cost-to-go from x, t.
J(x, t) = min ut:TEu
((XT ) +
Tt
dtV(Xt, u(Xt, t)))
Xt = x
J satisfies a partial differential equation
tJ(t, x) = minu
(V(x, u) + f (x, u)xJ(x, t) +
122xJ(x, t)
)J(x,T ) = (x)
with u = u(x, t).This is HJB equation. Optimal control u(x, t) defines distributionover trajectories p() (= p(|x0, 0)).
Bert Kappen 11
Path integral control theory
0 0.5 1 1.5 22
1
0
1
2
dXt = f (Xt, t)dt + g(Xt, t)(u(Xt, t)dt + dWt) X0 = x0
Goal is to find function u(x, t) that minimizes
C = Eu
(S () +
T0
dt12
u(Xt, t)2)
S () = (XT ) + T
0V(Xt, t)
Bert Kappen 12
Path integral control theory
0 0.5 1 1.5 22
1
0
1
2
0 0.5 1 1.5 22
1
0
1
2
Equivalent formulation: Find distribution over trajectories p that minimizes 3
C(p) =
dp()(S () + log
p()q()
)q(|x0, 0) is distribution over uncontrolled trajectories.
The optimal solution is given by p() = 1q()eS ()
3Eu T0 dt
12u(Xt, t)
2 =
dp() log p()q() .
Bert Kappen 13
Path integral control theory
0 0.5 1 1.5 22
1
0
1
2
0 0.5 1 1.5 22
1
0
1
2
Equivalent formulation: Find distribution over trajectories p that minimizes
C(p) =
dp()(S () + log
p()q()
)q(|x0, 0) is distribution over uncontrolled trajectories.
The optimal solution is given by p() = 1q()eS () = p(|u).
Equivalence of optimal control and discounted cost (Girsanov)
Bert Kappen 14
Path integral control theory
0 0.5 1 1.5 22
1
0
1
2
0 0.5 1 1.5 22
1
0
1
2
The optimal control cost is C(p) = log = J(x0, 0) with
=
dq()eS () = EqeS
udt =Eq
(dWeS
)Eq
(eS
)J, u can be computed by forward sampling from q.
Bert Kappen 15
Estimating = EeS
0 0.5 1 1.5 22
1
0
1
2
ESS = 1.8, C=31.7
Sample N trajectories from uncontrolled dynamics
i q() wi = eS (i) =1N
i
wi
unbiased estimate of .
Sampling efficiency is inversely proportional to variance in (normalized) wi.
ES S =N
1 + N2Var(w)
Bert Kappen 16
Importance sampling
0 0.5 1 1.5 22
1
0
1
2
ESS = 1.8, C=31.7
0 0.5 1 1.5 22
1
0
1
2
ESS = 3.5, C=5.0
0 0.5 1 1.5 22
1
0
1
2
ESS=9.5, C=2.0
Sample N trajectories from controlled dynamics and reweight yields unbiased es-timate of cost-to-go:
i p() wi = eS (i)q(i)p(i)
= eS u(i) =1N
i
wi
S u() = S () + T
0dt
12
u(Xt, t)2 + T
0u(Xt, t)dWt
Bert Kappen 17
Importance sampling
0 0.5 1 1.5 22
1
0
1
2
ESS = 1.8, C=31.7
0 0.5 1 1.5 22
1
0
1
2
ESS = 3.5, C=5.0
0 0.5 1 1.5 22
1
0
1
2
ESS=9.5, C=2.0
S u() = S () + T
0dt
12
u(Xt, t)2 + T
0u(Xt, t)dWt
Thm: Better u (in the sense of optimal control) provides a better sampler (in the senseof effective sample size). Optimal u = u (in the sense of optimal control) requires only one sample andS u() deterministic!
Thijssen, Kappen 2015
Bert Kappen 18
Proof
Control cost is C(p) = Ep(S () + log p()q()
)Using Jensens inequality:
C = log
q()eS () = log
p()eS ()logp()q()
p()(S () + log
p()q()
)
Bert Kappen 19
Proof
Control cost is C(p) = Ep(S () + log p()q()
)Using Jensens inequality:
C = log
q()eS () = log
p()eS ()logp()q()
p()(S () + log
p()q()
)
The inequality is saturated when S () + log p()q() has zero variance.
This is realized when p = p 4.
4p exists when q()eS () <
Bert Kappen 20
The Path Integral Cross Entropy (PICE) method
We wish to estimate
=
dq()eS ()
The optimal (zero variance) importance sampler is p() = 1q()eS ().
We approximate p() with pu(), where u(x, t|) is a parametrized control function.
Following the Cross Entropy method, we minimise KL(p|pu).
EueS u T
0dWt
u(Xt, t|)
u(x, t|) is arbitrary.
Kappen, Ruiz 2016
Bert Kappen 21
Adaptive importance samplingfor k = 0, . . . do
datak = generate data(model, uk) % Importance sampleruk+1 = learn control(datak, uk) % Gradient descent
end for
Parallel samplingParallel gradient computation
Bert Kappen 22
Inverted pendulum
Simple 2nd order pendulum with noise, X = (, )
= cos + u C = E T
0dtV(Xt) +
12
u(Xt, t)2
Naive grid: u(x) =
k ukx,xk.
0 200 400 600 800 10000
0.2
0.4
0.6
0.8
1
ss
0 200 400 600 800 10001.4
1.2
1
0.8
0.6
0.4
0.2
0
J
0 2 4 6
2
1
0
1
2 3
2
1
0
1
2
3
Final fraction ES S < 1 due to time discretization, finite sample size effects.
Bert Kappen 23
Acrobot
Swing up and stabilize underactuated stochastic double pendulum.
(acrobot.mp4)
Neural network 2 hidden layers, 50 neurons per layer. Input is position and velocity. 2000iterations, with 30000 rollouts per iteration. 100 cores. 15 minutes
Bert Kappen 24
Lavf56.36.100
basic_animation.mp4Media File (video/mp4)
Coordination of UAVs
Centralized path integral solution computed in real time (simulation) for 10 quadrotors. Objectiveis to fly a holding pattern near a fixed location maintaining a minimal velocity and distance to otherdrones. Video at: http://www.snn.ru.nl/bertk/control_theory/PI_quadrotors.mp4
Gomez et al. 2016
Bert Kappen 25
http://www.snn.ru.nl/~bertk/control_theory/PI_quadrotors.mp4
Coordination of UAVs
This behavior was replicated on real quadrotors demonstrating high dimensionalnon-linear stochastic optimal control in real-time.
Chao Xu ACC 2017
Bert Kappen 26
Control Inference
Fleming, Mitter
Bert Kappen 27
Infering neural activity from BOLD fMRI
Subjects were asked to respond as fast as possible to a visual or auditory stimulusMeasure fMRI BOLD response in motor cortex
Reconstruct neural signal from BOLD signal.
Neural activity dZt = AZtdt + zdWt
HemodynamicsVasodilation s(z, s, f )Blood flow f (s)
Balloon modelBlood volume v( f , v)Deoxygenated hemoglobin q( f , v, q)
BOLD y(v, q)
Bert Kappen 28
Infering neural activity from BOLD fMRI
Subjects were asked to respond as fast as possible to a visual or auditory stimulusMeasure fMRI BOLD response in motor cortex
Reconstruct neural signal from BOLD signal.
Neural activity dZt = AZtdt + zdWt
HemodynamicsVasodilation s(z, s, f )Blood flow f (s)
Balloon modelBlood volume v( f , v)Deoxygenated hemoglobin q( f , v, q)
BOLD y(v, q)
Bert Kappen 29
Infering neural activity from BOLD fMRI
Subjects were asked to respond as fast as possible to a visual or auditory stimulusMeasure fMRI BOLD response in motor cortex
Reconstruct neural signal from BOLD signal.
Neural activity dZt = AZtdt + zdWt
HemodynamicsVasodilation s(z, s, f )Blood flow f (s)
Balloon modelBlood volume v( f , v)Deoxygenated hemoglobin q( f , v, q)
BOLD y(v, q)
Bert Kappen 30
Infering neural activity from BOLD fMRI
Neural network with 8 layers and 50 units per layer. 45 learning iterations. 300.000 MC samplesper iteration. 500 cores. CPU time 3100 sec.
Bert Kappen 31
Infering neural activity from BOLD fMRI
Bert Kappen 32
Thank you!
Kappen, Hilbert Johan, and Hans Christian Ruiz. Adaptive importance sampling for control andinference. Journal of Statistical Physics 162.5 (2016): 1244-1266.
Bert Kappen 33