45
MC2019 Part 2 Optimal control for stochastic systems. Hirokazu Tanaka

MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

MC2019 Part 2Optimal control for stochastic systems.

Hirokazu Tanaka

Page 2: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Optimal control for stochastic systems.

In this lecture, we will learn:• Feedforward and feedback control• Trial-by-trial variability• Signal-dependent noise• Dynamic programming• Bellman’s optimality equation• Linear-quadratic-Gaussian (LQG) control• Optimal feedback control (OFC) model• Human psychophysics

Page 3: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Straight paths and bell-shaped velocity in reaching.

Morasso (1981) Exp Brain Res

12

3 4 5

6

joint angles

angular velocity

angular accel.

hand speed

T1->T4 T3->T5 T2->T5 T1->T5

Page 4: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Power laws for curved movements.

Lacquaniti et al. (1983) Acta Psychologica

23θ κ∝

13v κ

−∝

: angular velocity : h: curvature

and speedvθ

κ

or

curvature

angu

lar v

eloc

ity

Page 5: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Fitts’ law for movement duration in rapid pointing movements.

Fitts (1954) J Exp Psychol

22logf

Dt a bW

= +

Page 6: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

“Main sequence” for saccadic movements

Bahill et al. (1975) Math Biosci

Page 7: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Optimality principle: how a unique trajectory is chosen.

http://en.wikipedia.org/wiki/Snell%27s_law

Snell’s law

Fermat’s principle of least time

1 1 1

2 2 2

sin 1/sin 1/

v nv n

θθ

= =

( ) ( ) ( )Q Q

0 P P

1T dsT s dt n s dsv s c

⋅ = = = ∫ ∫ ∫

( ) ( )Q

P

1 0T s n s dsc

δ δ⋅ = = ∫

Page 8: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Feedforward and feedback control.

Feedforward control as a function of time step:

1k k k+ = +x Ax Bu

( )k k k=u u

Feedback control as a function of state:

( )k k k=u u x

For a deterministic system, feedforward and feedback control are equivalent. On the other hand, for a stochastic system, feedforward and feedback control are NOT equivalent.

Page 9: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Signal-dependent noise: trial-by-trial variability.

van Beers et al. (2004) J Neurophysiol;Schmidt et al. (1979) Psychol Rev; Jones et al. (2002) J Neurophysiol

Variability in reaching endpoints Variability in force production

( ) ( )noise 0, 11 ,ξξ= +u u

noise noise TE , Cov = = u u u uuSignal dependent noise

Page 10: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Minimum-variance control predicts smooth trajectory.

Harris & Wolpert (1998) Nature

( )1 1n n n nξ+ = + +x Ax Bu

( )( ) ( )

( )

1 1 1

22 2 2 1 1

1

00

1

1

1 1

1n

nk

k

n n n n

n n n n n

n kk

ξ

ξ ξ

ξ

− − −

− −

=

− − −

− −

= + +

= + + + +

=

= + +∑

x Ax Bu

A x ABu Bu

A x A Bu

[ ] 0

1

0

1E n kn

nn

kk

− −−

=

= +∑x A x A Bu

[ ] ( )T1 T1

0

T 1Cov n k n kn k k

n

k

=

− − − −= ∑x A Bu u B A

Page 11: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Minimum-variance control predicts smooth trajectory.

Harris & Wolpert (1998) Nature

[ ] ( )0

11

0E

nn

kn k

nk

f ft t T−

=

− −= + = ≤ ≤ +∑x A x A Bu x

[ ] ( )T1 T T1

11110

1Covf f

f f

Tn k n

t t T n

kt n t k

n kt

k+

− − −

=

−+ −

= =

= ∑ ∑ ∑x A Bu u B A

Minimize the variance of final position (quadratic with respect to u)

under constraints (T+1 constraints with respect to u)

This problem can be solved with quadratic programming (quadprogcommand).

Page 12: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Minimum-variance control predicts smooth trajectory.

Harris & Wolpert (1998) Nature

Saccade velocity

Reaching and drawing

Page 13: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Matlab code for the minimum-variance model.

Harris & Wolpert (1998) Nature

t1 = 224/1000; % time const of eye dynamics t2 = 13/1000; % another time const of eye dynamicstm = 10/1000;

dt = 1/1000; % simulation time step

tf = 50/1000; % movement durationtp = 20/1000; % post-movement durationK = tf/dt;L = tp/dt;

x0 = [0; 0; 0]; % initial statexf = [10; 0; 0]; % final state

Ac = [0 1 0; ...0 0 1; ...-1/(t1*t2*tm) -1/(t1*t2)-1/(t1*tm)-1/(t2*tm) -1/t1-1/t2-

1/tm];Bc = [0; 0; 1];A = expm(Ac*dt);B = inv(Ac)*(eye(3)-expm(Ac*dt))*Bc;

Q = zeros(K+L);

% calculation of Qfor ell=0:K+L-1

if ell<Kfor k=K:K+L-1

tmpQ = A^(k-ell-1)*B*B'*A'^(k-ell-1);Q(ell+1, ell+1) = Q(ell+1, ell+1) + tmpQ(1,1);

endelse

for k=ell+1:K+L-1tmpQ = A^(k-ell-1)*B*B'*A'^(k-ell-1);Q(ell+1, ell+1) = Q(ell, ell) + tmpQ(1,1);

endend

End

% calculation of CC = [];for p=1:L+1

Ctmp = [];for q=1:K+L

if K-1-(q-1)+(p-1)>0Ctmp = [Ctmp A^(K-1-(q-1)+(p-1))*B];

elseif K-1-(q-1)+(p-1)==0Ctmp = [Ctmp B];

elseCtmp = [Ctmp zeros(3,1)];

endendC = [C; Ctmp];

end

% calculation of dd = [];for ell=0:L

d = [d; xf-A^(K+ell)*x0];end

% solution by quadratic programmingu = quadprog(Q, [], [], [], C, d);

% forward solutionx = zeros(3, K+L);x(:,1) = x0;for k=1:K+L-1

x(:,k+1) = A*x(:,k) + B*u(k);end

figure(1);subplot(211); hold on; plot(x(1,:), 'k', 'linewidth', 2);set(gca, 'plotboxaspectratio', [1.5 1 1]);xlabel('time (ms)'); ylabel('eye position (deg)');subplot(212); hold on; plot(x(2,:), 'k', 'linewidth', 2)set(gca, 'plotboxaspectratio', [1.5 1 1]);xlabel('time (ms)'); ylabel('eye velocity (deg/s)');

Page 14: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Matlab code for the minimum-variance model.

Harris & Wolpert (1998) Nature

Page 15: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Minimum-time control predicts Fitts’ law and main sequence.

Tanaka et al. (2006) J Neurophysiol

Faster, but less accurate

300ms 700ms

More precise, but slower

500ms

Unique movement time can be determined

[ ]

[ ]

MT ,

T

1 Var (t

( ) E ( )

{ }

)t tf p

f

t tf pf

f

p

t f

t f

f

V d

dt t

tt

S t

t

t

λ+

+

=

+

+

− ∫

∫ μ x x

u

x

minimization of movement time

final variance constraint

final position constraint

Page 16: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Minimum-time control predicts Fitts’ law and main sequence.

Tanaka et al. (2006) J Neurophysiol

Fitts’ law

Main sequence

Model Experiment

Page 17: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Optimal control: Bellman’s optimality equation.

( ) ( )1

0 1 1 00

1 1, , , ;2 2

NT T T

N k k k k k N N Nk

J−

−=

= + +∑u u u x x Q x u Ru x Q x

1k k k+ = +x Ax Bu

Error during movement

Control cost Endpoint error

Find control signals {u0, u1, …, uN-1} that minimize the cost function.

Deterministic (i.e., noiseless) dynamics:

Cost function:

Page 18: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Optimal control: Pontryagin’s minimum principle.

State x

Time stepNN-1N-2N-30 1 2 3

u0

u1 u2u3

uN-3

uN-2uN-1

x0

x1

x2 x3

xN-3 xN-2

xN-1

xN

( ) ( )1

0 1 1 00

1 1, , , ;2 2

NT T T

N k k k k k N N Nk

J−

−=

= + +∑u u u x x Q x u Ru x Q x

The control signals {u0, u1, …, uN-1} are optimized as a whole.

Page 19: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Optimal control: Bellman’s optimality equation.

State x

Time stepNN-1N-2N-30 1 2 3

u0

u1 u2u3

x0

x1

x2 x3

n n+1 n+2

(cost at state xn at step n) = (cost of moving xn to xn+1) + (cost at state xn+1 at step 𝑛𝑛 + 1)

( )1 1n nV + +x( )n nV x

Page 20: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Optimal control: Bellman’s optimality equation.

( ){ }

( )1 1

1

, , ,

1 12 2min

n n N

NT T T

n n k k k k k N N Nk n

V+ −

=

= + + ∑

u u ux x Q x u Ru x Q x

( ){ }

( )

( ){ }

( )1

1 1

1

1

,

,

,

1

,

,

1

1 12 2

1 1 1 2 2 2

min

min min

n N

nn N

n

NT T T

n n k k k k k N N Nk n

NT T T T Tn n n n n k k k k k N N N

nk

V

+ −

+ −

=

+

=

= + +

= + + + +

u u u

u u u

x x Q x u Ru x Q x

x Q x u Ru x Q x u Ru x Q x

( ) ( )1 112min

n

T Tn n n n n n nV + +

= + + ux Q x u Ru x

( )1 1n nV + +x

( ) ( ) ( )1 112min

n

T Tn n n n n n n n nV V + +

= + + ux x Q x u Ru x

Page 21: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Solvable example: Linear-Quadratic-Regulator control.

( ) ( )

( ) ( ) ( )

1 1 1

1

1 12 21 1 2 2

min

min

T T Tk k k k k k k k k

u

TT Tk k k k k k k k k k

u

V + + +

+

= + + = + + + +

x x Q x u Ru x S x

x Q x u Ru Ax Bu S Ax Bu

( )

( )1 1

1 1

102

T T T T T T Tk k k k k k k k k k

T Tk k k k

+ +

+ +

∂ = + + + ∂

= + +

u Ru u B ABu x A S Bu u B S Axu

R B S B u B S Ax

( ) 1

1 1T T

k k k k k k

+ += − + = −u R B S B B S Ax L x

Assume that the cost-to-go function has a quadratic form:

( )1 1 1 112

Tk k k kV + + + +=x x S x

Substituting this into the Bellman equation gives

Then, by minimizing with respect to u,

A feedback control law is obtained:

Page 22: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Solvable example: Linear-Quadratic-Regulator control.

( ) 1

1 1T T

k k k k k k

+ += − + = −u R B S B B S Ax L x

( ) ( ) 11 11

1 12 2

TT T

k k k k k k k kk

V−− −

+ = = + +

x x S x x Q A S BR B A x

( ) 11 11

T Tk k k

−− −+= + +S Q A S BR B A

Substituting the control law

into the Bellman equation gives

Therefore a backward recursive equation for the matrix S is obtained.

0

S0

L0

1

S1

L1

2

S2

k-1

Sk-1

Lk-1

k

Sk

Lk

k+1

Sk+1

Lk-2

N-1

SN-1

LN-1

N

SN

LN-2

Page 23: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Solvable example: Linear-Quadratic-Regulator control.

k k k= −u L x

( ) 1

1 1T T

k k k

+ += +L R B S B B S A

( ) 11 11

T Tk k k

−− −+= + +S Q A S BR B A

N N=S Q

Backward recursion equation for the matrix S:

Feedback control law:

where

with the terminal condition

Therefore, once the matrices {S0, …, SN} and {L0, …, LN-1} are obtained, the optimal feedback control signals {u0, …, uN-1} can be computed.

Page 24: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Deterministic and stochastic optimal control.

( ) ( )1

0 1 1 00

1 1, , , ;2 2

NT T T

N k k k k k N N Nk

J−

−=

= + +∑u u u x x Q x u Ru x Q x

( ) ( )1

0 1 10 ,

0 01 1ˆ, , , ; E,2 2

NT T T

N k k k k k N N Nk

J−

−=

Σ = + + ∑

w v

u u u x x Q x u Ru x Q x

1k k k

k k

+ = + =

x Ax Buy Cx

1k k k

k kk

k+ = + ++ =

wv

x Ax Buy Cx

Deterministic Optimal Control

Stochastic Optimal Control

Page 25: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Stochastic optimal control combines Kalman filter and feedback control.

ˆk k k= −u L x

( )1ˆ ˆ ˆk k k k k k+ = + + −x Ax Bu K y Cx

( ) ( )1

0 1 10 ,

0 01 1ˆ, , , ; E,2 2

NT T T

N k k k k k N N Nk

J−

−=

Σ = + + ∑

w v

u u u x x Q x u Ru x Q x

1k k k

k kk

k+ = + ++ =

wv

x Ax Buy Cx

Stochastic Optimal Control

Optimal state estimation (Kalman Filter)

Feedback controller

Page 26: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Optimal feedback control as a model of motor control.

Todorov & Jordan (2002) Nature Neurosci

( ) ( )1

0 1 10 ,

0 01 1ˆ, , , ; , E2 2

NT T T

N k k k k k N N Nk

−=

Σ = + + ∑

ω

u u u x x Q x u Ru x Q x

1 k

k

k k k k

k k

ε+ = + += +

x Ax Bu Cy Hx ω

u

Page 27: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Optimal feedback control as a model of motor control.

Todorov & Jordan (2002) Nature Neurosci; Todorov (2005) Neural Comput

Kalman filter:( )1ˆ ˆ ˆk k k k k k+ = + + −x Ax Bu K y Hx

( )( )

( ) ( )

1T T

T T T1 1 1

1ˆ T ˆ ˆ

1

;

ˆ ˆ;

k k k

k k k k k k

T Tk k k k k k

+

+

= +

= − + =

= + − − =

Σ Σ

Σ Σ Σ Σ Σ

Σ Σ Σ Σ

e

x x x

e ω

e e x e

e

K A H H

A K H A CL L C

K H A A BL A BL x

Ω

x

H

Feedback control:ˆk k k= −u L x

( )( )( )( ) ( )

1T T T1 1 1 1

T1

TT1 1

;

;

k

N

N

k k k k

k k k k N

k k k k k k

Q

+ + + +

+

+ +

= + + +

= + − =

= + − − =

e

e e e

x x x

x x x

x

L B S B R C S S C B S A

S Q A S A BL S

S A S BL A K H S A K H S 0

Page 28: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

小脳内部順モデル予測

運動野フィードバック制御

頭頂葉カルマンフィルタ

身体・環境

体性感覚野感覚処理

大脳基底核評価関数の設定

predictionx̂

filterx̂

yfilterx̂filterˆ= −u Lx

u

Computational neuroanatomy for motor control.

Shadmehr & Krakauer (2008); Haar & Donchin (2019)

Page 29: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Matlab code: optimal feedback control model.

Todorov & Jordan (2002) Nature Neurosci; Todorov (2005) Neural Comput

for iter = 1:MaxIter

% initialize covariancesSiE = S1;SiX = X1*X1';SiXE = zeros(szX,szX);

% forward pass - recompute Kalman filter for k = 1:N-1

% compute Kalman gaintemp = SiE + SiX + SiXE + SiXE';if size(D,2)==1,

DSiD = diag(diag(temp).*D.^2);else

DSiD = zeros(szY,szY);for i=1:szD

DSiD = DSiD + D(:,:,i)*temp*D(:,:,i)';end;

end;K(:,:,k) = A*SiE*H'*pinv(H*SiE*H'+D0*D0'+DSiD);

% compute new SiEnewE = E0*E0' + C0*C0' + (A-K(:,:,k)*H)*SiE*A';LSiL = L(:,:,k)*SiX*L(:,:,k)';if size(C,2)==1,

newE = newE + B*diag(diag(LSiL).*C.^2)*B';else

for i=1:szCnewE = newE + B*C(:,:,i)*LSiL*C(:,:,i)'*B';

end;end;

% update SiX, SiE, SiXESiX = E0*E0' + K(:,:,k)*H*SiE*A' + (A-B*L(:,:,k))*SiX*(A-B*L(:,:,k))' + ...

(A-B*L(:,:,k))*SiXE*H'*K(:,:,k)' + K(:,:,k)*H*SiXE'*(A-B*L(:,:,k))';SiE = newE;SiXE = (A-B*L(:,:,k))*SiXE*(A-K(:,:,k)*H)' - E0*E0';

end;

% first pass initializationif iter==1,

if Init==0, % open loopK = zeros(szX,szY,N-1);

elseif Init==2, % randomK = randn(szX,szY,N-1);

end;end;

% initialize optimal cost-to-go functionSx = Q(:,:,N);Se = zeros(szX,szX);Cost(iter) = 0;

% backward pass - recompute control policyfor k=N-1:-1:1

% update CostCost(iter) = Cost(iter) + trace(Sx*C0*C0') + ...

trace(Se*(K(:,:,k)*D0*D0'*K(:,:,k)' + E0*E0' + C0*C0'));

% Controllertemp = R + B'*Sx*B;BSxeB = B'*(Sx+Se)*B;if size(C,2)==1,

temp = temp + diag(diag(BSxeB).*C.^2);else

for i=1:size(C,3)temp = temp + C(:,:,i)'*BSxeB*C(:,:,i);

end;end;L(:,:,k) = pinv(temp)*B'*Sx*A;

% compute new SenewE = A'*Sx*B*L(:,:,k) + (A-K(:,:,k)*H)'*Se*(A-K(:,:,k)*H);

% update Sx and SeSx = Q(:,:,k) + A'*Sx*(A-B*L(:,:,k));KSeK = K(:,:,k)'*Se*K(:,:,k);if size(D,2)==1,

Sx = Sx + diag(diag(KSeK).*D.^2);else

for i=1:szDSx = Sx + D(:,:,i)'*KSeK*D(:,:,i);

end;end; Se = newE;

end;

% adjust costCost(iter) = Cost(iter) + X1'*Sx*X1 + trace((Se+Sx)*S1);

% progress barif ~rem(iter,10),

fprintf('.');end;

% check convergence of Costif (Niter>0 & iter>=Niter) | ...

(Niter==0 & iter>1 & abs(Cost(iter-1)-Cost(iter))<Eps) | ...(Niter==0 & iter>20 & sum(diff(dist(iter-10:iter))>0)>3),break;

end;end;

Page 30: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Matlab code: optimal feedback control model.

Todorov & Jordan (2002) Nature Neurosci; Todorov (2005) Neural Comput

Page 31: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Infinite-horizon optimal feedback control.

Phillis (1985) IEEE Trans Auto Contr; Qian et al. (2013) Neural Comput

( ) ,d dd dt dd dt d

β γ ωξ

= +

+

+ + +

=

x Yu Gy Cx

x Bu xD

A F

( )T T T1 2 0

1lim E E limt

t tJ J J dt

t→∞ →∞

= + = + + ∫x Ux x Qx u Ru

Stochastic dynamics with signal- and state-dependent noises:

Infinite-horizon cost:

( ) ( ).

ˆ ,ˆˆˆd dt d dt= + + −

= −

x Ax Bu yu

xL

Cx

K

In infinite-horizon optimal feedback control, the Kalman gain (K) and the feedback gain (L) are time invariant.

Page 32: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Bimanual coordination explained by OFC model.

Dietrichsen (2007) Curr Biol

Two cursor condition One cursor condition

Model

Experiment

Page 33: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Motor adaptation as reoptimization.

Izawa et al. (2008) J Neurosci

Sigmoidal pathsModel Experiment

Field uncertaintyModel Experiment

Page 34: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Movement as a real-time decision making process.

Nashed et al. (2014) J Neurosci

Page 35: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Learning optimal control policy without knowing dynamics.

Jiang & Jiang (2014) Biol Cybern

ii

id dt dt η= + + ∑x Ax Bu B C u

( )0

T TJ dt∞

= +∫ x Qx u Ru

= −u Kx

10 T −= + − +A P PA Q PBR BP1 T−=K R B P

Algebraic Riccatti equation:

The Riccatti equation for the optimal feedback gain requires explicit knowledge of the forward model (the matrices A and B)!

Page 36: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Learning optimal control policy without knowing dynamics.

Jiang & Jiang (2014) Biol Cybern

10 T −= + + −A P PA Q PBR BP

( ) ( )0 Tk k k k k k= − + − + −A BK P P A BK Q K RK

11

Tk k

−+ =K R B P

Algebraic Riccatti equation:

Iterative solution to the Riccatti equation:

1) A − BK is Hurwitz.2) 𝑃𝑃∗ ≤ 𝑃𝑃𝑘𝑘+1 ≤ 𝑃𝑃𝑘𝑘3) 𝐾𝐾𝑘𝑘 → 𝐾𝐾∗ and 𝑃𝑃𝑘𝑘 → 𝑃𝑃∗

Page 37: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Learning optimal control policy without knowing dynamics.

Jiang & Jiang (2014) Biol Cybern

10 T −= + − +A P PA Q PBR BP

( ) ( )0 Tk k k k k k= − + − − +A BK P P A BK Q K RK

11

Tk k

−+ =K R B P

Algebraic Riccatti equation:

Iterative solution to the Riccatti equation:

1) A − BK is Hurwitz.2) 𝑃𝑃∗ ≤ 𝑃𝑃𝑘𝑘+1 ≤ 𝑃𝑃𝑘𝑘3) 𝐾𝐾𝑘𝑘 → 𝐾𝐾∗ and 𝑃𝑃𝑘𝑘 → 𝑃𝑃∗

Page 38: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Learning optimal control policy without knowing dynamics.

Jiang & Jiang (2014) Biol Cybern

10 T −= + + −A P PA Q PBR BP

Problem: In determining optimal policy, the system matrices need to be known:

How can optimal policy be derived without assuming the direct knowledge of the system matrices?

Solution: Note that

( )

10 T T

T T T

dt

d dt

− = + + − = + +

x A P PA Q PBR x

x Px x Q K RK x

BP

Therefore, the knowledge of 𝐀𝐀,𝐁𝐁 is replaced with the knowledge of 𝑑𝑑𝐱𝐱.

Page 39: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Learning optimal control policy without knowing dynamics.

Jiang & Jiang (2014) Biol Cybern

By integrating over an infinitesimal interval:

( )0 T T Td dt = + + x Px x Q K RK x

( ) ( ) ( ) ( )tT T t T

t

Tt t t tt dttδ

δ δ+

+ + − = − + ∫x Px x Px x Q K RK x

By integrating over an infinitesimal interval:

Therefore, without knowing the system dynamics, the matrices Pand K are determined.

Page 40: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Learning optimal control policy without knowing dynamics.

Jiang & Jiang (2014) Biol Cybern

Page 41: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Learning optimal control policy without knowing dynamics.

Jiang & Jiang (2014) Biol Cybern

Page 42: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Learning optimal control policy without knowing dynamics.

Jiang & Jiang (2014) Biol Cybern

Page 43: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Summary

• Feedforward and feedback control differ in stochastic dynamics.

• Signal-dependent control noise contributes to trial-by-trial motor variability.

• Movement accuracy under signal-dependent noise models human movements.

• Optimal feedback control (OFC) integrates state estimation and motor control in a single computational framework.

• A number of psychophysical experiments can be explained by the OFC model.

Page 44: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

References

• Schmidt, R. A., Zelaznik, H., Hawkins, B., Frank, J. S., & Quinn Jr, J. T. (1979). Motor-output variability: a theory for the accuracy of rapid motor acts. Psychological review, 86(5), 415.

• van Beers, R. J., Haggard, P., & Wolpert, D. M. (2004). The role of execution noise in movement variability. Journal of Neurophysiology, 91(2), 1050-1063.

• Harris, C. M., & Wolpert, D. M. (1998). Signal-dependent noise determines motor planning. Nature, 394(6695), 780-784.

• Tanaka, H., Krakauer, J. W., & Qian, N. (2006). An optimization principle for determining movement duration. Journal of neurophysiology, 95(6), 3875-3886.

• Todorov, E., & Jordan, M. I. (2002). Optimal feedback control as a theory of motor coordination. Nature neuroscience, 5(11), 1226-1235.

• Todorov, E. (2005). Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural computation, 17(5), 1084-1108.

• Athans, M. (1967). The matrix minimum principle. Information and control, 11(5), 592-606.

• Phillis, Y. A. (1985). Controller design of systems with multiplicative noise. Automatic Control, IEEE Transactions on, 30(10), 1017-1019.

• Qian, N., Jiang, Y., Jiang, Z. P., & Mazzoni, P. (2013). Movement duration, fitts's law, and an infinite-horizon optimal feedback control model for biological motor systems. Neural computation, 25(3), 697-724.

• Diedrichsen, J. (2007). Optimal task-dependent changes of bimanual feedback control and adaptation. Current Biology, 17(19), 1675-1679.

• Izawa, J., Rane, T., Donchin, O., & Shadmehr, R. (2008). Motor adaptation as a process of reoptimization. The Journal of Neuroscience, 28(11), 2883-2891.

• Nashed, J. Y., Crevecoeur, F., & Scott, S. H. (2014). Rapid online selection between multiple motor plans. The Journal of Neuroscience, 34(5), 1769-1780.

• Jiang, Y., & Jiang, Z. P. (2014). Adaptive dynamic programming as a theory of sensorimotor control. Biological cybernetics, 108(4), 459-473.

Page 45: MC2019 Part 2 Optimal control for stochastic systems. · Straight paths and bell -shaped velocity in reaching. Morasso (1981) Exp Brain Res 2. 1. 3. 4. 5. 6. joint angles. angular

Exercise

• Write a Matlab code of the minimum-variance model for a trajectory with given initial and final positions.

• Write a Matlab code of the optimal feedback control for a trajectory with given initial and final positions.

• Investigate how the minimum-variance model and the optimal feedback model differ.