Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
MC2019 Part 2Optimal control for stochastic systems.
Hirokazu Tanaka
Optimal control for stochastic systems.
In this lecture, we will learn:• Feedforward and feedback control• Trial-by-trial variability• Signal-dependent noise• Dynamic programming• Bellman’s optimality equation• Linear-quadratic-Gaussian (LQG) control• Optimal feedback control (OFC) model• Human psychophysics
Straight paths and bell-shaped velocity in reaching.
Morasso (1981) Exp Brain Res
12
3 4 5
6
joint angles
angular velocity
angular accel.
hand speed
T1->T4 T3->T5 T2->T5 T1->T5
Power laws for curved movements.
Lacquaniti et al. (1983) Acta Psychologica
23θ κ∝
13v κ
−∝
: angular velocity : h: curvature
and speedvθ
κ
or
curvature
angu
lar v
eloc
ity
Fitts’ law for movement duration in rapid pointing movements.
Fitts (1954) J Exp Psychol
22logf
Dt a bW
= +
“Main sequence” for saccadic movements
Bahill et al. (1975) Math Biosci
Optimality principle: how a unique trajectory is chosen.
http://en.wikipedia.org/wiki/Snell%27s_law
Snell’s law
Fermat’s principle of least time
1 1 1
2 2 2
sin 1/sin 1/
v nv n
θθ
= =
( ) ( ) ( )Q Q
0 P P
1T dsT s dt n s dsv s c
⋅ = = = ∫ ∫ ∫
( ) ( )Q
P
1 0T s n s dsc
δ δ⋅ = = ∫
Feedforward and feedback control.
Feedforward control as a function of time step:
1k k k+ = +x Ax Bu
( )k k k=u u
Feedback control as a function of state:
( )k k k=u u x
For a deterministic system, feedforward and feedback control are equivalent. On the other hand, for a stochastic system, feedforward and feedback control are NOT equivalent.
Signal-dependent noise: trial-by-trial variability.
van Beers et al. (2004) J Neurophysiol;Schmidt et al. (1979) Psychol Rev; Jones et al. (2002) J Neurophysiol
Variability in reaching endpoints Variability in force production
( ) ( )noise 0, 11 ,ξξ= +u u
noise noise TE , Cov = = u u u uuSignal dependent noise
Minimum-variance control predicts smooth trajectory.
Harris & Wolpert (1998) Nature
( )1 1n n n nξ+ = + +x Ax Bu
( )( ) ( )
( )
1 1 1
22 2 2 1 1
1
00
1
1
1 1
1n
nk
k
n n n n
n n n n n
n kk
ξ
ξ ξ
ξ
− − −
− −
−
=
− − −
− −
= + +
= + + + +
=
= + +∑
x Ax Bu
A x ABu Bu
A x A Bu
[ ] 0
1
0
1E n kn
nn
kk
− −−
=
= +∑x A x A Bu
[ ] ( )T1 T1
0
T 1Cov n k n kn k k
n
k
−
=
− − − −= ∑x A Bu u B A
Minimum-variance control predicts smooth trajectory.
Harris & Wolpert (1998) Nature
[ ] ( )0
11
0E
nn
kn k
nk
f ft t T−
=
− −= + = ≤ ≤ +∑x A x A Bu x
[ ] ( )T1 T T1
11110
1Covf f
f f
Tn k n
t t T n
kt n t k
n kt
k+
− − −
=
−+ −
= =
= ∑ ∑ ∑x A Bu u B A
Minimize the variance of final position (quadratic with respect to u)
under constraints (T+1 constraints with respect to u)
This problem can be solved with quadratic programming (quadprogcommand).
Minimum-variance control predicts smooth trajectory.
Harris & Wolpert (1998) Nature
Saccade velocity
Reaching and drawing
Matlab code for the minimum-variance model.
Harris & Wolpert (1998) Nature
t1 = 224/1000; % time const of eye dynamics t2 = 13/1000; % another time const of eye dynamicstm = 10/1000;
dt = 1/1000; % simulation time step
tf = 50/1000; % movement durationtp = 20/1000; % post-movement durationK = tf/dt;L = tp/dt;
x0 = [0; 0; 0]; % initial statexf = [10; 0; 0]; % final state
Ac = [0 1 0; ...0 0 1; ...-1/(t1*t2*tm) -1/(t1*t2)-1/(t1*tm)-1/(t2*tm) -1/t1-1/t2-
1/tm];Bc = [0; 0; 1];A = expm(Ac*dt);B = inv(Ac)*(eye(3)-expm(Ac*dt))*Bc;
Q = zeros(K+L);
% calculation of Qfor ell=0:K+L-1
if ell<Kfor k=K:K+L-1
tmpQ = A^(k-ell-1)*B*B'*A'^(k-ell-1);Q(ell+1, ell+1) = Q(ell+1, ell+1) + tmpQ(1,1);
endelse
for k=ell+1:K+L-1tmpQ = A^(k-ell-1)*B*B'*A'^(k-ell-1);Q(ell+1, ell+1) = Q(ell, ell) + tmpQ(1,1);
endend
End
% calculation of CC = [];for p=1:L+1
Ctmp = [];for q=1:K+L
if K-1-(q-1)+(p-1)>0Ctmp = [Ctmp A^(K-1-(q-1)+(p-1))*B];
elseif K-1-(q-1)+(p-1)==0Ctmp = [Ctmp B];
elseCtmp = [Ctmp zeros(3,1)];
endendC = [C; Ctmp];
end
% calculation of dd = [];for ell=0:L
d = [d; xf-A^(K+ell)*x0];end
% solution by quadratic programmingu = quadprog(Q, [], [], [], C, d);
% forward solutionx = zeros(3, K+L);x(:,1) = x0;for k=1:K+L-1
x(:,k+1) = A*x(:,k) + B*u(k);end
figure(1);subplot(211); hold on; plot(x(1,:), 'k', 'linewidth', 2);set(gca, 'plotboxaspectratio', [1.5 1 1]);xlabel('time (ms)'); ylabel('eye position (deg)');subplot(212); hold on; plot(x(2,:), 'k', 'linewidth', 2)set(gca, 'plotboxaspectratio', [1.5 1 1]);xlabel('time (ms)'); ylabel('eye velocity (deg/s)');
Matlab code for the minimum-variance model.
Harris & Wolpert (1998) Nature
Minimum-time control predicts Fitts’ law and main sequence.
Tanaka et al. (2006) J Neurophysiol
Faster, but less accurate
300ms 700ms
More precise, but slower
500ms
Unique movement time can be determined
[ ]
[ ]
MT ,
T
1 Var (t
( ) E ( )
{ }
)t tf p
f
t tf pf
f
p
t f
t f
f
V d
dt t
tt
S t
t
t
λ+
+
=
+
−
+
− ∫
∫ μ x x
u
x
minimization of movement time
final variance constraint
final position constraint
Minimum-time control predicts Fitts’ law and main sequence.
Tanaka et al. (2006) J Neurophysiol
Fitts’ law
Main sequence
Model Experiment
Optimal control: Bellman’s optimality equation.
( ) ( )1
0 1 1 00
1 1, , , ;2 2
NT T T
N k k k k k N N Nk
J−
−=
= + +∑u u u x x Q x u Ru x Q x
1k k k+ = +x Ax Bu
Error during movement
Control cost Endpoint error
Find control signals {u0, u1, …, uN-1} that minimize the cost function.
Deterministic (i.e., noiseless) dynamics:
Cost function:
Optimal control: Pontryagin’s minimum principle.
State x
Time stepNN-1N-2N-30 1 2 3
u0
u1 u2u3
uN-3
uN-2uN-1
x0
x1
x2 x3
xN-3 xN-2
xN-1
xN
( ) ( )1
0 1 1 00
1 1, , , ;2 2
NT T T
N k k k k k N N Nk
J−
−=
= + +∑u u u x x Q x u Ru x Q x
The control signals {u0, u1, …, uN-1} are optimized as a whole.
Optimal control: Bellman’s optimality equation.
State x
Time stepNN-1N-2N-30 1 2 3
u0
u1 u2u3
x0
x1
x2 x3
n n+1 n+2
(cost at state xn at step n) = (cost of moving xn to xn+1) + (cost at state xn+1 at step 𝑛𝑛 + 1)
( )1 1n nV + +x( )n nV x
Optimal control: Bellman’s optimality equation.
( ){ }
( )1 1
1
, , ,
1 12 2min
n n N
NT T T
n n k k k k k N N Nk n
V+ −
−
=
= + + ∑
u u ux x Q x u Ru x Q x
( ){ }
( )
( ){ }
( )1
1 1
1
1
,
,
,
1
,
,
1
1 12 2
1 1 1 2 2 2
min
min min
n N
nn N
n
NT T T
n n k k k k k N N Nk n
NT T T T Tn n n n n k k k k k N N N
nk
V
+ −
+ −
−
=
+
−
=
= + +
= + + + +
∑
∑
u u u
u u u
x x Q x u Ru x Q x
x Q x u Ru x Q x u Ru x Q x
( ) ( )1 112min
n
T Tn n n n n n nV + +
= + + ux Q x u Ru x
( )1 1n nV + +x
( ) ( ) ( )1 112min
n
T Tn n n n n n n n nV V + +
= + + ux x Q x u Ru x
Solvable example: Linear-Quadratic-Regulator control.
( ) ( )
( ) ( ) ( )
1 1 1
1
1 12 21 1 2 2
min
min
T T Tk k k k k k k k k
u
TT Tk k k k k k k k k k
u
V + + +
+
= + + = + + + +
x x Q x u Ru x S x
x Q x u Ru Ax Bu S Ax Bu
( )
( )1 1
1 1
102
T T T T T T Tk k k k k k k k k k
T Tk k k k
+ +
+ +
∂ = + + + ∂
= + +
u Ru u B ABu x A S Bu u B S Axu
R B S B u B S Ax
( ) 1
1 1T T
k k k k k k
−
+ += − + = −u R B S B B S Ax L x
Assume that the cost-to-go function has a quadratic form:
( )1 1 1 112
Tk k k kV + + + +=x x S x
Substituting this into the Bellman equation gives
Then, by minimizing with respect to u,
A feedback control law is obtained:
Solvable example: Linear-Quadratic-Regulator control.
( ) 1
1 1T T
k k k k k k
−
+ += − + = −u R B S B B S Ax L x
( ) ( ) 11 11
1 12 2
TT T
k k k k k k k kk
V−− −
+ = = + +
x x S x x Q A S BR B A x
( ) 11 11
T Tk k k
−− −+= + +S Q A S BR B A
Substituting the control law
into the Bellman equation gives
Therefore a backward recursive equation for the matrix S is obtained.
0
S0
L0
1
S1
L1
2
S2
k-1
Sk-1
Lk-1
k
Sk
Lk
k+1
Sk+1
Lk-2
N-1
SN-1
LN-1
N
SN
LN-2
Solvable example: Linear-Quadratic-Regulator control.
k k k= −u L x
( ) 1
1 1T T
k k k
−
+ += +L R B S B B S A
( ) 11 11
T Tk k k
−− −+= + +S Q A S BR B A
N N=S Q
Backward recursion equation for the matrix S:
Feedback control law:
where
with the terminal condition
Therefore, once the matrices {S0, …, SN} and {L0, …, LN-1} are obtained, the optimal feedback control signals {u0, …, uN-1} can be computed.
Deterministic and stochastic optimal control.
( ) ( )1
0 1 1 00
1 1, , , ;2 2
NT T T
N k k k k k N N Nk
J−
−=
= + +∑u u u x x Q x u Ru x Q x
( ) ( )1
0 1 10 ,
0 01 1ˆ, , , ; E,2 2
NT T T
N k k k k k N N Nk
J−
−=
Σ = + + ∑
w v
u u u x x Q x u Ru x Q x
1k k k
k k
+ = + =
x Ax Buy Cx
1k k k
k kk
k+ = + ++ =
wv
x Ax Buy Cx
Deterministic Optimal Control
Stochastic Optimal Control
Stochastic optimal control combines Kalman filter and feedback control.
ˆk k k= −u L x
( )1ˆ ˆ ˆk k k k k k+ = + + −x Ax Bu K y Cx
( ) ( )1
0 1 10 ,
0 01 1ˆ, , , ; E,2 2
NT T T
N k k k k k N N Nk
J−
−=
Σ = + + ∑
w v
u u u x x Q x u Ru x Q x
1k k k
k kk
k+ = + ++ =
wv
x Ax Buy Cx
Stochastic Optimal Control
Optimal state estimation (Kalman Filter)
Feedback controller
Optimal feedback control as a model of motor control.
Todorov & Jordan (2002) Nature Neurosci
( ) ( )1
0 1 10 ,
0 01 1ˆ, , , ; , E2 2
NT T T
N k k k k k N N Nk
Jε
−
−=
Σ = + + ∑
ω
u u u x x Q x u Ru x Q x
1 k
k
k k k k
k k
ε+ = + += +
x Ax Bu Cy Hx ω
u
Optimal feedback control as a model of motor control.
Todorov & Jordan (2002) Nature Neurosci; Todorov (2005) Neural Comput
Kalman filter:( )1ˆ ˆ ˆk k k k k k+ = + + −x Ax Bu K y Hx
( )( )
( ) ( )
1T T
T T T1 1 1
1ˆ T ˆ ˆ
1
;
ˆ ˆ;
k k k
k k k k k k
T Tk k k k k k
−
+
+
= +
= − + =
= + − − =
Σ Σ
Σ Σ Σ Σ Σ
Σ Σ Σ Σ
e
x x x
e ω
e e x e
e
K A H H
A K H A CL L C
K H A A BL A BL x
Ω
x
H
Feedback control:ˆk k k= −u L x
( )( )( )( ) ( )
1T T T1 1 1 1
T1
TT1 1
;
;
k
N
N
k k k k
k k k k N
k k k k k k
Q
−
+ + + +
+
+ +
= + + +
= + − =
= + − − =
e
e e e
x x x
x x x
x
L B S B R C S S C B S A
S Q A S A BL S
S A S BL A K H S A K H S 0
小脳内部順モデル予測
運動野フィードバック制御
頭頂葉カルマンフィルタ
身体・環境
体性感覚野感覚処理
大脳基底核評価関数の設定
predictionx̂
filterx̂
yfilterx̂filterˆ= −u Lx
u
Computational neuroanatomy for motor control.
Shadmehr & Krakauer (2008); Haar & Donchin (2019)
Matlab code: optimal feedback control model.
Todorov & Jordan (2002) Nature Neurosci; Todorov (2005) Neural Comput
for iter = 1:MaxIter
% initialize covariancesSiE = S1;SiX = X1*X1';SiXE = zeros(szX,szX);
% forward pass - recompute Kalman filter for k = 1:N-1
% compute Kalman gaintemp = SiE + SiX + SiXE + SiXE';if size(D,2)==1,
DSiD = diag(diag(temp).*D.^2);else
DSiD = zeros(szY,szY);for i=1:szD
DSiD = DSiD + D(:,:,i)*temp*D(:,:,i)';end;
end;K(:,:,k) = A*SiE*H'*pinv(H*SiE*H'+D0*D0'+DSiD);
% compute new SiEnewE = E0*E0' + C0*C0' + (A-K(:,:,k)*H)*SiE*A';LSiL = L(:,:,k)*SiX*L(:,:,k)';if size(C,2)==1,
newE = newE + B*diag(diag(LSiL).*C.^2)*B';else
for i=1:szCnewE = newE + B*C(:,:,i)*LSiL*C(:,:,i)'*B';
end;end;
% update SiX, SiE, SiXESiX = E0*E0' + K(:,:,k)*H*SiE*A' + (A-B*L(:,:,k))*SiX*(A-B*L(:,:,k))' + ...
(A-B*L(:,:,k))*SiXE*H'*K(:,:,k)' + K(:,:,k)*H*SiXE'*(A-B*L(:,:,k))';SiE = newE;SiXE = (A-B*L(:,:,k))*SiXE*(A-K(:,:,k)*H)' - E0*E0';
end;
% first pass initializationif iter==1,
if Init==0, % open loopK = zeros(szX,szY,N-1);
elseif Init==2, % randomK = randn(szX,szY,N-1);
end;end;
% initialize optimal cost-to-go functionSx = Q(:,:,N);Se = zeros(szX,szX);Cost(iter) = 0;
% backward pass - recompute control policyfor k=N-1:-1:1
% update CostCost(iter) = Cost(iter) + trace(Sx*C0*C0') + ...
trace(Se*(K(:,:,k)*D0*D0'*K(:,:,k)' + E0*E0' + C0*C0'));
% Controllertemp = R + B'*Sx*B;BSxeB = B'*(Sx+Se)*B;if size(C,2)==1,
temp = temp + diag(diag(BSxeB).*C.^2);else
for i=1:size(C,3)temp = temp + C(:,:,i)'*BSxeB*C(:,:,i);
end;end;L(:,:,k) = pinv(temp)*B'*Sx*A;
% compute new SenewE = A'*Sx*B*L(:,:,k) + (A-K(:,:,k)*H)'*Se*(A-K(:,:,k)*H);
% update Sx and SeSx = Q(:,:,k) + A'*Sx*(A-B*L(:,:,k));KSeK = K(:,:,k)'*Se*K(:,:,k);if size(D,2)==1,
Sx = Sx + diag(diag(KSeK).*D.^2);else
for i=1:szDSx = Sx + D(:,:,i)'*KSeK*D(:,:,i);
end;end; Se = newE;
end;
% adjust costCost(iter) = Cost(iter) + X1'*Sx*X1 + trace((Se+Sx)*S1);
% progress barif ~rem(iter,10),
fprintf('.');end;
% check convergence of Costif (Niter>0 & iter>=Niter) | ...
(Niter==0 & iter>1 & abs(Cost(iter-1)-Cost(iter))<Eps) | ...(Niter==0 & iter>20 & sum(diff(dist(iter-10:iter))>0)>3),break;
end;end;
Matlab code: optimal feedback control model.
Todorov & Jordan (2002) Nature Neurosci; Todorov (2005) Neural Comput
Infinite-horizon optimal feedback control.
Phillis (1985) IEEE Trans Auto Contr; Qian et al. (2013) Neural Comput
( ) ,d dd dt dd dt d
β γ ωξ
= +
+
+ + +
=
x Yu Gy Cx
x Bu xD
A F
( )T T T1 2 0
1lim E E limt
t tJ J J dt
t→∞ →∞
= + = + + ∫x Ux x Qx u Ru
Stochastic dynamics with signal- and state-dependent noises:
Infinite-horizon cost:
( ) ( ).
ˆ ,ˆˆˆd dt d dt= + + −
= −
x Ax Bu yu
xL
Cx
K
In infinite-horizon optimal feedback control, the Kalman gain (K) and the feedback gain (L) are time invariant.
Bimanual coordination explained by OFC model.
Dietrichsen (2007) Curr Biol
Two cursor condition One cursor condition
Model
Experiment
Motor adaptation as reoptimization.
Izawa et al. (2008) J Neurosci
Sigmoidal pathsModel Experiment
Field uncertaintyModel Experiment
Movement as a real-time decision making process.
Nashed et al. (2014) J Neurosci
Learning optimal control policy without knowing dynamics.
Jiang & Jiang (2014) Biol Cybern
ii
id dt dt η= + + ∑x Ax Bu B C u
( )0
T TJ dt∞
= +∫ x Qx u Ru
= −u Kx
10 T −= + − +A P PA Q PBR BP1 T−=K R B P
Algebraic Riccatti equation:
The Riccatti equation for the optimal feedback gain requires explicit knowledge of the forward model (the matrices A and B)!
Learning optimal control policy without knowing dynamics.
Jiang & Jiang (2014) Biol Cybern
10 T −= + + −A P PA Q PBR BP
( ) ( )0 Tk k k k k k= − + − + −A BK P P A BK Q K RK
11
Tk k
−+ =K R B P
Algebraic Riccatti equation:
Iterative solution to the Riccatti equation:
1) A − BK is Hurwitz.2) 𝑃𝑃∗ ≤ 𝑃𝑃𝑘𝑘+1 ≤ 𝑃𝑃𝑘𝑘3) 𝐾𝐾𝑘𝑘 → 𝐾𝐾∗ and 𝑃𝑃𝑘𝑘 → 𝑃𝑃∗
Learning optimal control policy without knowing dynamics.
Jiang & Jiang (2014) Biol Cybern
10 T −= + − +A P PA Q PBR BP
( ) ( )0 Tk k k k k k= − + − − +A BK P P A BK Q K RK
11
Tk k
−+ =K R B P
Algebraic Riccatti equation:
Iterative solution to the Riccatti equation:
1) A − BK is Hurwitz.2) 𝑃𝑃∗ ≤ 𝑃𝑃𝑘𝑘+1 ≤ 𝑃𝑃𝑘𝑘3) 𝐾𝐾𝑘𝑘 → 𝐾𝐾∗ and 𝑃𝑃𝑘𝑘 → 𝑃𝑃∗
Learning optimal control policy without knowing dynamics.
Jiang & Jiang (2014) Biol Cybern
10 T −= + + −A P PA Q PBR BP
Problem: In determining optimal policy, the system matrices need to be known:
How can optimal policy be derived without assuming the direct knowledge of the system matrices?
Solution: Note that
( )
10 T T
T T T
dt
d dt
− = + + − = + +
x A P PA Q PBR x
x Px x Q K RK x
BP
Therefore, the knowledge of 𝐀𝐀,𝐁𝐁 is replaced with the knowledge of 𝑑𝑑𝐱𝐱.
Learning optimal control policy without knowing dynamics.
Jiang & Jiang (2014) Biol Cybern
By integrating over an infinitesimal interval:
( )0 T T Td dt = + + x Px x Q K RK x
( ) ( ) ( ) ( )tT T t T
t
Tt t t tt dttδ
δ δ+
+ + − = − + ∫x Px x Px x Q K RK x
By integrating over an infinitesimal interval:
Therefore, without knowing the system dynamics, the matrices Pand K are determined.
Learning optimal control policy without knowing dynamics.
Jiang & Jiang (2014) Biol Cybern
Learning optimal control policy without knowing dynamics.
Jiang & Jiang (2014) Biol Cybern
Learning optimal control policy without knowing dynamics.
Jiang & Jiang (2014) Biol Cybern
Summary
• Feedforward and feedback control differ in stochastic dynamics.
• Signal-dependent control noise contributes to trial-by-trial motor variability.
• Movement accuracy under signal-dependent noise models human movements.
• Optimal feedback control (OFC) integrates state estimation and motor control in a single computational framework.
• A number of psychophysical experiments can be explained by the OFC model.
References
• Schmidt, R. A., Zelaznik, H., Hawkins, B., Frank, J. S., & Quinn Jr, J. T. (1979). Motor-output variability: a theory for the accuracy of rapid motor acts. Psychological review, 86(5), 415.
• van Beers, R. J., Haggard, P., & Wolpert, D. M. (2004). The role of execution noise in movement variability. Journal of Neurophysiology, 91(2), 1050-1063.
• Harris, C. M., & Wolpert, D. M. (1998). Signal-dependent noise determines motor planning. Nature, 394(6695), 780-784.
• Tanaka, H., Krakauer, J. W., & Qian, N. (2006). An optimization principle for determining movement duration. Journal of neurophysiology, 95(6), 3875-3886.
• Todorov, E., & Jordan, M. I. (2002). Optimal feedback control as a theory of motor coordination. Nature neuroscience, 5(11), 1226-1235.
• Todorov, E. (2005). Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural computation, 17(5), 1084-1108.
• Athans, M. (1967). The matrix minimum principle. Information and control, 11(5), 592-606.
• Phillis, Y. A. (1985). Controller design of systems with multiplicative noise. Automatic Control, IEEE Transactions on, 30(10), 1017-1019.
• Qian, N., Jiang, Y., Jiang, Z. P., & Mazzoni, P. (2013). Movement duration, fitts's law, and an infinite-horizon optimal feedback control model for biological motor systems. Neural computation, 25(3), 697-724.
• Diedrichsen, J. (2007). Optimal task-dependent changes of bimanual feedback control and adaptation. Current Biology, 17(19), 1675-1679.
• Izawa, J., Rane, T., Donchin, O., & Shadmehr, R. (2008). Motor adaptation as a process of reoptimization. The Journal of Neuroscience, 28(11), 2883-2891.
• Nashed, J. Y., Crevecoeur, F., & Scott, S. H. (2014). Rapid online selection between multiple motor plans. The Journal of Neuroscience, 34(5), 1769-1780.
• Jiang, Y., & Jiang, Z. P. (2014). Adaptive dynamic programming as a theory of sensorimotor control. Biological cybernetics, 108(4), 459-473.
Exercise
• Write a Matlab code of the minimum-variance model for a trajectory with given initial and final positions.
• Write a Matlab code of the optimal feedback control for a trajectory with given initial and final positions.
• Investigate how the minimum-variance model and the optimal feedback model differ.