25
LETTER Communicated by Tamar Flash Stochastic Optimal Control and Estimation Methods Adapted to the Noise Characteristics of the Sensorimotor System Emanuel Todorov [email protected] Department of Cognitive Science, University of California San Diego, La Jolla CA 92093-0515. Optimality principles of biological movement are conceptually appeal- ing and straightforward to formulate. Testing them empirically, however, requires the solution to stochastic optimal control and estimation prob- lems for reasonably realistic models of the motor task and the senso- rimotor periphery. Recent studies have highlighted the importance of incorporating biologically plausible noise into such models. Here we ex- tend the linear-quadratic-gaussian framework—currently the only frame- work where such problems can be solved efficiently—to include control- dependent, state-dependent, and internal noise. Under this extended noise model, we derive a coordinate-descent algorithm guaranteed to converge to a feedback control law and a nonadaptive linear estimator optimal with respect to each other. Numerical simulations indicate that convergence is exponential, local minima do not exist, and the restric- tion to nonadaptive linear estimators has negligible effects in the control problems of interest. The application of the algorithm is illustrated in the context of reaching movements. A Matlab implementation is available at www.cogsci.ucsd.edu/todorov. 1 Introduction Many theories in the physical sciences are expressed in terms of optimality principles, which often provide the most compact description of the laws governing a system’s behavior. Such principles play an important role in the field of sensorimotor control as well (Todorov, 2004). A quantitative the- ory of sensorimotor control requires a precise definition of success in the form of a scalar cost function. By combining top-down reasoning with in- tuitions derived from empirical observations, researchers have proposed a number of hypothetical cost functions for biological movement. While such hypotheses are not difficult to formulate, comparing their predictions to experimental data is complicated by the fact that the predictions have to be derived in the first place—that is, the hypothetical optimal control and esti- mation problems have to be solved. The most popular approach has been to optimize, in an open loop, the sequence of control signals (Chow & Jacobson, Neural Computation 17, 1084–1108 (2005) © 2005 Massachusetts Institute of Technology

Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

LETTER Communicated by Tamar Flash

Stochastic Optimal Control and Estimation Methods Adaptedto the Noise Characteristics of the Sensorimotor System

Emanuel TodorovtodorovcogsciucsdeduDepartment of Cognitive Science University of California San DiegoLa Jolla CA 92093-0515

Optimality principles of biological movement are conceptually appeal-ing and straightforward to formulate Testing them empirically howeverrequires the solution to stochastic optimal control and estimation prob-lems for reasonably realistic models of the motor task and the senso-rimotor periphery Recent studies have highlighted the importance ofincorporating biologically plausible noise into such models Here we ex-tend the linear-quadratic-gaussian frameworkmdashcurrently the only frame-work where such problems can be solved efficientlymdashto include control-dependent state-dependent and internal noise Under this extendednoise model we derive a coordinate-descent algorithm guaranteed toconverge to a feedback control law and a nonadaptive linear estimatoroptimal with respect to each other Numerical simulations indicate thatconvergence is exponential local minima do not exist and the restric-tion to nonadaptive linear estimators has negligible effects in the controlproblems of interest The application of the algorithm is illustrated in thecontext of reaching movements A Matlab implementation is available atwwwcogsciucsdedusimtodorov

1 Introduction

Many theories in the physical sciences are expressed in terms of optimalityprinciples which often provide the most compact description of the lawsgoverning a systemrsquos behavior Such principles play an important role inthe field of sensorimotor control as well (Todorov 2004) A quantitative the-ory of sensorimotor control requires a precise definition of success in theform of a scalar cost function By combining top-down reasoning with in-tuitions derived from empirical observations researchers have proposed anumber of hypothetical cost functions for biological movement While suchhypotheses are not difficult to formulate comparing their predictions toexperimental data is complicated by the fact that the predictions have to bederived in the first placemdashthat is the hypothetical optimal control and esti-mation problems have to be solved The most popular approach has been tooptimize in an open loop the sequence of control signals (Chow amp Jacobson

Neural Computation 17 1084ndash1108 (2005) copy 2005 Massachusetts Institute of Technology

Methods for Optimal Sensorimotor Control 1085

1971 Hatze amp Buys 1977 Anderson amp Pandy 2001) or limb states (Nelson1983 Flash amp Hogan 1985 Uno Kawato amp Suzuki 1989 Harris amp Wolpert1998) For stochastic partially observable plants such as the musculoskele-tal system however open-loop approaches yield suboptimal performance(Todorov amp Jordan 2002b Todorov 2004) Optimal performance can beachieved only by a feedback control law which uses all sensory data avail-able online to compute the most appropriate muscle activations under thecircumstances

Optimization in the space of feedback control laws is studied in the re-lated fields of stochastic optimal control dynamic programming and rein-forcement learning Despite many advances the general-purpose methodsthat are guaranteed to converge in a reasonable amount of time to a reason-able answer remain limited to discrete state and action spaces (Bertsekas ampTsitsiklis 1997 Sutton amp Barto 1998 Kushner amp Dupuis 2001) Discretiza-tion methods are well suited for higher-level control problems such as theproblem faced by a rat that has to choose which way to turn in a two-dimensional maze But the main focus in sensorimotor control is on a dif-ferent level of analysis on how the rat chooses a hundred or so gradedmuscle activations at each point in time in a way that causes its body tomove toward the reward without falling or hitting walls Even when themusculoskeletal system is idealized and simplified the state and actionspaces of interest remain continuous and high-dimensional and the curseof dimensionality prevents the use of discretization methods Generaliza-tions of these methods to continuous high-dimensional spaces typicallyinvolve function approximations whose properties are not yet well under-stood Such approximations can produce good enough solutions whichis often acceptable in engineering applications However the success ofa theory of sensorimotor control ultimately depends on its ability to ex-plain data in a principled manner Unless the theoryrsquos predictions are closeto the globally optimal solution of the hypothetical control problem itis difficult to determine whether the (mis)match to experimental data isdue to the general (in)applicability of optimality ideas to biological move-ment or the (in)appropriateness of the specific cost function or the specificapproximationsmdashin both the plant model and the controller designmdashusedto derive the predictions

Accelerated progress will require efficient and well-understood meth-ods for optimal feedback control of stochastic partially observable contin-uous nonstationary and high-dimensional systems The only frameworkthat currently provides such methods is linear-quadratic-gaussian (LQG)control which has been used to model biological systems subject to sen-sory and motor uncertainty (Loeb Levine amp He 1990 Hoff 1992 Kuo1995) While optimal solutions can be obtained efficiently within the LQGsetting (via Riccati equations) this computational efficiency comes at theprice of reduced biological realism because (1) musculoskeletal dynamicsare generally nonlinear (2) behaviorally relevant performance criteria are

1086 E Todorov

unlikely to be globally quadratic (Kording amp Wolpert 2004) and (3) noise inthe sensorimotor apparatus is not additive but signal-dependent The thirdlimitation is particularly problematic because it is becoming increasinglyclear that many robust and extensively studied phenomenamdashsuch as tra-jectory smoothness speed-accuracy trade-offs task-dependent impedancestructured motor variability and synergistic control and cosine tuningmdashare linked to the signal-dependent nature of sensorimotor noise (Harris ampWolpert 1998 Todorov 2002 Todorov amp Jordan 2002b)

It is thus desirable to extend the LQG setting as much as possible andadapt it to the online control and estimation problems that the nervoussystem faces Indeed extensions are possible in each of the three directionslisted above

1 Nonlinear dynamics (and nonquadratic costs) can be approximatedin the vicinity of the expected trajectory generated by an existingcontroller One can then apply modified LQG methodology to theapproximate problem and use it to improve the existing controlleriteratively Differential dynamic programming (Jacobson amp Mayne1970) as well as iterative LQG methods (Li amp Todorov 2004 Todorovamp Li 2004) are based on this general idea In their present formmost such methods assume deterministic dynamics but stochasticextensions are possible (Todorov amp Li 2004)

2 Quadratic costs can be replaced with a parametric family ofexponential-of-quadratic costs for which optimal LQG-like solutionscan be obtained efficiently (Whittle 1990 Bensoussan 1992) The con-trollers that are optimal for such costs range from risk averse (ierobust) through classic LQG to risk seeking This extended family ofcost functions has not yet been explored in the context of biologicalmovement

3 Additive gaussian noise in the plant dynamics can be replaced withmultiplicative noise which is still gaussian but has standard devi-ation proportional to the magnitude of the control signals or statevariables When the state of the plant is fully observable optimalLQG-like solutions can be computed efficiently as shown by severalauthors (Kleinman 1969 McLane 1971 Willems amp Willems 1976Bensoussan 1992 El Ghaoui 1995 Beghi amp DrsquoAlessandro 1998 RamiChen amp Moore 2001) Such methodology has also been used to modelreaching movements (Hoff 1992) Most relevant to the study of sen-sorimotor control however is the partially observable case whichremains an open problem While some work along these lines hasbeen done (Pakshin 1978 Phillis 1985) it has not produced reliablealgorithms that one can use off the shelf in building biologically rele-vant models (see section 9) Our goal here is to address that problemand provide the model-building methodology that is needed

Methods for Optimal Sensorimotor Control 1087

Table 1 List of Notation

xt isin Rm state vector at time step t

ut isin Rp control signal

yt isin Rk sensory observation

n total number of time stepsA B H system dynamics and observation matricesξtωt εt εtηt zero-mean noise termsξ ω ε ε η covariances of noise termsC1 Cc scaling matrices for control-dependent system noiseD1 Dd scaling matrices for state-dependent observation noiseQt R matrices defining state- and control-dependent costsxt state estimateet estimation errort conditional estimation error covariancee

t xt xe

t unconditional covariancesvt optimal cost-to-go functionSx

t Set st parameters of the optimal cost-to-go function

Kt filter gain matricesLt control gain matrices

In this letter we define an extended noise model that reflects the prop-erties of the sensorimotor system derive an efficient algorithm for solvingthe stochastic optimal control and estimation problems under that noisemodel illustrate the application of this extended LQG methodology in thecontext of reaching movements and study the properties of the new algo-rithm through extensive numerical simulations A special case of the al-gorithm derived here has already allowed us (Todorov amp Jordan 2002b)to construct models of a wider range of empirical results than previouslypossible

In section 2 we motivate our extended noise model which includescontrol-dependent state-dependent and internal estimation noise Insection 3 we formalize the problem and restrict the feedback control lawsunder consideration to functions of state estimates that are obtained by un-biased nonadaptive linear filters In section 4 we compute the optimal feed-back control law for any nonadaptive linear filter and show that it is linearin the state estimate In section 5 we derive the optimal nonadaptive linearfilter for any linear control law The two results together provide an iterativecoordinate-descent algorithm (equations 42 and 52) which is guaranteedto converge to a filter and a control law optimal with respect to each otherIn section 6 we illustrate the application of our method to the analysis ofreaching movements In section 7 we explore numerically the convergenceproperties of the algorithm and observe exponential convergence with nolocal minima In section 8 we assess the effects of assuming a nonadap-tive linear filter and find them to be negligible for the control problems ofinterest

Table 1 shows the notation used in this letter

1088 E Todorov

2 Noise Characteristics of the Sensorimotor System

Noise in the motor output is not additive but instead increases with themagnitude of the control signals This is intuitively obvious if you restyour arm on the table it does not bounce around (ie the passive plantdynamics have little noise) but when you make a movement (ie generatecontrol signals) the outcome is not always as desired Quantitatively therelationship between motor noise and control magnitude is surprisinglysimple Such noise has been found to be multiplicative the standard de-viation of muscle force is well fit with a linear function of the mean forcein both static (Sutton amp Sykes 1967 Todorov 2002) and dynamic (SchmidtZelaznick Hawkins Frank amp Quinn 1979) isometric force tasks The exactreasons for this dependence are not entirely clear although it can be ex-plained at least in part with Poisson noise on the neural level combined withHennemanrsquos size principle of motoneuron recruitment (Jones Hamilton ampWolpert 2002) To formalize the empirically established dependence let ube a vector of control signals (corresponding to the muscle activation levelsthat the nervous system attempts to set) and ε be a vector of zero-mean ran-dom numbers A general multiplicative noise model takes the form C(u)εwhere C(u) is a matrix whose elements depend linearly on u To expressa linear relationship between a vector u and a matrix C we make the ithcolumn of C equal to Ci u where Ci are constant scaling matrices Thenwe have C(u)ε = sum

i Ci uεi where εi is the ith component of the randomvector ε

Online movement control relies on feedback from a variety of sensorymodalities with vision and proprioception typically playing the dominantrole Visual noise obviously depends on the retinal position of the objectsof interest and increases with distance away from the fovea (ie eccen-tricity) The accuracy of visual positional estimates is again surprisinglywell modeled with multiplicative noise whose standard deviation is pro-portional to eccentricity This is an instantiation of Weberrsquos law and hasbeen found to be quite robust in a variety of interval discrimination ex-periments (Burbeck amp Yap 1990 Whitaker amp Latham 1997) We have alsoconfirmed this scaling law in a visuomotor setting where subjects pointedto memorized targets presented in the visual periphery (Todorov 1998)Such results motivate the use of a multiplicative observation noise modelof the form D (x)ε = sum

i Di xεi where x is the state of the plant and environ-ment including the current fixation point and the positions and velocities ofrelevant objects Incorporating state-dependent noise in analyses of senso-rimotor control can allow more accurate modeling of the effects of feedbackand various experimental perturbations it also can effectively induce a costfunction over eye movement patterns and allow us to predict the eye move-ments that would result in optimal hand performance (Todorov 1998) Notethat if other forms of state-dependent sensory noise are found the modelcan still be useful as a linear approximation

Methods for Optimal Sensorimotor Control 1089

Intelligent control of a partially observable stochastic plant requiresa feedback control law which is typically a function of a state estimatethat is computed recursively over time In engineering applications theestimation-control loop is implemented in a noiseless digital computer andso all noise is external In models of biological movement we usually makethe same assumption treating all noise as being a property of the muscu-loskeletal plant or the sensory apparatus This is in principle unrealisticbecause neural representations are likely subject to internal fluctuationsthat do not arise in the periphery It is also unrealistic in modeling practiceAn ideal observer model predicts that the estimation error covariance ofany stationary feature of the environment will asymptote to 0 In partic-ular such models predict that if we view a stationary object in the visualperiphery long enough we should eventually know exactly where it is andbe able to reach for it as accurately as if it were at the center of fixation Thiscontradicts our intuition as well as experimental data Both interval dis-crimination experiments and reaching to remembered peripheral targetsexperiments indicate that estimation errors asymptote rather quickly butnot to 0 Instead the asymptote level depends linearly on eccentricity Thesimplest way to model this is to assume another noise process which wecall internal noise acting directly on whatever state estimate the nervoussystem chooses to compute

3 Problem Statement and Assumptions

Consider a linear dynamical system with state xt isin Rm control ut isin R

pfeedback yt isin R

k in discrete time t

Dynamics xt+1 = Axt + But + ξt +csum

i=1

εitCi ut

Feedback yt = Hxt + ωt +dsum

i=1

εit Di xt

Cost per step xTt Qtxt + uT

t Rut

(31)

The feedback signal yt is received after the control signal ut has been gen-erated The initial state has known mean x1 and covariance 1 All matricesare known and have compatible dimensions making them time varyingis straightforward The control cost matrix R is symmetric positive defi-nite (R gt 0) and the state cost matrices Q1 Qn are symmetric positivesemidefinite (Qt ge 0) Each movement lasts n time steps at t = n the finalcost is xT

n Qnxn and un is undefined The independent random variablesξt isin R

m ωt isin Rk εt isin R

c and εt isin Rd have multidimensional gaussian dis-

tributions with mean 0 and covariances ξ ge 0 ω gt 0 ε = I and ε = Irespectively Thus the control-dependent and state-dependent noise termsin equation 31 have covariances

sumi Ci utuT

t CTi and

sumi Di xtxT

t DTi When the

1090 E Todorov

control-dependent noise is meant to be added to the control signal (which isusually the case) the matrices Ci should have the form B Fi where Fi are theactual noise scaling factors Then the control-dependent part of the plantdynamics becomes B(I + sum

i εit Fi )ut

The problem of optimal control is to find the optimal control law thatis the sequence of causal control functions ut(u1 utminus1 y1 ytminus1) thatminimize the expected total cost over the movement Note that computingthe optimal sequence of functions u1(middot) unminus1(middot) is a different and ingeneral much more difficult problem than computing the optimal sequenceof open-loop controls u1 unminus1

When only additive noise is present (ie C1 Cc = 0 and D1 Dd =0) this reduces to the classic LQG problem which has the well-knownoptimal solution (Davis amp Vinter 1985)

Linear-Quadratic Regulator Kalman Filterut = minusLtxt xt+1 = A xt + But + Kt (yt minus H xt)

Lt = (R + BTSt+1 B)minus1 BTSt+1 A Kt = At HT(Ht HT + ω)minus1

St = Qt + ATSt+1(Aminus BLt) t+1 = ξ + (Aminus Kt H) t AT

(32)

In that case the optimal control law depends on the history of control andfeedback signals only through the state estimate xt which is updated recur-sively by the Kalman filter The matrices L that define the optimal controllaw do not depend on the noise covariances or filter coefficients and thematrices K that define the optimal filter do not depend on the cost andcontrol law

In the case of control-dependent and state-dependent noise the aboveindependence properties no longer hold This complicates the problem sub-stantially and forces us to adopt a more restricted formulation in the interestof analytical tractability We assume that as in equation 32 the entire his-tory of control and feedback signals is summarized by a state estimate xt which is all the information available to the control system at time t Thefeedback control law ut(middot) is allowed to be an arbitrary function of xt butxt can be updated only by a recursive linear filter of the form

xt+1 = Axt + But + Kt(yt minus H xt) + ηt

The internal noise ηt isin Rm has mean 0 and covariance η ge 0 The fil-

ter gains K1 Knminus1 are nonadaptive they are determined in advanceand cannot change as a function of the specific controls and observa-tions within a simulation run Such a filter is always unbiased for anyK1 Knminus1 we have E [xt| xt] = xt for all t Note however that underthe extended noise model any nonadaptive linear filter is suboptimalwhen xt is computed as defined above Cov [xt| xt] is generally larger thanCov [xt|u1 utminus1 y1 ytminus1] The consequences of this will be explorednumerically in section 8

Methods for Optimal Sensorimotor Control 1091

4 Optimal Controller

The optimal ut will be computed using the method of dynamic program-ming We will show by induction that if the true state at time t is xt andthe unbiased state estimate available to the control system is xt then theoptimal cost-to-go function (ie the cost expected to accumulate under theoptimal control law) has the quadratic form

vt(xt xt) = xTt Sx

t xt + (xt minus xt)TSet (xt minus xt) + st = xT

t Sxt xt + eT

t Set et + st

where et xt minus xt is the estimation error At the final time t = n the optimalcost-to-go is simply the final cost xT

n Qnxn and so vn is in the assumed formwith Sx

n = Qn Sen = 0 sn = 0 To carry out the induction proof we have to

show that if vt+1 is in the above form for some t lt n then vt is also in thatform

Consider a time-varying control law that is optimal at times t + 1 nand at time t is given by ut = π ( xt) Let vπ

t (xt xt) be the correspondingcost-to-go function Since this control law is optimal after time t we havevπ

t+1 = vt+1 Then the cost-to-go function vπt satisfies the Bellman equation

vπt (xt xt) = xT

t Qtxt + π ( xt)T Rπ ( xt) + E [vt+1(xt+1 xt+1)|xt xt π ]

To compute the above expectation term we need the update equationsfor the system variables Using the definitions of the observation yt andthe estimation error et the stochastic dynamics of the variables of interestbecome

xt+1 = Axt + Bπ ( xt) + ξt +sum

i

εitCiπ ( xt)

et+1 = (Aminus Kt H)et + ξt minus Ktωt minus ηt +sum

i

εitCiπ ( xt) minus

sumi

εit Kt Di xt

(41)

Then the conditional means and covariances of xt+1 and et+1 are

E [xt+1|xt xt π ] = Axt + Bπ ( xt)

E [et+1|xt xt π ] = (Aminus Kt H)et

Cov [xt+1|xt xt π ] = ξ +sum

i

Ciπ ( xt)π ( xt)TCTi

Cov [et+1|xt xt π ] = ξ +sum

i

Ciπ ( xt)π ( xt)TCTi + η

+ Ktω K T

t +sum

i

Kt Di xtxTt DT

i K Tt

1092 E Todorov

and the conditional expectation in the Bellman equation can be computedThe cost-to-go becomes

vπt (xt xt) = xT

t

(Qt + ATSx

t+1 A+ Dt)xt

+ eTt (Aminus Kt H)TSe

t+1(Aminus Kt H)et

+ tr (Mt) + π ( xt)T (R + BTSx

t+1 B + Ct)π ( xt)

+ 2π ( xt)T BTSxt+1 Axt

where we defined the shortcuts

Ct sum

i

CTi

(Se

t+1 + Sxt+1

)Ci

Dt sum

i

DTi K T

t Set+1 Kt Di and

Mt Sxt+1

ξ + Set+1

(ξ + η + Kt

ω K Tt

)

Note that the control law affects only the cost-go-to function through anexpression that is quadratic in π ( xt) which can be minimized analyticallyBut there is a problem the minimum depends on xt while π is only allowedto be a function of xt To obtain the optimal control law at time t we haveto take an expectation over xt conditional on xt and find the function π

that minimizes the resulting expression Note that the control-dependentexpression is linear in xt and so its expectation depends on the conditionalmean of xt but not on any higher moments Since E [xt| xt] = xt we have

E[vπ

t (xt xt)| xt] = const + π ( xt)T

(R + BTSx

t+1 B + Ct)π ( xt)

+ 2π ( xt)T BTSxt+1 A xt

and thus the optimal control law at time t is

ut = π ( xt) = minusLtxt Lt (R + BTSx

t+1 B + Ct)minus1

BTSxt+1 A

Note that the linear form of the optimal control law fell out of the opti-mization and was not assumed Given our assumptions the matrix beinginverted is symmetric positive-definite

To complete the induction proof we have to compute the optimal cost-to-go vt which is equal to vπ

t when π is set to the optimal control law minusLtxt Using the fact that LT

t (R + BTSxt+1 B + Ct)Lt = LT

t BTSxt+1 A = ATSx

t+1 BLt andthat xT Z x minus 2 xT Zx = (x minus x )T Z(x minus x ) minus xT Zx = eT Ze minus xT Zx for a sym-metric matrix Z (in our case equal to LT

t BTSxt+1 A) the result is

vt(xt xt) = xTt

(Qt + ATSx

t+1(Aminus BLt) + Dt)xt + tr (Mt) + st+1

+ eTt

(ATSx

t+1 BLt + (Aminus Kt H)TSet+1(Aminus Kt H)

)et

Methods for Optimal Sensorimotor Control 1093

We now see that the optimal cost-to-go function remains in the assumedquadratic form which completes the induction proof The optimal controllaw is computed recursively backward in time as

Controller ut = minusLtxt

Lt =(

R + BTSxt+1 B +

sumi

CTi

(Sx

t+1 + Set+1

)Ci

)minus1

BTSxt+1 A

Sxt = Qt + ATSx

t+1(Aminus BLt) +sum

i

DTi K T

t Set+1 Kt Di Sx

n = Qn

Set = ATSx

t+1 BLt + (Aminus Kt H)T Set+1 (Aminus Kt H) Se

n = 0

st = tr(Sx

t+1ξ + Se

t+1

(ξ + η + Kt

ω K Tt

)) + st+1 sn = 0

(42)

The total expected cost is x T1 Sx

1 x1 + tr((Sx1 + Se

1 )1) + s1When the control-dependent and state-dependent noise terms are re-

moved (ie C1 Cc = 0 D1 Dd = 0) the control laws given byequation 42 and 32 are identical The internal noise term η as well asthe additive noise terms ξ and ω do not directly affect the calculation ofthe feedback gain matrices L However all noise terms affect the calculation(see below) of the optimal filter gains K which in turn affect L

One can attempt to transform equation 31 into a fully observable systemby setting H = I ω = η = 0 D1 Dd = 0 in which case K = A andapply equation 42 Recall however our assumption that the control signalis generated before the current state is measured Thus even if we makethe sensory measurement equal to the state we would still be dealing witha partially observable system To derive the optimal controller for the fullyobservable case we have to assume that xt is known at the time when ut isgenerated The above derivation is now much simplified the optimal cost-to-go function vt is in the form xT

t Stxt + st and the expectation term thatneeds to be minimized with regard to ut = π (xt) becomes

E [vt+1] = (Axt + But)TSt+1(Axt + But)

+ uTt

(sumi

CTi St+1Ci

)ut + tr [St+1

ξ ] + st+1

and the optimal controller is computed in a backward pass through time as

Fully observable controller ut = minusLtxt

Lt =(

R + BTSt+1 B +sum

i

CTi St+1Ci

)minus1

BTSt+1 A

St = Qt + ATSt+1(Aminus BLt) Sn = Qn

st = tr (St+1ξ ) + st+1 sn = 0

(43)

1094 E Todorov

5 Optimal Estimator

So far we have computed the optimal control law L for any fixed sequenceof filter gains K What should these gains be fixed to Ideally they shouldcorrespond to a Kalman filter which is the optimal linear estimator How-ever in the presence of control-dependent and state-dependent noise theKalman filter gains become adaptive (ie Kt depends on xt and ut) whichwould make our control law derivation invalid Thus if we want to preservethe optimality of the control law given by equation 42 and obtain an iter-ative algorithm with guaranteed convergence we need to compute a fixedsequence of filter gains that are optimal for a given control law Once theiterative algorithm has converged and the control law has been designedwe could use an adaptive filter in place of the fixed-gain filter in run time(see section 8)

Thus our objective here is the following given a linear feedback controllaw L1 Lnminus1 (which is optimal for the previous filter K1 Knminus1)compute a new filter that in conjunction with the given control law resultsin minimal expected cost In other words we will evaluate the filter not bythe magnitude of its estimation errors but by the effect that these estimationerrors have on the performance of the composite estimation-control system

We will show that the new optimal filter can be designed in a forwardpass through time In particular we will show that regardless of the newvalues of K1 Ktminus1 the optimal Kt can be found analytically as long asKt+1 Knminus1 still have the values for which Lt+1 Lnminus1 are optimalRecall that the optimal Lt+1 Lnminus1 depend only on Kt+1 Knminus1 andso the parameters (as well as the form) of the optimal cost-to-go functionvt+1 cannot be affected by changing K1 Kt Since Kt affects only thecomputation of xt+1 and the effect of xt+1 on the total expected cost is cap-tured by the function vt+1 we have to minimize vt+1 with respect to Kt But v is a function of x and x while K cannot be adapted to the specificvalues of x and x within a simulation run (by assumption) Thus thequantity we have to minimize is the unconditional expectation of vt+1 Indoing so we will use that fact that

E [vt+1(xt+1 xt+1)] = Ext xt [E [vt+1(xt+1 xt+1)|xt xt Lt]]

The conditional expectation was already computed as an intermediate stepin the previous section (not shown) The terms in E [vt+1(xt+1 xt+1)|xt xt Lt]that depend on Kt are

eTt (Aminus Kt H)TSe

t+1(Aminus Kt H)et + tr

(Kt

(ω +

sumi

Di xtxTt DT

i

)K T

t Set+1

)

Defining the (uncentered) unconditional covariances et E [eteT

t ] andx

t E [xtxTt ] the unconditional expectation of the Kt-dependent expression

Methods for Optimal Sensorimotor Control 1095

above becomes

a (Kt) = tr((

(Aminus Kt H)et (Aminus Kt H)T + KtPt K T

t

)Se

t+1

)Pt ω +

sumi

Dixt DT

i

The minimum of a (Kt) is found by setting its derivative with regard to Kt to0 Using the matrix identities part

part X tr (XU) = UT and partpart X tr

(XU XTV

) = VXU +VT XUT and the fact that the matrices Se

t+1 ω e

t xt are symmetric we

obtainparta (Kt)partKt

= 2Set+1

(Kt

(He

t HT + Pt) minus Ae

t HT)

This expression is equal to 0 whenever Kt = Aet HT(He

t HT + Pt)minus1 re-gardless of the value of Se

t+1 Given our assumptions the matrix being in-verted is symmetric positive-definite Note that the optimal Kt dependson K1 Ktminus1 (through e

t and xt ) but is independent of Kt+1 Knminus1

(since it is independent of Set+1) This is the reason that the filter gains are

reoptimized in a forward passTo complete the derivation we have to substitute the optimal filter

gains and compute the unconditional covariances Recall that the variablesxt xt et are deterministically related by et = xt minus xt so the covariance ofany one of them can be computed given the covariances of the other twoand we have a choice of which pair of covariance matrices to computeThe resulting equations are most compact for the pair xt et The stochasticdynamics of these variables are

xt+1 = (Aminus BLt) xt + Kt Het + Ktωt + ηt +sum

i

εit Kt Di (et + xt)

et+1 = (Aminus Kt H)et + ξt minus Ktωt minus ηt minussum

i

εitCi Ltxt (51)

minussum

i

εit Kt Di (et + xt)

Define the unconditional covariances

et E

[eteT

t

] xt E

[xtxT

t

] xet E

[xteT

t

]

noting that xt is uncentered and ex

t = (xet )T Since x1 is a known constant

the initialization at t = 1 is e1 = 1 x

1 = x1xT1 xe

1 = 0 With these defi-nitions we have x

t = E [(et + xt)(et + xt)T] = et + x

t + xet + xeT

t Usingequation 51 the updates for the unconditional covariances are

et+1 = (Aminus Kt H)e

t (Aminus Kt H)T + ξ + η + KtPt K Tt

+sum

i

Ci Ltxt LT

t CTi

1096 E Todorov

xt+1 = (Aminus BLt)x

t (Aminus BLt)T + η + Kt(He

t HT + Pt)K T

t

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T

xet+1 = (Aminus BLt)xe

t (Aminus Kt H)T + Kt Het (Aminus Kt H)T

minus η minus KtPt K Tt

Substituting the optimal value of Kt which allows some simplifications tothe above update equations the optimal nonadaptive linear filter is com-puted in a forward pass through time as

Estimator xt+1 = (Aminus BLt) xt + Kt(yt minus H xt) + ηt

Kt = Aet HT

(He

t HT + ω +sum

i

Di(e

t + xt + xe

t + ext

)DT

i

)minus1

et+1 = ξ + η + (Aminus Kt H)e

t AT +sum

i

Ci Ltxt LT

t CTi e

1 = 1

xt+1 = η + Kt He

t AT + (Aminus BLt)xt (Aminus BLt)T

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T x

1 = x1xT1

xet+1 = (Aminus BLt) xe

t (Aminus Kt H)T minus η xe1 = 0

(52)It is worth noting the effects of the internal noise ηt If that term did not

exist (ie η = 0) the last update equation would yield xet = 0 for all t

Indeed for an optimal filter one would expect xet = 0 from the orthogo-

nality principle if the state estimate and estimation error were correlatedone could improve the filter by taking that correlation into account How-ever the situation here is different because we have noise acting directlyon the state estimate When such noise pushes xt in one direction et is (bydefinition) pushed in the opposite direction creating a negative correlationbetween xt and et This is the reason for the negative sign in front of the η

term in the last update equationThe complete algorithm is the followingAlgorithmInitialize K1 Knminus1 and iterate equation 42 and equation 52 until

convergence Convergence is guaranteed because the expected cost is non-negative by definition and we are using a coordinate-descent algorithmwhich decreases the expected cost in each step The initial sequence K couldbe set to 0mdashin which case the first pass of equation 42 will find the optimalopen-loop controls or initialized from equation 32mdashwhich is equivalent toassuming additive noise in the first pass

We can also derive the optimal adaptive linear filter with gains Kt thatdepend on the specific xt and ut = minusLtxt within each simulation run This isagain accomplished by minimizing E [vt+1] with respect to Kt but the expec-tation is computed with xt being a known constant rather than a randomvariable We now have x

t = xtxTt and xe

t = 0 and so the last two update

Methods for Optimal Sensorimotor Control 1097

equations in equation 52 are no longer needed The optimal adaptive linearfilter is

Adaptive estimator xt+1 = (Aminus BLt) xt + Kt (yt minus H xt) + ηt

Kt = At HT

(Ht HT + ω +

sumi

Di(t + xt xT

t

)DT

i

)minus1

t+1 = ξ + η + (Aminus Kt H) t AT +sum

i

Ci Ltxt xTt LT

t CTi

(53)

where t = Cov [xt| xt] is the conditional estimation error covariance(initialized from 1 which is given) When the control-dependentstate-dependent and internal noise terms are removed (C1 Cc = 0D1 Dd = 0 η = 0) equation 53 reduces to the Kalman filter in equa-tion 32 Note that using equation 53 instead of equation 52 online reducesthe total expected cost because equation 53 achieves lower estimation errorthan any other linear filter and the expected cost depends on the conditionalestimation error covariance This can be seen from

E[vt(xt xt)| xt] = xTt Sx

t xt + st + tr((

Sxt + Se

t

)Cov[xt| xt]

)6 Application to Reaching Movements

We now illustrate how the methodology developed above can be used toconstruct models relevant to motor control Since this is a methodologi-cal rather than a modeling article a detailed evaluation of the resultingmodels in the context of the motor control literature will not be givenhere The first model is a one-dimensional model of reaching and in-cludes control-dependent noise but no state-dependent or internal noiseThe latter two forms of noise are illustrated in the second model wherewe estimate the position of a stationary peripheral target without making amovement

61 Models We model a single-joint movement (such as flexing the el-bow) that brings the hand to a specified target For simplicity the rotationalmotion is replaced with translational motion the hand is modeled as a pointmass (m = 1 kg) whose one-dimensional position at time t is p(t) The com-bined action of all muscles is represented with the force f (t) acting on thehand The control signal u(t) is transformed into force f (t) by adding control-dependent multiplicative noise and applying a second-order muscle-likelow-pass filter (Winter 1990) of the form τ1τ2 f (t) + (τ1 + τ2) f(t) + f (t) =u(t) with time constants τ1 = τ2 = 004 sec Note that a second-order filtercan be written as a pair of coupled first-order filters (with outputs g and f )as follows τ1g(t) + g(t) = u(t) τ2 f (t) + f (t) = g(t)

The task is to move the hand from the starting position p(0) = 0 m to thetarget position plowast = 01 m and stop there at time tend with minimal energy

1098 E Todorov

consumption Movement durations are in the interval tend isin [025 sec035 sec] Time is discretized at = 001 sec The total cost is defined as

(p(tend) minus plowast)2 + (wv p(tend))2 + (w f f (tend))2 + rn minus 1

nminus1sumk=1

u(k)2

The first term enforces positional accuracy the second and third termsspecify that the movement has to stop at time tend that is both the velocityand force have to vanish and the last term penalizes energy consumptionIt makes sense to set the scaling weights wv and w f so that wv p(t) andw f f (t) averaged over the movement have magnitudes similar to the handdisplacement plowast minus p(0) For a 01 m reaching movement that lasts about03 sec these weights are wv = 02 and w f = 002 The weight of the energyterm was set to r = 000001

The discrete-time system state is represented with the five-dimensionalvector

xt = [p(t) p(t) f (t) g(t) plowast]

initialized from a gaussian with mean x1 = [0 0 0 0 plowast] The auxiliarystate variable g(t) is needed to implement a second-order filter The tar-get plowast is included in the state so that we can capture the above cost func-tion using a quadratic with no linear terms defining p = [1 0 0 0 minus1]we have pTxt = p(tend) minus plowast and so xT

t (ppT)xt = (p(tend) minus plowast)2 Note thatthe same could be accomplished by setting p = [1 0 0 0 minusplowast] and xt =[p(t) p(t) f (t) g(t) 1] The advantage of the formulation used here is thatbecause the target is represented in the state the same control law can bereused for other targets The control law of course depends on the filterwhich depends on the initial expected state which depends on the targetmdashand so a control law optimal for one target is not necessarily optimal for allother targets Unpublished simulation results indicate good generalizationbut a more detailed investigation of how the optimal control law dependson the target position is needed

The sensory feedback carries information about position velocity andforce

yt = [p(t) p(t) f (t)] + ωt

The vector ωt of sensory noise terms has zero-mean gaussian distributionwith diagonal covariance

ω = (σsdiag[002 m 02 ms 1 N])2

where the relative magnitudes are set using the same order-of-magnitudereasoning as before and σs = 05 The multiplicative noise term added tothe discrete-time control signal ut = u(t) is σcεtut where σc = 05 Note that

Methods for Optimal Sensorimotor Control 1099

σc is a unitless quantity that defines the noise magnitude relative to thecontrol signal magnitude

The discrete-time dynamics of the above system are

p(t + ) = p(t) + p(t)

p(t + ) = p(t) + f (t)m

f (t + ) = f (t) (1 minus τ2) + g(t)τ2

g(t + ) = g(t) (1 minus τ1) + u(t) (1 + σcεt) τ1

which is transformed into the form of equation 31 by the matrices

A =

1 0 0 0

0 1 m 0 0

0 0 1 minus τ2 τ2 0

0 0 0 1 minus τ1 0

0 0 0 0 1

B =

0

0

0

τ1

0

H =

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

C1 = Bσc c = 1 d = 0

1 = ξ = η = 0

The cost matrices are R = r Q1nminus1 = 0 and Qn = ppT + vvT + ffT where

p = [1 0 0 0 minus1] v = [0 wv 0 0 0] f = [0 0 w f 0 0]

This completes the formulation of the first model The above algorithmcan now be applied to obtain the control law and filter and the closed-loopsystem can be simulated To replace the control-dependent noise with ad-ditive noise of similar magnitude (and compare the effects of the two formsof noise) we will set c = 0 and ξ = (46 N)2 B BT The value of 46 N is theaverage magnitude of the control-dependent noise over the range of move-ment durations (found through 10000 simulation runs at each movementduration)

We also model an estimation process under state-dependent and inter-nal noise in the absence of movement In that case the state is xt = plowastwhere the stationary target plowast is sampled from a gaussian with mean x1 isin5 cm 15 cm 25 cm and variance 1 = (5 cm)2 Note that target eccentric-ity is represented as distance rather than visual angle The state-dependentnoise has scale D1 = 05 fixation is assumed to be at 0 cm the time step is = 10 msec and we run the estimation process for n = 100 time steps Inone set of simulations we use internal noise η = (05 cm)2 without addi-tive noise In another set of simulations we study additive noise with thesame magnitude ω = (05 cm)2 without internal noise There is no actu-ator to be controlled so we have A = H = 1 and B = L = 0 Estimation isbased on the adaptive filter from equation 53

1100 E Todorov

62 Results Reaching movements are known to have stereotyped bell-shaped speed profiles (Flash amp Hogan 1985) Models of this phenomenonhave traditionally been formulated in terms of deterministic open-loop min-imization of some cost function Cost functions that penalize physicallymeaningful quantities (such as duration or energy consumption) did notagree with empirical data (Nelson 1983) in order to obtain realistic speedprofiles it appeared necessary to minimize a smoothness-related cost thatpenalizes the derivative of acceleration (Flash amp Hogan 1985) or torque(Uno et al 1989) Smoothness-related cost functions have also been usedin the context of stochastic optimal feedback control (Hoff 1992) to obtainbell-shaped speed profiles It was recently shown however that smoothnessdoes not have to be explicitly enforced by the cost function open-loop min-imization of end-point error was found sufficient to produce realistic trajec-tories provided that the multiplicative nature of motor noise is taken intoaccount (Harris amp Wolpert 1998) While this is an important step toward amore principled optimization model of trajectory smoothness it still con-tains an ad hoc element the optimization is performed in an open loopwhich is suboptimal especially for movements of longer duration Ourmodel differs from Harris and Wolpert (1998) in that not only the averagesequence of control signals is optimal but the feedback gains that determinethe online sensory-guided adjustments are also optimal Optimal feedbackcontrol of reaching has been studied by Meyer Abrams Kornblum Wrightand Smith (1988) in an intermittent setting and Hoff (1992) in a continu-ous setting However both of these models assume full state observationOurs is the first optimal control model of reaching that incorporates sensorynoise and combines state estimation and feedback control into an optimalsensorimotor loop The predicted movement kinematics shown in Figure 1Aclosely resemble observed movement trajectories (Flash amp Hogan 1985)

Another well-known property of reaching movements first observed acentury ago by Woodworth and later quantified as Fittsrsquo law is the trade-offbetween speed and accuracy The fact that faster movements are less ac-curate implies that the instantaneous noise in the motor system is control-dependent in agreement with direct measurements of isometric force fluc-tuations (Sutton and Sykes 1967 Schmidt et al 1979 Todorov 2002) thatshow standard deviation increasing linearly with the mean Naturally thisnoise scaling has formed the basis of both closed-loop (Meyer et al 1988Hoff 1992) and open-loop (Harris amp Wolpert 1998) optimization models ofthe speed-accuracy trade-off Figure 1B illustrates the effect in our model asthe (specified) movement duration increases the standard deviation of theend-point error achieved by the optimal controller decreases To emphasizethe need for incorporating control-dependent noise we modified the modelby making the noise in the plant dynamics additive with fixed magnitudechosen to match the average multiplicative noise magnitude over the rangeof movement durations With that change the end-point error showed theopposite trend to the one observed experimentally (see Figure 1B)

Methods for Optimal Sensorimotor Control 1101

Figure 1 (A) Normalized position (Pos) velocity (Vel) and acceleration (Acc) ofthe average trajectory of the optimal controller (B) A separate optimal controllerwas constructed for each instructed duration the resulting closed-loop systemwas simulated for 10000 trials and the positional standard deviation at the endof the trial was plotted This was done with either multiplicative (solid line)or additive (dashed line) noise in the plant dynamics (C) The position of astationary peripheral target was estimated over time under internal estimationnoise (solid line) or additive observation noise (dashed line) This was done inthree sets of trials with target positions sampled from gaussians with means5 cm (bottom) 15 cm (middle) and 25 cm (top) Each curve is an average over10000 simulation runs

It is interesting to compare the effects of the control penalty r and the mul-tiplicative noise scaling σc As equation 42 shows both terms penalize largecontrol signalsmdashdirectly in the case of r and indirectly (via increased un-certainty) in the case of σc Consequently both terms lead to a negative biasin end-point position (not shown) but the effect is much more pronouncedfor r Another consequence of the fact that larger controls are more costlyarises in the control of redundant systems where the optimal strategy is tofollow a minimal intervention principle that is to leave task-irrelevant de-viations from the average behavior uncorrected (Todorov amp Jordan 2002a2002b) Simulations have shown that this more complex effect is dependenton σc and actually decreases when r is increased while σc is kept constant(Todorov amp Jordan 2002b)

Figure 1C shows simulation results from our second model where the po-sition of a stationary peripheral target is estimated by the optimal adaptivefilter in equation 53 operating under internal estimation noise or additiveobservation noise of the same magnitude In each case we show results forthree sets of targets with varying average eccentricity The standard devi-ations of the estimation error always reach an asymptote (much faster inthe case of internal noise) In the presence of internal noise this asymptotedepends on target eccentricity for the chosen model parameters the depen-dence is in quantitative agreement with our experimental results (Todorov1998) Under additive noise the error always asymptotes to 0

1102 E Todorov

Figure 2 Relative change in expected cost as a function of iteration numberin (A) psychophysical models and (B) random models (C) Relative variabil-ity (SDmean) among expected costs obtained from 100 different runs of thealgorithm on the same model (average over models in each class)

7 Convergence Properties

We studied the convergence properties of the algorithm in 10 models of psy-chophysical experiments taken from Todorov and Jordan (2002b) and 200randomly generated models The psychophysical models had dynamics andcost functions similar to the above example They included two models ofplanar reaching three models of passing through sequences of targets onemodel of isometric force production three models of tracking and reach-ing with a mechanically redundant arm and one model of throwing Thedimensionalities of the state control and feedback were between 5 and20 and the horizon n was about 100 The psychophysical models includedcontrol-dependent dynamics noise and additive observation noise but nointernal or state-dependent noise The details of all these models are inter-esting from a motor control point of view but we omit them here since theydid not affect the convergence of the algorithm in any systematic way

The random models were divided into two groups of 100 each passivelystable with all eigenvalues of A being smaller than 1 and passively unsta-ble with the largest eigenvalue of A being between 1 and 2 The dynamicswere restricted so that the last component of xt was 1mdashto make the randommodels more similar to the psychophysical models which always incorpo-rated a constant in the state description The state control and measurementdimensionalities were sampled uniformly between 5 and 20 The randommodels included all forms of noise allowed by equation 31

For each model we initialized K1nminus1 from equation 32 and applied ouriterative algorithm In all cases convergence was very rapid (see Figures 2Aand 2B) with the relative change in expected cost decreasing exponentiallyThe jitter observed at the end of the minimization (see Figure 2A) is due tonumerical round-off errors (note the log scale) and continues indefinitelyThe exponential convergence regime does not always start from the firstiteration (see Figure 2A) Similar behavior was observed for the absolute

Methods for Optimal Sensorimotor Control 1103

change in expected cost (not shown) As one would expect random mod-els with unstable passive dynamics converged more slowly than passivelystable models Convergence was observed in all cases

To test for the existence of local minima we focused on five psychophys-ical five random stable and five random unstable models For each modelthe algorithm was initialized 100 times with different randomly chosensequences K1nminus1 and run for 100 iterations For each model we com-puted the standard deviation of the expected cost obtained at each iterationand divided by the mean expected cost at that iteration The results aver-aged within each model class are plotted in Figure 2C The negligibly smallvalues after convergence indicate that the algorithm always finds the samesolution This was true for every model we studied despite the fact thatthe random initialization sometimes produced very large initial costs Wealso examined the K and L sequences found at the end of each run and thedifferences seemed to be due to round-off errors Thus we conjecture thatthe algorithm always converges to the globally optimal solution So far wehave not been able to prove this analytically and cannot offer a satisfyingintuitive explanation at this time

Note that the system can be unstable even for the optimal controllerFormally that does not affect the derivation because in a discrete-time finite-horizon system all numbers remain finite In practice the components of xt

can exceed the maximum floating-point number whenever the eigenvaluesof (Aminus BLt) are sufficiently large In the applications we are interestedin (Todorov 1998 Todorov amp Jordan 2002b) such problems were neverencountered

8 Improving Performance via Adaptive Estimation

Although the iterative algorithm given by equations 42 and 52 is guar-anteed to converge and empirically it appears to converge to the globallyoptimal solution performance can still be suboptimal due to the imposedrestriction to nonadaptive filters Here we present simulations aimed atquantifying this suboptimality

Because the potential suboptimality arises from the restriction to non-adaptive filters it is natural to ask what would happen if that restriction wereremoved in run time and the optimal adaptive linear filter from equation 53were used instead Recall that although the control law is optimized underthe assumption of a nonadaptive filter it yields better performance if a dif-ferent filter which somehow achieves lower estimation error is used in runtime Thus in our first test we simply replace the nonadaptive filter withequation 53 in run time and compute the reduction in expected total cost

The above discussion suggests a possibility for further improvement Thecontrol law is optimal with respect to some sequence of filter gains K1nminus1But the adaptive filter applied in run time uses systematically differentgains because it achieves systematically lower estimation error We can run

1104 E Todorov

Table 2 Cost Reduction

Model

Method Psychophysical Random Stable Random Unstable

Adaptive Estimator 19 0 314 Reoptimized Controller 17 0 283

Notes Numbers indicate percent reduction in expected total cost relative to thecost of the solution found by our iterative algorithm The two improvement meth-ods are described in the text Each method is applied to 10 models in each modelclass For each model and method expected total cost is computed from 10000simulation runs A value of 0 indicates that with a sample size of 10 models theimprovement was not significantly different from 0 (t-test p = 005 threshold)

our control law in conjunction with the adaptive filter and find the averagefilter gains K1nminus1 that are used online Now one would think that if wereoptimized the control law for the nonadaptive filter K1nminus1 which betterreflects the gains being used online by the adaptive filter this will furtherimprove performance This is the second test we apply

As Table 2 shows neither method improves performance substantiallyfor psychophysical models or random stable models However both meth-ods result in substantial improvement for random unstable models This isnot surprising In the passively stable models the differences between theexpected and actual values of the states and controls are relatively smalland so the optimal nonadaptive filter is not that different from the optimaladaptive filter The unstable models on the other hand are very sensitive tosmall perturbations and thus follow substantially different state-control tra-jectories in different simulation runs So the advantage of adaptive filteringis much greater Since musculoskeletal plants have stable passive dynamicswe conclude that our algorithm is well suited for approximating the optimalsensorimotor system

It is interesting that control law reoptimization in addition to adaptivefiltering is actually worse than adaptive filtering alonemdashcontrary to our in-tuition This was the case for every model we studied Although it is notclear where the problem with the reoptimization method lies this some-what unexpected result provides further justification for the restriction weintroduced In particular it suggests that the control law that is optimalunder the best nonadaptive filter may be close to optimal under the bestadaptive filter

9 Discussion

We have presented an algorithm for stochastic optimal control and estima-tion of partially observable linear dynamical systems subject to quadraticcosts and noise processes characteristic of the sensorimotor system (see

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 2: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

Methods for Optimal Sensorimotor Control 1085

1971 Hatze amp Buys 1977 Anderson amp Pandy 2001) or limb states (Nelson1983 Flash amp Hogan 1985 Uno Kawato amp Suzuki 1989 Harris amp Wolpert1998) For stochastic partially observable plants such as the musculoskele-tal system however open-loop approaches yield suboptimal performance(Todorov amp Jordan 2002b Todorov 2004) Optimal performance can beachieved only by a feedback control law which uses all sensory data avail-able online to compute the most appropriate muscle activations under thecircumstances

Optimization in the space of feedback control laws is studied in the re-lated fields of stochastic optimal control dynamic programming and rein-forcement learning Despite many advances the general-purpose methodsthat are guaranteed to converge in a reasonable amount of time to a reason-able answer remain limited to discrete state and action spaces (Bertsekas ampTsitsiklis 1997 Sutton amp Barto 1998 Kushner amp Dupuis 2001) Discretiza-tion methods are well suited for higher-level control problems such as theproblem faced by a rat that has to choose which way to turn in a two-dimensional maze But the main focus in sensorimotor control is on a dif-ferent level of analysis on how the rat chooses a hundred or so gradedmuscle activations at each point in time in a way that causes its body tomove toward the reward without falling or hitting walls Even when themusculoskeletal system is idealized and simplified the state and actionspaces of interest remain continuous and high-dimensional and the curseof dimensionality prevents the use of discretization methods Generaliza-tions of these methods to continuous high-dimensional spaces typicallyinvolve function approximations whose properties are not yet well under-stood Such approximations can produce good enough solutions whichis often acceptable in engineering applications However the success ofa theory of sensorimotor control ultimately depends on its ability to ex-plain data in a principled manner Unless the theoryrsquos predictions are closeto the globally optimal solution of the hypothetical control problem itis difficult to determine whether the (mis)match to experimental data isdue to the general (in)applicability of optimality ideas to biological move-ment or the (in)appropriateness of the specific cost function or the specificapproximationsmdashin both the plant model and the controller designmdashusedto derive the predictions

Accelerated progress will require efficient and well-understood meth-ods for optimal feedback control of stochastic partially observable contin-uous nonstationary and high-dimensional systems The only frameworkthat currently provides such methods is linear-quadratic-gaussian (LQG)control which has been used to model biological systems subject to sen-sory and motor uncertainty (Loeb Levine amp He 1990 Hoff 1992 Kuo1995) While optimal solutions can be obtained efficiently within the LQGsetting (via Riccati equations) this computational efficiency comes at theprice of reduced biological realism because (1) musculoskeletal dynamicsare generally nonlinear (2) behaviorally relevant performance criteria are

1086 E Todorov

unlikely to be globally quadratic (Kording amp Wolpert 2004) and (3) noise inthe sensorimotor apparatus is not additive but signal-dependent The thirdlimitation is particularly problematic because it is becoming increasinglyclear that many robust and extensively studied phenomenamdashsuch as tra-jectory smoothness speed-accuracy trade-offs task-dependent impedancestructured motor variability and synergistic control and cosine tuningmdashare linked to the signal-dependent nature of sensorimotor noise (Harris ampWolpert 1998 Todorov 2002 Todorov amp Jordan 2002b)

It is thus desirable to extend the LQG setting as much as possible andadapt it to the online control and estimation problems that the nervoussystem faces Indeed extensions are possible in each of the three directionslisted above

1 Nonlinear dynamics (and nonquadratic costs) can be approximatedin the vicinity of the expected trajectory generated by an existingcontroller One can then apply modified LQG methodology to theapproximate problem and use it to improve the existing controlleriteratively Differential dynamic programming (Jacobson amp Mayne1970) as well as iterative LQG methods (Li amp Todorov 2004 Todorovamp Li 2004) are based on this general idea In their present formmost such methods assume deterministic dynamics but stochasticextensions are possible (Todorov amp Li 2004)

2 Quadratic costs can be replaced with a parametric family ofexponential-of-quadratic costs for which optimal LQG-like solutionscan be obtained efficiently (Whittle 1990 Bensoussan 1992) The con-trollers that are optimal for such costs range from risk averse (ierobust) through classic LQG to risk seeking This extended family ofcost functions has not yet been explored in the context of biologicalmovement

3 Additive gaussian noise in the plant dynamics can be replaced withmultiplicative noise which is still gaussian but has standard devi-ation proportional to the magnitude of the control signals or statevariables When the state of the plant is fully observable optimalLQG-like solutions can be computed efficiently as shown by severalauthors (Kleinman 1969 McLane 1971 Willems amp Willems 1976Bensoussan 1992 El Ghaoui 1995 Beghi amp DrsquoAlessandro 1998 RamiChen amp Moore 2001) Such methodology has also been used to modelreaching movements (Hoff 1992) Most relevant to the study of sen-sorimotor control however is the partially observable case whichremains an open problem While some work along these lines hasbeen done (Pakshin 1978 Phillis 1985) it has not produced reliablealgorithms that one can use off the shelf in building biologically rele-vant models (see section 9) Our goal here is to address that problemand provide the model-building methodology that is needed

Methods for Optimal Sensorimotor Control 1087

Table 1 List of Notation

xt isin Rm state vector at time step t

ut isin Rp control signal

yt isin Rk sensory observation

n total number of time stepsA B H system dynamics and observation matricesξtωt εt εtηt zero-mean noise termsξ ω ε ε η covariances of noise termsC1 Cc scaling matrices for control-dependent system noiseD1 Dd scaling matrices for state-dependent observation noiseQt R matrices defining state- and control-dependent costsxt state estimateet estimation errort conditional estimation error covariancee

t xt xe

t unconditional covariancesvt optimal cost-to-go functionSx

t Set st parameters of the optimal cost-to-go function

Kt filter gain matricesLt control gain matrices

In this letter we define an extended noise model that reflects the prop-erties of the sensorimotor system derive an efficient algorithm for solvingthe stochastic optimal control and estimation problems under that noisemodel illustrate the application of this extended LQG methodology in thecontext of reaching movements and study the properties of the new algo-rithm through extensive numerical simulations A special case of the al-gorithm derived here has already allowed us (Todorov amp Jordan 2002b)to construct models of a wider range of empirical results than previouslypossible

In section 2 we motivate our extended noise model which includescontrol-dependent state-dependent and internal estimation noise Insection 3 we formalize the problem and restrict the feedback control lawsunder consideration to functions of state estimates that are obtained by un-biased nonadaptive linear filters In section 4 we compute the optimal feed-back control law for any nonadaptive linear filter and show that it is linearin the state estimate In section 5 we derive the optimal nonadaptive linearfilter for any linear control law The two results together provide an iterativecoordinate-descent algorithm (equations 42 and 52) which is guaranteedto converge to a filter and a control law optimal with respect to each otherIn section 6 we illustrate the application of our method to the analysis ofreaching movements In section 7 we explore numerically the convergenceproperties of the algorithm and observe exponential convergence with nolocal minima In section 8 we assess the effects of assuming a nonadap-tive linear filter and find them to be negligible for the control problems ofinterest

Table 1 shows the notation used in this letter

1088 E Todorov

2 Noise Characteristics of the Sensorimotor System

Noise in the motor output is not additive but instead increases with themagnitude of the control signals This is intuitively obvious if you restyour arm on the table it does not bounce around (ie the passive plantdynamics have little noise) but when you make a movement (ie generatecontrol signals) the outcome is not always as desired Quantitatively therelationship between motor noise and control magnitude is surprisinglysimple Such noise has been found to be multiplicative the standard de-viation of muscle force is well fit with a linear function of the mean forcein both static (Sutton amp Sykes 1967 Todorov 2002) and dynamic (SchmidtZelaznick Hawkins Frank amp Quinn 1979) isometric force tasks The exactreasons for this dependence are not entirely clear although it can be ex-plained at least in part with Poisson noise on the neural level combined withHennemanrsquos size principle of motoneuron recruitment (Jones Hamilton ampWolpert 2002) To formalize the empirically established dependence let ube a vector of control signals (corresponding to the muscle activation levelsthat the nervous system attempts to set) and ε be a vector of zero-mean ran-dom numbers A general multiplicative noise model takes the form C(u)εwhere C(u) is a matrix whose elements depend linearly on u To expressa linear relationship between a vector u and a matrix C we make the ithcolumn of C equal to Ci u where Ci are constant scaling matrices Thenwe have C(u)ε = sum

i Ci uεi where εi is the ith component of the randomvector ε

Online movement control relies on feedback from a variety of sensorymodalities with vision and proprioception typically playing the dominantrole Visual noise obviously depends on the retinal position of the objectsof interest and increases with distance away from the fovea (ie eccen-tricity) The accuracy of visual positional estimates is again surprisinglywell modeled with multiplicative noise whose standard deviation is pro-portional to eccentricity This is an instantiation of Weberrsquos law and hasbeen found to be quite robust in a variety of interval discrimination ex-periments (Burbeck amp Yap 1990 Whitaker amp Latham 1997) We have alsoconfirmed this scaling law in a visuomotor setting where subjects pointedto memorized targets presented in the visual periphery (Todorov 1998)Such results motivate the use of a multiplicative observation noise modelof the form D (x)ε = sum

i Di xεi where x is the state of the plant and environ-ment including the current fixation point and the positions and velocities ofrelevant objects Incorporating state-dependent noise in analyses of senso-rimotor control can allow more accurate modeling of the effects of feedbackand various experimental perturbations it also can effectively induce a costfunction over eye movement patterns and allow us to predict the eye move-ments that would result in optimal hand performance (Todorov 1998) Notethat if other forms of state-dependent sensory noise are found the modelcan still be useful as a linear approximation

Methods for Optimal Sensorimotor Control 1089

Intelligent control of a partially observable stochastic plant requiresa feedback control law which is typically a function of a state estimatethat is computed recursively over time In engineering applications theestimation-control loop is implemented in a noiseless digital computer andso all noise is external In models of biological movement we usually makethe same assumption treating all noise as being a property of the muscu-loskeletal plant or the sensory apparatus This is in principle unrealisticbecause neural representations are likely subject to internal fluctuationsthat do not arise in the periphery It is also unrealistic in modeling practiceAn ideal observer model predicts that the estimation error covariance ofany stationary feature of the environment will asymptote to 0 In partic-ular such models predict that if we view a stationary object in the visualperiphery long enough we should eventually know exactly where it is andbe able to reach for it as accurately as if it were at the center of fixation Thiscontradicts our intuition as well as experimental data Both interval dis-crimination experiments and reaching to remembered peripheral targetsexperiments indicate that estimation errors asymptote rather quickly butnot to 0 Instead the asymptote level depends linearly on eccentricity Thesimplest way to model this is to assume another noise process which wecall internal noise acting directly on whatever state estimate the nervoussystem chooses to compute

3 Problem Statement and Assumptions

Consider a linear dynamical system with state xt isin Rm control ut isin R

pfeedback yt isin R

k in discrete time t

Dynamics xt+1 = Axt + But + ξt +csum

i=1

εitCi ut

Feedback yt = Hxt + ωt +dsum

i=1

εit Di xt

Cost per step xTt Qtxt + uT

t Rut

(31)

The feedback signal yt is received after the control signal ut has been gen-erated The initial state has known mean x1 and covariance 1 All matricesare known and have compatible dimensions making them time varyingis straightforward The control cost matrix R is symmetric positive defi-nite (R gt 0) and the state cost matrices Q1 Qn are symmetric positivesemidefinite (Qt ge 0) Each movement lasts n time steps at t = n the finalcost is xT

n Qnxn and un is undefined The independent random variablesξt isin R

m ωt isin Rk εt isin R

c and εt isin Rd have multidimensional gaussian dis-

tributions with mean 0 and covariances ξ ge 0 ω gt 0 ε = I and ε = Irespectively Thus the control-dependent and state-dependent noise termsin equation 31 have covariances

sumi Ci utuT

t CTi and

sumi Di xtxT

t DTi When the

1090 E Todorov

control-dependent noise is meant to be added to the control signal (which isusually the case) the matrices Ci should have the form B Fi where Fi are theactual noise scaling factors Then the control-dependent part of the plantdynamics becomes B(I + sum

i εit Fi )ut

The problem of optimal control is to find the optimal control law thatis the sequence of causal control functions ut(u1 utminus1 y1 ytminus1) thatminimize the expected total cost over the movement Note that computingthe optimal sequence of functions u1(middot) unminus1(middot) is a different and ingeneral much more difficult problem than computing the optimal sequenceof open-loop controls u1 unminus1

When only additive noise is present (ie C1 Cc = 0 and D1 Dd =0) this reduces to the classic LQG problem which has the well-knownoptimal solution (Davis amp Vinter 1985)

Linear-Quadratic Regulator Kalman Filterut = minusLtxt xt+1 = A xt + But + Kt (yt minus H xt)

Lt = (R + BTSt+1 B)minus1 BTSt+1 A Kt = At HT(Ht HT + ω)minus1

St = Qt + ATSt+1(Aminus BLt) t+1 = ξ + (Aminus Kt H) t AT

(32)

In that case the optimal control law depends on the history of control andfeedback signals only through the state estimate xt which is updated recur-sively by the Kalman filter The matrices L that define the optimal controllaw do not depend on the noise covariances or filter coefficients and thematrices K that define the optimal filter do not depend on the cost andcontrol law

In the case of control-dependent and state-dependent noise the aboveindependence properties no longer hold This complicates the problem sub-stantially and forces us to adopt a more restricted formulation in the interestof analytical tractability We assume that as in equation 32 the entire his-tory of control and feedback signals is summarized by a state estimate xt which is all the information available to the control system at time t Thefeedback control law ut(middot) is allowed to be an arbitrary function of xt butxt can be updated only by a recursive linear filter of the form

xt+1 = Axt + But + Kt(yt minus H xt) + ηt

The internal noise ηt isin Rm has mean 0 and covariance η ge 0 The fil-

ter gains K1 Knminus1 are nonadaptive they are determined in advanceand cannot change as a function of the specific controls and observa-tions within a simulation run Such a filter is always unbiased for anyK1 Knminus1 we have E [xt| xt] = xt for all t Note however that underthe extended noise model any nonadaptive linear filter is suboptimalwhen xt is computed as defined above Cov [xt| xt] is generally larger thanCov [xt|u1 utminus1 y1 ytminus1] The consequences of this will be explorednumerically in section 8

Methods for Optimal Sensorimotor Control 1091

4 Optimal Controller

The optimal ut will be computed using the method of dynamic program-ming We will show by induction that if the true state at time t is xt andthe unbiased state estimate available to the control system is xt then theoptimal cost-to-go function (ie the cost expected to accumulate under theoptimal control law) has the quadratic form

vt(xt xt) = xTt Sx

t xt + (xt minus xt)TSet (xt minus xt) + st = xT

t Sxt xt + eT

t Set et + st

where et xt minus xt is the estimation error At the final time t = n the optimalcost-to-go is simply the final cost xT

n Qnxn and so vn is in the assumed formwith Sx

n = Qn Sen = 0 sn = 0 To carry out the induction proof we have to

show that if vt+1 is in the above form for some t lt n then vt is also in thatform

Consider a time-varying control law that is optimal at times t + 1 nand at time t is given by ut = π ( xt) Let vπ

t (xt xt) be the correspondingcost-to-go function Since this control law is optimal after time t we havevπ

t+1 = vt+1 Then the cost-to-go function vπt satisfies the Bellman equation

vπt (xt xt) = xT

t Qtxt + π ( xt)T Rπ ( xt) + E [vt+1(xt+1 xt+1)|xt xt π ]

To compute the above expectation term we need the update equationsfor the system variables Using the definitions of the observation yt andthe estimation error et the stochastic dynamics of the variables of interestbecome

xt+1 = Axt + Bπ ( xt) + ξt +sum

i

εitCiπ ( xt)

et+1 = (Aminus Kt H)et + ξt minus Ktωt minus ηt +sum

i

εitCiπ ( xt) minus

sumi

εit Kt Di xt

(41)

Then the conditional means and covariances of xt+1 and et+1 are

E [xt+1|xt xt π ] = Axt + Bπ ( xt)

E [et+1|xt xt π ] = (Aminus Kt H)et

Cov [xt+1|xt xt π ] = ξ +sum

i

Ciπ ( xt)π ( xt)TCTi

Cov [et+1|xt xt π ] = ξ +sum

i

Ciπ ( xt)π ( xt)TCTi + η

+ Ktω K T

t +sum

i

Kt Di xtxTt DT

i K Tt

1092 E Todorov

and the conditional expectation in the Bellman equation can be computedThe cost-to-go becomes

vπt (xt xt) = xT

t

(Qt + ATSx

t+1 A+ Dt)xt

+ eTt (Aminus Kt H)TSe

t+1(Aminus Kt H)et

+ tr (Mt) + π ( xt)T (R + BTSx

t+1 B + Ct)π ( xt)

+ 2π ( xt)T BTSxt+1 Axt

where we defined the shortcuts

Ct sum

i

CTi

(Se

t+1 + Sxt+1

)Ci

Dt sum

i

DTi K T

t Set+1 Kt Di and

Mt Sxt+1

ξ + Set+1

(ξ + η + Kt

ω K Tt

)

Note that the control law affects only the cost-go-to function through anexpression that is quadratic in π ( xt) which can be minimized analyticallyBut there is a problem the minimum depends on xt while π is only allowedto be a function of xt To obtain the optimal control law at time t we haveto take an expectation over xt conditional on xt and find the function π

that minimizes the resulting expression Note that the control-dependentexpression is linear in xt and so its expectation depends on the conditionalmean of xt but not on any higher moments Since E [xt| xt] = xt we have

E[vπ

t (xt xt)| xt] = const + π ( xt)T

(R + BTSx

t+1 B + Ct)π ( xt)

+ 2π ( xt)T BTSxt+1 A xt

and thus the optimal control law at time t is

ut = π ( xt) = minusLtxt Lt (R + BTSx

t+1 B + Ct)minus1

BTSxt+1 A

Note that the linear form of the optimal control law fell out of the opti-mization and was not assumed Given our assumptions the matrix beinginverted is symmetric positive-definite

To complete the induction proof we have to compute the optimal cost-to-go vt which is equal to vπ

t when π is set to the optimal control law minusLtxt Using the fact that LT

t (R + BTSxt+1 B + Ct)Lt = LT

t BTSxt+1 A = ATSx

t+1 BLt andthat xT Z x minus 2 xT Zx = (x minus x )T Z(x minus x ) minus xT Zx = eT Ze minus xT Zx for a sym-metric matrix Z (in our case equal to LT

t BTSxt+1 A) the result is

vt(xt xt) = xTt

(Qt + ATSx

t+1(Aminus BLt) + Dt)xt + tr (Mt) + st+1

+ eTt

(ATSx

t+1 BLt + (Aminus Kt H)TSet+1(Aminus Kt H)

)et

Methods for Optimal Sensorimotor Control 1093

We now see that the optimal cost-to-go function remains in the assumedquadratic form which completes the induction proof The optimal controllaw is computed recursively backward in time as

Controller ut = minusLtxt

Lt =(

R + BTSxt+1 B +

sumi

CTi

(Sx

t+1 + Set+1

)Ci

)minus1

BTSxt+1 A

Sxt = Qt + ATSx

t+1(Aminus BLt) +sum

i

DTi K T

t Set+1 Kt Di Sx

n = Qn

Set = ATSx

t+1 BLt + (Aminus Kt H)T Set+1 (Aminus Kt H) Se

n = 0

st = tr(Sx

t+1ξ + Se

t+1

(ξ + η + Kt

ω K Tt

)) + st+1 sn = 0

(42)

The total expected cost is x T1 Sx

1 x1 + tr((Sx1 + Se

1 )1) + s1When the control-dependent and state-dependent noise terms are re-

moved (ie C1 Cc = 0 D1 Dd = 0) the control laws given byequation 42 and 32 are identical The internal noise term η as well asthe additive noise terms ξ and ω do not directly affect the calculation ofthe feedback gain matrices L However all noise terms affect the calculation(see below) of the optimal filter gains K which in turn affect L

One can attempt to transform equation 31 into a fully observable systemby setting H = I ω = η = 0 D1 Dd = 0 in which case K = A andapply equation 42 Recall however our assumption that the control signalis generated before the current state is measured Thus even if we makethe sensory measurement equal to the state we would still be dealing witha partially observable system To derive the optimal controller for the fullyobservable case we have to assume that xt is known at the time when ut isgenerated The above derivation is now much simplified the optimal cost-to-go function vt is in the form xT

t Stxt + st and the expectation term thatneeds to be minimized with regard to ut = π (xt) becomes

E [vt+1] = (Axt + But)TSt+1(Axt + But)

+ uTt

(sumi

CTi St+1Ci

)ut + tr [St+1

ξ ] + st+1

and the optimal controller is computed in a backward pass through time as

Fully observable controller ut = minusLtxt

Lt =(

R + BTSt+1 B +sum

i

CTi St+1Ci

)minus1

BTSt+1 A

St = Qt + ATSt+1(Aminus BLt) Sn = Qn

st = tr (St+1ξ ) + st+1 sn = 0

(43)

1094 E Todorov

5 Optimal Estimator

So far we have computed the optimal control law L for any fixed sequenceof filter gains K What should these gains be fixed to Ideally they shouldcorrespond to a Kalman filter which is the optimal linear estimator How-ever in the presence of control-dependent and state-dependent noise theKalman filter gains become adaptive (ie Kt depends on xt and ut) whichwould make our control law derivation invalid Thus if we want to preservethe optimality of the control law given by equation 42 and obtain an iter-ative algorithm with guaranteed convergence we need to compute a fixedsequence of filter gains that are optimal for a given control law Once theiterative algorithm has converged and the control law has been designedwe could use an adaptive filter in place of the fixed-gain filter in run time(see section 8)

Thus our objective here is the following given a linear feedback controllaw L1 Lnminus1 (which is optimal for the previous filter K1 Knminus1)compute a new filter that in conjunction with the given control law resultsin minimal expected cost In other words we will evaluate the filter not bythe magnitude of its estimation errors but by the effect that these estimationerrors have on the performance of the composite estimation-control system

We will show that the new optimal filter can be designed in a forwardpass through time In particular we will show that regardless of the newvalues of K1 Ktminus1 the optimal Kt can be found analytically as long asKt+1 Knminus1 still have the values for which Lt+1 Lnminus1 are optimalRecall that the optimal Lt+1 Lnminus1 depend only on Kt+1 Knminus1 andso the parameters (as well as the form) of the optimal cost-to-go functionvt+1 cannot be affected by changing K1 Kt Since Kt affects only thecomputation of xt+1 and the effect of xt+1 on the total expected cost is cap-tured by the function vt+1 we have to minimize vt+1 with respect to Kt But v is a function of x and x while K cannot be adapted to the specificvalues of x and x within a simulation run (by assumption) Thus thequantity we have to minimize is the unconditional expectation of vt+1 Indoing so we will use that fact that

E [vt+1(xt+1 xt+1)] = Ext xt [E [vt+1(xt+1 xt+1)|xt xt Lt]]

The conditional expectation was already computed as an intermediate stepin the previous section (not shown) The terms in E [vt+1(xt+1 xt+1)|xt xt Lt]that depend on Kt are

eTt (Aminus Kt H)TSe

t+1(Aminus Kt H)et + tr

(Kt

(ω +

sumi

Di xtxTt DT

i

)K T

t Set+1

)

Defining the (uncentered) unconditional covariances et E [eteT

t ] andx

t E [xtxTt ] the unconditional expectation of the Kt-dependent expression

Methods for Optimal Sensorimotor Control 1095

above becomes

a (Kt) = tr((

(Aminus Kt H)et (Aminus Kt H)T + KtPt K T

t

)Se

t+1

)Pt ω +

sumi

Dixt DT

i

The minimum of a (Kt) is found by setting its derivative with regard to Kt to0 Using the matrix identities part

part X tr (XU) = UT and partpart X tr

(XU XTV

) = VXU +VT XUT and the fact that the matrices Se

t+1 ω e

t xt are symmetric we

obtainparta (Kt)partKt

= 2Set+1

(Kt

(He

t HT + Pt) minus Ae

t HT)

This expression is equal to 0 whenever Kt = Aet HT(He

t HT + Pt)minus1 re-gardless of the value of Se

t+1 Given our assumptions the matrix being in-verted is symmetric positive-definite Note that the optimal Kt dependson K1 Ktminus1 (through e

t and xt ) but is independent of Kt+1 Knminus1

(since it is independent of Set+1) This is the reason that the filter gains are

reoptimized in a forward passTo complete the derivation we have to substitute the optimal filter

gains and compute the unconditional covariances Recall that the variablesxt xt et are deterministically related by et = xt minus xt so the covariance ofany one of them can be computed given the covariances of the other twoand we have a choice of which pair of covariance matrices to computeThe resulting equations are most compact for the pair xt et The stochasticdynamics of these variables are

xt+1 = (Aminus BLt) xt + Kt Het + Ktωt + ηt +sum

i

εit Kt Di (et + xt)

et+1 = (Aminus Kt H)et + ξt minus Ktωt minus ηt minussum

i

εitCi Ltxt (51)

minussum

i

εit Kt Di (et + xt)

Define the unconditional covariances

et E

[eteT

t

] xt E

[xtxT

t

] xet E

[xteT

t

]

noting that xt is uncentered and ex

t = (xet )T Since x1 is a known constant

the initialization at t = 1 is e1 = 1 x

1 = x1xT1 xe

1 = 0 With these defi-nitions we have x

t = E [(et + xt)(et + xt)T] = et + x

t + xet + xeT

t Usingequation 51 the updates for the unconditional covariances are

et+1 = (Aminus Kt H)e

t (Aminus Kt H)T + ξ + η + KtPt K Tt

+sum

i

Ci Ltxt LT

t CTi

1096 E Todorov

xt+1 = (Aminus BLt)x

t (Aminus BLt)T + η + Kt(He

t HT + Pt)K T

t

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T

xet+1 = (Aminus BLt)xe

t (Aminus Kt H)T + Kt Het (Aminus Kt H)T

minus η minus KtPt K Tt

Substituting the optimal value of Kt which allows some simplifications tothe above update equations the optimal nonadaptive linear filter is com-puted in a forward pass through time as

Estimator xt+1 = (Aminus BLt) xt + Kt(yt minus H xt) + ηt

Kt = Aet HT

(He

t HT + ω +sum

i

Di(e

t + xt + xe

t + ext

)DT

i

)minus1

et+1 = ξ + η + (Aminus Kt H)e

t AT +sum

i

Ci Ltxt LT

t CTi e

1 = 1

xt+1 = η + Kt He

t AT + (Aminus BLt)xt (Aminus BLt)T

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T x

1 = x1xT1

xet+1 = (Aminus BLt) xe

t (Aminus Kt H)T minus η xe1 = 0

(52)It is worth noting the effects of the internal noise ηt If that term did not

exist (ie η = 0) the last update equation would yield xet = 0 for all t

Indeed for an optimal filter one would expect xet = 0 from the orthogo-

nality principle if the state estimate and estimation error were correlatedone could improve the filter by taking that correlation into account How-ever the situation here is different because we have noise acting directlyon the state estimate When such noise pushes xt in one direction et is (bydefinition) pushed in the opposite direction creating a negative correlationbetween xt and et This is the reason for the negative sign in front of the η

term in the last update equationThe complete algorithm is the followingAlgorithmInitialize K1 Knminus1 and iterate equation 42 and equation 52 until

convergence Convergence is guaranteed because the expected cost is non-negative by definition and we are using a coordinate-descent algorithmwhich decreases the expected cost in each step The initial sequence K couldbe set to 0mdashin which case the first pass of equation 42 will find the optimalopen-loop controls or initialized from equation 32mdashwhich is equivalent toassuming additive noise in the first pass

We can also derive the optimal adaptive linear filter with gains Kt thatdepend on the specific xt and ut = minusLtxt within each simulation run This isagain accomplished by minimizing E [vt+1] with respect to Kt but the expec-tation is computed with xt being a known constant rather than a randomvariable We now have x

t = xtxTt and xe

t = 0 and so the last two update

Methods for Optimal Sensorimotor Control 1097

equations in equation 52 are no longer needed The optimal adaptive linearfilter is

Adaptive estimator xt+1 = (Aminus BLt) xt + Kt (yt minus H xt) + ηt

Kt = At HT

(Ht HT + ω +

sumi

Di(t + xt xT

t

)DT

i

)minus1

t+1 = ξ + η + (Aminus Kt H) t AT +sum

i

Ci Ltxt xTt LT

t CTi

(53)

where t = Cov [xt| xt] is the conditional estimation error covariance(initialized from 1 which is given) When the control-dependentstate-dependent and internal noise terms are removed (C1 Cc = 0D1 Dd = 0 η = 0) equation 53 reduces to the Kalman filter in equa-tion 32 Note that using equation 53 instead of equation 52 online reducesthe total expected cost because equation 53 achieves lower estimation errorthan any other linear filter and the expected cost depends on the conditionalestimation error covariance This can be seen from

E[vt(xt xt)| xt] = xTt Sx

t xt + st + tr((

Sxt + Se

t

)Cov[xt| xt]

)6 Application to Reaching Movements

We now illustrate how the methodology developed above can be used toconstruct models relevant to motor control Since this is a methodologi-cal rather than a modeling article a detailed evaluation of the resultingmodels in the context of the motor control literature will not be givenhere The first model is a one-dimensional model of reaching and in-cludes control-dependent noise but no state-dependent or internal noiseThe latter two forms of noise are illustrated in the second model wherewe estimate the position of a stationary peripheral target without making amovement

61 Models We model a single-joint movement (such as flexing the el-bow) that brings the hand to a specified target For simplicity the rotationalmotion is replaced with translational motion the hand is modeled as a pointmass (m = 1 kg) whose one-dimensional position at time t is p(t) The com-bined action of all muscles is represented with the force f (t) acting on thehand The control signal u(t) is transformed into force f (t) by adding control-dependent multiplicative noise and applying a second-order muscle-likelow-pass filter (Winter 1990) of the form τ1τ2 f (t) + (τ1 + τ2) f(t) + f (t) =u(t) with time constants τ1 = τ2 = 004 sec Note that a second-order filtercan be written as a pair of coupled first-order filters (with outputs g and f )as follows τ1g(t) + g(t) = u(t) τ2 f (t) + f (t) = g(t)

The task is to move the hand from the starting position p(0) = 0 m to thetarget position plowast = 01 m and stop there at time tend with minimal energy

1098 E Todorov

consumption Movement durations are in the interval tend isin [025 sec035 sec] Time is discretized at = 001 sec The total cost is defined as

(p(tend) minus plowast)2 + (wv p(tend))2 + (w f f (tend))2 + rn minus 1

nminus1sumk=1

u(k)2

The first term enforces positional accuracy the second and third termsspecify that the movement has to stop at time tend that is both the velocityand force have to vanish and the last term penalizes energy consumptionIt makes sense to set the scaling weights wv and w f so that wv p(t) andw f f (t) averaged over the movement have magnitudes similar to the handdisplacement plowast minus p(0) For a 01 m reaching movement that lasts about03 sec these weights are wv = 02 and w f = 002 The weight of the energyterm was set to r = 000001

The discrete-time system state is represented with the five-dimensionalvector

xt = [p(t) p(t) f (t) g(t) plowast]

initialized from a gaussian with mean x1 = [0 0 0 0 plowast] The auxiliarystate variable g(t) is needed to implement a second-order filter The tar-get plowast is included in the state so that we can capture the above cost func-tion using a quadratic with no linear terms defining p = [1 0 0 0 minus1]we have pTxt = p(tend) minus plowast and so xT

t (ppT)xt = (p(tend) minus plowast)2 Note thatthe same could be accomplished by setting p = [1 0 0 0 minusplowast] and xt =[p(t) p(t) f (t) g(t) 1] The advantage of the formulation used here is thatbecause the target is represented in the state the same control law can bereused for other targets The control law of course depends on the filterwhich depends on the initial expected state which depends on the targetmdashand so a control law optimal for one target is not necessarily optimal for allother targets Unpublished simulation results indicate good generalizationbut a more detailed investigation of how the optimal control law dependson the target position is needed

The sensory feedback carries information about position velocity andforce

yt = [p(t) p(t) f (t)] + ωt

The vector ωt of sensory noise terms has zero-mean gaussian distributionwith diagonal covariance

ω = (σsdiag[002 m 02 ms 1 N])2

where the relative magnitudes are set using the same order-of-magnitudereasoning as before and σs = 05 The multiplicative noise term added tothe discrete-time control signal ut = u(t) is σcεtut where σc = 05 Note that

Methods for Optimal Sensorimotor Control 1099

σc is a unitless quantity that defines the noise magnitude relative to thecontrol signal magnitude

The discrete-time dynamics of the above system are

p(t + ) = p(t) + p(t)

p(t + ) = p(t) + f (t)m

f (t + ) = f (t) (1 minus τ2) + g(t)τ2

g(t + ) = g(t) (1 minus τ1) + u(t) (1 + σcεt) τ1

which is transformed into the form of equation 31 by the matrices

A =

1 0 0 0

0 1 m 0 0

0 0 1 minus τ2 τ2 0

0 0 0 1 minus τ1 0

0 0 0 0 1

B =

0

0

0

τ1

0

H =

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

C1 = Bσc c = 1 d = 0

1 = ξ = η = 0

The cost matrices are R = r Q1nminus1 = 0 and Qn = ppT + vvT + ffT where

p = [1 0 0 0 minus1] v = [0 wv 0 0 0] f = [0 0 w f 0 0]

This completes the formulation of the first model The above algorithmcan now be applied to obtain the control law and filter and the closed-loopsystem can be simulated To replace the control-dependent noise with ad-ditive noise of similar magnitude (and compare the effects of the two formsof noise) we will set c = 0 and ξ = (46 N)2 B BT The value of 46 N is theaverage magnitude of the control-dependent noise over the range of move-ment durations (found through 10000 simulation runs at each movementduration)

We also model an estimation process under state-dependent and inter-nal noise in the absence of movement In that case the state is xt = plowastwhere the stationary target plowast is sampled from a gaussian with mean x1 isin5 cm 15 cm 25 cm and variance 1 = (5 cm)2 Note that target eccentric-ity is represented as distance rather than visual angle The state-dependentnoise has scale D1 = 05 fixation is assumed to be at 0 cm the time step is = 10 msec and we run the estimation process for n = 100 time steps Inone set of simulations we use internal noise η = (05 cm)2 without addi-tive noise In another set of simulations we study additive noise with thesame magnitude ω = (05 cm)2 without internal noise There is no actu-ator to be controlled so we have A = H = 1 and B = L = 0 Estimation isbased on the adaptive filter from equation 53

1100 E Todorov

62 Results Reaching movements are known to have stereotyped bell-shaped speed profiles (Flash amp Hogan 1985) Models of this phenomenonhave traditionally been formulated in terms of deterministic open-loop min-imization of some cost function Cost functions that penalize physicallymeaningful quantities (such as duration or energy consumption) did notagree with empirical data (Nelson 1983) in order to obtain realistic speedprofiles it appeared necessary to minimize a smoothness-related cost thatpenalizes the derivative of acceleration (Flash amp Hogan 1985) or torque(Uno et al 1989) Smoothness-related cost functions have also been usedin the context of stochastic optimal feedback control (Hoff 1992) to obtainbell-shaped speed profiles It was recently shown however that smoothnessdoes not have to be explicitly enforced by the cost function open-loop min-imization of end-point error was found sufficient to produce realistic trajec-tories provided that the multiplicative nature of motor noise is taken intoaccount (Harris amp Wolpert 1998) While this is an important step toward amore principled optimization model of trajectory smoothness it still con-tains an ad hoc element the optimization is performed in an open loopwhich is suboptimal especially for movements of longer duration Ourmodel differs from Harris and Wolpert (1998) in that not only the averagesequence of control signals is optimal but the feedback gains that determinethe online sensory-guided adjustments are also optimal Optimal feedbackcontrol of reaching has been studied by Meyer Abrams Kornblum Wrightand Smith (1988) in an intermittent setting and Hoff (1992) in a continu-ous setting However both of these models assume full state observationOurs is the first optimal control model of reaching that incorporates sensorynoise and combines state estimation and feedback control into an optimalsensorimotor loop The predicted movement kinematics shown in Figure 1Aclosely resemble observed movement trajectories (Flash amp Hogan 1985)

Another well-known property of reaching movements first observed acentury ago by Woodworth and later quantified as Fittsrsquo law is the trade-offbetween speed and accuracy The fact that faster movements are less ac-curate implies that the instantaneous noise in the motor system is control-dependent in agreement with direct measurements of isometric force fluc-tuations (Sutton and Sykes 1967 Schmidt et al 1979 Todorov 2002) thatshow standard deviation increasing linearly with the mean Naturally thisnoise scaling has formed the basis of both closed-loop (Meyer et al 1988Hoff 1992) and open-loop (Harris amp Wolpert 1998) optimization models ofthe speed-accuracy trade-off Figure 1B illustrates the effect in our model asthe (specified) movement duration increases the standard deviation of theend-point error achieved by the optimal controller decreases To emphasizethe need for incorporating control-dependent noise we modified the modelby making the noise in the plant dynamics additive with fixed magnitudechosen to match the average multiplicative noise magnitude over the rangeof movement durations With that change the end-point error showed theopposite trend to the one observed experimentally (see Figure 1B)

Methods for Optimal Sensorimotor Control 1101

Figure 1 (A) Normalized position (Pos) velocity (Vel) and acceleration (Acc) ofthe average trajectory of the optimal controller (B) A separate optimal controllerwas constructed for each instructed duration the resulting closed-loop systemwas simulated for 10000 trials and the positional standard deviation at the endof the trial was plotted This was done with either multiplicative (solid line)or additive (dashed line) noise in the plant dynamics (C) The position of astationary peripheral target was estimated over time under internal estimationnoise (solid line) or additive observation noise (dashed line) This was done inthree sets of trials with target positions sampled from gaussians with means5 cm (bottom) 15 cm (middle) and 25 cm (top) Each curve is an average over10000 simulation runs

It is interesting to compare the effects of the control penalty r and the mul-tiplicative noise scaling σc As equation 42 shows both terms penalize largecontrol signalsmdashdirectly in the case of r and indirectly (via increased un-certainty) in the case of σc Consequently both terms lead to a negative biasin end-point position (not shown) but the effect is much more pronouncedfor r Another consequence of the fact that larger controls are more costlyarises in the control of redundant systems where the optimal strategy is tofollow a minimal intervention principle that is to leave task-irrelevant de-viations from the average behavior uncorrected (Todorov amp Jordan 2002a2002b) Simulations have shown that this more complex effect is dependenton σc and actually decreases when r is increased while σc is kept constant(Todorov amp Jordan 2002b)

Figure 1C shows simulation results from our second model where the po-sition of a stationary peripheral target is estimated by the optimal adaptivefilter in equation 53 operating under internal estimation noise or additiveobservation noise of the same magnitude In each case we show results forthree sets of targets with varying average eccentricity The standard devi-ations of the estimation error always reach an asymptote (much faster inthe case of internal noise) In the presence of internal noise this asymptotedepends on target eccentricity for the chosen model parameters the depen-dence is in quantitative agreement with our experimental results (Todorov1998) Under additive noise the error always asymptotes to 0

1102 E Todorov

Figure 2 Relative change in expected cost as a function of iteration numberin (A) psychophysical models and (B) random models (C) Relative variabil-ity (SDmean) among expected costs obtained from 100 different runs of thealgorithm on the same model (average over models in each class)

7 Convergence Properties

We studied the convergence properties of the algorithm in 10 models of psy-chophysical experiments taken from Todorov and Jordan (2002b) and 200randomly generated models The psychophysical models had dynamics andcost functions similar to the above example They included two models ofplanar reaching three models of passing through sequences of targets onemodel of isometric force production three models of tracking and reach-ing with a mechanically redundant arm and one model of throwing Thedimensionalities of the state control and feedback were between 5 and20 and the horizon n was about 100 The psychophysical models includedcontrol-dependent dynamics noise and additive observation noise but nointernal or state-dependent noise The details of all these models are inter-esting from a motor control point of view but we omit them here since theydid not affect the convergence of the algorithm in any systematic way

The random models were divided into two groups of 100 each passivelystable with all eigenvalues of A being smaller than 1 and passively unsta-ble with the largest eigenvalue of A being between 1 and 2 The dynamicswere restricted so that the last component of xt was 1mdashto make the randommodels more similar to the psychophysical models which always incorpo-rated a constant in the state description The state control and measurementdimensionalities were sampled uniformly between 5 and 20 The randommodels included all forms of noise allowed by equation 31

For each model we initialized K1nminus1 from equation 32 and applied ouriterative algorithm In all cases convergence was very rapid (see Figures 2Aand 2B) with the relative change in expected cost decreasing exponentiallyThe jitter observed at the end of the minimization (see Figure 2A) is due tonumerical round-off errors (note the log scale) and continues indefinitelyThe exponential convergence regime does not always start from the firstiteration (see Figure 2A) Similar behavior was observed for the absolute

Methods for Optimal Sensorimotor Control 1103

change in expected cost (not shown) As one would expect random mod-els with unstable passive dynamics converged more slowly than passivelystable models Convergence was observed in all cases

To test for the existence of local minima we focused on five psychophys-ical five random stable and five random unstable models For each modelthe algorithm was initialized 100 times with different randomly chosensequences K1nminus1 and run for 100 iterations For each model we com-puted the standard deviation of the expected cost obtained at each iterationand divided by the mean expected cost at that iteration The results aver-aged within each model class are plotted in Figure 2C The negligibly smallvalues after convergence indicate that the algorithm always finds the samesolution This was true for every model we studied despite the fact thatthe random initialization sometimes produced very large initial costs Wealso examined the K and L sequences found at the end of each run and thedifferences seemed to be due to round-off errors Thus we conjecture thatthe algorithm always converges to the globally optimal solution So far wehave not been able to prove this analytically and cannot offer a satisfyingintuitive explanation at this time

Note that the system can be unstable even for the optimal controllerFormally that does not affect the derivation because in a discrete-time finite-horizon system all numbers remain finite In practice the components of xt

can exceed the maximum floating-point number whenever the eigenvaluesof (Aminus BLt) are sufficiently large In the applications we are interestedin (Todorov 1998 Todorov amp Jordan 2002b) such problems were neverencountered

8 Improving Performance via Adaptive Estimation

Although the iterative algorithm given by equations 42 and 52 is guar-anteed to converge and empirically it appears to converge to the globallyoptimal solution performance can still be suboptimal due to the imposedrestriction to nonadaptive filters Here we present simulations aimed atquantifying this suboptimality

Because the potential suboptimality arises from the restriction to non-adaptive filters it is natural to ask what would happen if that restriction wereremoved in run time and the optimal adaptive linear filter from equation 53were used instead Recall that although the control law is optimized underthe assumption of a nonadaptive filter it yields better performance if a dif-ferent filter which somehow achieves lower estimation error is used in runtime Thus in our first test we simply replace the nonadaptive filter withequation 53 in run time and compute the reduction in expected total cost

The above discussion suggests a possibility for further improvement Thecontrol law is optimal with respect to some sequence of filter gains K1nminus1But the adaptive filter applied in run time uses systematically differentgains because it achieves systematically lower estimation error We can run

1104 E Todorov

Table 2 Cost Reduction

Model

Method Psychophysical Random Stable Random Unstable

Adaptive Estimator 19 0 314 Reoptimized Controller 17 0 283

Notes Numbers indicate percent reduction in expected total cost relative to thecost of the solution found by our iterative algorithm The two improvement meth-ods are described in the text Each method is applied to 10 models in each modelclass For each model and method expected total cost is computed from 10000simulation runs A value of 0 indicates that with a sample size of 10 models theimprovement was not significantly different from 0 (t-test p = 005 threshold)

our control law in conjunction with the adaptive filter and find the averagefilter gains K1nminus1 that are used online Now one would think that if wereoptimized the control law for the nonadaptive filter K1nminus1 which betterreflects the gains being used online by the adaptive filter this will furtherimprove performance This is the second test we apply

As Table 2 shows neither method improves performance substantiallyfor psychophysical models or random stable models However both meth-ods result in substantial improvement for random unstable models This isnot surprising In the passively stable models the differences between theexpected and actual values of the states and controls are relatively smalland so the optimal nonadaptive filter is not that different from the optimaladaptive filter The unstable models on the other hand are very sensitive tosmall perturbations and thus follow substantially different state-control tra-jectories in different simulation runs So the advantage of adaptive filteringis much greater Since musculoskeletal plants have stable passive dynamicswe conclude that our algorithm is well suited for approximating the optimalsensorimotor system

It is interesting that control law reoptimization in addition to adaptivefiltering is actually worse than adaptive filtering alonemdashcontrary to our in-tuition This was the case for every model we studied Although it is notclear where the problem with the reoptimization method lies this some-what unexpected result provides further justification for the restriction weintroduced In particular it suggests that the control law that is optimalunder the best nonadaptive filter may be close to optimal under the bestadaptive filter

9 Discussion

We have presented an algorithm for stochastic optimal control and estima-tion of partially observable linear dynamical systems subject to quadraticcosts and noise processes characteristic of the sensorimotor system (see

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 3: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

1086 E Todorov

unlikely to be globally quadratic (Kording amp Wolpert 2004) and (3) noise inthe sensorimotor apparatus is not additive but signal-dependent The thirdlimitation is particularly problematic because it is becoming increasinglyclear that many robust and extensively studied phenomenamdashsuch as tra-jectory smoothness speed-accuracy trade-offs task-dependent impedancestructured motor variability and synergistic control and cosine tuningmdashare linked to the signal-dependent nature of sensorimotor noise (Harris ampWolpert 1998 Todorov 2002 Todorov amp Jordan 2002b)

It is thus desirable to extend the LQG setting as much as possible andadapt it to the online control and estimation problems that the nervoussystem faces Indeed extensions are possible in each of the three directionslisted above

1 Nonlinear dynamics (and nonquadratic costs) can be approximatedin the vicinity of the expected trajectory generated by an existingcontroller One can then apply modified LQG methodology to theapproximate problem and use it to improve the existing controlleriteratively Differential dynamic programming (Jacobson amp Mayne1970) as well as iterative LQG methods (Li amp Todorov 2004 Todorovamp Li 2004) are based on this general idea In their present formmost such methods assume deterministic dynamics but stochasticextensions are possible (Todorov amp Li 2004)

2 Quadratic costs can be replaced with a parametric family ofexponential-of-quadratic costs for which optimal LQG-like solutionscan be obtained efficiently (Whittle 1990 Bensoussan 1992) The con-trollers that are optimal for such costs range from risk averse (ierobust) through classic LQG to risk seeking This extended family ofcost functions has not yet been explored in the context of biologicalmovement

3 Additive gaussian noise in the plant dynamics can be replaced withmultiplicative noise which is still gaussian but has standard devi-ation proportional to the magnitude of the control signals or statevariables When the state of the plant is fully observable optimalLQG-like solutions can be computed efficiently as shown by severalauthors (Kleinman 1969 McLane 1971 Willems amp Willems 1976Bensoussan 1992 El Ghaoui 1995 Beghi amp DrsquoAlessandro 1998 RamiChen amp Moore 2001) Such methodology has also been used to modelreaching movements (Hoff 1992) Most relevant to the study of sen-sorimotor control however is the partially observable case whichremains an open problem While some work along these lines hasbeen done (Pakshin 1978 Phillis 1985) it has not produced reliablealgorithms that one can use off the shelf in building biologically rele-vant models (see section 9) Our goal here is to address that problemand provide the model-building methodology that is needed

Methods for Optimal Sensorimotor Control 1087

Table 1 List of Notation

xt isin Rm state vector at time step t

ut isin Rp control signal

yt isin Rk sensory observation

n total number of time stepsA B H system dynamics and observation matricesξtωt εt εtηt zero-mean noise termsξ ω ε ε η covariances of noise termsC1 Cc scaling matrices for control-dependent system noiseD1 Dd scaling matrices for state-dependent observation noiseQt R matrices defining state- and control-dependent costsxt state estimateet estimation errort conditional estimation error covariancee

t xt xe

t unconditional covariancesvt optimal cost-to-go functionSx

t Set st parameters of the optimal cost-to-go function

Kt filter gain matricesLt control gain matrices

In this letter we define an extended noise model that reflects the prop-erties of the sensorimotor system derive an efficient algorithm for solvingthe stochastic optimal control and estimation problems under that noisemodel illustrate the application of this extended LQG methodology in thecontext of reaching movements and study the properties of the new algo-rithm through extensive numerical simulations A special case of the al-gorithm derived here has already allowed us (Todorov amp Jordan 2002b)to construct models of a wider range of empirical results than previouslypossible

In section 2 we motivate our extended noise model which includescontrol-dependent state-dependent and internal estimation noise Insection 3 we formalize the problem and restrict the feedback control lawsunder consideration to functions of state estimates that are obtained by un-biased nonadaptive linear filters In section 4 we compute the optimal feed-back control law for any nonadaptive linear filter and show that it is linearin the state estimate In section 5 we derive the optimal nonadaptive linearfilter for any linear control law The two results together provide an iterativecoordinate-descent algorithm (equations 42 and 52) which is guaranteedto converge to a filter and a control law optimal with respect to each otherIn section 6 we illustrate the application of our method to the analysis ofreaching movements In section 7 we explore numerically the convergenceproperties of the algorithm and observe exponential convergence with nolocal minima In section 8 we assess the effects of assuming a nonadap-tive linear filter and find them to be negligible for the control problems ofinterest

Table 1 shows the notation used in this letter

1088 E Todorov

2 Noise Characteristics of the Sensorimotor System

Noise in the motor output is not additive but instead increases with themagnitude of the control signals This is intuitively obvious if you restyour arm on the table it does not bounce around (ie the passive plantdynamics have little noise) but when you make a movement (ie generatecontrol signals) the outcome is not always as desired Quantitatively therelationship between motor noise and control magnitude is surprisinglysimple Such noise has been found to be multiplicative the standard de-viation of muscle force is well fit with a linear function of the mean forcein both static (Sutton amp Sykes 1967 Todorov 2002) and dynamic (SchmidtZelaznick Hawkins Frank amp Quinn 1979) isometric force tasks The exactreasons for this dependence are not entirely clear although it can be ex-plained at least in part with Poisson noise on the neural level combined withHennemanrsquos size principle of motoneuron recruitment (Jones Hamilton ampWolpert 2002) To formalize the empirically established dependence let ube a vector of control signals (corresponding to the muscle activation levelsthat the nervous system attempts to set) and ε be a vector of zero-mean ran-dom numbers A general multiplicative noise model takes the form C(u)εwhere C(u) is a matrix whose elements depend linearly on u To expressa linear relationship between a vector u and a matrix C we make the ithcolumn of C equal to Ci u where Ci are constant scaling matrices Thenwe have C(u)ε = sum

i Ci uεi where εi is the ith component of the randomvector ε

Online movement control relies on feedback from a variety of sensorymodalities with vision and proprioception typically playing the dominantrole Visual noise obviously depends on the retinal position of the objectsof interest and increases with distance away from the fovea (ie eccen-tricity) The accuracy of visual positional estimates is again surprisinglywell modeled with multiplicative noise whose standard deviation is pro-portional to eccentricity This is an instantiation of Weberrsquos law and hasbeen found to be quite robust in a variety of interval discrimination ex-periments (Burbeck amp Yap 1990 Whitaker amp Latham 1997) We have alsoconfirmed this scaling law in a visuomotor setting where subjects pointedto memorized targets presented in the visual periphery (Todorov 1998)Such results motivate the use of a multiplicative observation noise modelof the form D (x)ε = sum

i Di xεi where x is the state of the plant and environ-ment including the current fixation point and the positions and velocities ofrelevant objects Incorporating state-dependent noise in analyses of senso-rimotor control can allow more accurate modeling of the effects of feedbackand various experimental perturbations it also can effectively induce a costfunction over eye movement patterns and allow us to predict the eye move-ments that would result in optimal hand performance (Todorov 1998) Notethat if other forms of state-dependent sensory noise are found the modelcan still be useful as a linear approximation

Methods for Optimal Sensorimotor Control 1089

Intelligent control of a partially observable stochastic plant requiresa feedback control law which is typically a function of a state estimatethat is computed recursively over time In engineering applications theestimation-control loop is implemented in a noiseless digital computer andso all noise is external In models of biological movement we usually makethe same assumption treating all noise as being a property of the muscu-loskeletal plant or the sensory apparatus This is in principle unrealisticbecause neural representations are likely subject to internal fluctuationsthat do not arise in the periphery It is also unrealistic in modeling practiceAn ideal observer model predicts that the estimation error covariance ofany stationary feature of the environment will asymptote to 0 In partic-ular such models predict that if we view a stationary object in the visualperiphery long enough we should eventually know exactly where it is andbe able to reach for it as accurately as if it were at the center of fixation Thiscontradicts our intuition as well as experimental data Both interval dis-crimination experiments and reaching to remembered peripheral targetsexperiments indicate that estimation errors asymptote rather quickly butnot to 0 Instead the asymptote level depends linearly on eccentricity Thesimplest way to model this is to assume another noise process which wecall internal noise acting directly on whatever state estimate the nervoussystem chooses to compute

3 Problem Statement and Assumptions

Consider a linear dynamical system with state xt isin Rm control ut isin R

pfeedback yt isin R

k in discrete time t

Dynamics xt+1 = Axt + But + ξt +csum

i=1

εitCi ut

Feedback yt = Hxt + ωt +dsum

i=1

εit Di xt

Cost per step xTt Qtxt + uT

t Rut

(31)

The feedback signal yt is received after the control signal ut has been gen-erated The initial state has known mean x1 and covariance 1 All matricesare known and have compatible dimensions making them time varyingis straightforward The control cost matrix R is symmetric positive defi-nite (R gt 0) and the state cost matrices Q1 Qn are symmetric positivesemidefinite (Qt ge 0) Each movement lasts n time steps at t = n the finalcost is xT

n Qnxn and un is undefined The independent random variablesξt isin R

m ωt isin Rk εt isin R

c and εt isin Rd have multidimensional gaussian dis-

tributions with mean 0 and covariances ξ ge 0 ω gt 0 ε = I and ε = Irespectively Thus the control-dependent and state-dependent noise termsin equation 31 have covariances

sumi Ci utuT

t CTi and

sumi Di xtxT

t DTi When the

1090 E Todorov

control-dependent noise is meant to be added to the control signal (which isusually the case) the matrices Ci should have the form B Fi where Fi are theactual noise scaling factors Then the control-dependent part of the plantdynamics becomes B(I + sum

i εit Fi )ut

The problem of optimal control is to find the optimal control law thatis the sequence of causal control functions ut(u1 utminus1 y1 ytminus1) thatminimize the expected total cost over the movement Note that computingthe optimal sequence of functions u1(middot) unminus1(middot) is a different and ingeneral much more difficult problem than computing the optimal sequenceof open-loop controls u1 unminus1

When only additive noise is present (ie C1 Cc = 0 and D1 Dd =0) this reduces to the classic LQG problem which has the well-knownoptimal solution (Davis amp Vinter 1985)

Linear-Quadratic Regulator Kalman Filterut = minusLtxt xt+1 = A xt + But + Kt (yt minus H xt)

Lt = (R + BTSt+1 B)minus1 BTSt+1 A Kt = At HT(Ht HT + ω)minus1

St = Qt + ATSt+1(Aminus BLt) t+1 = ξ + (Aminus Kt H) t AT

(32)

In that case the optimal control law depends on the history of control andfeedback signals only through the state estimate xt which is updated recur-sively by the Kalman filter The matrices L that define the optimal controllaw do not depend on the noise covariances or filter coefficients and thematrices K that define the optimal filter do not depend on the cost andcontrol law

In the case of control-dependent and state-dependent noise the aboveindependence properties no longer hold This complicates the problem sub-stantially and forces us to adopt a more restricted formulation in the interestof analytical tractability We assume that as in equation 32 the entire his-tory of control and feedback signals is summarized by a state estimate xt which is all the information available to the control system at time t Thefeedback control law ut(middot) is allowed to be an arbitrary function of xt butxt can be updated only by a recursive linear filter of the form

xt+1 = Axt + But + Kt(yt minus H xt) + ηt

The internal noise ηt isin Rm has mean 0 and covariance η ge 0 The fil-

ter gains K1 Knminus1 are nonadaptive they are determined in advanceand cannot change as a function of the specific controls and observa-tions within a simulation run Such a filter is always unbiased for anyK1 Knminus1 we have E [xt| xt] = xt for all t Note however that underthe extended noise model any nonadaptive linear filter is suboptimalwhen xt is computed as defined above Cov [xt| xt] is generally larger thanCov [xt|u1 utminus1 y1 ytminus1] The consequences of this will be explorednumerically in section 8

Methods for Optimal Sensorimotor Control 1091

4 Optimal Controller

The optimal ut will be computed using the method of dynamic program-ming We will show by induction that if the true state at time t is xt andthe unbiased state estimate available to the control system is xt then theoptimal cost-to-go function (ie the cost expected to accumulate under theoptimal control law) has the quadratic form

vt(xt xt) = xTt Sx

t xt + (xt minus xt)TSet (xt minus xt) + st = xT

t Sxt xt + eT

t Set et + st

where et xt minus xt is the estimation error At the final time t = n the optimalcost-to-go is simply the final cost xT

n Qnxn and so vn is in the assumed formwith Sx

n = Qn Sen = 0 sn = 0 To carry out the induction proof we have to

show that if vt+1 is in the above form for some t lt n then vt is also in thatform

Consider a time-varying control law that is optimal at times t + 1 nand at time t is given by ut = π ( xt) Let vπ

t (xt xt) be the correspondingcost-to-go function Since this control law is optimal after time t we havevπ

t+1 = vt+1 Then the cost-to-go function vπt satisfies the Bellman equation

vπt (xt xt) = xT

t Qtxt + π ( xt)T Rπ ( xt) + E [vt+1(xt+1 xt+1)|xt xt π ]

To compute the above expectation term we need the update equationsfor the system variables Using the definitions of the observation yt andthe estimation error et the stochastic dynamics of the variables of interestbecome

xt+1 = Axt + Bπ ( xt) + ξt +sum

i

εitCiπ ( xt)

et+1 = (Aminus Kt H)et + ξt minus Ktωt minus ηt +sum

i

εitCiπ ( xt) minus

sumi

εit Kt Di xt

(41)

Then the conditional means and covariances of xt+1 and et+1 are

E [xt+1|xt xt π ] = Axt + Bπ ( xt)

E [et+1|xt xt π ] = (Aminus Kt H)et

Cov [xt+1|xt xt π ] = ξ +sum

i

Ciπ ( xt)π ( xt)TCTi

Cov [et+1|xt xt π ] = ξ +sum

i

Ciπ ( xt)π ( xt)TCTi + η

+ Ktω K T

t +sum

i

Kt Di xtxTt DT

i K Tt

1092 E Todorov

and the conditional expectation in the Bellman equation can be computedThe cost-to-go becomes

vπt (xt xt) = xT

t

(Qt + ATSx

t+1 A+ Dt)xt

+ eTt (Aminus Kt H)TSe

t+1(Aminus Kt H)et

+ tr (Mt) + π ( xt)T (R + BTSx

t+1 B + Ct)π ( xt)

+ 2π ( xt)T BTSxt+1 Axt

where we defined the shortcuts

Ct sum

i

CTi

(Se

t+1 + Sxt+1

)Ci

Dt sum

i

DTi K T

t Set+1 Kt Di and

Mt Sxt+1

ξ + Set+1

(ξ + η + Kt

ω K Tt

)

Note that the control law affects only the cost-go-to function through anexpression that is quadratic in π ( xt) which can be minimized analyticallyBut there is a problem the minimum depends on xt while π is only allowedto be a function of xt To obtain the optimal control law at time t we haveto take an expectation over xt conditional on xt and find the function π

that minimizes the resulting expression Note that the control-dependentexpression is linear in xt and so its expectation depends on the conditionalmean of xt but not on any higher moments Since E [xt| xt] = xt we have

E[vπ

t (xt xt)| xt] = const + π ( xt)T

(R + BTSx

t+1 B + Ct)π ( xt)

+ 2π ( xt)T BTSxt+1 A xt

and thus the optimal control law at time t is

ut = π ( xt) = minusLtxt Lt (R + BTSx

t+1 B + Ct)minus1

BTSxt+1 A

Note that the linear form of the optimal control law fell out of the opti-mization and was not assumed Given our assumptions the matrix beinginverted is symmetric positive-definite

To complete the induction proof we have to compute the optimal cost-to-go vt which is equal to vπ

t when π is set to the optimal control law minusLtxt Using the fact that LT

t (R + BTSxt+1 B + Ct)Lt = LT

t BTSxt+1 A = ATSx

t+1 BLt andthat xT Z x minus 2 xT Zx = (x minus x )T Z(x minus x ) minus xT Zx = eT Ze minus xT Zx for a sym-metric matrix Z (in our case equal to LT

t BTSxt+1 A) the result is

vt(xt xt) = xTt

(Qt + ATSx

t+1(Aminus BLt) + Dt)xt + tr (Mt) + st+1

+ eTt

(ATSx

t+1 BLt + (Aminus Kt H)TSet+1(Aminus Kt H)

)et

Methods for Optimal Sensorimotor Control 1093

We now see that the optimal cost-to-go function remains in the assumedquadratic form which completes the induction proof The optimal controllaw is computed recursively backward in time as

Controller ut = minusLtxt

Lt =(

R + BTSxt+1 B +

sumi

CTi

(Sx

t+1 + Set+1

)Ci

)minus1

BTSxt+1 A

Sxt = Qt + ATSx

t+1(Aminus BLt) +sum

i

DTi K T

t Set+1 Kt Di Sx

n = Qn

Set = ATSx

t+1 BLt + (Aminus Kt H)T Set+1 (Aminus Kt H) Se

n = 0

st = tr(Sx

t+1ξ + Se

t+1

(ξ + η + Kt

ω K Tt

)) + st+1 sn = 0

(42)

The total expected cost is x T1 Sx

1 x1 + tr((Sx1 + Se

1 )1) + s1When the control-dependent and state-dependent noise terms are re-

moved (ie C1 Cc = 0 D1 Dd = 0) the control laws given byequation 42 and 32 are identical The internal noise term η as well asthe additive noise terms ξ and ω do not directly affect the calculation ofthe feedback gain matrices L However all noise terms affect the calculation(see below) of the optimal filter gains K which in turn affect L

One can attempt to transform equation 31 into a fully observable systemby setting H = I ω = η = 0 D1 Dd = 0 in which case K = A andapply equation 42 Recall however our assumption that the control signalis generated before the current state is measured Thus even if we makethe sensory measurement equal to the state we would still be dealing witha partially observable system To derive the optimal controller for the fullyobservable case we have to assume that xt is known at the time when ut isgenerated The above derivation is now much simplified the optimal cost-to-go function vt is in the form xT

t Stxt + st and the expectation term thatneeds to be minimized with regard to ut = π (xt) becomes

E [vt+1] = (Axt + But)TSt+1(Axt + But)

+ uTt

(sumi

CTi St+1Ci

)ut + tr [St+1

ξ ] + st+1

and the optimal controller is computed in a backward pass through time as

Fully observable controller ut = minusLtxt

Lt =(

R + BTSt+1 B +sum

i

CTi St+1Ci

)minus1

BTSt+1 A

St = Qt + ATSt+1(Aminus BLt) Sn = Qn

st = tr (St+1ξ ) + st+1 sn = 0

(43)

1094 E Todorov

5 Optimal Estimator

So far we have computed the optimal control law L for any fixed sequenceof filter gains K What should these gains be fixed to Ideally they shouldcorrespond to a Kalman filter which is the optimal linear estimator How-ever in the presence of control-dependent and state-dependent noise theKalman filter gains become adaptive (ie Kt depends on xt and ut) whichwould make our control law derivation invalid Thus if we want to preservethe optimality of the control law given by equation 42 and obtain an iter-ative algorithm with guaranteed convergence we need to compute a fixedsequence of filter gains that are optimal for a given control law Once theiterative algorithm has converged and the control law has been designedwe could use an adaptive filter in place of the fixed-gain filter in run time(see section 8)

Thus our objective here is the following given a linear feedback controllaw L1 Lnminus1 (which is optimal for the previous filter K1 Knminus1)compute a new filter that in conjunction with the given control law resultsin minimal expected cost In other words we will evaluate the filter not bythe magnitude of its estimation errors but by the effect that these estimationerrors have on the performance of the composite estimation-control system

We will show that the new optimal filter can be designed in a forwardpass through time In particular we will show that regardless of the newvalues of K1 Ktminus1 the optimal Kt can be found analytically as long asKt+1 Knminus1 still have the values for which Lt+1 Lnminus1 are optimalRecall that the optimal Lt+1 Lnminus1 depend only on Kt+1 Knminus1 andso the parameters (as well as the form) of the optimal cost-to-go functionvt+1 cannot be affected by changing K1 Kt Since Kt affects only thecomputation of xt+1 and the effect of xt+1 on the total expected cost is cap-tured by the function vt+1 we have to minimize vt+1 with respect to Kt But v is a function of x and x while K cannot be adapted to the specificvalues of x and x within a simulation run (by assumption) Thus thequantity we have to minimize is the unconditional expectation of vt+1 Indoing so we will use that fact that

E [vt+1(xt+1 xt+1)] = Ext xt [E [vt+1(xt+1 xt+1)|xt xt Lt]]

The conditional expectation was already computed as an intermediate stepin the previous section (not shown) The terms in E [vt+1(xt+1 xt+1)|xt xt Lt]that depend on Kt are

eTt (Aminus Kt H)TSe

t+1(Aminus Kt H)et + tr

(Kt

(ω +

sumi

Di xtxTt DT

i

)K T

t Set+1

)

Defining the (uncentered) unconditional covariances et E [eteT

t ] andx

t E [xtxTt ] the unconditional expectation of the Kt-dependent expression

Methods for Optimal Sensorimotor Control 1095

above becomes

a (Kt) = tr((

(Aminus Kt H)et (Aminus Kt H)T + KtPt K T

t

)Se

t+1

)Pt ω +

sumi

Dixt DT

i

The minimum of a (Kt) is found by setting its derivative with regard to Kt to0 Using the matrix identities part

part X tr (XU) = UT and partpart X tr

(XU XTV

) = VXU +VT XUT and the fact that the matrices Se

t+1 ω e

t xt are symmetric we

obtainparta (Kt)partKt

= 2Set+1

(Kt

(He

t HT + Pt) minus Ae

t HT)

This expression is equal to 0 whenever Kt = Aet HT(He

t HT + Pt)minus1 re-gardless of the value of Se

t+1 Given our assumptions the matrix being in-verted is symmetric positive-definite Note that the optimal Kt dependson K1 Ktminus1 (through e

t and xt ) but is independent of Kt+1 Knminus1

(since it is independent of Set+1) This is the reason that the filter gains are

reoptimized in a forward passTo complete the derivation we have to substitute the optimal filter

gains and compute the unconditional covariances Recall that the variablesxt xt et are deterministically related by et = xt minus xt so the covariance ofany one of them can be computed given the covariances of the other twoand we have a choice of which pair of covariance matrices to computeThe resulting equations are most compact for the pair xt et The stochasticdynamics of these variables are

xt+1 = (Aminus BLt) xt + Kt Het + Ktωt + ηt +sum

i

εit Kt Di (et + xt)

et+1 = (Aminus Kt H)et + ξt minus Ktωt minus ηt minussum

i

εitCi Ltxt (51)

minussum

i

εit Kt Di (et + xt)

Define the unconditional covariances

et E

[eteT

t

] xt E

[xtxT

t

] xet E

[xteT

t

]

noting that xt is uncentered and ex

t = (xet )T Since x1 is a known constant

the initialization at t = 1 is e1 = 1 x

1 = x1xT1 xe

1 = 0 With these defi-nitions we have x

t = E [(et + xt)(et + xt)T] = et + x

t + xet + xeT

t Usingequation 51 the updates for the unconditional covariances are

et+1 = (Aminus Kt H)e

t (Aminus Kt H)T + ξ + η + KtPt K Tt

+sum

i

Ci Ltxt LT

t CTi

1096 E Todorov

xt+1 = (Aminus BLt)x

t (Aminus BLt)T + η + Kt(He

t HT + Pt)K T

t

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T

xet+1 = (Aminus BLt)xe

t (Aminus Kt H)T + Kt Het (Aminus Kt H)T

minus η minus KtPt K Tt

Substituting the optimal value of Kt which allows some simplifications tothe above update equations the optimal nonadaptive linear filter is com-puted in a forward pass through time as

Estimator xt+1 = (Aminus BLt) xt + Kt(yt minus H xt) + ηt

Kt = Aet HT

(He

t HT + ω +sum

i

Di(e

t + xt + xe

t + ext

)DT

i

)minus1

et+1 = ξ + η + (Aminus Kt H)e

t AT +sum

i

Ci Ltxt LT

t CTi e

1 = 1

xt+1 = η + Kt He

t AT + (Aminus BLt)xt (Aminus BLt)T

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T x

1 = x1xT1

xet+1 = (Aminus BLt) xe

t (Aminus Kt H)T minus η xe1 = 0

(52)It is worth noting the effects of the internal noise ηt If that term did not

exist (ie η = 0) the last update equation would yield xet = 0 for all t

Indeed for an optimal filter one would expect xet = 0 from the orthogo-

nality principle if the state estimate and estimation error were correlatedone could improve the filter by taking that correlation into account How-ever the situation here is different because we have noise acting directlyon the state estimate When such noise pushes xt in one direction et is (bydefinition) pushed in the opposite direction creating a negative correlationbetween xt and et This is the reason for the negative sign in front of the η

term in the last update equationThe complete algorithm is the followingAlgorithmInitialize K1 Knminus1 and iterate equation 42 and equation 52 until

convergence Convergence is guaranteed because the expected cost is non-negative by definition and we are using a coordinate-descent algorithmwhich decreases the expected cost in each step The initial sequence K couldbe set to 0mdashin which case the first pass of equation 42 will find the optimalopen-loop controls or initialized from equation 32mdashwhich is equivalent toassuming additive noise in the first pass

We can also derive the optimal adaptive linear filter with gains Kt thatdepend on the specific xt and ut = minusLtxt within each simulation run This isagain accomplished by minimizing E [vt+1] with respect to Kt but the expec-tation is computed with xt being a known constant rather than a randomvariable We now have x

t = xtxTt and xe

t = 0 and so the last two update

Methods for Optimal Sensorimotor Control 1097

equations in equation 52 are no longer needed The optimal adaptive linearfilter is

Adaptive estimator xt+1 = (Aminus BLt) xt + Kt (yt minus H xt) + ηt

Kt = At HT

(Ht HT + ω +

sumi

Di(t + xt xT

t

)DT

i

)minus1

t+1 = ξ + η + (Aminus Kt H) t AT +sum

i

Ci Ltxt xTt LT

t CTi

(53)

where t = Cov [xt| xt] is the conditional estimation error covariance(initialized from 1 which is given) When the control-dependentstate-dependent and internal noise terms are removed (C1 Cc = 0D1 Dd = 0 η = 0) equation 53 reduces to the Kalman filter in equa-tion 32 Note that using equation 53 instead of equation 52 online reducesthe total expected cost because equation 53 achieves lower estimation errorthan any other linear filter and the expected cost depends on the conditionalestimation error covariance This can be seen from

E[vt(xt xt)| xt] = xTt Sx

t xt + st + tr((

Sxt + Se

t

)Cov[xt| xt]

)6 Application to Reaching Movements

We now illustrate how the methodology developed above can be used toconstruct models relevant to motor control Since this is a methodologi-cal rather than a modeling article a detailed evaluation of the resultingmodels in the context of the motor control literature will not be givenhere The first model is a one-dimensional model of reaching and in-cludes control-dependent noise but no state-dependent or internal noiseThe latter two forms of noise are illustrated in the second model wherewe estimate the position of a stationary peripheral target without making amovement

61 Models We model a single-joint movement (such as flexing the el-bow) that brings the hand to a specified target For simplicity the rotationalmotion is replaced with translational motion the hand is modeled as a pointmass (m = 1 kg) whose one-dimensional position at time t is p(t) The com-bined action of all muscles is represented with the force f (t) acting on thehand The control signal u(t) is transformed into force f (t) by adding control-dependent multiplicative noise and applying a second-order muscle-likelow-pass filter (Winter 1990) of the form τ1τ2 f (t) + (τ1 + τ2) f(t) + f (t) =u(t) with time constants τ1 = τ2 = 004 sec Note that a second-order filtercan be written as a pair of coupled first-order filters (with outputs g and f )as follows τ1g(t) + g(t) = u(t) τ2 f (t) + f (t) = g(t)

The task is to move the hand from the starting position p(0) = 0 m to thetarget position plowast = 01 m and stop there at time tend with minimal energy

1098 E Todorov

consumption Movement durations are in the interval tend isin [025 sec035 sec] Time is discretized at = 001 sec The total cost is defined as

(p(tend) minus plowast)2 + (wv p(tend))2 + (w f f (tend))2 + rn minus 1

nminus1sumk=1

u(k)2

The first term enforces positional accuracy the second and third termsspecify that the movement has to stop at time tend that is both the velocityand force have to vanish and the last term penalizes energy consumptionIt makes sense to set the scaling weights wv and w f so that wv p(t) andw f f (t) averaged over the movement have magnitudes similar to the handdisplacement plowast minus p(0) For a 01 m reaching movement that lasts about03 sec these weights are wv = 02 and w f = 002 The weight of the energyterm was set to r = 000001

The discrete-time system state is represented with the five-dimensionalvector

xt = [p(t) p(t) f (t) g(t) plowast]

initialized from a gaussian with mean x1 = [0 0 0 0 plowast] The auxiliarystate variable g(t) is needed to implement a second-order filter The tar-get plowast is included in the state so that we can capture the above cost func-tion using a quadratic with no linear terms defining p = [1 0 0 0 minus1]we have pTxt = p(tend) minus plowast and so xT

t (ppT)xt = (p(tend) minus plowast)2 Note thatthe same could be accomplished by setting p = [1 0 0 0 minusplowast] and xt =[p(t) p(t) f (t) g(t) 1] The advantage of the formulation used here is thatbecause the target is represented in the state the same control law can bereused for other targets The control law of course depends on the filterwhich depends on the initial expected state which depends on the targetmdashand so a control law optimal for one target is not necessarily optimal for allother targets Unpublished simulation results indicate good generalizationbut a more detailed investigation of how the optimal control law dependson the target position is needed

The sensory feedback carries information about position velocity andforce

yt = [p(t) p(t) f (t)] + ωt

The vector ωt of sensory noise terms has zero-mean gaussian distributionwith diagonal covariance

ω = (σsdiag[002 m 02 ms 1 N])2

where the relative magnitudes are set using the same order-of-magnitudereasoning as before and σs = 05 The multiplicative noise term added tothe discrete-time control signal ut = u(t) is σcεtut where σc = 05 Note that

Methods for Optimal Sensorimotor Control 1099

σc is a unitless quantity that defines the noise magnitude relative to thecontrol signal magnitude

The discrete-time dynamics of the above system are

p(t + ) = p(t) + p(t)

p(t + ) = p(t) + f (t)m

f (t + ) = f (t) (1 minus τ2) + g(t)τ2

g(t + ) = g(t) (1 minus τ1) + u(t) (1 + σcεt) τ1

which is transformed into the form of equation 31 by the matrices

A =

1 0 0 0

0 1 m 0 0

0 0 1 minus τ2 τ2 0

0 0 0 1 minus τ1 0

0 0 0 0 1

B =

0

0

0

τ1

0

H =

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

C1 = Bσc c = 1 d = 0

1 = ξ = η = 0

The cost matrices are R = r Q1nminus1 = 0 and Qn = ppT + vvT + ffT where

p = [1 0 0 0 minus1] v = [0 wv 0 0 0] f = [0 0 w f 0 0]

This completes the formulation of the first model The above algorithmcan now be applied to obtain the control law and filter and the closed-loopsystem can be simulated To replace the control-dependent noise with ad-ditive noise of similar magnitude (and compare the effects of the two formsof noise) we will set c = 0 and ξ = (46 N)2 B BT The value of 46 N is theaverage magnitude of the control-dependent noise over the range of move-ment durations (found through 10000 simulation runs at each movementduration)

We also model an estimation process under state-dependent and inter-nal noise in the absence of movement In that case the state is xt = plowastwhere the stationary target plowast is sampled from a gaussian with mean x1 isin5 cm 15 cm 25 cm and variance 1 = (5 cm)2 Note that target eccentric-ity is represented as distance rather than visual angle The state-dependentnoise has scale D1 = 05 fixation is assumed to be at 0 cm the time step is = 10 msec and we run the estimation process for n = 100 time steps Inone set of simulations we use internal noise η = (05 cm)2 without addi-tive noise In another set of simulations we study additive noise with thesame magnitude ω = (05 cm)2 without internal noise There is no actu-ator to be controlled so we have A = H = 1 and B = L = 0 Estimation isbased on the adaptive filter from equation 53

1100 E Todorov

62 Results Reaching movements are known to have stereotyped bell-shaped speed profiles (Flash amp Hogan 1985) Models of this phenomenonhave traditionally been formulated in terms of deterministic open-loop min-imization of some cost function Cost functions that penalize physicallymeaningful quantities (such as duration or energy consumption) did notagree with empirical data (Nelson 1983) in order to obtain realistic speedprofiles it appeared necessary to minimize a smoothness-related cost thatpenalizes the derivative of acceleration (Flash amp Hogan 1985) or torque(Uno et al 1989) Smoothness-related cost functions have also been usedin the context of stochastic optimal feedback control (Hoff 1992) to obtainbell-shaped speed profiles It was recently shown however that smoothnessdoes not have to be explicitly enforced by the cost function open-loop min-imization of end-point error was found sufficient to produce realistic trajec-tories provided that the multiplicative nature of motor noise is taken intoaccount (Harris amp Wolpert 1998) While this is an important step toward amore principled optimization model of trajectory smoothness it still con-tains an ad hoc element the optimization is performed in an open loopwhich is suboptimal especially for movements of longer duration Ourmodel differs from Harris and Wolpert (1998) in that not only the averagesequence of control signals is optimal but the feedback gains that determinethe online sensory-guided adjustments are also optimal Optimal feedbackcontrol of reaching has been studied by Meyer Abrams Kornblum Wrightand Smith (1988) in an intermittent setting and Hoff (1992) in a continu-ous setting However both of these models assume full state observationOurs is the first optimal control model of reaching that incorporates sensorynoise and combines state estimation and feedback control into an optimalsensorimotor loop The predicted movement kinematics shown in Figure 1Aclosely resemble observed movement trajectories (Flash amp Hogan 1985)

Another well-known property of reaching movements first observed acentury ago by Woodworth and later quantified as Fittsrsquo law is the trade-offbetween speed and accuracy The fact that faster movements are less ac-curate implies that the instantaneous noise in the motor system is control-dependent in agreement with direct measurements of isometric force fluc-tuations (Sutton and Sykes 1967 Schmidt et al 1979 Todorov 2002) thatshow standard deviation increasing linearly with the mean Naturally thisnoise scaling has formed the basis of both closed-loop (Meyer et al 1988Hoff 1992) and open-loop (Harris amp Wolpert 1998) optimization models ofthe speed-accuracy trade-off Figure 1B illustrates the effect in our model asthe (specified) movement duration increases the standard deviation of theend-point error achieved by the optimal controller decreases To emphasizethe need for incorporating control-dependent noise we modified the modelby making the noise in the plant dynamics additive with fixed magnitudechosen to match the average multiplicative noise magnitude over the rangeof movement durations With that change the end-point error showed theopposite trend to the one observed experimentally (see Figure 1B)

Methods for Optimal Sensorimotor Control 1101

Figure 1 (A) Normalized position (Pos) velocity (Vel) and acceleration (Acc) ofthe average trajectory of the optimal controller (B) A separate optimal controllerwas constructed for each instructed duration the resulting closed-loop systemwas simulated for 10000 trials and the positional standard deviation at the endof the trial was plotted This was done with either multiplicative (solid line)or additive (dashed line) noise in the plant dynamics (C) The position of astationary peripheral target was estimated over time under internal estimationnoise (solid line) or additive observation noise (dashed line) This was done inthree sets of trials with target positions sampled from gaussians with means5 cm (bottom) 15 cm (middle) and 25 cm (top) Each curve is an average over10000 simulation runs

It is interesting to compare the effects of the control penalty r and the mul-tiplicative noise scaling σc As equation 42 shows both terms penalize largecontrol signalsmdashdirectly in the case of r and indirectly (via increased un-certainty) in the case of σc Consequently both terms lead to a negative biasin end-point position (not shown) but the effect is much more pronouncedfor r Another consequence of the fact that larger controls are more costlyarises in the control of redundant systems where the optimal strategy is tofollow a minimal intervention principle that is to leave task-irrelevant de-viations from the average behavior uncorrected (Todorov amp Jordan 2002a2002b) Simulations have shown that this more complex effect is dependenton σc and actually decreases when r is increased while σc is kept constant(Todorov amp Jordan 2002b)

Figure 1C shows simulation results from our second model where the po-sition of a stationary peripheral target is estimated by the optimal adaptivefilter in equation 53 operating under internal estimation noise or additiveobservation noise of the same magnitude In each case we show results forthree sets of targets with varying average eccentricity The standard devi-ations of the estimation error always reach an asymptote (much faster inthe case of internal noise) In the presence of internal noise this asymptotedepends on target eccentricity for the chosen model parameters the depen-dence is in quantitative agreement with our experimental results (Todorov1998) Under additive noise the error always asymptotes to 0

1102 E Todorov

Figure 2 Relative change in expected cost as a function of iteration numberin (A) psychophysical models and (B) random models (C) Relative variabil-ity (SDmean) among expected costs obtained from 100 different runs of thealgorithm on the same model (average over models in each class)

7 Convergence Properties

We studied the convergence properties of the algorithm in 10 models of psy-chophysical experiments taken from Todorov and Jordan (2002b) and 200randomly generated models The psychophysical models had dynamics andcost functions similar to the above example They included two models ofplanar reaching three models of passing through sequences of targets onemodel of isometric force production three models of tracking and reach-ing with a mechanically redundant arm and one model of throwing Thedimensionalities of the state control and feedback were between 5 and20 and the horizon n was about 100 The psychophysical models includedcontrol-dependent dynamics noise and additive observation noise but nointernal or state-dependent noise The details of all these models are inter-esting from a motor control point of view but we omit them here since theydid not affect the convergence of the algorithm in any systematic way

The random models were divided into two groups of 100 each passivelystable with all eigenvalues of A being smaller than 1 and passively unsta-ble with the largest eigenvalue of A being between 1 and 2 The dynamicswere restricted so that the last component of xt was 1mdashto make the randommodels more similar to the psychophysical models which always incorpo-rated a constant in the state description The state control and measurementdimensionalities were sampled uniformly between 5 and 20 The randommodels included all forms of noise allowed by equation 31

For each model we initialized K1nminus1 from equation 32 and applied ouriterative algorithm In all cases convergence was very rapid (see Figures 2Aand 2B) with the relative change in expected cost decreasing exponentiallyThe jitter observed at the end of the minimization (see Figure 2A) is due tonumerical round-off errors (note the log scale) and continues indefinitelyThe exponential convergence regime does not always start from the firstiteration (see Figure 2A) Similar behavior was observed for the absolute

Methods for Optimal Sensorimotor Control 1103

change in expected cost (not shown) As one would expect random mod-els with unstable passive dynamics converged more slowly than passivelystable models Convergence was observed in all cases

To test for the existence of local minima we focused on five psychophys-ical five random stable and five random unstable models For each modelthe algorithm was initialized 100 times with different randomly chosensequences K1nminus1 and run for 100 iterations For each model we com-puted the standard deviation of the expected cost obtained at each iterationand divided by the mean expected cost at that iteration The results aver-aged within each model class are plotted in Figure 2C The negligibly smallvalues after convergence indicate that the algorithm always finds the samesolution This was true for every model we studied despite the fact thatthe random initialization sometimes produced very large initial costs Wealso examined the K and L sequences found at the end of each run and thedifferences seemed to be due to round-off errors Thus we conjecture thatthe algorithm always converges to the globally optimal solution So far wehave not been able to prove this analytically and cannot offer a satisfyingintuitive explanation at this time

Note that the system can be unstable even for the optimal controllerFormally that does not affect the derivation because in a discrete-time finite-horizon system all numbers remain finite In practice the components of xt

can exceed the maximum floating-point number whenever the eigenvaluesof (Aminus BLt) are sufficiently large In the applications we are interestedin (Todorov 1998 Todorov amp Jordan 2002b) such problems were neverencountered

8 Improving Performance via Adaptive Estimation

Although the iterative algorithm given by equations 42 and 52 is guar-anteed to converge and empirically it appears to converge to the globallyoptimal solution performance can still be suboptimal due to the imposedrestriction to nonadaptive filters Here we present simulations aimed atquantifying this suboptimality

Because the potential suboptimality arises from the restriction to non-adaptive filters it is natural to ask what would happen if that restriction wereremoved in run time and the optimal adaptive linear filter from equation 53were used instead Recall that although the control law is optimized underthe assumption of a nonadaptive filter it yields better performance if a dif-ferent filter which somehow achieves lower estimation error is used in runtime Thus in our first test we simply replace the nonadaptive filter withequation 53 in run time and compute the reduction in expected total cost

The above discussion suggests a possibility for further improvement Thecontrol law is optimal with respect to some sequence of filter gains K1nminus1But the adaptive filter applied in run time uses systematically differentgains because it achieves systematically lower estimation error We can run

1104 E Todorov

Table 2 Cost Reduction

Model

Method Psychophysical Random Stable Random Unstable

Adaptive Estimator 19 0 314 Reoptimized Controller 17 0 283

Notes Numbers indicate percent reduction in expected total cost relative to thecost of the solution found by our iterative algorithm The two improvement meth-ods are described in the text Each method is applied to 10 models in each modelclass For each model and method expected total cost is computed from 10000simulation runs A value of 0 indicates that with a sample size of 10 models theimprovement was not significantly different from 0 (t-test p = 005 threshold)

our control law in conjunction with the adaptive filter and find the averagefilter gains K1nminus1 that are used online Now one would think that if wereoptimized the control law for the nonadaptive filter K1nminus1 which betterreflects the gains being used online by the adaptive filter this will furtherimprove performance This is the second test we apply

As Table 2 shows neither method improves performance substantiallyfor psychophysical models or random stable models However both meth-ods result in substantial improvement for random unstable models This isnot surprising In the passively stable models the differences between theexpected and actual values of the states and controls are relatively smalland so the optimal nonadaptive filter is not that different from the optimaladaptive filter The unstable models on the other hand are very sensitive tosmall perturbations and thus follow substantially different state-control tra-jectories in different simulation runs So the advantage of adaptive filteringis much greater Since musculoskeletal plants have stable passive dynamicswe conclude that our algorithm is well suited for approximating the optimalsensorimotor system

It is interesting that control law reoptimization in addition to adaptivefiltering is actually worse than adaptive filtering alonemdashcontrary to our in-tuition This was the case for every model we studied Although it is notclear where the problem with the reoptimization method lies this some-what unexpected result provides further justification for the restriction weintroduced In particular it suggests that the control law that is optimalunder the best nonadaptive filter may be close to optimal under the bestadaptive filter

9 Discussion

We have presented an algorithm for stochastic optimal control and estima-tion of partially observable linear dynamical systems subject to quadraticcosts and noise processes characteristic of the sensorimotor system (see

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 4: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

Methods for Optimal Sensorimotor Control 1087

Table 1 List of Notation

xt isin Rm state vector at time step t

ut isin Rp control signal

yt isin Rk sensory observation

n total number of time stepsA B H system dynamics and observation matricesξtωt εt εtηt zero-mean noise termsξ ω ε ε η covariances of noise termsC1 Cc scaling matrices for control-dependent system noiseD1 Dd scaling matrices for state-dependent observation noiseQt R matrices defining state- and control-dependent costsxt state estimateet estimation errort conditional estimation error covariancee

t xt xe

t unconditional covariancesvt optimal cost-to-go functionSx

t Set st parameters of the optimal cost-to-go function

Kt filter gain matricesLt control gain matrices

In this letter we define an extended noise model that reflects the prop-erties of the sensorimotor system derive an efficient algorithm for solvingthe stochastic optimal control and estimation problems under that noisemodel illustrate the application of this extended LQG methodology in thecontext of reaching movements and study the properties of the new algo-rithm through extensive numerical simulations A special case of the al-gorithm derived here has already allowed us (Todorov amp Jordan 2002b)to construct models of a wider range of empirical results than previouslypossible

In section 2 we motivate our extended noise model which includescontrol-dependent state-dependent and internal estimation noise Insection 3 we formalize the problem and restrict the feedback control lawsunder consideration to functions of state estimates that are obtained by un-biased nonadaptive linear filters In section 4 we compute the optimal feed-back control law for any nonadaptive linear filter and show that it is linearin the state estimate In section 5 we derive the optimal nonadaptive linearfilter for any linear control law The two results together provide an iterativecoordinate-descent algorithm (equations 42 and 52) which is guaranteedto converge to a filter and a control law optimal with respect to each otherIn section 6 we illustrate the application of our method to the analysis ofreaching movements In section 7 we explore numerically the convergenceproperties of the algorithm and observe exponential convergence with nolocal minima In section 8 we assess the effects of assuming a nonadap-tive linear filter and find them to be negligible for the control problems ofinterest

Table 1 shows the notation used in this letter

1088 E Todorov

2 Noise Characteristics of the Sensorimotor System

Noise in the motor output is not additive but instead increases with themagnitude of the control signals This is intuitively obvious if you restyour arm on the table it does not bounce around (ie the passive plantdynamics have little noise) but when you make a movement (ie generatecontrol signals) the outcome is not always as desired Quantitatively therelationship between motor noise and control magnitude is surprisinglysimple Such noise has been found to be multiplicative the standard de-viation of muscle force is well fit with a linear function of the mean forcein both static (Sutton amp Sykes 1967 Todorov 2002) and dynamic (SchmidtZelaznick Hawkins Frank amp Quinn 1979) isometric force tasks The exactreasons for this dependence are not entirely clear although it can be ex-plained at least in part with Poisson noise on the neural level combined withHennemanrsquos size principle of motoneuron recruitment (Jones Hamilton ampWolpert 2002) To formalize the empirically established dependence let ube a vector of control signals (corresponding to the muscle activation levelsthat the nervous system attempts to set) and ε be a vector of zero-mean ran-dom numbers A general multiplicative noise model takes the form C(u)εwhere C(u) is a matrix whose elements depend linearly on u To expressa linear relationship between a vector u and a matrix C we make the ithcolumn of C equal to Ci u where Ci are constant scaling matrices Thenwe have C(u)ε = sum

i Ci uεi where εi is the ith component of the randomvector ε

Online movement control relies on feedback from a variety of sensorymodalities with vision and proprioception typically playing the dominantrole Visual noise obviously depends on the retinal position of the objectsof interest and increases with distance away from the fovea (ie eccen-tricity) The accuracy of visual positional estimates is again surprisinglywell modeled with multiplicative noise whose standard deviation is pro-portional to eccentricity This is an instantiation of Weberrsquos law and hasbeen found to be quite robust in a variety of interval discrimination ex-periments (Burbeck amp Yap 1990 Whitaker amp Latham 1997) We have alsoconfirmed this scaling law in a visuomotor setting where subjects pointedto memorized targets presented in the visual periphery (Todorov 1998)Such results motivate the use of a multiplicative observation noise modelof the form D (x)ε = sum

i Di xεi where x is the state of the plant and environ-ment including the current fixation point and the positions and velocities ofrelevant objects Incorporating state-dependent noise in analyses of senso-rimotor control can allow more accurate modeling of the effects of feedbackand various experimental perturbations it also can effectively induce a costfunction over eye movement patterns and allow us to predict the eye move-ments that would result in optimal hand performance (Todorov 1998) Notethat if other forms of state-dependent sensory noise are found the modelcan still be useful as a linear approximation

Methods for Optimal Sensorimotor Control 1089

Intelligent control of a partially observable stochastic plant requiresa feedback control law which is typically a function of a state estimatethat is computed recursively over time In engineering applications theestimation-control loop is implemented in a noiseless digital computer andso all noise is external In models of biological movement we usually makethe same assumption treating all noise as being a property of the muscu-loskeletal plant or the sensory apparatus This is in principle unrealisticbecause neural representations are likely subject to internal fluctuationsthat do not arise in the periphery It is also unrealistic in modeling practiceAn ideal observer model predicts that the estimation error covariance ofany stationary feature of the environment will asymptote to 0 In partic-ular such models predict that if we view a stationary object in the visualperiphery long enough we should eventually know exactly where it is andbe able to reach for it as accurately as if it were at the center of fixation Thiscontradicts our intuition as well as experimental data Both interval dis-crimination experiments and reaching to remembered peripheral targetsexperiments indicate that estimation errors asymptote rather quickly butnot to 0 Instead the asymptote level depends linearly on eccentricity Thesimplest way to model this is to assume another noise process which wecall internal noise acting directly on whatever state estimate the nervoussystem chooses to compute

3 Problem Statement and Assumptions

Consider a linear dynamical system with state xt isin Rm control ut isin R

pfeedback yt isin R

k in discrete time t

Dynamics xt+1 = Axt + But + ξt +csum

i=1

εitCi ut

Feedback yt = Hxt + ωt +dsum

i=1

εit Di xt

Cost per step xTt Qtxt + uT

t Rut

(31)

The feedback signal yt is received after the control signal ut has been gen-erated The initial state has known mean x1 and covariance 1 All matricesare known and have compatible dimensions making them time varyingis straightforward The control cost matrix R is symmetric positive defi-nite (R gt 0) and the state cost matrices Q1 Qn are symmetric positivesemidefinite (Qt ge 0) Each movement lasts n time steps at t = n the finalcost is xT

n Qnxn and un is undefined The independent random variablesξt isin R

m ωt isin Rk εt isin R

c and εt isin Rd have multidimensional gaussian dis-

tributions with mean 0 and covariances ξ ge 0 ω gt 0 ε = I and ε = Irespectively Thus the control-dependent and state-dependent noise termsin equation 31 have covariances

sumi Ci utuT

t CTi and

sumi Di xtxT

t DTi When the

1090 E Todorov

control-dependent noise is meant to be added to the control signal (which isusually the case) the matrices Ci should have the form B Fi where Fi are theactual noise scaling factors Then the control-dependent part of the plantdynamics becomes B(I + sum

i εit Fi )ut

The problem of optimal control is to find the optimal control law thatis the sequence of causal control functions ut(u1 utminus1 y1 ytminus1) thatminimize the expected total cost over the movement Note that computingthe optimal sequence of functions u1(middot) unminus1(middot) is a different and ingeneral much more difficult problem than computing the optimal sequenceof open-loop controls u1 unminus1

When only additive noise is present (ie C1 Cc = 0 and D1 Dd =0) this reduces to the classic LQG problem which has the well-knownoptimal solution (Davis amp Vinter 1985)

Linear-Quadratic Regulator Kalman Filterut = minusLtxt xt+1 = A xt + But + Kt (yt minus H xt)

Lt = (R + BTSt+1 B)minus1 BTSt+1 A Kt = At HT(Ht HT + ω)minus1

St = Qt + ATSt+1(Aminus BLt) t+1 = ξ + (Aminus Kt H) t AT

(32)

In that case the optimal control law depends on the history of control andfeedback signals only through the state estimate xt which is updated recur-sively by the Kalman filter The matrices L that define the optimal controllaw do not depend on the noise covariances or filter coefficients and thematrices K that define the optimal filter do not depend on the cost andcontrol law

In the case of control-dependent and state-dependent noise the aboveindependence properties no longer hold This complicates the problem sub-stantially and forces us to adopt a more restricted formulation in the interestof analytical tractability We assume that as in equation 32 the entire his-tory of control and feedback signals is summarized by a state estimate xt which is all the information available to the control system at time t Thefeedback control law ut(middot) is allowed to be an arbitrary function of xt butxt can be updated only by a recursive linear filter of the form

xt+1 = Axt + But + Kt(yt minus H xt) + ηt

The internal noise ηt isin Rm has mean 0 and covariance η ge 0 The fil-

ter gains K1 Knminus1 are nonadaptive they are determined in advanceand cannot change as a function of the specific controls and observa-tions within a simulation run Such a filter is always unbiased for anyK1 Knminus1 we have E [xt| xt] = xt for all t Note however that underthe extended noise model any nonadaptive linear filter is suboptimalwhen xt is computed as defined above Cov [xt| xt] is generally larger thanCov [xt|u1 utminus1 y1 ytminus1] The consequences of this will be explorednumerically in section 8

Methods for Optimal Sensorimotor Control 1091

4 Optimal Controller

The optimal ut will be computed using the method of dynamic program-ming We will show by induction that if the true state at time t is xt andthe unbiased state estimate available to the control system is xt then theoptimal cost-to-go function (ie the cost expected to accumulate under theoptimal control law) has the quadratic form

vt(xt xt) = xTt Sx

t xt + (xt minus xt)TSet (xt minus xt) + st = xT

t Sxt xt + eT

t Set et + st

where et xt minus xt is the estimation error At the final time t = n the optimalcost-to-go is simply the final cost xT

n Qnxn and so vn is in the assumed formwith Sx

n = Qn Sen = 0 sn = 0 To carry out the induction proof we have to

show that if vt+1 is in the above form for some t lt n then vt is also in thatform

Consider a time-varying control law that is optimal at times t + 1 nand at time t is given by ut = π ( xt) Let vπ

t (xt xt) be the correspondingcost-to-go function Since this control law is optimal after time t we havevπ

t+1 = vt+1 Then the cost-to-go function vπt satisfies the Bellman equation

vπt (xt xt) = xT

t Qtxt + π ( xt)T Rπ ( xt) + E [vt+1(xt+1 xt+1)|xt xt π ]

To compute the above expectation term we need the update equationsfor the system variables Using the definitions of the observation yt andthe estimation error et the stochastic dynamics of the variables of interestbecome

xt+1 = Axt + Bπ ( xt) + ξt +sum

i

εitCiπ ( xt)

et+1 = (Aminus Kt H)et + ξt minus Ktωt minus ηt +sum

i

εitCiπ ( xt) minus

sumi

εit Kt Di xt

(41)

Then the conditional means and covariances of xt+1 and et+1 are

E [xt+1|xt xt π ] = Axt + Bπ ( xt)

E [et+1|xt xt π ] = (Aminus Kt H)et

Cov [xt+1|xt xt π ] = ξ +sum

i

Ciπ ( xt)π ( xt)TCTi

Cov [et+1|xt xt π ] = ξ +sum

i

Ciπ ( xt)π ( xt)TCTi + η

+ Ktω K T

t +sum

i

Kt Di xtxTt DT

i K Tt

1092 E Todorov

and the conditional expectation in the Bellman equation can be computedThe cost-to-go becomes

vπt (xt xt) = xT

t

(Qt + ATSx

t+1 A+ Dt)xt

+ eTt (Aminus Kt H)TSe

t+1(Aminus Kt H)et

+ tr (Mt) + π ( xt)T (R + BTSx

t+1 B + Ct)π ( xt)

+ 2π ( xt)T BTSxt+1 Axt

where we defined the shortcuts

Ct sum

i

CTi

(Se

t+1 + Sxt+1

)Ci

Dt sum

i

DTi K T

t Set+1 Kt Di and

Mt Sxt+1

ξ + Set+1

(ξ + η + Kt

ω K Tt

)

Note that the control law affects only the cost-go-to function through anexpression that is quadratic in π ( xt) which can be minimized analyticallyBut there is a problem the minimum depends on xt while π is only allowedto be a function of xt To obtain the optimal control law at time t we haveto take an expectation over xt conditional on xt and find the function π

that minimizes the resulting expression Note that the control-dependentexpression is linear in xt and so its expectation depends on the conditionalmean of xt but not on any higher moments Since E [xt| xt] = xt we have

E[vπ

t (xt xt)| xt] = const + π ( xt)T

(R + BTSx

t+1 B + Ct)π ( xt)

+ 2π ( xt)T BTSxt+1 A xt

and thus the optimal control law at time t is

ut = π ( xt) = minusLtxt Lt (R + BTSx

t+1 B + Ct)minus1

BTSxt+1 A

Note that the linear form of the optimal control law fell out of the opti-mization and was not assumed Given our assumptions the matrix beinginverted is symmetric positive-definite

To complete the induction proof we have to compute the optimal cost-to-go vt which is equal to vπ

t when π is set to the optimal control law minusLtxt Using the fact that LT

t (R + BTSxt+1 B + Ct)Lt = LT

t BTSxt+1 A = ATSx

t+1 BLt andthat xT Z x minus 2 xT Zx = (x minus x )T Z(x minus x ) minus xT Zx = eT Ze minus xT Zx for a sym-metric matrix Z (in our case equal to LT

t BTSxt+1 A) the result is

vt(xt xt) = xTt

(Qt + ATSx

t+1(Aminus BLt) + Dt)xt + tr (Mt) + st+1

+ eTt

(ATSx

t+1 BLt + (Aminus Kt H)TSet+1(Aminus Kt H)

)et

Methods for Optimal Sensorimotor Control 1093

We now see that the optimal cost-to-go function remains in the assumedquadratic form which completes the induction proof The optimal controllaw is computed recursively backward in time as

Controller ut = minusLtxt

Lt =(

R + BTSxt+1 B +

sumi

CTi

(Sx

t+1 + Set+1

)Ci

)minus1

BTSxt+1 A

Sxt = Qt + ATSx

t+1(Aminus BLt) +sum

i

DTi K T

t Set+1 Kt Di Sx

n = Qn

Set = ATSx

t+1 BLt + (Aminus Kt H)T Set+1 (Aminus Kt H) Se

n = 0

st = tr(Sx

t+1ξ + Se

t+1

(ξ + η + Kt

ω K Tt

)) + st+1 sn = 0

(42)

The total expected cost is x T1 Sx

1 x1 + tr((Sx1 + Se

1 )1) + s1When the control-dependent and state-dependent noise terms are re-

moved (ie C1 Cc = 0 D1 Dd = 0) the control laws given byequation 42 and 32 are identical The internal noise term η as well asthe additive noise terms ξ and ω do not directly affect the calculation ofthe feedback gain matrices L However all noise terms affect the calculation(see below) of the optimal filter gains K which in turn affect L

One can attempt to transform equation 31 into a fully observable systemby setting H = I ω = η = 0 D1 Dd = 0 in which case K = A andapply equation 42 Recall however our assumption that the control signalis generated before the current state is measured Thus even if we makethe sensory measurement equal to the state we would still be dealing witha partially observable system To derive the optimal controller for the fullyobservable case we have to assume that xt is known at the time when ut isgenerated The above derivation is now much simplified the optimal cost-to-go function vt is in the form xT

t Stxt + st and the expectation term thatneeds to be minimized with regard to ut = π (xt) becomes

E [vt+1] = (Axt + But)TSt+1(Axt + But)

+ uTt

(sumi

CTi St+1Ci

)ut + tr [St+1

ξ ] + st+1

and the optimal controller is computed in a backward pass through time as

Fully observable controller ut = minusLtxt

Lt =(

R + BTSt+1 B +sum

i

CTi St+1Ci

)minus1

BTSt+1 A

St = Qt + ATSt+1(Aminus BLt) Sn = Qn

st = tr (St+1ξ ) + st+1 sn = 0

(43)

1094 E Todorov

5 Optimal Estimator

So far we have computed the optimal control law L for any fixed sequenceof filter gains K What should these gains be fixed to Ideally they shouldcorrespond to a Kalman filter which is the optimal linear estimator How-ever in the presence of control-dependent and state-dependent noise theKalman filter gains become adaptive (ie Kt depends on xt and ut) whichwould make our control law derivation invalid Thus if we want to preservethe optimality of the control law given by equation 42 and obtain an iter-ative algorithm with guaranteed convergence we need to compute a fixedsequence of filter gains that are optimal for a given control law Once theiterative algorithm has converged and the control law has been designedwe could use an adaptive filter in place of the fixed-gain filter in run time(see section 8)

Thus our objective here is the following given a linear feedback controllaw L1 Lnminus1 (which is optimal for the previous filter K1 Knminus1)compute a new filter that in conjunction with the given control law resultsin minimal expected cost In other words we will evaluate the filter not bythe magnitude of its estimation errors but by the effect that these estimationerrors have on the performance of the composite estimation-control system

We will show that the new optimal filter can be designed in a forwardpass through time In particular we will show that regardless of the newvalues of K1 Ktminus1 the optimal Kt can be found analytically as long asKt+1 Knminus1 still have the values for which Lt+1 Lnminus1 are optimalRecall that the optimal Lt+1 Lnminus1 depend only on Kt+1 Knminus1 andso the parameters (as well as the form) of the optimal cost-to-go functionvt+1 cannot be affected by changing K1 Kt Since Kt affects only thecomputation of xt+1 and the effect of xt+1 on the total expected cost is cap-tured by the function vt+1 we have to minimize vt+1 with respect to Kt But v is a function of x and x while K cannot be adapted to the specificvalues of x and x within a simulation run (by assumption) Thus thequantity we have to minimize is the unconditional expectation of vt+1 Indoing so we will use that fact that

E [vt+1(xt+1 xt+1)] = Ext xt [E [vt+1(xt+1 xt+1)|xt xt Lt]]

The conditional expectation was already computed as an intermediate stepin the previous section (not shown) The terms in E [vt+1(xt+1 xt+1)|xt xt Lt]that depend on Kt are

eTt (Aminus Kt H)TSe

t+1(Aminus Kt H)et + tr

(Kt

(ω +

sumi

Di xtxTt DT

i

)K T

t Set+1

)

Defining the (uncentered) unconditional covariances et E [eteT

t ] andx

t E [xtxTt ] the unconditional expectation of the Kt-dependent expression

Methods for Optimal Sensorimotor Control 1095

above becomes

a (Kt) = tr((

(Aminus Kt H)et (Aminus Kt H)T + KtPt K T

t

)Se

t+1

)Pt ω +

sumi

Dixt DT

i

The minimum of a (Kt) is found by setting its derivative with regard to Kt to0 Using the matrix identities part

part X tr (XU) = UT and partpart X tr

(XU XTV

) = VXU +VT XUT and the fact that the matrices Se

t+1 ω e

t xt are symmetric we

obtainparta (Kt)partKt

= 2Set+1

(Kt

(He

t HT + Pt) minus Ae

t HT)

This expression is equal to 0 whenever Kt = Aet HT(He

t HT + Pt)minus1 re-gardless of the value of Se

t+1 Given our assumptions the matrix being in-verted is symmetric positive-definite Note that the optimal Kt dependson K1 Ktminus1 (through e

t and xt ) but is independent of Kt+1 Knminus1

(since it is independent of Set+1) This is the reason that the filter gains are

reoptimized in a forward passTo complete the derivation we have to substitute the optimal filter

gains and compute the unconditional covariances Recall that the variablesxt xt et are deterministically related by et = xt minus xt so the covariance ofany one of them can be computed given the covariances of the other twoand we have a choice of which pair of covariance matrices to computeThe resulting equations are most compact for the pair xt et The stochasticdynamics of these variables are

xt+1 = (Aminus BLt) xt + Kt Het + Ktωt + ηt +sum

i

εit Kt Di (et + xt)

et+1 = (Aminus Kt H)et + ξt minus Ktωt minus ηt minussum

i

εitCi Ltxt (51)

minussum

i

εit Kt Di (et + xt)

Define the unconditional covariances

et E

[eteT

t

] xt E

[xtxT

t

] xet E

[xteT

t

]

noting that xt is uncentered and ex

t = (xet )T Since x1 is a known constant

the initialization at t = 1 is e1 = 1 x

1 = x1xT1 xe

1 = 0 With these defi-nitions we have x

t = E [(et + xt)(et + xt)T] = et + x

t + xet + xeT

t Usingequation 51 the updates for the unconditional covariances are

et+1 = (Aminus Kt H)e

t (Aminus Kt H)T + ξ + η + KtPt K Tt

+sum

i

Ci Ltxt LT

t CTi

1096 E Todorov

xt+1 = (Aminus BLt)x

t (Aminus BLt)T + η + Kt(He

t HT + Pt)K T

t

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T

xet+1 = (Aminus BLt)xe

t (Aminus Kt H)T + Kt Het (Aminus Kt H)T

minus η minus KtPt K Tt

Substituting the optimal value of Kt which allows some simplifications tothe above update equations the optimal nonadaptive linear filter is com-puted in a forward pass through time as

Estimator xt+1 = (Aminus BLt) xt + Kt(yt minus H xt) + ηt

Kt = Aet HT

(He

t HT + ω +sum

i

Di(e

t + xt + xe

t + ext

)DT

i

)minus1

et+1 = ξ + η + (Aminus Kt H)e

t AT +sum

i

Ci Ltxt LT

t CTi e

1 = 1

xt+1 = η + Kt He

t AT + (Aminus BLt)xt (Aminus BLt)T

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T x

1 = x1xT1

xet+1 = (Aminus BLt) xe

t (Aminus Kt H)T minus η xe1 = 0

(52)It is worth noting the effects of the internal noise ηt If that term did not

exist (ie η = 0) the last update equation would yield xet = 0 for all t

Indeed for an optimal filter one would expect xet = 0 from the orthogo-

nality principle if the state estimate and estimation error were correlatedone could improve the filter by taking that correlation into account How-ever the situation here is different because we have noise acting directlyon the state estimate When such noise pushes xt in one direction et is (bydefinition) pushed in the opposite direction creating a negative correlationbetween xt and et This is the reason for the negative sign in front of the η

term in the last update equationThe complete algorithm is the followingAlgorithmInitialize K1 Knminus1 and iterate equation 42 and equation 52 until

convergence Convergence is guaranteed because the expected cost is non-negative by definition and we are using a coordinate-descent algorithmwhich decreases the expected cost in each step The initial sequence K couldbe set to 0mdashin which case the first pass of equation 42 will find the optimalopen-loop controls or initialized from equation 32mdashwhich is equivalent toassuming additive noise in the first pass

We can also derive the optimal adaptive linear filter with gains Kt thatdepend on the specific xt and ut = minusLtxt within each simulation run This isagain accomplished by minimizing E [vt+1] with respect to Kt but the expec-tation is computed with xt being a known constant rather than a randomvariable We now have x

t = xtxTt and xe

t = 0 and so the last two update

Methods for Optimal Sensorimotor Control 1097

equations in equation 52 are no longer needed The optimal adaptive linearfilter is

Adaptive estimator xt+1 = (Aminus BLt) xt + Kt (yt minus H xt) + ηt

Kt = At HT

(Ht HT + ω +

sumi

Di(t + xt xT

t

)DT

i

)minus1

t+1 = ξ + η + (Aminus Kt H) t AT +sum

i

Ci Ltxt xTt LT

t CTi

(53)

where t = Cov [xt| xt] is the conditional estimation error covariance(initialized from 1 which is given) When the control-dependentstate-dependent and internal noise terms are removed (C1 Cc = 0D1 Dd = 0 η = 0) equation 53 reduces to the Kalman filter in equa-tion 32 Note that using equation 53 instead of equation 52 online reducesthe total expected cost because equation 53 achieves lower estimation errorthan any other linear filter and the expected cost depends on the conditionalestimation error covariance This can be seen from

E[vt(xt xt)| xt] = xTt Sx

t xt + st + tr((

Sxt + Se

t

)Cov[xt| xt]

)6 Application to Reaching Movements

We now illustrate how the methodology developed above can be used toconstruct models relevant to motor control Since this is a methodologi-cal rather than a modeling article a detailed evaluation of the resultingmodels in the context of the motor control literature will not be givenhere The first model is a one-dimensional model of reaching and in-cludes control-dependent noise but no state-dependent or internal noiseThe latter two forms of noise are illustrated in the second model wherewe estimate the position of a stationary peripheral target without making amovement

61 Models We model a single-joint movement (such as flexing the el-bow) that brings the hand to a specified target For simplicity the rotationalmotion is replaced with translational motion the hand is modeled as a pointmass (m = 1 kg) whose one-dimensional position at time t is p(t) The com-bined action of all muscles is represented with the force f (t) acting on thehand The control signal u(t) is transformed into force f (t) by adding control-dependent multiplicative noise and applying a second-order muscle-likelow-pass filter (Winter 1990) of the form τ1τ2 f (t) + (τ1 + τ2) f(t) + f (t) =u(t) with time constants τ1 = τ2 = 004 sec Note that a second-order filtercan be written as a pair of coupled first-order filters (with outputs g and f )as follows τ1g(t) + g(t) = u(t) τ2 f (t) + f (t) = g(t)

The task is to move the hand from the starting position p(0) = 0 m to thetarget position plowast = 01 m and stop there at time tend with minimal energy

1098 E Todorov

consumption Movement durations are in the interval tend isin [025 sec035 sec] Time is discretized at = 001 sec The total cost is defined as

(p(tend) minus plowast)2 + (wv p(tend))2 + (w f f (tend))2 + rn minus 1

nminus1sumk=1

u(k)2

The first term enforces positional accuracy the second and third termsspecify that the movement has to stop at time tend that is both the velocityand force have to vanish and the last term penalizes energy consumptionIt makes sense to set the scaling weights wv and w f so that wv p(t) andw f f (t) averaged over the movement have magnitudes similar to the handdisplacement plowast minus p(0) For a 01 m reaching movement that lasts about03 sec these weights are wv = 02 and w f = 002 The weight of the energyterm was set to r = 000001

The discrete-time system state is represented with the five-dimensionalvector

xt = [p(t) p(t) f (t) g(t) plowast]

initialized from a gaussian with mean x1 = [0 0 0 0 plowast] The auxiliarystate variable g(t) is needed to implement a second-order filter The tar-get plowast is included in the state so that we can capture the above cost func-tion using a quadratic with no linear terms defining p = [1 0 0 0 minus1]we have pTxt = p(tend) minus plowast and so xT

t (ppT)xt = (p(tend) minus plowast)2 Note thatthe same could be accomplished by setting p = [1 0 0 0 minusplowast] and xt =[p(t) p(t) f (t) g(t) 1] The advantage of the formulation used here is thatbecause the target is represented in the state the same control law can bereused for other targets The control law of course depends on the filterwhich depends on the initial expected state which depends on the targetmdashand so a control law optimal for one target is not necessarily optimal for allother targets Unpublished simulation results indicate good generalizationbut a more detailed investigation of how the optimal control law dependson the target position is needed

The sensory feedback carries information about position velocity andforce

yt = [p(t) p(t) f (t)] + ωt

The vector ωt of sensory noise terms has zero-mean gaussian distributionwith diagonal covariance

ω = (σsdiag[002 m 02 ms 1 N])2

where the relative magnitudes are set using the same order-of-magnitudereasoning as before and σs = 05 The multiplicative noise term added tothe discrete-time control signal ut = u(t) is σcεtut where σc = 05 Note that

Methods for Optimal Sensorimotor Control 1099

σc is a unitless quantity that defines the noise magnitude relative to thecontrol signal magnitude

The discrete-time dynamics of the above system are

p(t + ) = p(t) + p(t)

p(t + ) = p(t) + f (t)m

f (t + ) = f (t) (1 minus τ2) + g(t)τ2

g(t + ) = g(t) (1 minus τ1) + u(t) (1 + σcεt) τ1

which is transformed into the form of equation 31 by the matrices

A =

1 0 0 0

0 1 m 0 0

0 0 1 minus τ2 τ2 0

0 0 0 1 minus τ1 0

0 0 0 0 1

B =

0

0

0

τ1

0

H =

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

C1 = Bσc c = 1 d = 0

1 = ξ = η = 0

The cost matrices are R = r Q1nminus1 = 0 and Qn = ppT + vvT + ffT where

p = [1 0 0 0 minus1] v = [0 wv 0 0 0] f = [0 0 w f 0 0]

This completes the formulation of the first model The above algorithmcan now be applied to obtain the control law and filter and the closed-loopsystem can be simulated To replace the control-dependent noise with ad-ditive noise of similar magnitude (and compare the effects of the two formsof noise) we will set c = 0 and ξ = (46 N)2 B BT The value of 46 N is theaverage magnitude of the control-dependent noise over the range of move-ment durations (found through 10000 simulation runs at each movementduration)

We also model an estimation process under state-dependent and inter-nal noise in the absence of movement In that case the state is xt = plowastwhere the stationary target plowast is sampled from a gaussian with mean x1 isin5 cm 15 cm 25 cm and variance 1 = (5 cm)2 Note that target eccentric-ity is represented as distance rather than visual angle The state-dependentnoise has scale D1 = 05 fixation is assumed to be at 0 cm the time step is = 10 msec and we run the estimation process for n = 100 time steps Inone set of simulations we use internal noise η = (05 cm)2 without addi-tive noise In another set of simulations we study additive noise with thesame magnitude ω = (05 cm)2 without internal noise There is no actu-ator to be controlled so we have A = H = 1 and B = L = 0 Estimation isbased on the adaptive filter from equation 53

1100 E Todorov

62 Results Reaching movements are known to have stereotyped bell-shaped speed profiles (Flash amp Hogan 1985) Models of this phenomenonhave traditionally been formulated in terms of deterministic open-loop min-imization of some cost function Cost functions that penalize physicallymeaningful quantities (such as duration or energy consumption) did notagree with empirical data (Nelson 1983) in order to obtain realistic speedprofiles it appeared necessary to minimize a smoothness-related cost thatpenalizes the derivative of acceleration (Flash amp Hogan 1985) or torque(Uno et al 1989) Smoothness-related cost functions have also been usedin the context of stochastic optimal feedback control (Hoff 1992) to obtainbell-shaped speed profiles It was recently shown however that smoothnessdoes not have to be explicitly enforced by the cost function open-loop min-imization of end-point error was found sufficient to produce realistic trajec-tories provided that the multiplicative nature of motor noise is taken intoaccount (Harris amp Wolpert 1998) While this is an important step toward amore principled optimization model of trajectory smoothness it still con-tains an ad hoc element the optimization is performed in an open loopwhich is suboptimal especially for movements of longer duration Ourmodel differs from Harris and Wolpert (1998) in that not only the averagesequence of control signals is optimal but the feedback gains that determinethe online sensory-guided adjustments are also optimal Optimal feedbackcontrol of reaching has been studied by Meyer Abrams Kornblum Wrightand Smith (1988) in an intermittent setting and Hoff (1992) in a continu-ous setting However both of these models assume full state observationOurs is the first optimal control model of reaching that incorporates sensorynoise and combines state estimation and feedback control into an optimalsensorimotor loop The predicted movement kinematics shown in Figure 1Aclosely resemble observed movement trajectories (Flash amp Hogan 1985)

Another well-known property of reaching movements first observed acentury ago by Woodworth and later quantified as Fittsrsquo law is the trade-offbetween speed and accuracy The fact that faster movements are less ac-curate implies that the instantaneous noise in the motor system is control-dependent in agreement with direct measurements of isometric force fluc-tuations (Sutton and Sykes 1967 Schmidt et al 1979 Todorov 2002) thatshow standard deviation increasing linearly with the mean Naturally thisnoise scaling has formed the basis of both closed-loop (Meyer et al 1988Hoff 1992) and open-loop (Harris amp Wolpert 1998) optimization models ofthe speed-accuracy trade-off Figure 1B illustrates the effect in our model asthe (specified) movement duration increases the standard deviation of theend-point error achieved by the optimal controller decreases To emphasizethe need for incorporating control-dependent noise we modified the modelby making the noise in the plant dynamics additive with fixed magnitudechosen to match the average multiplicative noise magnitude over the rangeof movement durations With that change the end-point error showed theopposite trend to the one observed experimentally (see Figure 1B)

Methods for Optimal Sensorimotor Control 1101

Figure 1 (A) Normalized position (Pos) velocity (Vel) and acceleration (Acc) ofthe average trajectory of the optimal controller (B) A separate optimal controllerwas constructed for each instructed duration the resulting closed-loop systemwas simulated for 10000 trials and the positional standard deviation at the endof the trial was plotted This was done with either multiplicative (solid line)or additive (dashed line) noise in the plant dynamics (C) The position of astationary peripheral target was estimated over time under internal estimationnoise (solid line) or additive observation noise (dashed line) This was done inthree sets of trials with target positions sampled from gaussians with means5 cm (bottom) 15 cm (middle) and 25 cm (top) Each curve is an average over10000 simulation runs

It is interesting to compare the effects of the control penalty r and the mul-tiplicative noise scaling σc As equation 42 shows both terms penalize largecontrol signalsmdashdirectly in the case of r and indirectly (via increased un-certainty) in the case of σc Consequently both terms lead to a negative biasin end-point position (not shown) but the effect is much more pronouncedfor r Another consequence of the fact that larger controls are more costlyarises in the control of redundant systems where the optimal strategy is tofollow a minimal intervention principle that is to leave task-irrelevant de-viations from the average behavior uncorrected (Todorov amp Jordan 2002a2002b) Simulations have shown that this more complex effect is dependenton σc and actually decreases when r is increased while σc is kept constant(Todorov amp Jordan 2002b)

Figure 1C shows simulation results from our second model where the po-sition of a stationary peripheral target is estimated by the optimal adaptivefilter in equation 53 operating under internal estimation noise or additiveobservation noise of the same magnitude In each case we show results forthree sets of targets with varying average eccentricity The standard devi-ations of the estimation error always reach an asymptote (much faster inthe case of internal noise) In the presence of internal noise this asymptotedepends on target eccentricity for the chosen model parameters the depen-dence is in quantitative agreement with our experimental results (Todorov1998) Under additive noise the error always asymptotes to 0

1102 E Todorov

Figure 2 Relative change in expected cost as a function of iteration numberin (A) psychophysical models and (B) random models (C) Relative variabil-ity (SDmean) among expected costs obtained from 100 different runs of thealgorithm on the same model (average over models in each class)

7 Convergence Properties

We studied the convergence properties of the algorithm in 10 models of psy-chophysical experiments taken from Todorov and Jordan (2002b) and 200randomly generated models The psychophysical models had dynamics andcost functions similar to the above example They included two models ofplanar reaching three models of passing through sequences of targets onemodel of isometric force production three models of tracking and reach-ing with a mechanically redundant arm and one model of throwing Thedimensionalities of the state control and feedback were between 5 and20 and the horizon n was about 100 The psychophysical models includedcontrol-dependent dynamics noise and additive observation noise but nointernal or state-dependent noise The details of all these models are inter-esting from a motor control point of view but we omit them here since theydid not affect the convergence of the algorithm in any systematic way

The random models were divided into two groups of 100 each passivelystable with all eigenvalues of A being smaller than 1 and passively unsta-ble with the largest eigenvalue of A being between 1 and 2 The dynamicswere restricted so that the last component of xt was 1mdashto make the randommodels more similar to the psychophysical models which always incorpo-rated a constant in the state description The state control and measurementdimensionalities were sampled uniformly between 5 and 20 The randommodels included all forms of noise allowed by equation 31

For each model we initialized K1nminus1 from equation 32 and applied ouriterative algorithm In all cases convergence was very rapid (see Figures 2Aand 2B) with the relative change in expected cost decreasing exponentiallyThe jitter observed at the end of the minimization (see Figure 2A) is due tonumerical round-off errors (note the log scale) and continues indefinitelyThe exponential convergence regime does not always start from the firstiteration (see Figure 2A) Similar behavior was observed for the absolute

Methods for Optimal Sensorimotor Control 1103

change in expected cost (not shown) As one would expect random mod-els with unstable passive dynamics converged more slowly than passivelystable models Convergence was observed in all cases

To test for the existence of local minima we focused on five psychophys-ical five random stable and five random unstable models For each modelthe algorithm was initialized 100 times with different randomly chosensequences K1nminus1 and run for 100 iterations For each model we com-puted the standard deviation of the expected cost obtained at each iterationand divided by the mean expected cost at that iteration The results aver-aged within each model class are plotted in Figure 2C The negligibly smallvalues after convergence indicate that the algorithm always finds the samesolution This was true for every model we studied despite the fact thatthe random initialization sometimes produced very large initial costs Wealso examined the K and L sequences found at the end of each run and thedifferences seemed to be due to round-off errors Thus we conjecture thatthe algorithm always converges to the globally optimal solution So far wehave not been able to prove this analytically and cannot offer a satisfyingintuitive explanation at this time

Note that the system can be unstable even for the optimal controllerFormally that does not affect the derivation because in a discrete-time finite-horizon system all numbers remain finite In practice the components of xt

can exceed the maximum floating-point number whenever the eigenvaluesof (Aminus BLt) are sufficiently large In the applications we are interestedin (Todorov 1998 Todorov amp Jordan 2002b) such problems were neverencountered

8 Improving Performance via Adaptive Estimation

Although the iterative algorithm given by equations 42 and 52 is guar-anteed to converge and empirically it appears to converge to the globallyoptimal solution performance can still be suboptimal due to the imposedrestriction to nonadaptive filters Here we present simulations aimed atquantifying this suboptimality

Because the potential suboptimality arises from the restriction to non-adaptive filters it is natural to ask what would happen if that restriction wereremoved in run time and the optimal adaptive linear filter from equation 53were used instead Recall that although the control law is optimized underthe assumption of a nonadaptive filter it yields better performance if a dif-ferent filter which somehow achieves lower estimation error is used in runtime Thus in our first test we simply replace the nonadaptive filter withequation 53 in run time and compute the reduction in expected total cost

The above discussion suggests a possibility for further improvement Thecontrol law is optimal with respect to some sequence of filter gains K1nminus1But the adaptive filter applied in run time uses systematically differentgains because it achieves systematically lower estimation error We can run

1104 E Todorov

Table 2 Cost Reduction

Model

Method Psychophysical Random Stable Random Unstable

Adaptive Estimator 19 0 314 Reoptimized Controller 17 0 283

Notes Numbers indicate percent reduction in expected total cost relative to thecost of the solution found by our iterative algorithm The two improvement meth-ods are described in the text Each method is applied to 10 models in each modelclass For each model and method expected total cost is computed from 10000simulation runs A value of 0 indicates that with a sample size of 10 models theimprovement was not significantly different from 0 (t-test p = 005 threshold)

our control law in conjunction with the adaptive filter and find the averagefilter gains K1nminus1 that are used online Now one would think that if wereoptimized the control law for the nonadaptive filter K1nminus1 which betterreflects the gains being used online by the adaptive filter this will furtherimprove performance This is the second test we apply

As Table 2 shows neither method improves performance substantiallyfor psychophysical models or random stable models However both meth-ods result in substantial improvement for random unstable models This isnot surprising In the passively stable models the differences between theexpected and actual values of the states and controls are relatively smalland so the optimal nonadaptive filter is not that different from the optimaladaptive filter The unstable models on the other hand are very sensitive tosmall perturbations and thus follow substantially different state-control tra-jectories in different simulation runs So the advantage of adaptive filteringis much greater Since musculoskeletal plants have stable passive dynamicswe conclude that our algorithm is well suited for approximating the optimalsensorimotor system

It is interesting that control law reoptimization in addition to adaptivefiltering is actually worse than adaptive filtering alonemdashcontrary to our in-tuition This was the case for every model we studied Although it is notclear where the problem with the reoptimization method lies this some-what unexpected result provides further justification for the restriction weintroduced In particular it suggests that the control law that is optimalunder the best nonadaptive filter may be close to optimal under the bestadaptive filter

9 Discussion

We have presented an algorithm for stochastic optimal control and estima-tion of partially observable linear dynamical systems subject to quadraticcosts and noise processes characteristic of the sensorimotor system (see

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 5: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

1088 E Todorov

2 Noise Characteristics of the Sensorimotor System

Noise in the motor output is not additive but instead increases with themagnitude of the control signals This is intuitively obvious if you restyour arm on the table it does not bounce around (ie the passive plantdynamics have little noise) but when you make a movement (ie generatecontrol signals) the outcome is not always as desired Quantitatively therelationship between motor noise and control magnitude is surprisinglysimple Such noise has been found to be multiplicative the standard de-viation of muscle force is well fit with a linear function of the mean forcein both static (Sutton amp Sykes 1967 Todorov 2002) and dynamic (SchmidtZelaznick Hawkins Frank amp Quinn 1979) isometric force tasks The exactreasons for this dependence are not entirely clear although it can be ex-plained at least in part with Poisson noise on the neural level combined withHennemanrsquos size principle of motoneuron recruitment (Jones Hamilton ampWolpert 2002) To formalize the empirically established dependence let ube a vector of control signals (corresponding to the muscle activation levelsthat the nervous system attempts to set) and ε be a vector of zero-mean ran-dom numbers A general multiplicative noise model takes the form C(u)εwhere C(u) is a matrix whose elements depend linearly on u To expressa linear relationship between a vector u and a matrix C we make the ithcolumn of C equal to Ci u where Ci are constant scaling matrices Thenwe have C(u)ε = sum

i Ci uεi where εi is the ith component of the randomvector ε

Online movement control relies on feedback from a variety of sensorymodalities with vision and proprioception typically playing the dominantrole Visual noise obviously depends on the retinal position of the objectsof interest and increases with distance away from the fovea (ie eccen-tricity) The accuracy of visual positional estimates is again surprisinglywell modeled with multiplicative noise whose standard deviation is pro-portional to eccentricity This is an instantiation of Weberrsquos law and hasbeen found to be quite robust in a variety of interval discrimination ex-periments (Burbeck amp Yap 1990 Whitaker amp Latham 1997) We have alsoconfirmed this scaling law in a visuomotor setting where subjects pointedto memorized targets presented in the visual periphery (Todorov 1998)Such results motivate the use of a multiplicative observation noise modelof the form D (x)ε = sum

i Di xεi where x is the state of the plant and environ-ment including the current fixation point and the positions and velocities ofrelevant objects Incorporating state-dependent noise in analyses of senso-rimotor control can allow more accurate modeling of the effects of feedbackand various experimental perturbations it also can effectively induce a costfunction over eye movement patterns and allow us to predict the eye move-ments that would result in optimal hand performance (Todorov 1998) Notethat if other forms of state-dependent sensory noise are found the modelcan still be useful as a linear approximation

Methods for Optimal Sensorimotor Control 1089

Intelligent control of a partially observable stochastic plant requiresa feedback control law which is typically a function of a state estimatethat is computed recursively over time In engineering applications theestimation-control loop is implemented in a noiseless digital computer andso all noise is external In models of biological movement we usually makethe same assumption treating all noise as being a property of the muscu-loskeletal plant or the sensory apparatus This is in principle unrealisticbecause neural representations are likely subject to internal fluctuationsthat do not arise in the periphery It is also unrealistic in modeling practiceAn ideal observer model predicts that the estimation error covariance ofany stationary feature of the environment will asymptote to 0 In partic-ular such models predict that if we view a stationary object in the visualperiphery long enough we should eventually know exactly where it is andbe able to reach for it as accurately as if it were at the center of fixation Thiscontradicts our intuition as well as experimental data Both interval dis-crimination experiments and reaching to remembered peripheral targetsexperiments indicate that estimation errors asymptote rather quickly butnot to 0 Instead the asymptote level depends linearly on eccentricity Thesimplest way to model this is to assume another noise process which wecall internal noise acting directly on whatever state estimate the nervoussystem chooses to compute

3 Problem Statement and Assumptions

Consider a linear dynamical system with state xt isin Rm control ut isin R

pfeedback yt isin R

k in discrete time t

Dynamics xt+1 = Axt + But + ξt +csum

i=1

εitCi ut

Feedback yt = Hxt + ωt +dsum

i=1

εit Di xt

Cost per step xTt Qtxt + uT

t Rut

(31)

The feedback signal yt is received after the control signal ut has been gen-erated The initial state has known mean x1 and covariance 1 All matricesare known and have compatible dimensions making them time varyingis straightforward The control cost matrix R is symmetric positive defi-nite (R gt 0) and the state cost matrices Q1 Qn are symmetric positivesemidefinite (Qt ge 0) Each movement lasts n time steps at t = n the finalcost is xT

n Qnxn and un is undefined The independent random variablesξt isin R

m ωt isin Rk εt isin R

c and εt isin Rd have multidimensional gaussian dis-

tributions with mean 0 and covariances ξ ge 0 ω gt 0 ε = I and ε = Irespectively Thus the control-dependent and state-dependent noise termsin equation 31 have covariances

sumi Ci utuT

t CTi and

sumi Di xtxT

t DTi When the

1090 E Todorov

control-dependent noise is meant to be added to the control signal (which isusually the case) the matrices Ci should have the form B Fi where Fi are theactual noise scaling factors Then the control-dependent part of the plantdynamics becomes B(I + sum

i εit Fi )ut

The problem of optimal control is to find the optimal control law thatis the sequence of causal control functions ut(u1 utminus1 y1 ytminus1) thatminimize the expected total cost over the movement Note that computingthe optimal sequence of functions u1(middot) unminus1(middot) is a different and ingeneral much more difficult problem than computing the optimal sequenceof open-loop controls u1 unminus1

When only additive noise is present (ie C1 Cc = 0 and D1 Dd =0) this reduces to the classic LQG problem which has the well-knownoptimal solution (Davis amp Vinter 1985)

Linear-Quadratic Regulator Kalman Filterut = minusLtxt xt+1 = A xt + But + Kt (yt minus H xt)

Lt = (R + BTSt+1 B)minus1 BTSt+1 A Kt = At HT(Ht HT + ω)minus1

St = Qt + ATSt+1(Aminus BLt) t+1 = ξ + (Aminus Kt H) t AT

(32)

In that case the optimal control law depends on the history of control andfeedback signals only through the state estimate xt which is updated recur-sively by the Kalman filter The matrices L that define the optimal controllaw do not depend on the noise covariances or filter coefficients and thematrices K that define the optimal filter do not depend on the cost andcontrol law

In the case of control-dependent and state-dependent noise the aboveindependence properties no longer hold This complicates the problem sub-stantially and forces us to adopt a more restricted formulation in the interestof analytical tractability We assume that as in equation 32 the entire his-tory of control and feedback signals is summarized by a state estimate xt which is all the information available to the control system at time t Thefeedback control law ut(middot) is allowed to be an arbitrary function of xt butxt can be updated only by a recursive linear filter of the form

xt+1 = Axt + But + Kt(yt minus H xt) + ηt

The internal noise ηt isin Rm has mean 0 and covariance η ge 0 The fil-

ter gains K1 Knminus1 are nonadaptive they are determined in advanceand cannot change as a function of the specific controls and observa-tions within a simulation run Such a filter is always unbiased for anyK1 Knminus1 we have E [xt| xt] = xt for all t Note however that underthe extended noise model any nonadaptive linear filter is suboptimalwhen xt is computed as defined above Cov [xt| xt] is generally larger thanCov [xt|u1 utminus1 y1 ytminus1] The consequences of this will be explorednumerically in section 8

Methods for Optimal Sensorimotor Control 1091

4 Optimal Controller

The optimal ut will be computed using the method of dynamic program-ming We will show by induction that if the true state at time t is xt andthe unbiased state estimate available to the control system is xt then theoptimal cost-to-go function (ie the cost expected to accumulate under theoptimal control law) has the quadratic form

vt(xt xt) = xTt Sx

t xt + (xt minus xt)TSet (xt minus xt) + st = xT

t Sxt xt + eT

t Set et + st

where et xt minus xt is the estimation error At the final time t = n the optimalcost-to-go is simply the final cost xT

n Qnxn and so vn is in the assumed formwith Sx

n = Qn Sen = 0 sn = 0 To carry out the induction proof we have to

show that if vt+1 is in the above form for some t lt n then vt is also in thatform

Consider a time-varying control law that is optimal at times t + 1 nand at time t is given by ut = π ( xt) Let vπ

t (xt xt) be the correspondingcost-to-go function Since this control law is optimal after time t we havevπ

t+1 = vt+1 Then the cost-to-go function vπt satisfies the Bellman equation

vπt (xt xt) = xT

t Qtxt + π ( xt)T Rπ ( xt) + E [vt+1(xt+1 xt+1)|xt xt π ]

To compute the above expectation term we need the update equationsfor the system variables Using the definitions of the observation yt andthe estimation error et the stochastic dynamics of the variables of interestbecome

xt+1 = Axt + Bπ ( xt) + ξt +sum

i

εitCiπ ( xt)

et+1 = (Aminus Kt H)et + ξt minus Ktωt minus ηt +sum

i

εitCiπ ( xt) minus

sumi

εit Kt Di xt

(41)

Then the conditional means and covariances of xt+1 and et+1 are

E [xt+1|xt xt π ] = Axt + Bπ ( xt)

E [et+1|xt xt π ] = (Aminus Kt H)et

Cov [xt+1|xt xt π ] = ξ +sum

i

Ciπ ( xt)π ( xt)TCTi

Cov [et+1|xt xt π ] = ξ +sum

i

Ciπ ( xt)π ( xt)TCTi + η

+ Ktω K T

t +sum

i

Kt Di xtxTt DT

i K Tt

1092 E Todorov

and the conditional expectation in the Bellman equation can be computedThe cost-to-go becomes

vπt (xt xt) = xT

t

(Qt + ATSx

t+1 A+ Dt)xt

+ eTt (Aminus Kt H)TSe

t+1(Aminus Kt H)et

+ tr (Mt) + π ( xt)T (R + BTSx

t+1 B + Ct)π ( xt)

+ 2π ( xt)T BTSxt+1 Axt

where we defined the shortcuts

Ct sum

i

CTi

(Se

t+1 + Sxt+1

)Ci

Dt sum

i

DTi K T

t Set+1 Kt Di and

Mt Sxt+1

ξ + Set+1

(ξ + η + Kt

ω K Tt

)

Note that the control law affects only the cost-go-to function through anexpression that is quadratic in π ( xt) which can be minimized analyticallyBut there is a problem the minimum depends on xt while π is only allowedto be a function of xt To obtain the optimal control law at time t we haveto take an expectation over xt conditional on xt and find the function π

that minimizes the resulting expression Note that the control-dependentexpression is linear in xt and so its expectation depends on the conditionalmean of xt but not on any higher moments Since E [xt| xt] = xt we have

E[vπ

t (xt xt)| xt] = const + π ( xt)T

(R + BTSx

t+1 B + Ct)π ( xt)

+ 2π ( xt)T BTSxt+1 A xt

and thus the optimal control law at time t is

ut = π ( xt) = minusLtxt Lt (R + BTSx

t+1 B + Ct)minus1

BTSxt+1 A

Note that the linear form of the optimal control law fell out of the opti-mization and was not assumed Given our assumptions the matrix beinginverted is symmetric positive-definite

To complete the induction proof we have to compute the optimal cost-to-go vt which is equal to vπ

t when π is set to the optimal control law minusLtxt Using the fact that LT

t (R + BTSxt+1 B + Ct)Lt = LT

t BTSxt+1 A = ATSx

t+1 BLt andthat xT Z x minus 2 xT Zx = (x minus x )T Z(x minus x ) minus xT Zx = eT Ze minus xT Zx for a sym-metric matrix Z (in our case equal to LT

t BTSxt+1 A) the result is

vt(xt xt) = xTt

(Qt + ATSx

t+1(Aminus BLt) + Dt)xt + tr (Mt) + st+1

+ eTt

(ATSx

t+1 BLt + (Aminus Kt H)TSet+1(Aminus Kt H)

)et

Methods for Optimal Sensorimotor Control 1093

We now see that the optimal cost-to-go function remains in the assumedquadratic form which completes the induction proof The optimal controllaw is computed recursively backward in time as

Controller ut = minusLtxt

Lt =(

R + BTSxt+1 B +

sumi

CTi

(Sx

t+1 + Set+1

)Ci

)minus1

BTSxt+1 A

Sxt = Qt + ATSx

t+1(Aminus BLt) +sum

i

DTi K T

t Set+1 Kt Di Sx

n = Qn

Set = ATSx

t+1 BLt + (Aminus Kt H)T Set+1 (Aminus Kt H) Se

n = 0

st = tr(Sx

t+1ξ + Se

t+1

(ξ + η + Kt

ω K Tt

)) + st+1 sn = 0

(42)

The total expected cost is x T1 Sx

1 x1 + tr((Sx1 + Se

1 )1) + s1When the control-dependent and state-dependent noise terms are re-

moved (ie C1 Cc = 0 D1 Dd = 0) the control laws given byequation 42 and 32 are identical The internal noise term η as well asthe additive noise terms ξ and ω do not directly affect the calculation ofthe feedback gain matrices L However all noise terms affect the calculation(see below) of the optimal filter gains K which in turn affect L

One can attempt to transform equation 31 into a fully observable systemby setting H = I ω = η = 0 D1 Dd = 0 in which case K = A andapply equation 42 Recall however our assumption that the control signalis generated before the current state is measured Thus even if we makethe sensory measurement equal to the state we would still be dealing witha partially observable system To derive the optimal controller for the fullyobservable case we have to assume that xt is known at the time when ut isgenerated The above derivation is now much simplified the optimal cost-to-go function vt is in the form xT

t Stxt + st and the expectation term thatneeds to be minimized with regard to ut = π (xt) becomes

E [vt+1] = (Axt + But)TSt+1(Axt + But)

+ uTt

(sumi

CTi St+1Ci

)ut + tr [St+1

ξ ] + st+1

and the optimal controller is computed in a backward pass through time as

Fully observable controller ut = minusLtxt

Lt =(

R + BTSt+1 B +sum

i

CTi St+1Ci

)minus1

BTSt+1 A

St = Qt + ATSt+1(Aminus BLt) Sn = Qn

st = tr (St+1ξ ) + st+1 sn = 0

(43)

1094 E Todorov

5 Optimal Estimator

So far we have computed the optimal control law L for any fixed sequenceof filter gains K What should these gains be fixed to Ideally they shouldcorrespond to a Kalman filter which is the optimal linear estimator How-ever in the presence of control-dependent and state-dependent noise theKalman filter gains become adaptive (ie Kt depends on xt and ut) whichwould make our control law derivation invalid Thus if we want to preservethe optimality of the control law given by equation 42 and obtain an iter-ative algorithm with guaranteed convergence we need to compute a fixedsequence of filter gains that are optimal for a given control law Once theiterative algorithm has converged and the control law has been designedwe could use an adaptive filter in place of the fixed-gain filter in run time(see section 8)

Thus our objective here is the following given a linear feedback controllaw L1 Lnminus1 (which is optimal for the previous filter K1 Knminus1)compute a new filter that in conjunction with the given control law resultsin minimal expected cost In other words we will evaluate the filter not bythe magnitude of its estimation errors but by the effect that these estimationerrors have on the performance of the composite estimation-control system

We will show that the new optimal filter can be designed in a forwardpass through time In particular we will show that regardless of the newvalues of K1 Ktminus1 the optimal Kt can be found analytically as long asKt+1 Knminus1 still have the values for which Lt+1 Lnminus1 are optimalRecall that the optimal Lt+1 Lnminus1 depend only on Kt+1 Knminus1 andso the parameters (as well as the form) of the optimal cost-to-go functionvt+1 cannot be affected by changing K1 Kt Since Kt affects only thecomputation of xt+1 and the effect of xt+1 on the total expected cost is cap-tured by the function vt+1 we have to minimize vt+1 with respect to Kt But v is a function of x and x while K cannot be adapted to the specificvalues of x and x within a simulation run (by assumption) Thus thequantity we have to minimize is the unconditional expectation of vt+1 Indoing so we will use that fact that

E [vt+1(xt+1 xt+1)] = Ext xt [E [vt+1(xt+1 xt+1)|xt xt Lt]]

The conditional expectation was already computed as an intermediate stepin the previous section (not shown) The terms in E [vt+1(xt+1 xt+1)|xt xt Lt]that depend on Kt are

eTt (Aminus Kt H)TSe

t+1(Aminus Kt H)et + tr

(Kt

(ω +

sumi

Di xtxTt DT

i

)K T

t Set+1

)

Defining the (uncentered) unconditional covariances et E [eteT

t ] andx

t E [xtxTt ] the unconditional expectation of the Kt-dependent expression

Methods for Optimal Sensorimotor Control 1095

above becomes

a (Kt) = tr((

(Aminus Kt H)et (Aminus Kt H)T + KtPt K T

t

)Se

t+1

)Pt ω +

sumi

Dixt DT

i

The minimum of a (Kt) is found by setting its derivative with regard to Kt to0 Using the matrix identities part

part X tr (XU) = UT and partpart X tr

(XU XTV

) = VXU +VT XUT and the fact that the matrices Se

t+1 ω e

t xt are symmetric we

obtainparta (Kt)partKt

= 2Set+1

(Kt

(He

t HT + Pt) minus Ae

t HT)

This expression is equal to 0 whenever Kt = Aet HT(He

t HT + Pt)minus1 re-gardless of the value of Se

t+1 Given our assumptions the matrix being in-verted is symmetric positive-definite Note that the optimal Kt dependson K1 Ktminus1 (through e

t and xt ) but is independent of Kt+1 Knminus1

(since it is independent of Set+1) This is the reason that the filter gains are

reoptimized in a forward passTo complete the derivation we have to substitute the optimal filter

gains and compute the unconditional covariances Recall that the variablesxt xt et are deterministically related by et = xt minus xt so the covariance ofany one of them can be computed given the covariances of the other twoand we have a choice of which pair of covariance matrices to computeThe resulting equations are most compact for the pair xt et The stochasticdynamics of these variables are

xt+1 = (Aminus BLt) xt + Kt Het + Ktωt + ηt +sum

i

εit Kt Di (et + xt)

et+1 = (Aminus Kt H)et + ξt minus Ktωt minus ηt minussum

i

εitCi Ltxt (51)

minussum

i

εit Kt Di (et + xt)

Define the unconditional covariances

et E

[eteT

t

] xt E

[xtxT

t

] xet E

[xteT

t

]

noting that xt is uncentered and ex

t = (xet )T Since x1 is a known constant

the initialization at t = 1 is e1 = 1 x

1 = x1xT1 xe

1 = 0 With these defi-nitions we have x

t = E [(et + xt)(et + xt)T] = et + x

t + xet + xeT

t Usingequation 51 the updates for the unconditional covariances are

et+1 = (Aminus Kt H)e

t (Aminus Kt H)T + ξ + η + KtPt K Tt

+sum

i

Ci Ltxt LT

t CTi

1096 E Todorov

xt+1 = (Aminus BLt)x

t (Aminus BLt)T + η + Kt(He

t HT + Pt)K T

t

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T

xet+1 = (Aminus BLt)xe

t (Aminus Kt H)T + Kt Het (Aminus Kt H)T

minus η minus KtPt K Tt

Substituting the optimal value of Kt which allows some simplifications tothe above update equations the optimal nonadaptive linear filter is com-puted in a forward pass through time as

Estimator xt+1 = (Aminus BLt) xt + Kt(yt minus H xt) + ηt

Kt = Aet HT

(He

t HT + ω +sum

i

Di(e

t + xt + xe

t + ext

)DT

i

)minus1

et+1 = ξ + η + (Aminus Kt H)e

t AT +sum

i

Ci Ltxt LT

t CTi e

1 = 1

xt+1 = η + Kt He

t AT + (Aminus BLt)xt (Aminus BLt)T

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T x

1 = x1xT1

xet+1 = (Aminus BLt) xe

t (Aminus Kt H)T minus η xe1 = 0

(52)It is worth noting the effects of the internal noise ηt If that term did not

exist (ie η = 0) the last update equation would yield xet = 0 for all t

Indeed for an optimal filter one would expect xet = 0 from the orthogo-

nality principle if the state estimate and estimation error were correlatedone could improve the filter by taking that correlation into account How-ever the situation here is different because we have noise acting directlyon the state estimate When such noise pushes xt in one direction et is (bydefinition) pushed in the opposite direction creating a negative correlationbetween xt and et This is the reason for the negative sign in front of the η

term in the last update equationThe complete algorithm is the followingAlgorithmInitialize K1 Knminus1 and iterate equation 42 and equation 52 until

convergence Convergence is guaranteed because the expected cost is non-negative by definition and we are using a coordinate-descent algorithmwhich decreases the expected cost in each step The initial sequence K couldbe set to 0mdashin which case the first pass of equation 42 will find the optimalopen-loop controls or initialized from equation 32mdashwhich is equivalent toassuming additive noise in the first pass

We can also derive the optimal adaptive linear filter with gains Kt thatdepend on the specific xt and ut = minusLtxt within each simulation run This isagain accomplished by minimizing E [vt+1] with respect to Kt but the expec-tation is computed with xt being a known constant rather than a randomvariable We now have x

t = xtxTt and xe

t = 0 and so the last two update

Methods for Optimal Sensorimotor Control 1097

equations in equation 52 are no longer needed The optimal adaptive linearfilter is

Adaptive estimator xt+1 = (Aminus BLt) xt + Kt (yt minus H xt) + ηt

Kt = At HT

(Ht HT + ω +

sumi

Di(t + xt xT

t

)DT

i

)minus1

t+1 = ξ + η + (Aminus Kt H) t AT +sum

i

Ci Ltxt xTt LT

t CTi

(53)

where t = Cov [xt| xt] is the conditional estimation error covariance(initialized from 1 which is given) When the control-dependentstate-dependent and internal noise terms are removed (C1 Cc = 0D1 Dd = 0 η = 0) equation 53 reduces to the Kalman filter in equa-tion 32 Note that using equation 53 instead of equation 52 online reducesthe total expected cost because equation 53 achieves lower estimation errorthan any other linear filter and the expected cost depends on the conditionalestimation error covariance This can be seen from

E[vt(xt xt)| xt] = xTt Sx

t xt + st + tr((

Sxt + Se

t

)Cov[xt| xt]

)6 Application to Reaching Movements

We now illustrate how the methodology developed above can be used toconstruct models relevant to motor control Since this is a methodologi-cal rather than a modeling article a detailed evaluation of the resultingmodels in the context of the motor control literature will not be givenhere The first model is a one-dimensional model of reaching and in-cludes control-dependent noise but no state-dependent or internal noiseThe latter two forms of noise are illustrated in the second model wherewe estimate the position of a stationary peripheral target without making amovement

61 Models We model a single-joint movement (such as flexing the el-bow) that brings the hand to a specified target For simplicity the rotationalmotion is replaced with translational motion the hand is modeled as a pointmass (m = 1 kg) whose one-dimensional position at time t is p(t) The com-bined action of all muscles is represented with the force f (t) acting on thehand The control signal u(t) is transformed into force f (t) by adding control-dependent multiplicative noise and applying a second-order muscle-likelow-pass filter (Winter 1990) of the form τ1τ2 f (t) + (τ1 + τ2) f(t) + f (t) =u(t) with time constants τ1 = τ2 = 004 sec Note that a second-order filtercan be written as a pair of coupled first-order filters (with outputs g and f )as follows τ1g(t) + g(t) = u(t) τ2 f (t) + f (t) = g(t)

The task is to move the hand from the starting position p(0) = 0 m to thetarget position plowast = 01 m and stop there at time tend with minimal energy

1098 E Todorov

consumption Movement durations are in the interval tend isin [025 sec035 sec] Time is discretized at = 001 sec The total cost is defined as

(p(tend) minus plowast)2 + (wv p(tend))2 + (w f f (tend))2 + rn minus 1

nminus1sumk=1

u(k)2

The first term enforces positional accuracy the second and third termsspecify that the movement has to stop at time tend that is both the velocityand force have to vanish and the last term penalizes energy consumptionIt makes sense to set the scaling weights wv and w f so that wv p(t) andw f f (t) averaged over the movement have magnitudes similar to the handdisplacement plowast minus p(0) For a 01 m reaching movement that lasts about03 sec these weights are wv = 02 and w f = 002 The weight of the energyterm was set to r = 000001

The discrete-time system state is represented with the five-dimensionalvector

xt = [p(t) p(t) f (t) g(t) plowast]

initialized from a gaussian with mean x1 = [0 0 0 0 plowast] The auxiliarystate variable g(t) is needed to implement a second-order filter The tar-get plowast is included in the state so that we can capture the above cost func-tion using a quadratic with no linear terms defining p = [1 0 0 0 minus1]we have pTxt = p(tend) minus plowast and so xT

t (ppT)xt = (p(tend) minus plowast)2 Note thatthe same could be accomplished by setting p = [1 0 0 0 minusplowast] and xt =[p(t) p(t) f (t) g(t) 1] The advantage of the formulation used here is thatbecause the target is represented in the state the same control law can bereused for other targets The control law of course depends on the filterwhich depends on the initial expected state which depends on the targetmdashand so a control law optimal for one target is not necessarily optimal for allother targets Unpublished simulation results indicate good generalizationbut a more detailed investigation of how the optimal control law dependson the target position is needed

The sensory feedback carries information about position velocity andforce

yt = [p(t) p(t) f (t)] + ωt

The vector ωt of sensory noise terms has zero-mean gaussian distributionwith diagonal covariance

ω = (σsdiag[002 m 02 ms 1 N])2

where the relative magnitudes are set using the same order-of-magnitudereasoning as before and σs = 05 The multiplicative noise term added tothe discrete-time control signal ut = u(t) is σcεtut where σc = 05 Note that

Methods for Optimal Sensorimotor Control 1099

σc is a unitless quantity that defines the noise magnitude relative to thecontrol signal magnitude

The discrete-time dynamics of the above system are

p(t + ) = p(t) + p(t)

p(t + ) = p(t) + f (t)m

f (t + ) = f (t) (1 minus τ2) + g(t)τ2

g(t + ) = g(t) (1 minus τ1) + u(t) (1 + σcεt) τ1

which is transformed into the form of equation 31 by the matrices

A =

1 0 0 0

0 1 m 0 0

0 0 1 minus τ2 τ2 0

0 0 0 1 minus τ1 0

0 0 0 0 1

B =

0

0

0

τ1

0

H =

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

C1 = Bσc c = 1 d = 0

1 = ξ = η = 0

The cost matrices are R = r Q1nminus1 = 0 and Qn = ppT + vvT + ffT where

p = [1 0 0 0 minus1] v = [0 wv 0 0 0] f = [0 0 w f 0 0]

This completes the formulation of the first model The above algorithmcan now be applied to obtain the control law and filter and the closed-loopsystem can be simulated To replace the control-dependent noise with ad-ditive noise of similar magnitude (and compare the effects of the two formsof noise) we will set c = 0 and ξ = (46 N)2 B BT The value of 46 N is theaverage magnitude of the control-dependent noise over the range of move-ment durations (found through 10000 simulation runs at each movementduration)

We also model an estimation process under state-dependent and inter-nal noise in the absence of movement In that case the state is xt = plowastwhere the stationary target plowast is sampled from a gaussian with mean x1 isin5 cm 15 cm 25 cm and variance 1 = (5 cm)2 Note that target eccentric-ity is represented as distance rather than visual angle The state-dependentnoise has scale D1 = 05 fixation is assumed to be at 0 cm the time step is = 10 msec and we run the estimation process for n = 100 time steps Inone set of simulations we use internal noise η = (05 cm)2 without addi-tive noise In another set of simulations we study additive noise with thesame magnitude ω = (05 cm)2 without internal noise There is no actu-ator to be controlled so we have A = H = 1 and B = L = 0 Estimation isbased on the adaptive filter from equation 53

1100 E Todorov

62 Results Reaching movements are known to have stereotyped bell-shaped speed profiles (Flash amp Hogan 1985) Models of this phenomenonhave traditionally been formulated in terms of deterministic open-loop min-imization of some cost function Cost functions that penalize physicallymeaningful quantities (such as duration or energy consumption) did notagree with empirical data (Nelson 1983) in order to obtain realistic speedprofiles it appeared necessary to minimize a smoothness-related cost thatpenalizes the derivative of acceleration (Flash amp Hogan 1985) or torque(Uno et al 1989) Smoothness-related cost functions have also been usedin the context of stochastic optimal feedback control (Hoff 1992) to obtainbell-shaped speed profiles It was recently shown however that smoothnessdoes not have to be explicitly enforced by the cost function open-loop min-imization of end-point error was found sufficient to produce realistic trajec-tories provided that the multiplicative nature of motor noise is taken intoaccount (Harris amp Wolpert 1998) While this is an important step toward amore principled optimization model of trajectory smoothness it still con-tains an ad hoc element the optimization is performed in an open loopwhich is suboptimal especially for movements of longer duration Ourmodel differs from Harris and Wolpert (1998) in that not only the averagesequence of control signals is optimal but the feedback gains that determinethe online sensory-guided adjustments are also optimal Optimal feedbackcontrol of reaching has been studied by Meyer Abrams Kornblum Wrightand Smith (1988) in an intermittent setting and Hoff (1992) in a continu-ous setting However both of these models assume full state observationOurs is the first optimal control model of reaching that incorporates sensorynoise and combines state estimation and feedback control into an optimalsensorimotor loop The predicted movement kinematics shown in Figure 1Aclosely resemble observed movement trajectories (Flash amp Hogan 1985)

Another well-known property of reaching movements first observed acentury ago by Woodworth and later quantified as Fittsrsquo law is the trade-offbetween speed and accuracy The fact that faster movements are less ac-curate implies that the instantaneous noise in the motor system is control-dependent in agreement with direct measurements of isometric force fluc-tuations (Sutton and Sykes 1967 Schmidt et al 1979 Todorov 2002) thatshow standard deviation increasing linearly with the mean Naturally thisnoise scaling has formed the basis of both closed-loop (Meyer et al 1988Hoff 1992) and open-loop (Harris amp Wolpert 1998) optimization models ofthe speed-accuracy trade-off Figure 1B illustrates the effect in our model asthe (specified) movement duration increases the standard deviation of theend-point error achieved by the optimal controller decreases To emphasizethe need for incorporating control-dependent noise we modified the modelby making the noise in the plant dynamics additive with fixed magnitudechosen to match the average multiplicative noise magnitude over the rangeof movement durations With that change the end-point error showed theopposite trend to the one observed experimentally (see Figure 1B)

Methods for Optimal Sensorimotor Control 1101

Figure 1 (A) Normalized position (Pos) velocity (Vel) and acceleration (Acc) ofthe average trajectory of the optimal controller (B) A separate optimal controllerwas constructed for each instructed duration the resulting closed-loop systemwas simulated for 10000 trials and the positional standard deviation at the endof the trial was plotted This was done with either multiplicative (solid line)or additive (dashed line) noise in the plant dynamics (C) The position of astationary peripheral target was estimated over time under internal estimationnoise (solid line) or additive observation noise (dashed line) This was done inthree sets of trials with target positions sampled from gaussians with means5 cm (bottom) 15 cm (middle) and 25 cm (top) Each curve is an average over10000 simulation runs

It is interesting to compare the effects of the control penalty r and the mul-tiplicative noise scaling σc As equation 42 shows both terms penalize largecontrol signalsmdashdirectly in the case of r and indirectly (via increased un-certainty) in the case of σc Consequently both terms lead to a negative biasin end-point position (not shown) but the effect is much more pronouncedfor r Another consequence of the fact that larger controls are more costlyarises in the control of redundant systems where the optimal strategy is tofollow a minimal intervention principle that is to leave task-irrelevant de-viations from the average behavior uncorrected (Todorov amp Jordan 2002a2002b) Simulations have shown that this more complex effect is dependenton σc and actually decreases when r is increased while σc is kept constant(Todorov amp Jordan 2002b)

Figure 1C shows simulation results from our second model where the po-sition of a stationary peripheral target is estimated by the optimal adaptivefilter in equation 53 operating under internal estimation noise or additiveobservation noise of the same magnitude In each case we show results forthree sets of targets with varying average eccentricity The standard devi-ations of the estimation error always reach an asymptote (much faster inthe case of internal noise) In the presence of internal noise this asymptotedepends on target eccentricity for the chosen model parameters the depen-dence is in quantitative agreement with our experimental results (Todorov1998) Under additive noise the error always asymptotes to 0

1102 E Todorov

Figure 2 Relative change in expected cost as a function of iteration numberin (A) psychophysical models and (B) random models (C) Relative variabil-ity (SDmean) among expected costs obtained from 100 different runs of thealgorithm on the same model (average over models in each class)

7 Convergence Properties

We studied the convergence properties of the algorithm in 10 models of psy-chophysical experiments taken from Todorov and Jordan (2002b) and 200randomly generated models The psychophysical models had dynamics andcost functions similar to the above example They included two models ofplanar reaching three models of passing through sequences of targets onemodel of isometric force production three models of tracking and reach-ing with a mechanically redundant arm and one model of throwing Thedimensionalities of the state control and feedback were between 5 and20 and the horizon n was about 100 The psychophysical models includedcontrol-dependent dynamics noise and additive observation noise but nointernal or state-dependent noise The details of all these models are inter-esting from a motor control point of view but we omit them here since theydid not affect the convergence of the algorithm in any systematic way

The random models were divided into two groups of 100 each passivelystable with all eigenvalues of A being smaller than 1 and passively unsta-ble with the largest eigenvalue of A being between 1 and 2 The dynamicswere restricted so that the last component of xt was 1mdashto make the randommodels more similar to the psychophysical models which always incorpo-rated a constant in the state description The state control and measurementdimensionalities were sampled uniformly between 5 and 20 The randommodels included all forms of noise allowed by equation 31

For each model we initialized K1nminus1 from equation 32 and applied ouriterative algorithm In all cases convergence was very rapid (see Figures 2Aand 2B) with the relative change in expected cost decreasing exponentiallyThe jitter observed at the end of the minimization (see Figure 2A) is due tonumerical round-off errors (note the log scale) and continues indefinitelyThe exponential convergence regime does not always start from the firstiteration (see Figure 2A) Similar behavior was observed for the absolute

Methods for Optimal Sensorimotor Control 1103

change in expected cost (not shown) As one would expect random mod-els with unstable passive dynamics converged more slowly than passivelystable models Convergence was observed in all cases

To test for the existence of local minima we focused on five psychophys-ical five random stable and five random unstable models For each modelthe algorithm was initialized 100 times with different randomly chosensequences K1nminus1 and run for 100 iterations For each model we com-puted the standard deviation of the expected cost obtained at each iterationand divided by the mean expected cost at that iteration The results aver-aged within each model class are plotted in Figure 2C The negligibly smallvalues after convergence indicate that the algorithm always finds the samesolution This was true for every model we studied despite the fact thatthe random initialization sometimes produced very large initial costs Wealso examined the K and L sequences found at the end of each run and thedifferences seemed to be due to round-off errors Thus we conjecture thatthe algorithm always converges to the globally optimal solution So far wehave not been able to prove this analytically and cannot offer a satisfyingintuitive explanation at this time

Note that the system can be unstable even for the optimal controllerFormally that does not affect the derivation because in a discrete-time finite-horizon system all numbers remain finite In practice the components of xt

can exceed the maximum floating-point number whenever the eigenvaluesof (Aminus BLt) are sufficiently large In the applications we are interestedin (Todorov 1998 Todorov amp Jordan 2002b) such problems were neverencountered

8 Improving Performance via Adaptive Estimation

Although the iterative algorithm given by equations 42 and 52 is guar-anteed to converge and empirically it appears to converge to the globallyoptimal solution performance can still be suboptimal due to the imposedrestriction to nonadaptive filters Here we present simulations aimed atquantifying this suboptimality

Because the potential suboptimality arises from the restriction to non-adaptive filters it is natural to ask what would happen if that restriction wereremoved in run time and the optimal adaptive linear filter from equation 53were used instead Recall that although the control law is optimized underthe assumption of a nonadaptive filter it yields better performance if a dif-ferent filter which somehow achieves lower estimation error is used in runtime Thus in our first test we simply replace the nonadaptive filter withequation 53 in run time and compute the reduction in expected total cost

The above discussion suggests a possibility for further improvement Thecontrol law is optimal with respect to some sequence of filter gains K1nminus1But the adaptive filter applied in run time uses systematically differentgains because it achieves systematically lower estimation error We can run

1104 E Todorov

Table 2 Cost Reduction

Model

Method Psychophysical Random Stable Random Unstable

Adaptive Estimator 19 0 314 Reoptimized Controller 17 0 283

Notes Numbers indicate percent reduction in expected total cost relative to thecost of the solution found by our iterative algorithm The two improvement meth-ods are described in the text Each method is applied to 10 models in each modelclass For each model and method expected total cost is computed from 10000simulation runs A value of 0 indicates that with a sample size of 10 models theimprovement was not significantly different from 0 (t-test p = 005 threshold)

our control law in conjunction with the adaptive filter and find the averagefilter gains K1nminus1 that are used online Now one would think that if wereoptimized the control law for the nonadaptive filter K1nminus1 which betterreflects the gains being used online by the adaptive filter this will furtherimprove performance This is the second test we apply

As Table 2 shows neither method improves performance substantiallyfor psychophysical models or random stable models However both meth-ods result in substantial improvement for random unstable models This isnot surprising In the passively stable models the differences between theexpected and actual values of the states and controls are relatively smalland so the optimal nonadaptive filter is not that different from the optimaladaptive filter The unstable models on the other hand are very sensitive tosmall perturbations and thus follow substantially different state-control tra-jectories in different simulation runs So the advantage of adaptive filteringis much greater Since musculoskeletal plants have stable passive dynamicswe conclude that our algorithm is well suited for approximating the optimalsensorimotor system

It is interesting that control law reoptimization in addition to adaptivefiltering is actually worse than adaptive filtering alonemdashcontrary to our in-tuition This was the case for every model we studied Although it is notclear where the problem with the reoptimization method lies this some-what unexpected result provides further justification for the restriction weintroduced In particular it suggests that the control law that is optimalunder the best nonadaptive filter may be close to optimal under the bestadaptive filter

9 Discussion

We have presented an algorithm for stochastic optimal control and estima-tion of partially observable linear dynamical systems subject to quadraticcosts and noise processes characteristic of the sensorimotor system (see

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 6: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

Methods for Optimal Sensorimotor Control 1089

Intelligent control of a partially observable stochastic plant requiresa feedback control law which is typically a function of a state estimatethat is computed recursively over time In engineering applications theestimation-control loop is implemented in a noiseless digital computer andso all noise is external In models of biological movement we usually makethe same assumption treating all noise as being a property of the muscu-loskeletal plant or the sensory apparatus This is in principle unrealisticbecause neural representations are likely subject to internal fluctuationsthat do not arise in the periphery It is also unrealistic in modeling practiceAn ideal observer model predicts that the estimation error covariance ofany stationary feature of the environment will asymptote to 0 In partic-ular such models predict that if we view a stationary object in the visualperiphery long enough we should eventually know exactly where it is andbe able to reach for it as accurately as if it were at the center of fixation Thiscontradicts our intuition as well as experimental data Both interval dis-crimination experiments and reaching to remembered peripheral targetsexperiments indicate that estimation errors asymptote rather quickly butnot to 0 Instead the asymptote level depends linearly on eccentricity Thesimplest way to model this is to assume another noise process which wecall internal noise acting directly on whatever state estimate the nervoussystem chooses to compute

3 Problem Statement and Assumptions

Consider a linear dynamical system with state xt isin Rm control ut isin R

pfeedback yt isin R

k in discrete time t

Dynamics xt+1 = Axt + But + ξt +csum

i=1

εitCi ut

Feedback yt = Hxt + ωt +dsum

i=1

εit Di xt

Cost per step xTt Qtxt + uT

t Rut

(31)

The feedback signal yt is received after the control signal ut has been gen-erated The initial state has known mean x1 and covariance 1 All matricesare known and have compatible dimensions making them time varyingis straightforward The control cost matrix R is symmetric positive defi-nite (R gt 0) and the state cost matrices Q1 Qn are symmetric positivesemidefinite (Qt ge 0) Each movement lasts n time steps at t = n the finalcost is xT

n Qnxn and un is undefined The independent random variablesξt isin R

m ωt isin Rk εt isin R

c and εt isin Rd have multidimensional gaussian dis-

tributions with mean 0 and covariances ξ ge 0 ω gt 0 ε = I and ε = Irespectively Thus the control-dependent and state-dependent noise termsin equation 31 have covariances

sumi Ci utuT

t CTi and

sumi Di xtxT

t DTi When the

1090 E Todorov

control-dependent noise is meant to be added to the control signal (which isusually the case) the matrices Ci should have the form B Fi where Fi are theactual noise scaling factors Then the control-dependent part of the plantdynamics becomes B(I + sum

i εit Fi )ut

The problem of optimal control is to find the optimal control law thatis the sequence of causal control functions ut(u1 utminus1 y1 ytminus1) thatminimize the expected total cost over the movement Note that computingthe optimal sequence of functions u1(middot) unminus1(middot) is a different and ingeneral much more difficult problem than computing the optimal sequenceof open-loop controls u1 unminus1

When only additive noise is present (ie C1 Cc = 0 and D1 Dd =0) this reduces to the classic LQG problem which has the well-knownoptimal solution (Davis amp Vinter 1985)

Linear-Quadratic Regulator Kalman Filterut = minusLtxt xt+1 = A xt + But + Kt (yt minus H xt)

Lt = (R + BTSt+1 B)minus1 BTSt+1 A Kt = At HT(Ht HT + ω)minus1

St = Qt + ATSt+1(Aminus BLt) t+1 = ξ + (Aminus Kt H) t AT

(32)

In that case the optimal control law depends on the history of control andfeedback signals only through the state estimate xt which is updated recur-sively by the Kalman filter The matrices L that define the optimal controllaw do not depend on the noise covariances or filter coefficients and thematrices K that define the optimal filter do not depend on the cost andcontrol law

In the case of control-dependent and state-dependent noise the aboveindependence properties no longer hold This complicates the problem sub-stantially and forces us to adopt a more restricted formulation in the interestof analytical tractability We assume that as in equation 32 the entire his-tory of control and feedback signals is summarized by a state estimate xt which is all the information available to the control system at time t Thefeedback control law ut(middot) is allowed to be an arbitrary function of xt butxt can be updated only by a recursive linear filter of the form

xt+1 = Axt + But + Kt(yt minus H xt) + ηt

The internal noise ηt isin Rm has mean 0 and covariance η ge 0 The fil-

ter gains K1 Knminus1 are nonadaptive they are determined in advanceand cannot change as a function of the specific controls and observa-tions within a simulation run Such a filter is always unbiased for anyK1 Knminus1 we have E [xt| xt] = xt for all t Note however that underthe extended noise model any nonadaptive linear filter is suboptimalwhen xt is computed as defined above Cov [xt| xt] is generally larger thanCov [xt|u1 utminus1 y1 ytminus1] The consequences of this will be explorednumerically in section 8

Methods for Optimal Sensorimotor Control 1091

4 Optimal Controller

The optimal ut will be computed using the method of dynamic program-ming We will show by induction that if the true state at time t is xt andthe unbiased state estimate available to the control system is xt then theoptimal cost-to-go function (ie the cost expected to accumulate under theoptimal control law) has the quadratic form

vt(xt xt) = xTt Sx

t xt + (xt minus xt)TSet (xt minus xt) + st = xT

t Sxt xt + eT

t Set et + st

where et xt minus xt is the estimation error At the final time t = n the optimalcost-to-go is simply the final cost xT

n Qnxn and so vn is in the assumed formwith Sx

n = Qn Sen = 0 sn = 0 To carry out the induction proof we have to

show that if vt+1 is in the above form for some t lt n then vt is also in thatform

Consider a time-varying control law that is optimal at times t + 1 nand at time t is given by ut = π ( xt) Let vπ

t (xt xt) be the correspondingcost-to-go function Since this control law is optimal after time t we havevπ

t+1 = vt+1 Then the cost-to-go function vπt satisfies the Bellman equation

vπt (xt xt) = xT

t Qtxt + π ( xt)T Rπ ( xt) + E [vt+1(xt+1 xt+1)|xt xt π ]

To compute the above expectation term we need the update equationsfor the system variables Using the definitions of the observation yt andthe estimation error et the stochastic dynamics of the variables of interestbecome

xt+1 = Axt + Bπ ( xt) + ξt +sum

i

εitCiπ ( xt)

et+1 = (Aminus Kt H)et + ξt minus Ktωt minus ηt +sum

i

εitCiπ ( xt) minus

sumi

εit Kt Di xt

(41)

Then the conditional means and covariances of xt+1 and et+1 are

E [xt+1|xt xt π ] = Axt + Bπ ( xt)

E [et+1|xt xt π ] = (Aminus Kt H)et

Cov [xt+1|xt xt π ] = ξ +sum

i

Ciπ ( xt)π ( xt)TCTi

Cov [et+1|xt xt π ] = ξ +sum

i

Ciπ ( xt)π ( xt)TCTi + η

+ Ktω K T

t +sum

i

Kt Di xtxTt DT

i K Tt

1092 E Todorov

and the conditional expectation in the Bellman equation can be computedThe cost-to-go becomes

vπt (xt xt) = xT

t

(Qt + ATSx

t+1 A+ Dt)xt

+ eTt (Aminus Kt H)TSe

t+1(Aminus Kt H)et

+ tr (Mt) + π ( xt)T (R + BTSx

t+1 B + Ct)π ( xt)

+ 2π ( xt)T BTSxt+1 Axt

where we defined the shortcuts

Ct sum

i

CTi

(Se

t+1 + Sxt+1

)Ci

Dt sum

i

DTi K T

t Set+1 Kt Di and

Mt Sxt+1

ξ + Set+1

(ξ + η + Kt

ω K Tt

)

Note that the control law affects only the cost-go-to function through anexpression that is quadratic in π ( xt) which can be minimized analyticallyBut there is a problem the minimum depends on xt while π is only allowedto be a function of xt To obtain the optimal control law at time t we haveto take an expectation over xt conditional on xt and find the function π

that minimizes the resulting expression Note that the control-dependentexpression is linear in xt and so its expectation depends on the conditionalmean of xt but not on any higher moments Since E [xt| xt] = xt we have

E[vπ

t (xt xt)| xt] = const + π ( xt)T

(R + BTSx

t+1 B + Ct)π ( xt)

+ 2π ( xt)T BTSxt+1 A xt

and thus the optimal control law at time t is

ut = π ( xt) = minusLtxt Lt (R + BTSx

t+1 B + Ct)minus1

BTSxt+1 A

Note that the linear form of the optimal control law fell out of the opti-mization and was not assumed Given our assumptions the matrix beinginverted is symmetric positive-definite

To complete the induction proof we have to compute the optimal cost-to-go vt which is equal to vπ

t when π is set to the optimal control law minusLtxt Using the fact that LT

t (R + BTSxt+1 B + Ct)Lt = LT

t BTSxt+1 A = ATSx

t+1 BLt andthat xT Z x minus 2 xT Zx = (x minus x )T Z(x minus x ) minus xT Zx = eT Ze minus xT Zx for a sym-metric matrix Z (in our case equal to LT

t BTSxt+1 A) the result is

vt(xt xt) = xTt

(Qt + ATSx

t+1(Aminus BLt) + Dt)xt + tr (Mt) + st+1

+ eTt

(ATSx

t+1 BLt + (Aminus Kt H)TSet+1(Aminus Kt H)

)et

Methods for Optimal Sensorimotor Control 1093

We now see that the optimal cost-to-go function remains in the assumedquadratic form which completes the induction proof The optimal controllaw is computed recursively backward in time as

Controller ut = minusLtxt

Lt =(

R + BTSxt+1 B +

sumi

CTi

(Sx

t+1 + Set+1

)Ci

)minus1

BTSxt+1 A

Sxt = Qt + ATSx

t+1(Aminus BLt) +sum

i

DTi K T

t Set+1 Kt Di Sx

n = Qn

Set = ATSx

t+1 BLt + (Aminus Kt H)T Set+1 (Aminus Kt H) Se

n = 0

st = tr(Sx

t+1ξ + Se

t+1

(ξ + η + Kt

ω K Tt

)) + st+1 sn = 0

(42)

The total expected cost is x T1 Sx

1 x1 + tr((Sx1 + Se

1 )1) + s1When the control-dependent and state-dependent noise terms are re-

moved (ie C1 Cc = 0 D1 Dd = 0) the control laws given byequation 42 and 32 are identical The internal noise term η as well asthe additive noise terms ξ and ω do not directly affect the calculation ofthe feedback gain matrices L However all noise terms affect the calculation(see below) of the optimal filter gains K which in turn affect L

One can attempt to transform equation 31 into a fully observable systemby setting H = I ω = η = 0 D1 Dd = 0 in which case K = A andapply equation 42 Recall however our assumption that the control signalis generated before the current state is measured Thus even if we makethe sensory measurement equal to the state we would still be dealing witha partially observable system To derive the optimal controller for the fullyobservable case we have to assume that xt is known at the time when ut isgenerated The above derivation is now much simplified the optimal cost-to-go function vt is in the form xT

t Stxt + st and the expectation term thatneeds to be minimized with regard to ut = π (xt) becomes

E [vt+1] = (Axt + But)TSt+1(Axt + But)

+ uTt

(sumi

CTi St+1Ci

)ut + tr [St+1

ξ ] + st+1

and the optimal controller is computed in a backward pass through time as

Fully observable controller ut = minusLtxt

Lt =(

R + BTSt+1 B +sum

i

CTi St+1Ci

)minus1

BTSt+1 A

St = Qt + ATSt+1(Aminus BLt) Sn = Qn

st = tr (St+1ξ ) + st+1 sn = 0

(43)

1094 E Todorov

5 Optimal Estimator

So far we have computed the optimal control law L for any fixed sequenceof filter gains K What should these gains be fixed to Ideally they shouldcorrespond to a Kalman filter which is the optimal linear estimator How-ever in the presence of control-dependent and state-dependent noise theKalman filter gains become adaptive (ie Kt depends on xt and ut) whichwould make our control law derivation invalid Thus if we want to preservethe optimality of the control law given by equation 42 and obtain an iter-ative algorithm with guaranteed convergence we need to compute a fixedsequence of filter gains that are optimal for a given control law Once theiterative algorithm has converged and the control law has been designedwe could use an adaptive filter in place of the fixed-gain filter in run time(see section 8)

Thus our objective here is the following given a linear feedback controllaw L1 Lnminus1 (which is optimal for the previous filter K1 Knminus1)compute a new filter that in conjunction with the given control law resultsin minimal expected cost In other words we will evaluate the filter not bythe magnitude of its estimation errors but by the effect that these estimationerrors have on the performance of the composite estimation-control system

We will show that the new optimal filter can be designed in a forwardpass through time In particular we will show that regardless of the newvalues of K1 Ktminus1 the optimal Kt can be found analytically as long asKt+1 Knminus1 still have the values for which Lt+1 Lnminus1 are optimalRecall that the optimal Lt+1 Lnminus1 depend only on Kt+1 Knminus1 andso the parameters (as well as the form) of the optimal cost-to-go functionvt+1 cannot be affected by changing K1 Kt Since Kt affects only thecomputation of xt+1 and the effect of xt+1 on the total expected cost is cap-tured by the function vt+1 we have to minimize vt+1 with respect to Kt But v is a function of x and x while K cannot be adapted to the specificvalues of x and x within a simulation run (by assumption) Thus thequantity we have to minimize is the unconditional expectation of vt+1 Indoing so we will use that fact that

E [vt+1(xt+1 xt+1)] = Ext xt [E [vt+1(xt+1 xt+1)|xt xt Lt]]

The conditional expectation was already computed as an intermediate stepin the previous section (not shown) The terms in E [vt+1(xt+1 xt+1)|xt xt Lt]that depend on Kt are

eTt (Aminus Kt H)TSe

t+1(Aminus Kt H)et + tr

(Kt

(ω +

sumi

Di xtxTt DT

i

)K T

t Set+1

)

Defining the (uncentered) unconditional covariances et E [eteT

t ] andx

t E [xtxTt ] the unconditional expectation of the Kt-dependent expression

Methods for Optimal Sensorimotor Control 1095

above becomes

a (Kt) = tr((

(Aminus Kt H)et (Aminus Kt H)T + KtPt K T

t

)Se

t+1

)Pt ω +

sumi

Dixt DT

i

The minimum of a (Kt) is found by setting its derivative with regard to Kt to0 Using the matrix identities part

part X tr (XU) = UT and partpart X tr

(XU XTV

) = VXU +VT XUT and the fact that the matrices Se

t+1 ω e

t xt are symmetric we

obtainparta (Kt)partKt

= 2Set+1

(Kt

(He

t HT + Pt) minus Ae

t HT)

This expression is equal to 0 whenever Kt = Aet HT(He

t HT + Pt)minus1 re-gardless of the value of Se

t+1 Given our assumptions the matrix being in-verted is symmetric positive-definite Note that the optimal Kt dependson K1 Ktminus1 (through e

t and xt ) but is independent of Kt+1 Knminus1

(since it is independent of Set+1) This is the reason that the filter gains are

reoptimized in a forward passTo complete the derivation we have to substitute the optimal filter

gains and compute the unconditional covariances Recall that the variablesxt xt et are deterministically related by et = xt minus xt so the covariance ofany one of them can be computed given the covariances of the other twoand we have a choice of which pair of covariance matrices to computeThe resulting equations are most compact for the pair xt et The stochasticdynamics of these variables are

xt+1 = (Aminus BLt) xt + Kt Het + Ktωt + ηt +sum

i

εit Kt Di (et + xt)

et+1 = (Aminus Kt H)et + ξt minus Ktωt minus ηt minussum

i

εitCi Ltxt (51)

minussum

i

εit Kt Di (et + xt)

Define the unconditional covariances

et E

[eteT

t

] xt E

[xtxT

t

] xet E

[xteT

t

]

noting that xt is uncentered and ex

t = (xet )T Since x1 is a known constant

the initialization at t = 1 is e1 = 1 x

1 = x1xT1 xe

1 = 0 With these defi-nitions we have x

t = E [(et + xt)(et + xt)T] = et + x

t + xet + xeT

t Usingequation 51 the updates for the unconditional covariances are

et+1 = (Aminus Kt H)e

t (Aminus Kt H)T + ξ + η + KtPt K Tt

+sum

i

Ci Ltxt LT

t CTi

1096 E Todorov

xt+1 = (Aminus BLt)x

t (Aminus BLt)T + η + Kt(He

t HT + Pt)K T

t

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T

xet+1 = (Aminus BLt)xe

t (Aminus Kt H)T + Kt Het (Aminus Kt H)T

minus η minus KtPt K Tt

Substituting the optimal value of Kt which allows some simplifications tothe above update equations the optimal nonadaptive linear filter is com-puted in a forward pass through time as

Estimator xt+1 = (Aminus BLt) xt + Kt(yt minus H xt) + ηt

Kt = Aet HT

(He

t HT + ω +sum

i

Di(e

t + xt + xe

t + ext

)DT

i

)minus1

et+1 = ξ + η + (Aminus Kt H)e

t AT +sum

i

Ci Ltxt LT

t CTi e

1 = 1

xt+1 = η + Kt He

t AT + (Aminus BLt)xt (Aminus BLt)T

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T x

1 = x1xT1

xet+1 = (Aminus BLt) xe

t (Aminus Kt H)T minus η xe1 = 0

(52)It is worth noting the effects of the internal noise ηt If that term did not

exist (ie η = 0) the last update equation would yield xet = 0 for all t

Indeed for an optimal filter one would expect xet = 0 from the orthogo-

nality principle if the state estimate and estimation error were correlatedone could improve the filter by taking that correlation into account How-ever the situation here is different because we have noise acting directlyon the state estimate When such noise pushes xt in one direction et is (bydefinition) pushed in the opposite direction creating a negative correlationbetween xt and et This is the reason for the negative sign in front of the η

term in the last update equationThe complete algorithm is the followingAlgorithmInitialize K1 Knminus1 and iterate equation 42 and equation 52 until

convergence Convergence is guaranteed because the expected cost is non-negative by definition and we are using a coordinate-descent algorithmwhich decreases the expected cost in each step The initial sequence K couldbe set to 0mdashin which case the first pass of equation 42 will find the optimalopen-loop controls or initialized from equation 32mdashwhich is equivalent toassuming additive noise in the first pass

We can also derive the optimal adaptive linear filter with gains Kt thatdepend on the specific xt and ut = minusLtxt within each simulation run This isagain accomplished by minimizing E [vt+1] with respect to Kt but the expec-tation is computed with xt being a known constant rather than a randomvariable We now have x

t = xtxTt and xe

t = 0 and so the last two update

Methods for Optimal Sensorimotor Control 1097

equations in equation 52 are no longer needed The optimal adaptive linearfilter is

Adaptive estimator xt+1 = (Aminus BLt) xt + Kt (yt minus H xt) + ηt

Kt = At HT

(Ht HT + ω +

sumi

Di(t + xt xT

t

)DT

i

)minus1

t+1 = ξ + η + (Aminus Kt H) t AT +sum

i

Ci Ltxt xTt LT

t CTi

(53)

where t = Cov [xt| xt] is the conditional estimation error covariance(initialized from 1 which is given) When the control-dependentstate-dependent and internal noise terms are removed (C1 Cc = 0D1 Dd = 0 η = 0) equation 53 reduces to the Kalman filter in equa-tion 32 Note that using equation 53 instead of equation 52 online reducesthe total expected cost because equation 53 achieves lower estimation errorthan any other linear filter and the expected cost depends on the conditionalestimation error covariance This can be seen from

E[vt(xt xt)| xt] = xTt Sx

t xt + st + tr((

Sxt + Se

t

)Cov[xt| xt]

)6 Application to Reaching Movements

We now illustrate how the methodology developed above can be used toconstruct models relevant to motor control Since this is a methodologi-cal rather than a modeling article a detailed evaluation of the resultingmodels in the context of the motor control literature will not be givenhere The first model is a one-dimensional model of reaching and in-cludes control-dependent noise but no state-dependent or internal noiseThe latter two forms of noise are illustrated in the second model wherewe estimate the position of a stationary peripheral target without making amovement

61 Models We model a single-joint movement (such as flexing the el-bow) that brings the hand to a specified target For simplicity the rotationalmotion is replaced with translational motion the hand is modeled as a pointmass (m = 1 kg) whose one-dimensional position at time t is p(t) The com-bined action of all muscles is represented with the force f (t) acting on thehand The control signal u(t) is transformed into force f (t) by adding control-dependent multiplicative noise and applying a second-order muscle-likelow-pass filter (Winter 1990) of the form τ1τ2 f (t) + (τ1 + τ2) f(t) + f (t) =u(t) with time constants τ1 = τ2 = 004 sec Note that a second-order filtercan be written as a pair of coupled first-order filters (with outputs g and f )as follows τ1g(t) + g(t) = u(t) τ2 f (t) + f (t) = g(t)

The task is to move the hand from the starting position p(0) = 0 m to thetarget position plowast = 01 m and stop there at time tend with minimal energy

1098 E Todorov

consumption Movement durations are in the interval tend isin [025 sec035 sec] Time is discretized at = 001 sec The total cost is defined as

(p(tend) minus plowast)2 + (wv p(tend))2 + (w f f (tend))2 + rn minus 1

nminus1sumk=1

u(k)2

The first term enforces positional accuracy the second and third termsspecify that the movement has to stop at time tend that is both the velocityand force have to vanish and the last term penalizes energy consumptionIt makes sense to set the scaling weights wv and w f so that wv p(t) andw f f (t) averaged over the movement have magnitudes similar to the handdisplacement plowast minus p(0) For a 01 m reaching movement that lasts about03 sec these weights are wv = 02 and w f = 002 The weight of the energyterm was set to r = 000001

The discrete-time system state is represented with the five-dimensionalvector

xt = [p(t) p(t) f (t) g(t) plowast]

initialized from a gaussian with mean x1 = [0 0 0 0 plowast] The auxiliarystate variable g(t) is needed to implement a second-order filter The tar-get plowast is included in the state so that we can capture the above cost func-tion using a quadratic with no linear terms defining p = [1 0 0 0 minus1]we have pTxt = p(tend) minus plowast and so xT

t (ppT)xt = (p(tend) minus plowast)2 Note thatthe same could be accomplished by setting p = [1 0 0 0 minusplowast] and xt =[p(t) p(t) f (t) g(t) 1] The advantage of the formulation used here is thatbecause the target is represented in the state the same control law can bereused for other targets The control law of course depends on the filterwhich depends on the initial expected state which depends on the targetmdashand so a control law optimal for one target is not necessarily optimal for allother targets Unpublished simulation results indicate good generalizationbut a more detailed investigation of how the optimal control law dependson the target position is needed

The sensory feedback carries information about position velocity andforce

yt = [p(t) p(t) f (t)] + ωt

The vector ωt of sensory noise terms has zero-mean gaussian distributionwith diagonal covariance

ω = (σsdiag[002 m 02 ms 1 N])2

where the relative magnitudes are set using the same order-of-magnitudereasoning as before and σs = 05 The multiplicative noise term added tothe discrete-time control signal ut = u(t) is σcεtut where σc = 05 Note that

Methods for Optimal Sensorimotor Control 1099

σc is a unitless quantity that defines the noise magnitude relative to thecontrol signal magnitude

The discrete-time dynamics of the above system are

p(t + ) = p(t) + p(t)

p(t + ) = p(t) + f (t)m

f (t + ) = f (t) (1 minus τ2) + g(t)τ2

g(t + ) = g(t) (1 minus τ1) + u(t) (1 + σcεt) τ1

which is transformed into the form of equation 31 by the matrices

A =

1 0 0 0

0 1 m 0 0

0 0 1 minus τ2 τ2 0

0 0 0 1 minus τ1 0

0 0 0 0 1

B =

0

0

0

τ1

0

H =

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

C1 = Bσc c = 1 d = 0

1 = ξ = η = 0

The cost matrices are R = r Q1nminus1 = 0 and Qn = ppT + vvT + ffT where

p = [1 0 0 0 minus1] v = [0 wv 0 0 0] f = [0 0 w f 0 0]

This completes the formulation of the first model The above algorithmcan now be applied to obtain the control law and filter and the closed-loopsystem can be simulated To replace the control-dependent noise with ad-ditive noise of similar magnitude (and compare the effects of the two formsof noise) we will set c = 0 and ξ = (46 N)2 B BT The value of 46 N is theaverage magnitude of the control-dependent noise over the range of move-ment durations (found through 10000 simulation runs at each movementduration)

We also model an estimation process under state-dependent and inter-nal noise in the absence of movement In that case the state is xt = plowastwhere the stationary target plowast is sampled from a gaussian with mean x1 isin5 cm 15 cm 25 cm and variance 1 = (5 cm)2 Note that target eccentric-ity is represented as distance rather than visual angle The state-dependentnoise has scale D1 = 05 fixation is assumed to be at 0 cm the time step is = 10 msec and we run the estimation process for n = 100 time steps Inone set of simulations we use internal noise η = (05 cm)2 without addi-tive noise In another set of simulations we study additive noise with thesame magnitude ω = (05 cm)2 without internal noise There is no actu-ator to be controlled so we have A = H = 1 and B = L = 0 Estimation isbased on the adaptive filter from equation 53

1100 E Todorov

62 Results Reaching movements are known to have stereotyped bell-shaped speed profiles (Flash amp Hogan 1985) Models of this phenomenonhave traditionally been formulated in terms of deterministic open-loop min-imization of some cost function Cost functions that penalize physicallymeaningful quantities (such as duration or energy consumption) did notagree with empirical data (Nelson 1983) in order to obtain realistic speedprofiles it appeared necessary to minimize a smoothness-related cost thatpenalizes the derivative of acceleration (Flash amp Hogan 1985) or torque(Uno et al 1989) Smoothness-related cost functions have also been usedin the context of stochastic optimal feedback control (Hoff 1992) to obtainbell-shaped speed profiles It was recently shown however that smoothnessdoes not have to be explicitly enforced by the cost function open-loop min-imization of end-point error was found sufficient to produce realistic trajec-tories provided that the multiplicative nature of motor noise is taken intoaccount (Harris amp Wolpert 1998) While this is an important step toward amore principled optimization model of trajectory smoothness it still con-tains an ad hoc element the optimization is performed in an open loopwhich is suboptimal especially for movements of longer duration Ourmodel differs from Harris and Wolpert (1998) in that not only the averagesequence of control signals is optimal but the feedback gains that determinethe online sensory-guided adjustments are also optimal Optimal feedbackcontrol of reaching has been studied by Meyer Abrams Kornblum Wrightand Smith (1988) in an intermittent setting and Hoff (1992) in a continu-ous setting However both of these models assume full state observationOurs is the first optimal control model of reaching that incorporates sensorynoise and combines state estimation and feedback control into an optimalsensorimotor loop The predicted movement kinematics shown in Figure 1Aclosely resemble observed movement trajectories (Flash amp Hogan 1985)

Another well-known property of reaching movements first observed acentury ago by Woodworth and later quantified as Fittsrsquo law is the trade-offbetween speed and accuracy The fact that faster movements are less ac-curate implies that the instantaneous noise in the motor system is control-dependent in agreement with direct measurements of isometric force fluc-tuations (Sutton and Sykes 1967 Schmidt et al 1979 Todorov 2002) thatshow standard deviation increasing linearly with the mean Naturally thisnoise scaling has formed the basis of both closed-loop (Meyer et al 1988Hoff 1992) and open-loop (Harris amp Wolpert 1998) optimization models ofthe speed-accuracy trade-off Figure 1B illustrates the effect in our model asthe (specified) movement duration increases the standard deviation of theend-point error achieved by the optimal controller decreases To emphasizethe need for incorporating control-dependent noise we modified the modelby making the noise in the plant dynamics additive with fixed magnitudechosen to match the average multiplicative noise magnitude over the rangeof movement durations With that change the end-point error showed theopposite trend to the one observed experimentally (see Figure 1B)

Methods for Optimal Sensorimotor Control 1101

Figure 1 (A) Normalized position (Pos) velocity (Vel) and acceleration (Acc) ofthe average trajectory of the optimal controller (B) A separate optimal controllerwas constructed for each instructed duration the resulting closed-loop systemwas simulated for 10000 trials and the positional standard deviation at the endof the trial was plotted This was done with either multiplicative (solid line)or additive (dashed line) noise in the plant dynamics (C) The position of astationary peripheral target was estimated over time under internal estimationnoise (solid line) or additive observation noise (dashed line) This was done inthree sets of trials with target positions sampled from gaussians with means5 cm (bottom) 15 cm (middle) and 25 cm (top) Each curve is an average over10000 simulation runs

It is interesting to compare the effects of the control penalty r and the mul-tiplicative noise scaling σc As equation 42 shows both terms penalize largecontrol signalsmdashdirectly in the case of r and indirectly (via increased un-certainty) in the case of σc Consequently both terms lead to a negative biasin end-point position (not shown) but the effect is much more pronouncedfor r Another consequence of the fact that larger controls are more costlyarises in the control of redundant systems where the optimal strategy is tofollow a minimal intervention principle that is to leave task-irrelevant de-viations from the average behavior uncorrected (Todorov amp Jordan 2002a2002b) Simulations have shown that this more complex effect is dependenton σc and actually decreases when r is increased while σc is kept constant(Todorov amp Jordan 2002b)

Figure 1C shows simulation results from our second model where the po-sition of a stationary peripheral target is estimated by the optimal adaptivefilter in equation 53 operating under internal estimation noise or additiveobservation noise of the same magnitude In each case we show results forthree sets of targets with varying average eccentricity The standard devi-ations of the estimation error always reach an asymptote (much faster inthe case of internal noise) In the presence of internal noise this asymptotedepends on target eccentricity for the chosen model parameters the depen-dence is in quantitative agreement with our experimental results (Todorov1998) Under additive noise the error always asymptotes to 0

1102 E Todorov

Figure 2 Relative change in expected cost as a function of iteration numberin (A) psychophysical models and (B) random models (C) Relative variabil-ity (SDmean) among expected costs obtained from 100 different runs of thealgorithm on the same model (average over models in each class)

7 Convergence Properties

We studied the convergence properties of the algorithm in 10 models of psy-chophysical experiments taken from Todorov and Jordan (2002b) and 200randomly generated models The psychophysical models had dynamics andcost functions similar to the above example They included two models ofplanar reaching three models of passing through sequences of targets onemodel of isometric force production three models of tracking and reach-ing with a mechanically redundant arm and one model of throwing Thedimensionalities of the state control and feedback were between 5 and20 and the horizon n was about 100 The psychophysical models includedcontrol-dependent dynamics noise and additive observation noise but nointernal or state-dependent noise The details of all these models are inter-esting from a motor control point of view but we omit them here since theydid not affect the convergence of the algorithm in any systematic way

The random models were divided into two groups of 100 each passivelystable with all eigenvalues of A being smaller than 1 and passively unsta-ble with the largest eigenvalue of A being between 1 and 2 The dynamicswere restricted so that the last component of xt was 1mdashto make the randommodels more similar to the psychophysical models which always incorpo-rated a constant in the state description The state control and measurementdimensionalities were sampled uniformly between 5 and 20 The randommodels included all forms of noise allowed by equation 31

For each model we initialized K1nminus1 from equation 32 and applied ouriterative algorithm In all cases convergence was very rapid (see Figures 2Aand 2B) with the relative change in expected cost decreasing exponentiallyThe jitter observed at the end of the minimization (see Figure 2A) is due tonumerical round-off errors (note the log scale) and continues indefinitelyThe exponential convergence regime does not always start from the firstiteration (see Figure 2A) Similar behavior was observed for the absolute

Methods for Optimal Sensorimotor Control 1103

change in expected cost (not shown) As one would expect random mod-els with unstable passive dynamics converged more slowly than passivelystable models Convergence was observed in all cases

To test for the existence of local minima we focused on five psychophys-ical five random stable and five random unstable models For each modelthe algorithm was initialized 100 times with different randomly chosensequences K1nminus1 and run for 100 iterations For each model we com-puted the standard deviation of the expected cost obtained at each iterationand divided by the mean expected cost at that iteration The results aver-aged within each model class are plotted in Figure 2C The negligibly smallvalues after convergence indicate that the algorithm always finds the samesolution This was true for every model we studied despite the fact thatthe random initialization sometimes produced very large initial costs Wealso examined the K and L sequences found at the end of each run and thedifferences seemed to be due to round-off errors Thus we conjecture thatthe algorithm always converges to the globally optimal solution So far wehave not been able to prove this analytically and cannot offer a satisfyingintuitive explanation at this time

Note that the system can be unstable even for the optimal controllerFormally that does not affect the derivation because in a discrete-time finite-horizon system all numbers remain finite In practice the components of xt

can exceed the maximum floating-point number whenever the eigenvaluesof (Aminus BLt) are sufficiently large In the applications we are interestedin (Todorov 1998 Todorov amp Jordan 2002b) such problems were neverencountered

8 Improving Performance via Adaptive Estimation

Although the iterative algorithm given by equations 42 and 52 is guar-anteed to converge and empirically it appears to converge to the globallyoptimal solution performance can still be suboptimal due to the imposedrestriction to nonadaptive filters Here we present simulations aimed atquantifying this suboptimality

Because the potential suboptimality arises from the restriction to non-adaptive filters it is natural to ask what would happen if that restriction wereremoved in run time and the optimal adaptive linear filter from equation 53were used instead Recall that although the control law is optimized underthe assumption of a nonadaptive filter it yields better performance if a dif-ferent filter which somehow achieves lower estimation error is used in runtime Thus in our first test we simply replace the nonadaptive filter withequation 53 in run time and compute the reduction in expected total cost

The above discussion suggests a possibility for further improvement Thecontrol law is optimal with respect to some sequence of filter gains K1nminus1But the adaptive filter applied in run time uses systematically differentgains because it achieves systematically lower estimation error We can run

1104 E Todorov

Table 2 Cost Reduction

Model

Method Psychophysical Random Stable Random Unstable

Adaptive Estimator 19 0 314 Reoptimized Controller 17 0 283

Notes Numbers indicate percent reduction in expected total cost relative to thecost of the solution found by our iterative algorithm The two improvement meth-ods are described in the text Each method is applied to 10 models in each modelclass For each model and method expected total cost is computed from 10000simulation runs A value of 0 indicates that with a sample size of 10 models theimprovement was not significantly different from 0 (t-test p = 005 threshold)

our control law in conjunction with the adaptive filter and find the averagefilter gains K1nminus1 that are used online Now one would think that if wereoptimized the control law for the nonadaptive filter K1nminus1 which betterreflects the gains being used online by the adaptive filter this will furtherimprove performance This is the second test we apply

As Table 2 shows neither method improves performance substantiallyfor psychophysical models or random stable models However both meth-ods result in substantial improvement for random unstable models This isnot surprising In the passively stable models the differences between theexpected and actual values of the states and controls are relatively smalland so the optimal nonadaptive filter is not that different from the optimaladaptive filter The unstable models on the other hand are very sensitive tosmall perturbations and thus follow substantially different state-control tra-jectories in different simulation runs So the advantage of adaptive filteringis much greater Since musculoskeletal plants have stable passive dynamicswe conclude that our algorithm is well suited for approximating the optimalsensorimotor system

It is interesting that control law reoptimization in addition to adaptivefiltering is actually worse than adaptive filtering alonemdashcontrary to our in-tuition This was the case for every model we studied Although it is notclear where the problem with the reoptimization method lies this some-what unexpected result provides further justification for the restriction weintroduced In particular it suggests that the control law that is optimalunder the best nonadaptive filter may be close to optimal under the bestadaptive filter

9 Discussion

We have presented an algorithm for stochastic optimal control and estima-tion of partially observable linear dynamical systems subject to quadraticcosts and noise processes characteristic of the sensorimotor system (see

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 7: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

1090 E Todorov

control-dependent noise is meant to be added to the control signal (which isusually the case) the matrices Ci should have the form B Fi where Fi are theactual noise scaling factors Then the control-dependent part of the plantdynamics becomes B(I + sum

i εit Fi )ut

The problem of optimal control is to find the optimal control law thatis the sequence of causal control functions ut(u1 utminus1 y1 ytminus1) thatminimize the expected total cost over the movement Note that computingthe optimal sequence of functions u1(middot) unminus1(middot) is a different and ingeneral much more difficult problem than computing the optimal sequenceof open-loop controls u1 unminus1

When only additive noise is present (ie C1 Cc = 0 and D1 Dd =0) this reduces to the classic LQG problem which has the well-knownoptimal solution (Davis amp Vinter 1985)

Linear-Quadratic Regulator Kalman Filterut = minusLtxt xt+1 = A xt + But + Kt (yt minus H xt)

Lt = (R + BTSt+1 B)minus1 BTSt+1 A Kt = At HT(Ht HT + ω)minus1

St = Qt + ATSt+1(Aminus BLt) t+1 = ξ + (Aminus Kt H) t AT

(32)

In that case the optimal control law depends on the history of control andfeedback signals only through the state estimate xt which is updated recur-sively by the Kalman filter The matrices L that define the optimal controllaw do not depend on the noise covariances or filter coefficients and thematrices K that define the optimal filter do not depend on the cost andcontrol law

In the case of control-dependent and state-dependent noise the aboveindependence properties no longer hold This complicates the problem sub-stantially and forces us to adopt a more restricted formulation in the interestof analytical tractability We assume that as in equation 32 the entire his-tory of control and feedback signals is summarized by a state estimate xt which is all the information available to the control system at time t Thefeedback control law ut(middot) is allowed to be an arbitrary function of xt butxt can be updated only by a recursive linear filter of the form

xt+1 = Axt + But + Kt(yt minus H xt) + ηt

The internal noise ηt isin Rm has mean 0 and covariance η ge 0 The fil-

ter gains K1 Knminus1 are nonadaptive they are determined in advanceand cannot change as a function of the specific controls and observa-tions within a simulation run Such a filter is always unbiased for anyK1 Knminus1 we have E [xt| xt] = xt for all t Note however that underthe extended noise model any nonadaptive linear filter is suboptimalwhen xt is computed as defined above Cov [xt| xt] is generally larger thanCov [xt|u1 utminus1 y1 ytminus1] The consequences of this will be explorednumerically in section 8

Methods for Optimal Sensorimotor Control 1091

4 Optimal Controller

The optimal ut will be computed using the method of dynamic program-ming We will show by induction that if the true state at time t is xt andthe unbiased state estimate available to the control system is xt then theoptimal cost-to-go function (ie the cost expected to accumulate under theoptimal control law) has the quadratic form

vt(xt xt) = xTt Sx

t xt + (xt minus xt)TSet (xt minus xt) + st = xT

t Sxt xt + eT

t Set et + st

where et xt minus xt is the estimation error At the final time t = n the optimalcost-to-go is simply the final cost xT

n Qnxn and so vn is in the assumed formwith Sx

n = Qn Sen = 0 sn = 0 To carry out the induction proof we have to

show that if vt+1 is in the above form for some t lt n then vt is also in thatform

Consider a time-varying control law that is optimal at times t + 1 nand at time t is given by ut = π ( xt) Let vπ

t (xt xt) be the correspondingcost-to-go function Since this control law is optimal after time t we havevπ

t+1 = vt+1 Then the cost-to-go function vπt satisfies the Bellman equation

vπt (xt xt) = xT

t Qtxt + π ( xt)T Rπ ( xt) + E [vt+1(xt+1 xt+1)|xt xt π ]

To compute the above expectation term we need the update equationsfor the system variables Using the definitions of the observation yt andthe estimation error et the stochastic dynamics of the variables of interestbecome

xt+1 = Axt + Bπ ( xt) + ξt +sum

i

εitCiπ ( xt)

et+1 = (Aminus Kt H)et + ξt minus Ktωt minus ηt +sum

i

εitCiπ ( xt) minus

sumi

εit Kt Di xt

(41)

Then the conditional means and covariances of xt+1 and et+1 are

E [xt+1|xt xt π ] = Axt + Bπ ( xt)

E [et+1|xt xt π ] = (Aminus Kt H)et

Cov [xt+1|xt xt π ] = ξ +sum

i

Ciπ ( xt)π ( xt)TCTi

Cov [et+1|xt xt π ] = ξ +sum

i

Ciπ ( xt)π ( xt)TCTi + η

+ Ktω K T

t +sum

i

Kt Di xtxTt DT

i K Tt

1092 E Todorov

and the conditional expectation in the Bellman equation can be computedThe cost-to-go becomes

vπt (xt xt) = xT

t

(Qt + ATSx

t+1 A+ Dt)xt

+ eTt (Aminus Kt H)TSe

t+1(Aminus Kt H)et

+ tr (Mt) + π ( xt)T (R + BTSx

t+1 B + Ct)π ( xt)

+ 2π ( xt)T BTSxt+1 Axt

where we defined the shortcuts

Ct sum

i

CTi

(Se

t+1 + Sxt+1

)Ci

Dt sum

i

DTi K T

t Set+1 Kt Di and

Mt Sxt+1

ξ + Set+1

(ξ + η + Kt

ω K Tt

)

Note that the control law affects only the cost-go-to function through anexpression that is quadratic in π ( xt) which can be minimized analyticallyBut there is a problem the minimum depends on xt while π is only allowedto be a function of xt To obtain the optimal control law at time t we haveto take an expectation over xt conditional on xt and find the function π

that minimizes the resulting expression Note that the control-dependentexpression is linear in xt and so its expectation depends on the conditionalmean of xt but not on any higher moments Since E [xt| xt] = xt we have

E[vπ

t (xt xt)| xt] = const + π ( xt)T

(R + BTSx

t+1 B + Ct)π ( xt)

+ 2π ( xt)T BTSxt+1 A xt

and thus the optimal control law at time t is

ut = π ( xt) = minusLtxt Lt (R + BTSx

t+1 B + Ct)minus1

BTSxt+1 A

Note that the linear form of the optimal control law fell out of the opti-mization and was not assumed Given our assumptions the matrix beinginverted is symmetric positive-definite

To complete the induction proof we have to compute the optimal cost-to-go vt which is equal to vπ

t when π is set to the optimal control law minusLtxt Using the fact that LT

t (R + BTSxt+1 B + Ct)Lt = LT

t BTSxt+1 A = ATSx

t+1 BLt andthat xT Z x minus 2 xT Zx = (x minus x )T Z(x minus x ) minus xT Zx = eT Ze minus xT Zx for a sym-metric matrix Z (in our case equal to LT

t BTSxt+1 A) the result is

vt(xt xt) = xTt

(Qt + ATSx

t+1(Aminus BLt) + Dt)xt + tr (Mt) + st+1

+ eTt

(ATSx

t+1 BLt + (Aminus Kt H)TSet+1(Aminus Kt H)

)et

Methods for Optimal Sensorimotor Control 1093

We now see that the optimal cost-to-go function remains in the assumedquadratic form which completes the induction proof The optimal controllaw is computed recursively backward in time as

Controller ut = minusLtxt

Lt =(

R + BTSxt+1 B +

sumi

CTi

(Sx

t+1 + Set+1

)Ci

)minus1

BTSxt+1 A

Sxt = Qt + ATSx

t+1(Aminus BLt) +sum

i

DTi K T

t Set+1 Kt Di Sx

n = Qn

Set = ATSx

t+1 BLt + (Aminus Kt H)T Set+1 (Aminus Kt H) Se

n = 0

st = tr(Sx

t+1ξ + Se

t+1

(ξ + η + Kt

ω K Tt

)) + st+1 sn = 0

(42)

The total expected cost is x T1 Sx

1 x1 + tr((Sx1 + Se

1 )1) + s1When the control-dependent and state-dependent noise terms are re-

moved (ie C1 Cc = 0 D1 Dd = 0) the control laws given byequation 42 and 32 are identical The internal noise term η as well asthe additive noise terms ξ and ω do not directly affect the calculation ofthe feedback gain matrices L However all noise terms affect the calculation(see below) of the optimal filter gains K which in turn affect L

One can attempt to transform equation 31 into a fully observable systemby setting H = I ω = η = 0 D1 Dd = 0 in which case K = A andapply equation 42 Recall however our assumption that the control signalis generated before the current state is measured Thus even if we makethe sensory measurement equal to the state we would still be dealing witha partially observable system To derive the optimal controller for the fullyobservable case we have to assume that xt is known at the time when ut isgenerated The above derivation is now much simplified the optimal cost-to-go function vt is in the form xT

t Stxt + st and the expectation term thatneeds to be minimized with regard to ut = π (xt) becomes

E [vt+1] = (Axt + But)TSt+1(Axt + But)

+ uTt

(sumi

CTi St+1Ci

)ut + tr [St+1

ξ ] + st+1

and the optimal controller is computed in a backward pass through time as

Fully observable controller ut = minusLtxt

Lt =(

R + BTSt+1 B +sum

i

CTi St+1Ci

)minus1

BTSt+1 A

St = Qt + ATSt+1(Aminus BLt) Sn = Qn

st = tr (St+1ξ ) + st+1 sn = 0

(43)

1094 E Todorov

5 Optimal Estimator

So far we have computed the optimal control law L for any fixed sequenceof filter gains K What should these gains be fixed to Ideally they shouldcorrespond to a Kalman filter which is the optimal linear estimator How-ever in the presence of control-dependent and state-dependent noise theKalman filter gains become adaptive (ie Kt depends on xt and ut) whichwould make our control law derivation invalid Thus if we want to preservethe optimality of the control law given by equation 42 and obtain an iter-ative algorithm with guaranteed convergence we need to compute a fixedsequence of filter gains that are optimal for a given control law Once theiterative algorithm has converged and the control law has been designedwe could use an adaptive filter in place of the fixed-gain filter in run time(see section 8)

Thus our objective here is the following given a linear feedback controllaw L1 Lnminus1 (which is optimal for the previous filter K1 Knminus1)compute a new filter that in conjunction with the given control law resultsin minimal expected cost In other words we will evaluate the filter not bythe magnitude of its estimation errors but by the effect that these estimationerrors have on the performance of the composite estimation-control system

We will show that the new optimal filter can be designed in a forwardpass through time In particular we will show that regardless of the newvalues of K1 Ktminus1 the optimal Kt can be found analytically as long asKt+1 Knminus1 still have the values for which Lt+1 Lnminus1 are optimalRecall that the optimal Lt+1 Lnminus1 depend only on Kt+1 Knminus1 andso the parameters (as well as the form) of the optimal cost-to-go functionvt+1 cannot be affected by changing K1 Kt Since Kt affects only thecomputation of xt+1 and the effect of xt+1 on the total expected cost is cap-tured by the function vt+1 we have to minimize vt+1 with respect to Kt But v is a function of x and x while K cannot be adapted to the specificvalues of x and x within a simulation run (by assumption) Thus thequantity we have to minimize is the unconditional expectation of vt+1 Indoing so we will use that fact that

E [vt+1(xt+1 xt+1)] = Ext xt [E [vt+1(xt+1 xt+1)|xt xt Lt]]

The conditional expectation was already computed as an intermediate stepin the previous section (not shown) The terms in E [vt+1(xt+1 xt+1)|xt xt Lt]that depend on Kt are

eTt (Aminus Kt H)TSe

t+1(Aminus Kt H)et + tr

(Kt

(ω +

sumi

Di xtxTt DT

i

)K T

t Set+1

)

Defining the (uncentered) unconditional covariances et E [eteT

t ] andx

t E [xtxTt ] the unconditional expectation of the Kt-dependent expression

Methods for Optimal Sensorimotor Control 1095

above becomes

a (Kt) = tr((

(Aminus Kt H)et (Aminus Kt H)T + KtPt K T

t

)Se

t+1

)Pt ω +

sumi

Dixt DT

i

The minimum of a (Kt) is found by setting its derivative with regard to Kt to0 Using the matrix identities part

part X tr (XU) = UT and partpart X tr

(XU XTV

) = VXU +VT XUT and the fact that the matrices Se

t+1 ω e

t xt are symmetric we

obtainparta (Kt)partKt

= 2Set+1

(Kt

(He

t HT + Pt) minus Ae

t HT)

This expression is equal to 0 whenever Kt = Aet HT(He

t HT + Pt)minus1 re-gardless of the value of Se

t+1 Given our assumptions the matrix being in-verted is symmetric positive-definite Note that the optimal Kt dependson K1 Ktminus1 (through e

t and xt ) but is independent of Kt+1 Knminus1

(since it is independent of Set+1) This is the reason that the filter gains are

reoptimized in a forward passTo complete the derivation we have to substitute the optimal filter

gains and compute the unconditional covariances Recall that the variablesxt xt et are deterministically related by et = xt minus xt so the covariance ofany one of them can be computed given the covariances of the other twoand we have a choice of which pair of covariance matrices to computeThe resulting equations are most compact for the pair xt et The stochasticdynamics of these variables are

xt+1 = (Aminus BLt) xt + Kt Het + Ktωt + ηt +sum

i

εit Kt Di (et + xt)

et+1 = (Aminus Kt H)et + ξt minus Ktωt minus ηt minussum

i

εitCi Ltxt (51)

minussum

i

εit Kt Di (et + xt)

Define the unconditional covariances

et E

[eteT

t

] xt E

[xtxT

t

] xet E

[xteT

t

]

noting that xt is uncentered and ex

t = (xet )T Since x1 is a known constant

the initialization at t = 1 is e1 = 1 x

1 = x1xT1 xe

1 = 0 With these defi-nitions we have x

t = E [(et + xt)(et + xt)T] = et + x

t + xet + xeT

t Usingequation 51 the updates for the unconditional covariances are

et+1 = (Aminus Kt H)e

t (Aminus Kt H)T + ξ + η + KtPt K Tt

+sum

i

Ci Ltxt LT

t CTi

1096 E Todorov

xt+1 = (Aminus BLt)x

t (Aminus BLt)T + η + Kt(He

t HT + Pt)K T

t

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T

xet+1 = (Aminus BLt)xe

t (Aminus Kt H)T + Kt Het (Aminus Kt H)T

minus η minus KtPt K Tt

Substituting the optimal value of Kt which allows some simplifications tothe above update equations the optimal nonadaptive linear filter is com-puted in a forward pass through time as

Estimator xt+1 = (Aminus BLt) xt + Kt(yt minus H xt) + ηt

Kt = Aet HT

(He

t HT + ω +sum

i

Di(e

t + xt + xe

t + ext

)DT

i

)minus1

et+1 = ξ + η + (Aminus Kt H)e

t AT +sum

i

Ci Ltxt LT

t CTi e

1 = 1

xt+1 = η + Kt He

t AT + (Aminus BLt)xt (Aminus BLt)T

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T x

1 = x1xT1

xet+1 = (Aminus BLt) xe

t (Aminus Kt H)T minus η xe1 = 0

(52)It is worth noting the effects of the internal noise ηt If that term did not

exist (ie η = 0) the last update equation would yield xet = 0 for all t

Indeed for an optimal filter one would expect xet = 0 from the orthogo-

nality principle if the state estimate and estimation error were correlatedone could improve the filter by taking that correlation into account How-ever the situation here is different because we have noise acting directlyon the state estimate When such noise pushes xt in one direction et is (bydefinition) pushed in the opposite direction creating a negative correlationbetween xt and et This is the reason for the negative sign in front of the η

term in the last update equationThe complete algorithm is the followingAlgorithmInitialize K1 Knminus1 and iterate equation 42 and equation 52 until

convergence Convergence is guaranteed because the expected cost is non-negative by definition and we are using a coordinate-descent algorithmwhich decreases the expected cost in each step The initial sequence K couldbe set to 0mdashin which case the first pass of equation 42 will find the optimalopen-loop controls or initialized from equation 32mdashwhich is equivalent toassuming additive noise in the first pass

We can also derive the optimal adaptive linear filter with gains Kt thatdepend on the specific xt and ut = minusLtxt within each simulation run This isagain accomplished by minimizing E [vt+1] with respect to Kt but the expec-tation is computed with xt being a known constant rather than a randomvariable We now have x

t = xtxTt and xe

t = 0 and so the last two update

Methods for Optimal Sensorimotor Control 1097

equations in equation 52 are no longer needed The optimal adaptive linearfilter is

Adaptive estimator xt+1 = (Aminus BLt) xt + Kt (yt minus H xt) + ηt

Kt = At HT

(Ht HT + ω +

sumi

Di(t + xt xT

t

)DT

i

)minus1

t+1 = ξ + η + (Aminus Kt H) t AT +sum

i

Ci Ltxt xTt LT

t CTi

(53)

where t = Cov [xt| xt] is the conditional estimation error covariance(initialized from 1 which is given) When the control-dependentstate-dependent and internal noise terms are removed (C1 Cc = 0D1 Dd = 0 η = 0) equation 53 reduces to the Kalman filter in equa-tion 32 Note that using equation 53 instead of equation 52 online reducesthe total expected cost because equation 53 achieves lower estimation errorthan any other linear filter and the expected cost depends on the conditionalestimation error covariance This can be seen from

E[vt(xt xt)| xt] = xTt Sx

t xt + st + tr((

Sxt + Se

t

)Cov[xt| xt]

)6 Application to Reaching Movements

We now illustrate how the methodology developed above can be used toconstruct models relevant to motor control Since this is a methodologi-cal rather than a modeling article a detailed evaluation of the resultingmodels in the context of the motor control literature will not be givenhere The first model is a one-dimensional model of reaching and in-cludes control-dependent noise but no state-dependent or internal noiseThe latter two forms of noise are illustrated in the second model wherewe estimate the position of a stationary peripheral target without making amovement

61 Models We model a single-joint movement (such as flexing the el-bow) that brings the hand to a specified target For simplicity the rotationalmotion is replaced with translational motion the hand is modeled as a pointmass (m = 1 kg) whose one-dimensional position at time t is p(t) The com-bined action of all muscles is represented with the force f (t) acting on thehand The control signal u(t) is transformed into force f (t) by adding control-dependent multiplicative noise and applying a second-order muscle-likelow-pass filter (Winter 1990) of the form τ1τ2 f (t) + (τ1 + τ2) f(t) + f (t) =u(t) with time constants τ1 = τ2 = 004 sec Note that a second-order filtercan be written as a pair of coupled first-order filters (with outputs g and f )as follows τ1g(t) + g(t) = u(t) τ2 f (t) + f (t) = g(t)

The task is to move the hand from the starting position p(0) = 0 m to thetarget position plowast = 01 m and stop there at time tend with minimal energy

1098 E Todorov

consumption Movement durations are in the interval tend isin [025 sec035 sec] Time is discretized at = 001 sec The total cost is defined as

(p(tend) minus plowast)2 + (wv p(tend))2 + (w f f (tend))2 + rn minus 1

nminus1sumk=1

u(k)2

The first term enforces positional accuracy the second and third termsspecify that the movement has to stop at time tend that is both the velocityand force have to vanish and the last term penalizes energy consumptionIt makes sense to set the scaling weights wv and w f so that wv p(t) andw f f (t) averaged over the movement have magnitudes similar to the handdisplacement plowast minus p(0) For a 01 m reaching movement that lasts about03 sec these weights are wv = 02 and w f = 002 The weight of the energyterm was set to r = 000001

The discrete-time system state is represented with the five-dimensionalvector

xt = [p(t) p(t) f (t) g(t) plowast]

initialized from a gaussian with mean x1 = [0 0 0 0 plowast] The auxiliarystate variable g(t) is needed to implement a second-order filter The tar-get plowast is included in the state so that we can capture the above cost func-tion using a quadratic with no linear terms defining p = [1 0 0 0 minus1]we have pTxt = p(tend) minus plowast and so xT

t (ppT)xt = (p(tend) minus plowast)2 Note thatthe same could be accomplished by setting p = [1 0 0 0 minusplowast] and xt =[p(t) p(t) f (t) g(t) 1] The advantage of the formulation used here is thatbecause the target is represented in the state the same control law can bereused for other targets The control law of course depends on the filterwhich depends on the initial expected state which depends on the targetmdashand so a control law optimal for one target is not necessarily optimal for allother targets Unpublished simulation results indicate good generalizationbut a more detailed investigation of how the optimal control law dependson the target position is needed

The sensory feedback carries information about position velocity andforce

yt = [p(t) p(t) f (t)] + ωt

The vector ωt of sensory noise terms has zero-mean gaussian distributionwith diagonal covariance

ω = (σsdiag[002 m 02 ms 1 N])2

where the relative magnitudes are set using the same order-of-magnitudereasoning as before and σs = 05 The multiplicative noise term added tothe discrete-time control signal ut = u(t) is σcεtut where σc = 05 Note that

Methods for Optimal Sensorimotor Control 1099

σc is a unitless quantity that defines the noise magnitude relative to thecontrol signal magnitude

The discrete-time dynamics of the above system are

p(t + ) = p(t) + p(t)

p(t + ) = p(t) + f (t)m

f (t + ) = f (t) (1 minus τ2) + g(t)τ2

g(t + ) = g(t) (1 minus τ1) + u(t) (1 + σcεt) τ1

which is transformed into the form of equation 31 by the matrices

A =

1 0 0 0

0 1 m 0 0

0 0 1 minus τ2 τ2 0

0 0 0 1 minus τ1 0

0 0 0 0 1

B =

0

0

0

τ1

0

H =

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

C1 = Bσc c = 1 d = 0

1 = ξ = η = 0

The cost matrices are R = r Q1nminus1 = 0 and Qn = ppT + vvT + ffT where

p = [1 0 0 0 minus1] v = [0 wv 0 0 0] f = [0 0 w f 0 0]

This completes the formulation of the first model The above algorithmcan now be applied to obtain the control law and filter and the closed-loopsystem can be simulated To replace the control-dependent noise with ad-ditive noise of similar magnitude (and compare the effects of the two formsof noise) we will set c = 0 and ξ = (46 N)2 B BT The value of 46 N is theaverage magnitude of the control-dependent noise over the range of move-ment durations (found through 10000 simulation runs at each movementduration)

We also model an estimation process under state-dependent and inter-nal noise in the absence of movement In that case the state is xt = plowastwhere the stationary target plowast is sampled from a gaussian with mean x1 isin5 cm 15 cm 25 cm and variance 1 = (5 cm)2 Note that target eccentric-ity is represented as distance rather than visual angle The state-dependentnoise has scale D1 = 05 fixation is assumed to be at 0 cm the time step is = 10 msec and we run the estimation process for n = 100 time steps Inone set of simulations we use internal noise η = (05 cm)2 without addi-tive noise In another set of simulations we study additive noise with thesame magnitude ω = (05 cm)2 without internal noise There is no actu-ator to be controlled so we have A = H = 1 and B = L = 0 Estimation isbased on the adaptive filter from equation 53

1100 E Todorov

62 Results Reaching movements are known to have stereotyped bell-shaped speed profiles (Flash amp Hogan 1985) Models of this phenomenonhave traditionally been formulated in terms of deterministic open-loop min-imization of some cost function Cost functions that penalize physicallymeaningful quantities (such as duration or energy consumption) did notagree with empirical data (Nelson 1983) in order to obtain realistic speedprofiles it appeared necessary to minimize a smoothness-related cost thatpenalizes the derivative of acceleration (Flash amp Hogan 1985) or torque(Uno et al 1989) Smoothness-related cost functions have also been usedin the context of stochastic optimal feedback control (Hoff 1992) to obtainbell-shaped speed profiles It was recently shown however that smoothnessdoes not have to be explicitly enforced by the cost function open-loop min-imization of end-point error was found sufficient to produce realistic trajec-tories provided that the multiplicative nature of motor noise is taken intoaccount (Harris amp Wolpert 1998) While this is an important step toward amore principled optimization model of trajectory smoothness it still con-tains an ad hoc element the optimization is performed in an open loopwhich is suboptimal especially for movements of longer duration Ourmodel differs from Harris and Wolpert (1998) in that not only the averagesequence of control signals is optimal but the feedback gains that determinethe online sensory-guided adjustments are also optimal Optimal feedbackcontrol of reaching has been studied by Meyer Abrams Kornblum Wrightand Smith (1988) in an intermittent setting and Hoff (1992) in a continu-ous setting However both of these models assume full state observationOurs is the first optimal control model of reaching that incorporates sensorynoise and combines state estimation and feedback control into an optimalsensorimotor loop The predicted movement kinematics shown in Figure 1Aclosely resemble observed movement trajectories (Flash amp Hogan 1985)

Another well-known property of reaching movements first observed acentury ago by Woodworth and later quantified as Fittsrsquo law is the trade-offbetween speed and accuracy The fact that faster movements are less ac-curate implies that the instantaneous noise in the motor system is control-dependent in agreement with direct measurements of isometric force fluc-tuations (Sutton and Sykes 1967 Schmidt et al 1979 Todorov 2002) thatshow standard deviation increasing linearly with the mean Naturally thisnoise scaling has formed the basis of both closed-loop (Meyer et al 1988Hoff 1992) and open-loop (Harris amp Wolpert 1998) optimization models ofthe speed-accuracy trade-off Figure 1B illustrates the effect in our model asthe (specified) movement duration increases the standard deviation of theend-point error achieved by the optimal controller decreases To emphasizethe need for incorporating control-dependent noise we modified the modelby making the noise in the plant dynamics additive with fixed magnitudechosen to match the average multiplicative noise magnitude over the rangeof movement durations With that change the end-point error showed theopposite trend to the one observed experimentally (see Figure 1B)

Methods for Optimal Sensorimotor Control 1101

Figure 1 (A) Normalized position (Pos) velocity (Vel) and acceleration (Acc) ofthe average trajectory of the optimal controller (B) A separate optimal controllerwas constructed for each instructed duration the resulting closed-loop systemwas simulated for 10000 trials and the positional standard deviation at the endof the trial was plotted This was done with either multiplicative (solid line)or additive (dashed line) noise in the plant dynamics (C) The position of astationary peripheral target was estimated over time under internal estimationnoise (solid line) or additive observation noise (dashed line) This was done inthree sets of trials with target positions sampled from gaussians with means5 cm (bottom) 15 cm (middle) and 25 cm (top) Each curve is an average over10000 simulation runs

It is interesting to compare the effects of the control penalty r and the mul-tiplicative noise scaling σc As equation 42 shows both terms penalize largecontrol signalsmdashdirectly in the case of r and indirectly (via increased un-certainty) in the case of σc Consequently both terms lead to a negative biasin end-point position (not shown) but the effect is much more pronouncedfor r Another consequence of the fact that larger controls are more costlyarises in the control of redundant systems where the optimal strategy is tofollow a minimal intervention principle that is to leave task-irrelevant de-viations from the average behavior uncorrected (Todorov amp Jordan 2002a2002b) Simulations have shown that this more complex effect is dependenton σc and actually decreases when r is increased while σc is kept constant(Todorov amp Jordan 2002b)

Figure 1C shows simulation results from our second model where the po-sition of a stationary peripheral target is estimated by the optimal adaptivefilter in equation 53 operating under internal estimation noise or additiveobservation noise of the same magnitude In each case we show results forthree sets of targets with varying average eccentricity The standard devi-ations of the estimation error always reach an asymptote (much faster inthe case of internal noise) In the presence of internal noise this asymptotedepends on target eccentricity for the chosen model parameters the depen-dence is in quantitative agreement with our experimental results (Todorov1998) Under additive noise the error always asymptotes to 0

1102 E Todorov

Figure 2 Relative change in expected cost as a function of iteration numberin (A) psychophysical models and (B) random models (C) Relative variabil-ity (SDmean) among expected costs obtained from 100 different runs of thealgorithm on the same model (average over models in each class)

7 Convergence Properties

We studied the convergence properties of the algorithm in 10 models of psy-chophysical experiments taken from Todorov and Jordan (2002b) and 200randomly generated models The psychophysical models had dynamics andcost functions similar to the above example They included two models ofplanar reaching three models of passing through sequences of targets onemodel of isometric force production three models of tracking and reach-ing with a mechanically redundant arm and one model of throwing Thedimensionalities of the state control and feedback were between 5 and20 and the horizon n was about 100 The psychophysical models includedcontrol-dependent dynamics noise and additive observation noise but nointernal or state-dependent noise The details of all these models are inter-esting from a motor control point of view but we omit them here since theydid not affect the convergence of the algorithm in any systematic way

The random models were divided into two groups of 100 each passivelystable with all eigenvalues of A being smaller than 1 and passively unsta-ble with the largest eigenvalue of A being between 1 and 2 The dynamicswere restricted so that the last component of xt was 1mdashto make the randommodels more similar to the psychophysical models which always incorpo-rated a constant in the state description The state control and measurementdimensionalities were sampled uniformly between 5 and 20 The randommodels included all forms of noise allowed by equation 31

For each model we initialized K1nminus1 from equation 32 and applied ouriterative algorithm In all cases convergence was very rapid (see Figures 2Aand 2B) with the relative change in expected cost decreasing exponentiallyThe jitter observed at the end of the minimization (see Figure 2A) is due tonumerical round-off errors (note the log scale) and continues indefinitelyThe exponential convergence regime does not always start from the firstiteration (see Figure 2A) Similar behavior was observed for the absolute

Methods for Optimal Sensorimotor Control 1103

change in expected cost (not shown) As one would expect random mod-els with unstable passive dynamics converged more slowly than passivelystable models Convergence was observed in all cases

To test for the existence of local minima we focused on five psychophys-ical five random stable and five random unstable models For each modelthe algorithm was initialized 100 times with different randomly chosensequences K1nminus1 and run for 100 iterations For each model we com-puted the standard deviation of the expected cost obtained at each iterationand divided by the mean expected cost at that iteration The results aver-aged within each model class are plotted in Figure 2C The negligibly smallvalues after convergence indicate that the algorithm always finds the samesolution This was true for every model we studied despite the fact thatthe random initialization sometimes produced very large initial costs Wealso examined the K and L sequences found at the end of each run and thedifferences seemed to be due to round-off errors Thus we conjecture thatthe algorithm always converges to the globally optimal solution So far wehave not been able to prove this analytically and cannot offer a satisfyingintuitive explanation at this time

Note that the system can be unstable even for the optimal controllerFormally that does not affect the derivation because in a discrete-time finite-horizon system all numbers remain finite In practice the components of xt

can exceed the maximum floating-point number whenever the eigenvaluesof (Aminus BLt) are sufficiently large In the applications we are interestedin (Todorov 1998 Todorov amp Jordan 2002b) such problems were neverencountered

8 Improving Performance via Adaptive Estimation

Although the iterative algorithm given by equations 42 and 52 is guar-anteed to converge and empirically it appears to converge to the globallyoptimal solution performance can still be suboptimal due to the imposedrestriction to nonadaptive filters Here we present simulations aimed atquantifying this suboptimality

Because the potential suboptimality arises from the restriction to non-adaptive filters it is natural to ask what would happen if that restriction wereremoved in run time and the optimal adaptive linear filter from equation 53were used instead Recall that although the control law is optimized underthe assumption of a nonadaptive filter it yields better performance if a dif-ferent filter which somehow achieves lower estimation error is used in runtime Thus in our first test we simply replace the nonadaptive filter withequation 53 in run time and compute the reduction in expected total cost

The above discussion suggests a possibility for further improvement Thecontrol law is optimal with respect to some sequence of filter gains K1nminus1But the adaptive filter applied in run time uses systematically differentgains because it achieves systematically lower estimation error We can run

1104 E Todorov

Table 2 Cost Reduction

Model

Method Psychophysical Random Stable Random Unstable

Adaptive Estimator 19 0 314 Reoptimized Controller 17 0 283

Notes Numbers indicate percent reduction in expected total cost relative to thecost of the solution found by our iterative algorithm The two improvement meth-ods are described in the text Each method is applied to 10 models in each modelclass For each model and method expected total cost is computed from 10000simulation runs A value of 0 indicates that with a sample size of 10 models theimprovement was not significantly different from 0 (t-test p = 005 threshold)

our control law in conjunction with the adaptive filter and find the averagefilter gains K1nminus1 that are used online Now one would think that if wereoptimized the control law for the nonadaptive filter K1nminus1 which betterreflects the gains being used online by the adaptive filter this will furtherimprove performance This is the second test we apply

As Table 2 shows neither method improves performance substantiallyfor psychophysical models or random stable models However both meth-ods result in substantial improvement for random unstable models This isnot surprising In the passively stable models the differences between theexpected and actual values of the states and controls are relatively smalland so the optimal nonadaptive filter is not that different from the optimaladaptive filter The unstable models on the other hand are very sensitive tosmall perturbations and thus follow substantially different state-control tra-jectories in different simulation runs So the advantage of adaptive filteringis much greater Since musculoskeletal plants have stable passive dynamicswe conclude that our algorithm is well suited for approximating the optimalsensorimotor system

It is interesting that control law reoptimization in addition to adaptivefiltering is actually worse than adaptive filtering alonemdashcontrary to our in-tuition This was the case for every model we studied Although it is notclear where the problem with the reoptimization method lies this some-what unexpected result provides further justification for the restriction weintroduced In particular it suggests that the control law that is optimalunder the best nonadaptive filter may be close to optimal under the bestadaptive filter

9 Discussion

We have presented an algorithm for stochastic optimal control and estima-tion of partially observable linear dynamical systems subject to quadraticcosts and noise processes characteristic of the sensorimotor system (see

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 8: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

Methods for Optimal Sensorimotor Control 1091

4 Optimal Controller

The optimal ut will be computed using the method of dynamic program-ming We will show by induction that if the true state at time t is xt andthe unbiased state estimate available to the control system is xt then theoptimal cost-to-go function (ie the cost expected to accumulate under theoptimal control law) has the quadratic form

vt(xt xt) = xTt Sx

t xt + (xt minus xt)TSet (xt minus xt) + st = xT

t Sxt xt + eT

t Set et + st

where et xt minus xt is the estimation error At the final time t = n the optimalcost-to-go is simply the final cost xT

n Qnxn and so vn is in the assumed formwith Sx

n = Qn Sen = 0 sn = 0 To carry out the induction proof we have to

show that if vt+1 is in the above form for some t lt n then vt is also in thatform

Consider a time-varying control law that is optimal at times t + 1 nand at time t is given by ut = π ( xt) Let vπ

t (xt xt) be the correspondingcost-to-go function Since this control law is optimal after time t we havevπ

t+1 = vt+1 Then the cost-to-go function vπt satisfies the Bellman equation

vπt (xt xt) = xT

t Qtxt + π ( xt)T Rπ ( xt) + E [vt+1(xt+1 xt+1)|xt xt π ]

To compute the above expectation term we need the update equationsfor the system variables Using the definitions of the observation yt andthe estimation error et the stochastic dynamics of the variables of interestbecome

xt+1 = Axt + Bπ ( xt) + ξt +sum

i

εitCiπ ( xt)

et+1 = (Aminus Kt H)et + ξt minus Ktωt minus ηt +sum

i

εitCiπ ( xt) minus

sumi

εit Kt Di xt

(41)

Then the conditional means and covariances of xt+1 and et+1 are

E [xt+1|xt xt π ] = Axt + Bπ ( xt)

E [et+1|xt xt π ] = (Aminus Kt H)et

Cov [xt+1|xt xt π ] = ξ +sum

i

Ciπ ( xt)π ( xt)TCTi

Cov [et+1|xt xt π ] = ξ +sum

i

Ciπ ( xt)π ( xt)TCTi + η

+ Ktω K T

t +sum

i

Kt Di xtxTt DT

i K Tt

1092 E Todorov

and the conditional expectation in the Bellman equation can be computedThe cost-to-go becomes

vπt (xt xt) = xT

t

(Qt + ATSx

t+1 A+ Dt)xt

+ eTt (Aminus Kt H)TSe

t+1(Aminus Kt H)et

+ tr (Mt) + π ( xt)T (R + BTSx

t+1 B + Ct)π ( xt)

+ 2π ( xt)T BTSxt+1 Axt

where we defined the shortcuts

Ct sum

i

CTi

(Se

t+1 + Sxt+1

)Ci

Dt sum

i

DTi K T

t Set+1 Kt Di and

Mt Sxt+1

ξ + Set+1

(ξ + η + Kt

ω K Tt

)

Note that the control law affects only the cost-go-to function through anexpression that is quadratic in π ( xt) which can be minimized analyticallyBut there is a problem the minimum depends on xt while π is only allowedto be a function of xt To obtain the optimal control law at time t we haveto take an expectation over xt conditional on xt and find the function π

that minimizes the resulting expression Note that the control-dependentexpression is linear in xt and so its expectation depends on the conditionalmean of xt but not on any higher moments Since E [xt| xt] = xt we have

E[vπ

t (xt xt)| xt] = const + π ( xt)T

(R + BTSx

t+1 B + Ct)π ( xt)

+ 2π ( xt)T BTSxt+1 A xt

and thus the optimal control law at time t is

ut = π ( xt) = minusLtxt Lt (R + BTSx

t+1 B + Ct)minus1

BTSxt+1 A

Note that the linear form of the optimal control law fell out of the opti-mization and was not assumed Given our assumptions the matrix beinginverted is symmetric positive-definite

To complete the induction proof we have to compute the optimal cost-to-go vt which is equal to vπ

t when π is set to the optimal control law minusLtxt Using the fact that LT

t (R + BTSxt+1 B + Ct)Lt = LT

t BTSxt+1 A = ATSx

t+1 BLt andthat xT Z x minus 2 xT Zx = (x minus x )T Z(x minus x ) minus xT Zx = eT Ze minus xT Zx for a sym-metric matrix Z (in our case equal to LT

t BTSxt+1 A) the result is

vt(xt xt) = xTt

(Qt + ATSx

t+1(Aminus BLt) + Dt)xt + tr (Mt) + st+1

+ eTt

(ATSx

t+1 BLt + (Aminus Kt H)TSet+1(Aminus Kt H)

)et

Methods for Optimal Sensorimotor Control 1093

We now see that the optimal cost-to-go function remains in the assumedquadratic form which completes the induction proof The optimal controllaw is computed recursively backward in time as

Controller ut = minusLtxt

Lt =(

R + BTSxt+1 B +

sumi

CTi

(Sx

t+1 + Set+1

)Ci

)minus1

BTSxt+1 A

Sxt = Qt + ATSx

t+1(Aminus BLt) +sum

i

DTi K T

t Set+1 Kt Di Sx

n = Qn

Set = ATSx

t+1 BLt + (Aminus Kt H)T Set+1 (Aminus Kt H) Se

n = 0

st = tr(Sx

t+1ξ + Se

t+1

(ξ + η + Kt

ω K Tt

)) + st+1 sn = 0

(42)

The total expected cost is x T1 Sx

1 x1 + tr((Sx1 + Se

1 )1) + s1When the control-dependent and state-dependent noise terms are re-

moved (ie C1 Cc = 0 D1 Dd = 0) the control laws given byequation 42 and 32 are identical The internal noise term η as well asthe additive noise terms ξ and ω do not directly affect the calculation ofthe feedback gain matrices L However all noise terms affect the calculation(see below) of the optimal filter gains K which in turn affect L

One can attempt to transform equation 31 into a fully observable systemby setting H = I ω = η = 0 D1 Dd = 0 in which case K = A andapply equation 42 Recall however our assumption that the control signalis generated before the current state is measured Thus even if we makethe sensory measurement equal to the state we would still be dealing witha partially observable system To derive the optimal controller for the fullyobservable case we have to assume that xt is known at the time when ut isgenerated The above derivation is now much simplified the optimal cost-to-go function vt is in the form xT

t Stxt + st and the expectation term thatneeds to be minimized with regard to ut = π (xt) becomes

E [vt+1] = (Axt + But)TSt+1(Axt + But)

+ uTt

(sumi

CTi St+1Ci

)ut + tr [St+1

ξ ] + st+1

and the optimal controller is computed in a backward pass through time as

Fully observable controller ut = minusLtxt

Lt =(

R + BTSt+1 B +sum

i

CTi St+1Ci

)minus1

BTSt+1 A

St = Qt + ATSt+1(Aminus BLt) Sn = Qn

st = tr (St+1ξ ) + st+1 sn = 0

(43)

1094 E Todorov

5 Optimal Estimator

So far we have computed the optimal control law L for any fixed sequenceof filter gains K What should these gains be fixed to Ideally they shouldcorrespond to a Kalman filter which is the optimal linear estimator How-ever in the presence of control-dependent and state-dependent noise theKalman filter gains become adaptive (ie Kt depends on xt and ut) whichwould make our control law derivation invalid Thus if we want to preservethe optimality of the control law given by equation 42 and obtain an iter-ative algorithm with guaranteed convergence we need to compute a fixedsequence of filter gains that are optimal for a given control law Once theiterative algorithm has converged and the control law has been designedwe could use an adaptive filter in place of the fixed-gain filter in run time(see section 8)

Thus our objective here is the following given a linear feedback controllaw L1 Lnminus1 (which is optimal for the previous filter K1 Knminus1)compute a new filter that in conjunction with the given control law resultsin minimal expected cost In other words we will evaluate the filter not bythe magnitude of its estimation errors but by the effect that these estimationerrors have on the performance of the composite estimation-control system

We will show that the new optimal filter can be designed in a forwardpass through time In particular we will show that regardless of the newvalues of K1 Ktminus1 the optimal Kt can be found analytically as long asKt+1 Knminus1 still have the values for which Lt+1 Lnminus1 are optimalRecall that the optimal Lt+1 Lnminus1 depend only on Kt+1 Knminus1 andso the parameters (as well as the form) of the optimal cost-to-go functionvt+1 cannot be affected by changing K1 Kt Since Kt affects only thecomputation of xt+1 and the effect of xt+1 on the total expected cost is cap-tured by the function vt+1 we have to minimize vt+1 with respect to Kt But v is a function of x and x while K cannot be adapted to the specificvalues of x and x within a simulation run (by assumption) Thus thequantity we have to minimize is the unconditional expectation of vt+1 Indoing so we will use that fact that

E [vt+1(xt+1 xt+1)] = Ext xt [E [vt+1(xt+1 xt+1)|xt xt Lt]]

The conditional expectation was already computed as an intermediate stepin the previous section (not shown) The terms in E [vt+1(xt+1 xt+1)|xt xt Lt]that depend on Kt are

eTt (Aminus Kt H)TSe

t+1(Aminus Kt H)et + tr

(Kt

(ω +

sumi

Di xtxTt DT

i

)K T

t Set+1

)

Defining the (uncentered) unconditional covariances et E [eteT

t ] andx

t E [xtxTt ] the unconditional expectation of the Kt-dependent expression

Methods for Optimal Sensorimotor Control 1095

above becomes

a (Kt) = tr((

(Aminus Kt H)et (Aminus Kt H)T + KtPt K T

t

)Se

t+1

)Pt ω +

sumi

Dixt DT

i

The minimum of a (Kt) is found by setting its derivative with regard to Kt to0 Using the matrix identities part

part X tr (XU) = UT and partpart X tr

(XU XTV

) = VXU +VT XUT and the fact that the matrices Se

t+1 ω e

t xt are symmetric we

obtainparta (Kt)partKt

= 2Set+1

(Kt

(He

t HT + Pt) minus Ae

t HT)

This expression is equal to 0 whenever Kt = Aet HT(He

t HT + Pt)minus1 re-gardless of the value of Se

t+1 Given our assumptions the matrix being in-verted is symmetric positive-definite Note that the optimal Kt dependson K1 Ktminus1 (through e

t and xt ) but is independent of Kt+1 Knminus1

(since it is independent of Set+1) This is the reason that the filter gains are

reoptimized in a forward passTo complete the derivation we have to substitute the optimal filter

gains and compute the unconditional covariances Recall that the variablesxt xt et are deterministically related by et = xt minus xt so the covariance ofany one of them can be computed given the covariances of the other twoand we have a choice of which pair of covariance matrices to computeThe resulting equations are most compact for the pair xt et The stochasticdynamics of these variables are

xt+1 = (Aminus BLt) xt + Kt Het + Ktωt + ηt +sum

i

εit Kt Di (et + xt)

et+1 = (Aminus Kt H)et + ξt minus Ktωt minus ηt minussum

i

εitCi Ltxt (51)

minussum

i

εit Kt Di (et + xt)

Define the unconditional covariances

et E

[eteT

t

] xt E

[xtxT

t

] xet E

[xteT

t

]

noting that xt is uncentered and ex

t = (xet )T Since x1 is a known constant

the initialization at t = 1 is e1 = 1 x

1 = x1xT1 xe

1 = 0 With these defi-nitions we have x

t = E [(et + xt)(et + xt)T] = et + x

t + xet + xeT

t Usingequation 51 the updates for the unconditional covariances are

et+1 = (Aminus Kt H)e

t (Aminus Kt H)T + ξ + η + KtPt K Tt

+sum

i

Ci Ltxt LT

t CTi

1096 E Todorov

xt+1 = (Aminus BLt)x

t (Aminus BLt)T + η + Kt(He

t HT + Pt)K T

t

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T

xet+1 = (Aminus BLt)xe

t (Aminus Kt H)T + Kt Het (Aminus Kt H)T

minus η minus KtPt K Tt

Substituting the optimal value of Kt which allows some simplifications tothe above update equations the optimal nonadaptive linear filter is com-puted in a forward pass through time as

Estimator xt+1 = (Aminus BLt) xt + Kt(yt minus H xt) + ηt

Kt = Aet HT

(He

t HT + ω +sum

i

Di(e

t + xt + xe

t + ext

)DT

i

)minus1

et+1 = ξ + η + (Aminus Kt H)e

t AT +sum

i

Ci Ltxt LT

t CTi e

1 = 1

xt+1 = η + Kt He

t AT + (Aminus BLt)xt (Aminus BLt)T

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T x

1 = x1xT1

xet+1 = (Aminus BLt) xe

t (Aminus Kt H)T minus η xe1 = 0

(52)It is worth noting the effects of the internal noise ηt If that term did not

exist (ie η = 0) the last update equation would yield xet = 0 for all t

Indeed for an optimal filter one would expect xet = 0 from the orthogo-

nality principle if the state estimate and estimation error were correlatedone could improve the filter by taking that correlation into account How-ever the situation here is different because we have noise acting directlyon the state estimate When such noise pushes xt in one direction et is (bydefinition) pushed in the opposite direction creating a negative correlationbetween xt and et This is the reason for the negative sign in front of the η

term in the last update equationThe complete algorithm is the followingAlgorithmInitialize K1 Knminus1 and iterate equation 42 and equation 52 until

convergence Convergence is guaranteed because the expected cost is non-negative by definition and we are using a coordinate-descent algorithmwhich decreases the expected cost in each step The initial sequence K couldbe set to 0mdashin which case the first pass of equation 42 will find the optimalopen-loop controls or initialized from equation 32mdashwhich is equivalent toassuming additive noise in the first pass

We can also derive the optimal adaptive linear filter with gains Kt thatdepend on the specific xt and ut = minusLtxt within each simulation run This isagain accomplished by minimizing E [vt+1] with respect to Kt but the expec-tation is computed with xt being a known constant rather than a randomvariable We now have x

t = xtxTt and xe

t = 0 and so the last two update

Methods for Optimal Sensorimotor Control 1097

equations in equation 52 are no longer needed The optimal adaptive linearfilter is

Adaptive estimator xt+1 = (Aminus BLt) xt + Kt (yt minus H xt) + ηt

Kt = At HT

(Ht HT + ω +

sumi

Di(t + xt xT

t

)DT

i

)minus1

t+1 = ξ + η + (Aminus Kt H) t AT +sum

i

Ci Ltxt xTt LT

t CTi

(53)

where t = Cov [xt| xt] is the conditional estimation error covariance(initialized from 1 which is given) When the control-dependentstate-dependent and internal noise terms are removed (C1 Cc = 0D1 Dd = 0 η = 0) equation 53 reduces to the Kalman filter in equa-tion 32 Note that using equation 53 instead of equation 52 online reducesthe total expected cost because equation 53 achieves lower estimation errorthan any other linear filter and the expected cost depends on the conditionalestimation error covariance This can be seen from

E[vt(xt xt)| xt] = xTt Sx

t xt + st + tr((

Sxt + Se

t

)Cov[xt| xt]

)6 Application to Reaching Movements

We now illustrate how the methodology developed above can be used toconstruct models relevant to motor control Since this is a methodologi-cal rather than a modeling article a detailed evaluation of the resultingmodels in the context of the motor control literature will not be givenhere The first model is a one-dimensional model of reaching and in-cludes control-dependent noise but no state-dependent or internal noiseThe latter two forms of noise are illustrated in the second model wherewe estimate the position of a stationary peripheral target without making amovement

61 Models We model a single-joint movement (such as flexing the el-bow) that brings the hand to a specified target For simplicity the rotationalmotion is replaced with translational motion the hand is modeled as a pointmass (m = 1 kg) whose one-dimensional position at time t is p(t) The com-bined action of all muscles is represented with the force f (t) acting on thehand The control signal u(t) is transformed into force f (t) by adding control-dependent multiplicative noise and applying a second-order muscle-likelow-pass filter (Winter 1990) of the form τ1τ2 f (t) + (τ1 + τ2) f(t) + f (t) =u(t) with time constants τ1 = τ2 = 004 sec Note that a second-order filtercan be written as a pair of coupled first-order filters (with outputs g and f )as follows τ1g(t) + g(t) = u(t) τ2 f (t) + f (t) = g(t)

The task is to move the hand from the starting position p(0) = 0 m to thetarget position plowast = 01 m and stop there at time tend with minimal energy

1098 E Todorov

consumption Movement durations are in the interval tend isin [025 sec035 sec] Time is discretized at = 001 sec The total cost is defined as

(p(tend) minus plowast)2 + (wv p(tend))2 + (w f f (tend))2 + rn minus 1

nminus1sumk=1

u(k)2

The first term enforces positional accuracy the second and third termsspecify that the movement has to stop at time tend that is both the velocityand force have to vanish and the last term penalizes energy consumptionIt makes sense to set the scaling weights wv and w f so that wv p(t) andw f f (t) averaged over the movement have magnitudes similar to the handdisplacement plowast minus p(0) For a 01 m reaching movement that lasts about03 sec these weights are wv = 02 and w f = 002 The weight of the energyterm was set to r = 000001

The discrete-time system state is represented with the five-dimensionalvector

xt = [p(t) p(t) f (t) g(t) plowast]

initialized from a gaussian with mean x1 = [0 0 0 0 plowast] The auxiliarystate variable g(t) is needed to implement a second-order filter The tar-get plowast is included in the state so that we can capture the above cost func-tion using a quadratic with no linear terms defining p = [1 0 0 0 minus1]we have pTxt = p(tend) minus plowast and so xT

t (ppT)xt = (p(tend) minus plowast)2 Note thatthe same could be accomplished by setting p = [1 0 0 0 minusplowast] and xt =[p(t) p(t) f (t) g(t) 1] The advantage of the formulation used here is thatbecause the target is represented in the state the same control law can bereused for other targets The control law of course depends on the filterwhich depends on the initial expected state which depends on the targetmdashand so a control law optimal for one target is not necessarily optimal for allother targets Unpublished simulation results indicate good generalizationbut a more detailed investigation of how the optimal control law dependson the target position is needed

The sensory feedback carries information about position velocity andforce

yt = [p(t) p(t) f (t)] + ωt

The vector ωt of sensory noise terms has zero-mean gaussian distributionwith diagonal covariance

ω = (σsdiag[002 m 02 ms 1 N])2

where the relative magnitudes are set using the same order-of-magnitudereasoning as before and σs = 05 The multiplicative noise term added tothe discrete-time control signal ut = u(t) is σcεtut where σc = 05 Note that

Methods for Optimal Sensorimotor Control 1099

σc is a unitless quantity that defines the noise magnitude relative to thecontrol signal magnitude

The discrete-time dynamics of the above system are

p(t + ) = p(t) + p(t)

p(t + ) = p(t) + f (t)m

f (t + ) = f (t) (1 minus τ2) + g(t)τ2

g(t + ) = g(t) (1 minus τ1) + u(t) (1 + σcεt) τ1

which is transformed into the form of equation 31 by the matrices

A =

1 0 0 0

0 1 m 0 0

0 0 1 minus τ2 τ2 0

0 0 0 1 minus τ1 0

0 0 0 0 1

B =

0

0

0

τ1

0

H =

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

C1 = Bσc c = 1 d = 0

1 = ξ = η = 0

The cost matrices are R = r Q1nminus1 = 0 and Qn = ppT + vvT + ffT where

p = [1 0 0 0 minus1] v = [0 wv 0 0 0] f = [0 0 w f 0 0]

This completes the formulation of the first model The above algorithmcan now be applied to obtain the control law and filter and the closed-loopsystem can be simulated To replace the control-dependent noise with ad-ditive noise of similar magnitude (and compare the effects of the two formsof noise) we will set c = 0 and ξ = (46 N)2 B BT The value of 46 N is theaverage magnitude of the control-dependent noise over the range of move-ment durations (found through 10000 simulation runs at each movementduration)

We also model an estimation process under state-dependent and inter-nal noise in the absence of movement In that case the state is xt = plowastwhere the stationary target plowast is sampled from a gaussian with mean x1 isin5 cm 15 cm 25 cm and variance 1 = (5 cm)2 Note that target eccentric-ity is represented as distance rather than visual angle The state-dependentnoise has scale D1 = 05 fixation is assumed to be at 0 cm the time step is = 10 msec and we run the estimation process for n = 100 time steps Inone set of simulations we use internal noise η = (05 cm)2 without addi-tive noise In another set of simulations we study additive noise with thesame magnitude ω = (05 cm)2 without internal noise There is no actu-ator to be controlled so we have A = H = 1 and B = L = 0 Estimation isbased on the adaptive filter from equation 53

1100 E Todorov

62 Results Reaching movements are known to have stereotyped bell-shaped speed profiles (Flash amp Hogan 1985) Models of this phenomenonhave traditionally been formulated in terms of deterministic open-loop min-imization of some cost function Cost functions that penalize physicallymeaningful quantities (such as duration or energy consumption) did notagree with empirical data (Nelson 1983) in order to obtain realistic speedprofiles it appeared necessary to minimize a smoothness-related cost thatpenalizes the derivative of acceleration (Flash amp Hogan 1985) or torque(Uno et al 1989) Smoothness-related cost functions have also been usedin the context of stochastic optimal feedback control (Hoff 1992) to obtainbell-shaped speed profiles It was recently shown however that smoothnessdoes not have to be explicitly enforced by the cost function open-loop min-imization of end-point error was found sufficient to produce realistic trajec-tories provided that the multiplicative nature of motor noise is taken intoaccount (Harris amp Wolpert 1998) While this is an important step toward amore principled optimization model of trajectory smoothness it still con-tains an ad hoc element the optimization is performed in an open loopwhich is suboptimal especially for movements of longer duration Ourmodel differs from Harris and Wolpert (1998) in that not only the averagesequence of control signals is optimal but the feedback gains that determinethe online sensory-guided adjustments are also optimal Optimal feedbackcontrol of reaching has been studied by Meyer Abrams Kornblum Wrightand Smith (1988) in an intermittent setting and Hoff (1992) in a continu-ous setting However both of these models assume full state observationOurs is the first optimal control model of reaching that incorporates sensorynoise and combines state estimation and feedback control into an optimalsensorimotor loop The predicted movement kinematics shown in Figure 1Aclosely resemble observed movement trajectories (Flash amp Hogan 1985)

Another well-known property of reaching movements first observed acentury ago by Woodworth and later quantified as Fittsrsquo law is the trade-offbetween speed and accuracy The fact that faster movements are less ac-curate implies that the instantaneous noise in the motor system is control-dependent in agreement with direct measurements of isometric force fluc-tuations (Sutton and Sykes 1967 Schmidt et al 1979 Todorov 2002) thatshow standard deviation increasing linearly with the mean Naturally thisnoise scaling has formed the basis of both closed-loop (Meyer et al 1988Hoff 1992) and open-loop (Harris amp Wolpert 1998) optimization models ofthe speed-accuracy trade-off Figure 1B illustrates the effect in our model asthe (specified) movement duration increases the standard deviation of theend-point error achieved by the optimal controller decreases To emphasizethe need for incorporating control-dependent noise we modified the modelby making the noise in the plant dynamics additive with fixed magnitudechosen to match the average multiplicative noise magnitude over the rangeof movement durations With that change the end-point error showed theopposite trend to the one observed experimentally (see Figure 1B)

Methods for Optimal Sensorimotor Control 1101

Figure 1 (A) Normalized position (Pos) velocity (Vel) and acceleration (Acc) ofthe average trajectory of the optimal controller (B) A separate optimal controllerwas constructed for each instructed duration the resulting closed-loop systemwas simulated for 10000 trials and the positional standard deviation at the endof the trial was plotted This was done with either multiplicative (solid line)or additive (dashed line) noise in the plant dynamics (C) The position of astationary peripheral target was estimated over time under internal estimationnoise (solid line) or additive observation noise (dashed line) This was done inthree sets of trials with target positions sampled from gaussians with means5 cm (bottom) 15 cm (middle) and 25 cm (top) Each curve is an average over10000 simulation runs

It is interesting to compare the effects of the control penalty r and the mul-tiplicative noise scaling σc As equation 42 shows both terms penalize largecontrol signalsmdashdirectly in the case of r and indirectly (via increased un-certainty) in the case of σc Consequently both terms lead to a negative biasin end-point position (not shown) but the effect is much more pronouncedfor r Another consequence of the fact that larger controls are more costlyarises in the control of redundant systems where the optimal strategy is tofollow a minimal intervention principle that is to leave task-irrelevant de-viations from the average behavior uncorrected (Todorov amp Jordan 2002a2002b) Simulations have shown that this more complex effect is dependenton σc and actually decreases when r is increased while σc is kept constant(Todorov amp Jordan 2002b)

Figure 1C shows simulation results from our second model where the po-sition of a stationary peripheral target is estimated by the optimal adaptivefilter in equation 53 operating under internal estimation noise or additiveobservation noise of the same magnitude In each case we show results forthree sets of targets with varying average eccentricity The standard devi-ations of the estimation error always reach an asymptote (much faster inthe case of internal noise) In the presence of internal noise this asymptotedepends on target eccentricity for the chosen model parameters the depen-dence is in quantitative agreement with our experimental results (Todorov1998) Under additive noise the error always asymptotes to 0

1102 E Todorov

Figure 2 Relative change in expected cost as a function of iteration numberin (A) psychophysical models and (B) random models (C) Relative variabil-ity (SDmean) among expected costs obtained from 100 different runs of thealgorithm on the same model (average over models in each class)

7 Convergence Properties

We studied the convergence properties of the algorithm in 10 models of psy-chophysical experiments taken from Todorov and Jordan (2002b) and 200randomly generated models The psychophysical models had dynamics andcost functions similar to the above example They included two models ofplanar reaching three models of passing through sequences of targets onemodel of isometric force production three models of tracking and reach-ing with a mechanically redundant arm and one model of throwing Thedimensionalities of the state control and feedback were between 5 and20 and the horizon n was about 100 The psychophysical models includedcontrol-dependent dynamics noise and additive observation noise but nointernal or state-dependent noise The details of all these models are inter-esting from a motor control point of view but we omit them here since theydid not affect the convergence of the algorithm in any systematic way

The random models were divided into two groups of 100 each passivelystable with all eigenvalues of A being smaller than 1 and passively unsta-ble with the largest eigenvalue of A being between 1 and 2 The dynamicswere restricted so that the last component of xt was 1mdashto make the randommodels more similar to the psychophysical models which always incorpo-rated a constant in the state description The state control and measurementdimensionalities were sampled uniformly between 5 and 20 The randommodels included all forms of noise allowed by equation 31

For each model we initialized K1nminus1 from equation 32 and applied ouriterative algorithm In all cases convergence was very rapid (see Figures 2Aand 2B) with the relative change in expected cost decreasing exponentiallyThe jitter observed at the end of the minimization (see Figure 2A) is due tonumerical round-off errors (note the log scale) and continues indefinitelyThe exponential convergence regime does not always start from the firstiteration (see Figure 2A) Similar behavior was observed for the absolute

Methods for Optimal Sensorimotor Control 1103

change in expected cost (not shown) As one would expect random mod-els with unstable passive dynamics converged more slowly than passivelystable models Convergence was observed in all cases

To test for the existence of local minima we focused on five psychophys-ical five random stable and five random unstable models For each modelthe algorithm was initialized 100 times with different randomly chosensequences K1nminus1 and run for 100 iterations For each model we com-puted the standard deviation of the expected cost obtained at each iterationand divided by the mean expected cost at that iteration The results aver-aged within each model class are plotted in Figure 2C The negligibly smallvalues after convergence indicate that the algorithm always finds the samesolution This was true for every model we studied despite the fact thatthe random initialization sometimes produced very large initial costs Wealso examined the K and L sequences found at the end of each run and thedifferences seemed to be due to round-off errors Thus we conjecture thatthe algorithm always converges to the globally optimal solution So far wehave not been able to prove this analytically and cannot offer a satisfyingintuitive explanation at this time

Note that the system can be unstable even for the optimal controllerFormally that does not affect the derivation because in a discrete-time finite-horizon system all numbers remain finite In practice the components of xt

can exceed the maximum floating-point number whenever the eigenvaluesof (Aminus BLt) are sufficiently large In the applications we are interestedin (Todorov 1998 Todorov amp Jordan 2002b) such problems were neverencountered

8 Improving Performance via Adaptive Estimation

Although the iterative algorithm given by equations 42 and 52 is guar-anteed to converge and empirically it appears to converge to the globallyoptimal solution performance can still be suboptimal due to the imposedrestriction to nonadaptive filters Here we present simulations aimed atquantifying this suboptimality

Because the potential suboptimality arises from the restriction to non-adaptive filters it is natural to ask what would happen if that restriction wereremoved in run time and the optimal adaptive linear filter from equation 53were used instead Recall that although the control law is optimized underthe assumption of a nonadaptive filter it yields better performance if a dif-ferent filter which somehow achieves lower estimation error is used in runtime Thus in our first test we simply replace the nonadaptive filter withequation 53 in run time and compute the reduction in expected total cost

The above discussion suggests a possibility for further improvement Thecontrol law is optimal with respect to some sequence of filter gains K1nminus1But the adaptive filter applied in run time uses systematically differentgains because it achieves systematically lower estimation error We can run

1104 E Todorov

Table 2 Cost Reduction

Model

Method Psychophysical Random Stable Random Unstable

Adaptive Estimator 19 0 314 Reoptimized Controller 17 0 283

Notes Numbers indicate percent reduction in expected total cost relative to thecost of the solution found by our iterative algorithm The two improvement meth-ods are described in the text Each method is applied to 10 models in each modelclass For each model and method expected total cost is computed from 10000simulation runs A value of 0 indicates that with a sample size of 10 models theimprovement was not significantly different from 0 (t-test p = 005 threshold)

our control law in conjunction with the adaptive filter and find the averagefilter gains K1nminus1 that are used online Now one would think that if wereoptimized the control law for the nonadaptive filter K1nminus1 which betterreflects the gains being used online by the adaptive filter this will furtherimprove performance This is the second test we apply

As Table 2 shows neither method improves performance substantiallyfor psychophysical models or random stable models However both meth-ods result in substantial improvement for random unstable models This isnot surprising In the passively stable models the differences between theexpected and actual values of the states and controls are relatively smalland so the optimal nonadaptive filter is not that different from the optimaladaptive filter The unstable models on the other hand are very sensitive tosmall perturbations and thus follow substantially different state-control tra-jectories in different simulation runs So the advantage of adaptive filteringis much greater Since musculoskeletal plants have stable passive dynamicswe conclude that our algorithm is well suited for approximating the optimalsensorimotor system

It is interesting that control law reoptimization in addition to adaptivefiltering is actually worse than adaptive filtering alonemdashcontrary to our in-tuition This was the case for every model we studied Although it is notclear where the problem with the reoptimization method lies this some-what unexpected result provides further justification for the restriction weintroduced In particular it suggests that the control law that is optimalunder the best nonadaptive filter may be close to optimal under the bestadaptive filter

9 Discussion

We have presented an algorithm for stochastic optimal control and estima-tion of partially observable linear dynamical systems subject to quadraticcosts and noise processes characteristic of the sensorimotor system (see

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 9: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

1092 E Todorov

and the conditional expectation in the Bellman equation can be computedThe cost-to-go becomes

vπt (xt xt) = xT

t

(Qt + ATSx

t+1 A+ Dt)xt

+ eTt (Aminus Kt H)TSe

t+1(Aminus Kt H)et

+ tr (Mt) + π ( xt)T (R + BTSx

t+1 B + Ct)π ( xt)

+ 2π ( xt)T BTSxt+1 Axt

where we defined the shortcuts

Ct sum

i

CTi

(Se

t+1 + Sxt+1

)Ci

Dt sum

i

DTi K T

t Set+1 Kt Di and

Mt Sxt+1

ξ + Set+1

(ξ + η + Kt

ω K Tt

)

Note that the control law affects only the cost-go-to function through anexpression that is quadratic in π ( xt) which can be minimized analyticallyBut there is a problem the minimum depends on xt while π is only allowedto be a function of xt To obtain the optimal control law at time t we haveto take an expectation over xt conditional on xt and find the function π

that minimizes the resulting expression Note that the control-dependentexpression is linear in xt and so its expectation depends on the conditionalmean of xt but not on any higher moments Since E [xt| xt] = xt we have

E[vπ

t (xt xt)| xt] = const + π ( xt)T

(R + BTSx

t+1 B + Ct)π ( xt)

+ 2π ( xt)T BTSxt+1 A xt

and thus the optimal control law at time t is

ut = π ( xt) = minusLtxt Lt (R + BTSx

t+1 B + Ct)minus1

BTSxt+1 A

Note that the linear form of the optimal control law fell out of the opti-mization and was not assumed Given our assumptions the matrix beinginverted is symmetric positive-definite

To complete the induction proof we have to compute the optimal cost-to-go vt which is equal to vπ

t when π is set to the optimal control law minusLtxt Using the fact that LT

t (R + BTSxt+1 B + Ct)Lt = LT

t BTSxt+1 A = ATSx

t+1 BLt andthat xT Z x minus 2 xT Zx = (x minus x )T Z(x minus x ) minus xT Zx = eT Ze minus xT Zx for a sym-metric matrix Z (in our case equal to LT

t BTSxt+1 A) the result is

vt(xt xt) = xTt

(Qt + ATSx

t+1(Aminus BLt) + Dt)xt + tr (Mt) + st+1

+ eTt

(ATSx

t+1 BLt + (Aminus Kt H)TSet+1(Aminus Kt H)

)et

Methods for Optimal Sensorimotor Control 1093

We now see that the optimal cost-to-go function remains in the assumedquadratic form which completes the induction proof The optimal controllaw is computed recursively backward in time as

Controller ut = minusLtxt

Lt =(

R + BTSxt+1 B +

sumi

CTi

(Sx

t+1 + Set+1

)Ci

)minus1

BTSxt+1 A

Sxt = Qt + ATSx

t+1(Aminus BLt) +sum

i

DTi K T

t Set+1 Kt Di Sx

n = Qn

Set = ATSx

t+1 BLt + (Aminus Kt H)T Set+1 (Aminus Kt H) Se

n = 0

st = tr(Sx

t+1ξ + Se

t+1

(ξ + η + Kt

ω K Tt

)) + st+1 sn = 0

(42)

The total expected cost is x T1 Sx

1 x1 + tr((Sx1 + Se

1 )1) + s1When the control-dependent and state-dependent noise terms are re-

moved (ie C1 Cc = 0 D1 Dd = 0) the control laws given byequation 42 and 32 are identical The internal noise term η as well asthe additive noise terms ξ and ω do not directly affect the calculation ofthe feedback gain matrices L However all noise terms affect the calculation(see below) of the optimal filter gains K which in turn affect L

One can attempt to transform equation 31 into a fully observable systemby setting H = I ω = η = 0 D1 Dd = 0 in which case K = A andapply equation 42 Recall however our assumption that the control signalis generated before the current state is measured Thus even if we makethe sensory measurement equal to the state we would still be dealing witha partially observable system To derive the optimal controller for the fullyobservable case we have to assume that xt is known at the time when ut isgenerated The above derivation is now much simplified the optimal cost-to-go function vt is in the form xT

t Stxt + st and the expectation term thatneeds to be minimized with regard to ut = π (xt) becomes

E [vt+1] = (Axt + But)TSt+1(Axt + But)

+ uTt

(sumi

CTi St+1Ci

)ut + tr [St+1

ξ ] + st+1

and the optimal controller is computed in a backward pass through time as

Fully observable controller ut = minusLtxt

Lt =(

R + BTSt+1 B +sum

i

CTi St+1Ci

)minus1

BTSt+1 A

St = Qt + ATSt+1(Aminus BLt) Sn = Qn

st = tr (St+1ξ ) + st+1 sn = 0

(43)

1094 E Todorov

5 Optimal Estimator

So far we have computed the optimal control law L for any fixed sequenceof filter gains K What should these gains be fixed to Ideally they shouldcorrespond to a Kalman filter which is the optimal linear estimator How-ever in the presence of control-dependent and state-dependent noise theKalman filter gains become adaptive (ie Kt depends on xt and ut) whichwould make our control law derivation invalid Thus if we want to preservethe optimality of the control law given by equation 42 and obtain an iter-ative algorithm with guaranteed convergence we need to compute a fixedsequence of filter gains that are optimal for a given control law Once theiterative algorithm has converged and the control law has been designedwe could use an adaptive filter in place of the fixed-gain filter in run time(see section 8)

Thus our objective here is the following given a linear feedback controllaw L1 Lnminus1 (which is optimal for the previous filter K1 Knminus1)compute a new filter that in conjunction with the given control law resultsin minimal expected cost In other words we will evaluate the filter not bythe magnitude of its estimation errors but by the effect that these estimationerrors have on the performance of the composite estimation-control system

We will show that the new optimal filter can be designed in a forwardpass through time In particular we will show that regardless of the newvalues of K1 Ktminus1 the optimal Kt can be found analytically as long asKt+1 Knminus1 still have the values for which Lt+1 Lnminus1 are optimalRecall that the optimal Lt+1 Lnminus1 depend only on Kt+1 Knminus1 andso the parameters (as well as the form) of the optimal cost-to-go functionvt+1 cannot be affected by changing K1 Kt Since Kt affects only thecomputation of xt+1 and the effect of xt+1 on the total expected cost is cap-tured by the function vt+1 we have to minimize vt+1 with respect to Kt But v is a function of x and x while K cannot be adapted to the specificvalues of x and x within a simulation run (by assumption) Thus thequantity we have to minimize is the unconditional expectation of vt+1 Indoing so we will use that fact that

E [vt+1(xt+1 xt+1)] = Ext xt [E [vt+1(xt+1 xt+1)|xt xt Lt]]

The conditional expectation was already computed as an intermediate stepin the previous section (not shown) The terms in E [vt+1(xt+1 xt+1)|xt xt Lt]that depend on Kt are

eTt (Aminus Kt H)TSe

t+1(Aminus Kt H)et + tr

(Kt

(ω +

sumi

Di xtxTt DT

i

)K T

t Set+1

)

Defining the (uncentered) unconditional covariances et E [eteT

t ] andx

t E [xtxTt ] the unconditional expectation of the Kt-dependent expression

Methods for Optimal Sensorimotor Control 1095

above becomes

a (Kt) = tr((

(Aminus Kt H)et (Aminus Kt H)T + KtPt K T

t

)Se

t+1

)Pt ω +

sumi

Dixt DT

i

The minimum of a (Kt) is found by setting its derivative with regard to Kt to0 Using the matrix identities part

part X tr (XU) = UT and partpart X tr

(XU XTV

) = VXU +VT XUT and the fact that the matrices Se

t+1 ω e

t xt are symmetric we

obtainparta (Kt)partKt

= 2Set+1

(Kt

(He

t HT + Pt) minus Ae

t HT)

This expression is equal to 0 whenever Kt = Aet HT(He

t HT + Pt)minus1 re-gardless of the value of Se

t+1 Given our assumptions the matrix being in-verted is symmetric positive-definite Note that the optimal Kt dependson K1 Ktminus1 (through e

t and xt ) but is independent of Kt+1 Knminus1

(since it is independent of Set+1) This is the reason that the filter gains are

reoptimized in a forward passTo complete the derivation we have to substitute the optimal filter

gains and compute the unconditional covariances Recall that the variablesxt xt et are deterministically related by et = xt minus xt so the covariance ofany one of them can be computed given the covariances of the other twoand we have a choice of which pair of covariance matrices to computeThe resulting equations are most compact for the pair xt et The stochasticdynamics of these variables are

xt+1 = (Aminus BLt) xt + Kt Het + Ktωt + ηt +sum

i

εit Kt Di (et + xt)

et+1 = (Aminus Kt H)et + ξt minus Ktωt minus ηt minussum

i

εitCi Ltxt (51)

minussum

i

εit Kt Di (et + xt)

Define the unconditional covariances

et E

[eteT

t

] xt E

[xtxT

t

] xet E

[xteT

t

]

noting that xt is uncentered and ex

t = (xet )T Since x1 is a known constant

the initialization at t = 1 is e1 = 1 x

1 = x1xT1 xe

1 = 0 With these defi-nitions we have x

t = E [(et + xt)(et + xt)T] = et + x

t + xet + xeT

t Usingequation 51 the updates for the unconditional covariances are

et+1 = (Aminus Kt H)e

t (Aminus Kt H)T + ξ + η + KtPt K Tt

+sum

i

Ci Ltxt LT

t CTi

1096 E Todorov

xt+1 = (Aminus BLt)x

t (Aminus BLt)T + η + Kt(He

t HT + Pt)K T

t

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T

xet+1 = (Aminus BLt)xe

t (Aminus Kt H)T + Kt Het (Aminus Kt H)T

minus η minus KtPt K Tt

Substituting the optimal value of Kt which allows some simplifications tothe above update equations the optimal nonadaptive linear filter is com-puted in a forward pass through time as

Estimator xt+1 = (Aminus BLt) xt + Kt(yt minus H xt) + ηt

Kt = Aet HT

(He

t HT + ω +sum

i

Di(e

t + xt + xe

t + ext

)DT

i

)minus1

et+1 = ξ + η + (Aminus Kt H)e

t AT +sum

i

Ci Ltxt LT

t CTi e

1 = 1

xt+1 = η + Kt He

t AT + (Aminus BLt)xt (Aminus BLt)T

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T x

1 = x1xT1

xet+1 = (Aminus BLt) xe

t (Aminus Kt H)T minus η xe1 = 0

(52)It is worth noting the effects of the internal noise ηt If that term did not

exist (ie η = 0) the last update equation would yield xet = 0 for all t

Indeed for an optimal filter one would expect xet = 0 from the orthogo-

nality principle if the state estimate and estimation error were correlatedone could improve the filter by taking that correlation into account How-ever the situation here is different because we have noise acting directlyon the state estimate When such noise pushes xt in one direction et is (bydefinition) pushed in the opposite direction creating a negative correlationbetween xt and et This is the reason for the negative sign in front of the η

term in the last update equationThe complete algorithm is the followingAlgorithmInitialize K1 Knminus1 and iterate equation 42 and equation 52 until

convergence Convergence is guaranteed because the expected cost is non-negative by definition and we are using a coordinate-descent algorithmwhich decreases the expected cost in each step The initial sequence K couldbe set to 0mdashin which case the first pass of equation 42 will find the optimalopen-loop controls or initialized from equation 32mdashwhich is equivalent toassuming additive noise in the first pass

We can also derive the optimal adaptive linear filter with gains Kt thatdepend on the specific xt and ut = minusLtxt within each simulation run This isagain accomplished by minimizing E [vt+1] with respect to Kt but the expec-tation is computed with xt being a known constant rather than a randomvariable We now have x

t = xtxTt and xe

t = 0 and so the last two update

Methods for Optimal Sensorimotor Control 1097

equations in equation 52 are no longer needed The optimal adaptive linearfilter is

Adaptive estimator xt+1 = (Aminus BLt) xt + Kt (yt minus H xt) + ηt

Kt = At HT

(Ht HT + ω +

sumi

Di(t + xt xT

t

)DT

i

)minus1

t+1 = ξ + η + (Aminus Kt H) t AT +sum

i

Ci Ltxt xTt LT

t CTi

(53)

where t = Cov [xt| xt] is the conditional estimation error covariance(initialized from 1 which is given) When the control-dependentstate-dependent and internal noise terms are removed (C1 Cc = 0D1 Dd = 0 η = 0) equation 53 reduces to the Kalman filter in equa-tion 32 Note that using equation 53 instead of equation 52 online reducesthe total expected cost because equation 53 achieves lower estimation errorthan any other linear filter and the expected cost depends on the conditionalestimation error covariance This can be seen from

E[vt(xt xt)| xt] = xTt Sx

t xt + st + tr((

Sxt + Se

t

)Cov[xt| xt]

)6 Application to Reaching Movements

We now illustrate how the methodology developed above can be used toconstruct models relevant to motor control Since this is a methodologi-cal rather than a modeling article a detailed evaluation of the resultingmodels in the context of the motor control literature will not be givenhere The first model is a one-dimensional model of reaching and in-cludes control-dependent noise but no state-dependent or internal noiseThe latter two forms of noise are illustrated in the second model wherewe estimate the position of a stationary peripheral target without making amovement

61 Models We model a single-joint movement (such as flexing the el-bow) that brings the hand to a specified target For simplicity the rotationalmotion is replaced with translational motion the hand is modeled as a pointmass (m = 1 kg) whose one-dimensional position at time t is p(t) The com-bined action of all muscles is represented with the force f (t) acting on thehand The control signal u(t) is transformed into force f (t) by adding control-dependent multiplicative noise and applying a second-order muscle-likelow-pass filter (Winter 1990) of the form τ1τ2 f (t) + (τ1 + τ2) f(t) + f (t) =u(t) with time constants τ1 = τ2 = 004 sec Note that a second-order filtercan be written as a pair of coupled first-order filters (with outputs g and f )as follows τ1g(t) + g(t) = u(t) τ2 f (t) + f (t) = g(t)

The task is to move the hand from the starting position p(0) = 0 m to thetarget position plowast = 01 m and stop there at time tend with minimal energy

1098 E Todorov

consumption Movement durations are in the interval tend isin [025 sec035 sec] Time is discretized at = 001 sec The total cost is defined as

(p(tend) minus plowast)2 + (wv p(tend))2 + (w f f (tend))2 + rn minus 1

nminus1sumk=1

u(k)2

The first term enforces positional accuracy the second and third termsspecify that the movement has to stop at time tend that is both the velocityand force have to vanish and the last term penalizes energy consumptionIt makes sense to set the scaling weights wv and w f so that wv p(t) andw f f (t) averaged over the movement have magnitudes similar to the handdisplacement plowast minus p(0) For a 01 m reaching movement that lasts about03 sec these weights are wv = 02 and w f = 002 The weight of the energyterm was set to r = 000001

The discrete-time system state is represented with the five-dimensionalvector

xt = [p(t) p(t) f (t) g(t) plowast]

initialized from a gaussian with mean x1 = [0 0 0 0 plowast] The auxiliarystate variable g(t) is needed to implement a second-order filter The tar-get plowast is included in the state so that we can capture the above cost func-tion using a quadratic with no linear terms defining p = [1 0 0 0 minus1]we have pTxt = p(tend) minus plowast and so xT

t (ppT)xt = (p(tend) minus plowast)2 Note thatthe same could be accomplished by setting p = [1 0 0 0 minusplowast] and xt =[p(t) p(t) f (t) g(t) 1] The advantage of the formulation used here is thatbecause the target is represented in the state the same control law can bereused for other targets The control law of course depends on the filterwhich depends on the initial expected state which depends on the targetmdashand so a control law optimal for one target is not necessarily optimal for allother targets Unpublished simulation results indicate good generalizationbut a more detailed investigation of how the optimal control law dependson the target position is needed

The sensory feedback carries information about position velocity andforce

yt = [p(t) p(t) f (t)] + ωt

The vector ωt of sensory noise terms has zero-mean gaussian distributionwith diagonal covariance

ω = (σsdiag[002 m 02 ms 1 N])2

where the relative magnitudes are set using the same order-of-magnitudereasoning as before and σs = 05 The multiplicative noise term added tothe discrete-time control signal ut = u(t) is σcεtut where σc = 05 Note that

Methods for Optimal Sensorimotor Control 1099

σc is a unitless quantity that defines the noise magnitude relative to thecontrol signal magnitude

The discrete-time dynamics of the above system are

p(t + ) = p(t) + p(t)

p(t + ) = p(t) + f (t)m

f (t + ) = f (t) (1 minus τ2) + g(t)τ2

g(t + ) = g(t) (1 minus τ1) + u(t) (1 + σcεt) τ1

which is transformed into the form of equation 31 by the matrices

A =

1 0 0 0

0 1 m 0 0

0 0 1 minus τ2 τ2 0

0 0 0 1 minus τ1 0

0 0 0 0 1

B =

0

0

0

τ1

0

H =

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

C1 = Bσc c = 1 d = 0

1 = ξ = η = 0

The cost matrices are R = r Q1nminus1 = 0 and Qn = ppT + vvT + ffT where

p = [1 0 0 0 minus1] v = [0 wv 0 0 0] f = [0 0 w f 0 0]

This completes the formulation of the first model The above algorithmcan now be applied to obtain the control law and filter and the closed-loopsystem can be simulated To replace the control-dependent noise with ad-ditive noise of similar magnitude (and compare the effects of the two formsof noise) we will set c = 0 and ξ = (46 N)2 B BT The value of 46 N is theaverage magnitude of the control-dependent noise over the range of move-ment durations (found through 10000 simulation runs at each movementduration)

We also model an estimation process under state-dependent and inter-nal noise in the absence of movement In that case the state is xt = plowastwhere the stationary target plowast is sampled from a gaussian with mean x1 isin5 cm 15 cm 25 cm and variance 1 = (5 cm)2 Note that target eccentric-ity is represented as distance rather than visual angle The state-dependentnoise has scale D1 = 05 fixation is assumed to be at 0 cm the time step is = 10 msec and we run the estimation process for n = 100 time steps Inone set of simulations we use internal noise η = (05 cm)2 without addi-tive noise In another set of simulations we study additive noise with thesame magnitude ω = (05 cm)2 without internal noise There is no actu-ator to be controlled so we have A = H = 1 and B = L = 0 Estimation isbased on the adaptive filter from equation 53

1100 E Todorov

62 Results Reaching movements are known to have stereotyped bell-shaped speed profiles (Flash amp Hogan 1985) Models of this phenomenonhave traditionally been formulated in terms of deterministic open-loop min-imization of some cost function Cost functions that penalize physicallymeaningful quantities (such as duration or energy consumption) did notagree with empirical data (Nelson 1983) in order to obtain realistic speedprofiles it appeared necessary to minimize a smoothness-related cost thatpenalizes the derivative of acceleration (Flash amp Hogan 1985) or torque(Uno et al 1989) Smoothness-related cost functions have also been usedin the context of stochastic optimal feedback control (Hoff 1992) to obtainbell-shaped speed profiles It was recently shown however that smoothnessdoes not have to be explicitly enforced by the cost function open-loop min-imization of end-point error was found sufficient to produce realistic trajec-tories provided that the multiplicative nature of motor noise is taken intoaccount (Harris amp Wolpert 1998) While this is an important step toward amore principled optimization model of trajectory smoothness it still con-tains an ad hoc element the optimization is performed in an open loopwhich is suboptimal especially for movements of longer duration Ourmodel differs from Harris and Wolpert (1998) in that not only the averagesequence of control signals is optimal but the feedback gains that determinethe online sensory-guided adjustments are also optimal Optimal feedbackcontrol of reaching has been studied by Meyer Abrams Kornblum Wrightand Smith (1988) in an intermittent setting and Hoff (1992) in a continu-ous setting However both of these models assume full state observationOurs is the first optimal control model of reaching that incorporates sensorynoise and combines state estimation and feedback control into an optimalsensorimotor loop The predicted movement kinematics shown in Figure 1Aclosely resemble observed movement trajectories (Flash amp Hogan 1985)

Another well-known property of reaching movements first observed acentury ago by Woodworth and later quantified as Fittsrsquo law is the trade-offbetween speed and accuracy The fact that faster movements are less ac-curate implies that the instantaneous noise in the motor system is control-dependent in agreement with direct measurements of isometric force fluc-tuations (Sutton and Sykes 1967 Schmidt et al 1979 Todorov 2002) thatshow standard deviation increasing linearly with the mean Naturally thisnoise scaling has formed the basis of both closed-loop (Meyer et al 1988Hoff 1992) and open-loop (Harris amp Wolpert 1998) optimization models ofthe speed-accuracy trade-off Figure 1B illustrates the effect in our model asthe (specified) movement duration increases the standard deviation of theend-point error achieved by the optimal controller decreases To emphasizethe need for incorporating control-dependent noise we modified the modelby making the noise in the plant dynamics additive with fixed magnitudechosen to match the average multiplicative noise magnitude over the rangeof movement durations With that change the end-point error showed theopposite trend to the one observed experimentally (see Figure 1B)

Methods for Optimal Sensorimotor Control 1101

Figure 1 (A) Normalized position (Pos) velocity (Vel) and acceleration (Acc) ofthe average trajectory of the optimal controller (B) A separate optimal controllerwas constructed for each instructed duration the resulting closed-loop systemwas simulated for 10000 trials and the positional standard deviation at the endof the trial was plotted This was done with either multiplicative (solid line)or additive (dashed line) noise in the plant dynamics (C) The position of astationary peripheral target was estimated over time under internal estimationnoise (solid line) or additive observation noise (dashed line) This was done inthree sets of trials with target positions sampled from gaussians with means5 cm (bottom) 15 cm (middle) and 25 cm (top) Each curve is an average over10000 simulation runs

It is interesting to compare the effects of the control penalty r and the mul-tiplicative noise scaling σc As equation 42 shows both terms penalize largecontrol signalsmdashdirectly in the case of r and indirectly (via increased un-certainty) in the case of σc Consequently both terms lead to a negative biasin end-point position (not shown) but the effect is much more pronouncedfor r Another consequence of the fact that larger controls are more costlyarises in the control of redundant systems where the optimal strategy is tofollow a minimal intervention principle that is to leave task-irrelevant de-viations from the average behavior uncorrected (Todorov amp Jordan 2002a2002b) Simulations have shown that this more complex effect is dependenton σc and actually decreases when r is increased while σc is kept constant(Todorov amp Jordan 2002b)

Figure 1C shows simulation results from our second model where the po-sition of a stationary peripheral target is estimated by the optimal adaptivefilter in equation 53 operating under internal estimation noise or additiveobservation noise of the same magnitude In each case we show results forthree sets of targets with varying average eccentricity The standard devi-ations of the estimation error always reach an asymptote (much faster inthe case of internal noise) In the presence of internal noise this asymptotedepends on target eccentricity for the chosen model parameters the depen-dence is in quantitative agreement with our experimental results (Todorov1998) Under additive noise the error always asymptotes to 0

1102 E Todorov

Figure 2 Relative change in expected cost as a function of iteration numberin (A) psychophysical models and (B) random models (C) Relative variabil-ity (SDmean) among expected costs obtained from 100 different runs of thealgorithm on the same model (average over models in each class)

7 Convergence Properties

We studied the convergence properties of the algorithm in 10 models of psy-chophysical experiments taken from Todorov and Jordan (2002b) and 200randomly generated models The psychophysical models had dynamics andcost functions similar to the above example They included two models ofplanar reaching three models of passing through sequences of targets onemodel of isometric force production three models of tracking and reach-ing with a mechanically redundant arm and one model of throwing Thedimensionalities of the state control and feedback were between 5 and20 and the horizon n was about 100 The psychophysical models includedcontrol-dependent dynamics noise and additive observation noise but nointernal or state-dependent noise The details of all these models are inter-esting from a motor control point of view but we omit them here since theydid not affect the convergence of the algorithm in any systematic way

The random models were divided into two groups of 100 each passivelystable with all eigenvalues of A being smaller than 1 and passively unsta-ble with the largest eigenvalue of A being between 1 and 2 The dynamicswere restricted so that the last component of xt was 1mdashto make the randommodels more similar to the psychophysical models which always incorpo-rated a constant in the state description The state control and measurementdimensionalities were sampled uniformly between 5 and 20 The randommodels included all forms of noise allowed by equation 31

For each model we initialized K1nminus1 from equation 32 and applied ouriterative algorithm In all cases convergence was very rapid (see Figures 2Aand 2B) with the relative change in expected cost decreasing exponentiallyThe jitter observed at the end of the minimization (see Figure 2A) is due tonumerical round-off errors (note the log scale) and continues indefinitelyThe exponential convergence regime does not always start from the firstiteration (see Figure 2A) Similar behavior was observed for the absolute

Methods for Optimal Sensorimotor Control 1103

change in expected cost (not shown) As one would expect random mod-els with unstable passive dynamics converged more slowly than passivelystable models Convergence was observed in all cases

To test for the existence of local minima we focused on five psychophys-ical five random stable and five random unstable models For each modelthe algorithm was initialized 100 times with different randomly chosensequences K1nminus1 and run for 100 iterations For each model we com-puted the standard deviation of the expected cost obtained at each iterationand divided by the mean expected cost at that iteration The results aver-aged within each model class are plotted in Figure 2C The negligibly smallvalues after convergence indicate that the algorithm always finds the samesolution This was true for every model we studied despite the fact thatthe random initialization sometimes produced very large initial costs Wealso examined the K and L sequences found at the end of each run and thedifferences seemed to be due to round-off errors Thus we conjecture thatthe algorithm always converges to the globally optimal solution So far wehave not been able to prove this analytically and cannot offer a satisfyingintuitive explanation at this time

Note that the system can be unstable even for the optimal controllerFormally that does not affect the derivation because in a discrete-time finite-horizon system all numbers remain finite In practice the components of xt

can exceed the maximum floating-point number whenever the eigenvaluesof (Aminus BLt) are sufficiently large In the applications we are interestedin (Todorov 1998 Todorov amp Jordan 2002b) such problems were neverencountered

8 Improving Performance via Adaptive Estimation

Although the iterative algorithm given by equations 42 and 52 is guar-anteed to converge and empirically it appears to converge to the globallyoptimal solution performance can still be suboptimal due to the imposedrestriction to nonadaptive filters Here we present simulations aimed atquantifying this suboptimality

Because the potential suboptimality arises from the restriction to non-adaptive filters it is natural to ask what would happen if that restriction wereremoved in run time and the optimal adaptive linear filter from equation 53were used instead Recall that although the control law is optimized underthe assumption of a nonadaptive filter it yields better performance if a dif-ferent filter which somehow achieves lower estimation error is used in runtime Thus in our first test we simply replace the nonadaptive filter withequation 53 in run time and compute the reduction in expected total cost

The above discussion suggests a possibility for further improvement Thecontrol law is optimal with respect to some sequence of filter gains K1nminus1But the adaptive filter applied in run time uses systematically differentgains because it achieves systematically lower estimation error We can run

1104 E Todorov

Table 2 Cost Reduction

Model

Method Psychophysical Random Stable Random Unstable

Adaptive Estimator 19 0 314 Reoptimized Controller 17 0 283

Notes Numbers indicate percent reduction in expected total cost relative to thecost of the solution found by our iterative algorithm The two improvement meth-ods are described in the text Each method is applied to 10 models in each modelclass For each model and method expected total cost is computed from 10000simulation runs A value of 0 indicates that with a sample size of 10 models theimprovement was not significantly different from 0 (t-test p = 005 threshold)

our control law in conjunction with the adaptive filter and find the averagefilter gains K1nminus1 that are used online Now one would think that if wereoptimized the control law for the nonadaptive filter K1nminus1 which betterreflects the gains being used online by the adaptive filter this will furtherimprove performance This is the second test we apply

As Table 2 shows neither method improves performance substantiallyfor psychophysical models or random stable models However both meth-ods result in substantial improvement for random unstable models This isnot surprising In the passively stable models the differences between theexpected and actual values of the states and controls are relatively smalland so the optimal nonadaptive filter is not that different from the optimaladaptive filter The unstable models on the other hand are very sensitive tosmall perturbations and thus follow substantially different state-control tra-jectories in different simulation runs So the advantage of adaptive filteringis much greater Since musculoskeletal plants have stable passive dynamicswe conclude that our algorithm is well suited for approximating the optimalsensorimotor system

It is interesting that control law reoptimization in addition to adaptivefiltering is actually worse than adaptive filtering alonemdashcontrary to our in-tuition This was the case for every model we studied Although it is notclear where the problem with the reoptimization method lies this some-what unexpected result provides further justification for the restriction weintroduced In particular it suggests that the control law that is optimalunder the best nonadaptive filter may be close to optimal under the bestadaptive filter

9 Discussion

We have presented an algorithm for stochastic optimal control and estima-tion of partially observable linear dynamical systems subject to quadraticcosts and noise processes characteristic of the sensorimotor system (see

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 10: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

Methods for Optimal Sensorimotor Control 1093

We now see that the optimal cost-to-go function remains in the assumedquadratic form which completes the induction proof The optimal controllaw is computed recursively backward in time as

Controller ut = minusLtxt

Lt =(

R + BTSxt+1 B +

sumi

CTi

(Sx

t+1 + Set+1

)Ci

)minus1

BTSxt+1 A

Sxt = Qt + ATSx

t+1(Aminus BLt) +sum

i

DTi K T

t Set+1 Kt Di Sx

n = Qn

Set = ATSx

t+1 BLt + (Aminus Kt H)T Set+1 (Aminus Kt H) Se

n = 0

st = tr(Sx

t+1ξ + Se

t+1

(ξ + η + Kt

ω K Tt

)) + st+1 sn = 0

(42)

The total expected cost is x T1 Sx

1 x1 + tr((Sx1 + Se

1 )1) + s1When the control-dependent and state-dependent noise terms are re-

moved (ie C1 Cc = 0 D1 Dd = 0) the control laws given byequation 42 and 32 are identical The internal noise term η as well asthe additive noise terms ξ and ω do not directly affect the calculation ofthe feedback gain matrices L However all noise terms affect the calculation(see below) of the optimal filter gains K which in turn affect L

One can attempt to transform equation 31 into a fully observable systemby setting H = I ω = η = 0 D1 Dd = 0 in which case K = A andapply equation 42 Recall however our assumption that the control signalis generated before the current state is measured Thus even if we makethe sensory measurement equal to the state we would still be dealing witha partially observable system To derive the optimal controller for the fullyobservable case we have to assume that xt is known at the time when ut isgenerated The above derivation is now much simplified the optimal cost-to-go function vt is in the form xT

t Stxt + st and the expectation term thatneeds to be minimized with regard to ut = π (xt) becomes

E [vt+1] = (Axt + But)TSt+1(Axt + But)

+ uTt

(sumi

CTi St+1Ci

)ut + tr [St+1

ξ ] + st+1

and the optimal controller is computed in a backward pass through time as

Fully observable controller ut = minusLtxt

Lt =(

R + BTSt+1 B +sum

i

CTi St+1Ci

)minus1

BTSt+1 A

St = Qt + ATSt+1(Aminus BLt) Sn = Qn

st = tr (St+1ξ ) + st+1 sn = 0

(43)

1094 E Todorov

5 Optimal Estimator

So far we have computed the optimal control law L for any fixed sequenceof filter gains K What should these gains be fixed to Ideally they shouldcorrespond to a Kalman filter which is the optimal linear estimator How-ever in the presence of control-dependent and state-dependent noise theKalman filter gains become adaptive (ie Kt depends on xt and ut) whichwould make our control law derivation invalid Thus if we want to preservethe optimality of the control law given by equation 42 and obtain an iter-ative algorithm with guaranteed convergence we need to compute a fixedsequence of filter gains that are optimal for a given control law Once theiterative algorithm has converged and the control law has been designedwe could use an adaptive filter in place of the fixed-gain filter in run time(see section 8)

Thus our objective here is the following given a linear feedback controllaw L1 Lnminus1 (which is optimal for the previous filter K1 Knminus1)compute a new filter that in conjunction with the given control law resultsin minimal expected cost In other words we will evaluate the filter not bythe magnitude of its estimation errors but by the effect that these estimationerrors have on the performance of the composite estimation-control system

We will show that the new optimal filter can be designed in a forwardpass through time In particular we will show that regardless of the newvalues of K1 Ktminus1 the optimal Kt can be found analytically as long asKt+1 Knminus1 still have the values for which Lt+1 Lnminus1 are optimalRecall that the optimal Lt+1 Lnminus1 depend only on Kt+1 Knminus1 andso the parameters (as well as the form) of the optimal cost-to-go functionvt+1 cannot be affected by changing K1 Kt Since Kt affects only thecomputation of xt+1 and the effect of xt+1 on the total expected cost is cap-tured by the function vt+1 we have to minimize vt+1 with respect to Kt But v is a function of x and x while K cannot be adapted to the specificvalues of x and x within a simulation run (by assumption) Thus thequantity we have to minimize is the unconditional expectation of vt+1 Indoing so we will use that fact that

E [vt+1(xt+1 xt+1)] = Ext xt [E [vt+1(xt+1 xt+1)|xt xt Lt]]

The conditional expectation was already computed as an intermediate stepin the previous section (not shown) The terms in E [vt+1(xt+1 xt+1)|xt xt Lt]that depend on Kt are

eTt (Aminus Kt H)TSe

t+1(Aminus Kt H)et + tr

(Kt

(ω +

sumi

Di xtxTt DT

i

)K T

t Set+1

)

Defining the (uncentered) unconditional covariances et E [eteT

t ] andx

t E [xtxTt ] the unconditional expectation of the Kt-dependent expression

Methods for Optimal Sensorimotor Control 1095

above becomes

a (Kt) = tr((

(Aminus Kt H)et (Aminus Kt H)T + KtPt K T

t

)Se

t+1

)Pt ω +

sumi

Dixt DT

i

The minimum of a (Kt) is found by setting its derivative with regard to Kt to0 Using the matrix identities part

part X tr (XU) = UT and partpart X tr

(XU XTV

) = VXU +VT XUT and the fact that the matrices Se

t+1 ω e

t xt are symmetric we

obtainparta (Kt)partKt

= 2Set+1

(Kt

(He

t HT + Pt) minus Ae

t HT)

This expression is equal to 0 whenever Kt = Aet HT(He

t HT + Pt)minus1 re-gardless of the value of Se

t+1 Given our assumptions the matrix being in-verted is symmetric positive-definite Note that the optimal Kt dependson K1 Ktminus1 (through e

t and xt ) but is independent of Kt+1 Knminus1

(since it is independent of Set+1) This is the reason that the filter gains are

reoptimized in a forward passTo complete the derivation we have to substitute the optimal filter

gains and compute the unconditional covariances Recall that the variablesxt xt et are deterministically related by et = xt minus xt so the covariance ofany one of them can be computed given the covariances of the other twoand we have a choice of which pair of covariance matrices to computeThe resulting equations are most compact for the pair xt et The stochasticdynamics of these variables are

xt+1 = (Aminus BLt) xt + Kt Het + Ktωt + ηt +sum

i

εit Kt Di (et + xt)

et+1 = (Aminus Kt H)et + ξt minus Ktωt minus ηt minussum

i

εitCi Ltxt (51)

minussum

i

εit Kt Di (et + xt)

Define the unconditional covariances

et E

[eteT

t

] xt E

[xtxT

t

] xet E

[xteT

t

]

noting that xt is uncentered and ex

t = (xet )T Since x1 is a known constant

the initialization at t = 1 is e1 = 1 x

1 = x1xT1 xe

1 = 0 With these defi-nitions we have x

t = E [(et + xt)(et + xt)T] = et + x

t + xet + xeT

t Usingequation 51 the updates for the unconditional covariances are

et+1 = (Aminus Kt H)e

t (Aminus Kt H)T + ξ + η + KtPt K Tt

+sum

i

Ci Ltxt LT

t CTi

1096 E Todorov

xt+1 = (Aminus BLt)x

t (Aminus BLt)T + η + Kt(He

t HT + Pt)K T

t

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T

xet+1 = (Aminus BLt)xe

t (Aminus Kt H)T + Kt Het (Aminus Kt H)T

minus η minus KtPt K Tt

Substituting the optimal value of Kt which allows some simplifications tothe above update equations the optimal nonadaptive linear filter is com-puted in a forward pass through time as

Estimator xt+1 = (Aminus BLt) xt + Kt(yt minus H xt) + ηt

Kt = Aet HT

(He

t HT + ω +sum

i

Di(e

t + xt + xe

t + ext

)DT

i

)minus1

et+1 = ξ + η + (Aminus Kt H)e

t AT +sum

i

Ci Ltxt LT

t CTi e

1 = 1

xt+1 = η + Kt He

t AT + (Aminus BLt)xt (Aminus BLt)T

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T x

1 = x1xT1

xet+1 = (Aminus BLt) xe

t (Aminus Kt H)T minus η xe1 = 0

(52)It is worth noting the effects of the internal noise ηt If that term did not

exist (ie η = 0) the last update equation would yield xet = 0 for all t

Indeed for an optimal filter one would expect xet = 0 from the orthogo-

nality principle if the state estimate and estimation error were correlatedone could improve the filter by taking that correlation into account How-ever the situation here is different because we have noise acting directlyon the state estimate When such noise pushes xt in one direction et is (bydefinition) pushed in the opposite direction creating a negative correlationbetween xt and et This is the reason for the negative sign in front of the η

term in the last update equationThe complete algorithm is the followingAlgorithmInitialize K1 Knminus1 and iterate equation 42 and equation 52 until

convergence Convergence is guaranteed because the expected cost is non-negative by definition and we are using a coordinate-descent algorithmwhich decreases the expected cost in each step The initial sequence K couldbe set to 0mdashin which case the first pass of equation 42 will find the optimalopen-loop controls or initialized from equation 32mdashwhich is equivalent toassuming additive noise in the first pass

We can also derive the optimal adaptive linear filter with gains Kt thatdepend on the specific xt and ut = minusLtxt within each simulation run This isagain accomplished by minimizing E [vt+1] with respect to Kt but the expec-tation is computed with xt being a known constant rather than a randomvariable We now have x

t = xtxTt and xe

t = 0 and so the last two update

Methods for Optimal Sensorimotor Control 1097

equations in equation 52 are no longer needed The optimal adaptive linearfilter is

Adaptive estimator xt+1 = (Aminus BLt) xt + Kt (yt minus H xt) + ηt

Kt = At HT

(Ht HT + ω +

sumi

Di(t + xt xT

t

)DT

i

)minus1

t+1 = ξ + η + (Aminus Kt H) t AT +sum

i

Ci Ltxt xTt LT

t CTi

(53)

where t = Cov [xt| xt] is the conditional estimation error covariance(initialized from 1 which is given) When the control-dependentstate-dependent and internal noise terms are removed (C1 Cc = 0D1 Dd = 0 η = 0) equation 53 reduces to the Kalman filter in equa-tion 32 Note that using equation 53 instead of equation 52 online reducesthe total expected cost because equation 53 achieves lower estimation errorthan any other linear filter and the expected cost depends on the conditionalestimation error covariance This can be seen from

E[vt(xt xt)| xt] = xTt Sx

t xt + st + tr((

Sxt + Se

t

)Cov[xt| xt]

)6 Application to Reaching Movements

We now illustrate how the methodology developed above can be used toconstruct models relevant to motor control Since this is a methodologi-cal rather than a modeling article a detailed evaluation of the resultingmodels in the context of the motor control literature will not be givenhere The first model is a one-dimensional model of reaching and in-cludes control-dependent noise but no state-dependent or internal noiseThe latter two forms of noise are illustrated in the second model wherewe estimate the position of a stationary peripheral target without making amovement

61 Models We model a single-joint movement (such as flexing the el-bow) that brings the hand to a specified target For simplicity the rotationalmotion is replaced with translational motion the hand is modeled as a pointmass (m = 1 kg) whose one-dimensional position at time t is p(t) The com-bined action of all muscles is represented with the force f (t) acting on thehand The control signal u(t) is transformed into force f (t) by adding control-dependent multiplicative noise and applying a second-order muscle-likelow-pass filter (Winter 1990) of the form τ1τ2 f (t) + (τ1 + τ2) f(t) + f (t) =u(t) with time constants τ1 = τ2 = 004 sec Note that a second-order filtercan be written as a pair of coupled first-order filters (with outputs g and f )as follows τ1g(t) + g(t) = u(t) τ2 f (t) + f (t) = g(t)

The task is to move the hand from the starting position p(0) = 0 m to thetarget position plowast = 01 m and stop there at time tend with minimal energy

1098 E Todorov

consumption Movement durations are in the interval tend isin [025 sec035 sec] Time is discretized at = 001 sec The total cost is defined as

(p(tend) minus plowast)2 + (wv p(tend))2 + (w f f (tend))2 + rn minus 1

nminus1sumk=1

u(k)2

The first term enforces positional accuracy the second and third termsspecify that the movement has to stop at time tend that is both the velocityand force have to vanish and the last term penalizes energy consumptionIt makes sense to set the scaling weights wv and w f so that wv p(t) andw f f (t) averaged over the movement have magnitudes similar to the handdisplacement plowast minus p(0) For a 01 m reaching movement that lasts about03 sec these weights are wv = 02 and w f = 002 The weight of the energyterm was set to r = 000001

The discrete-time system state is represented with the five-dimensionalvector

xt = [p(t) p(t) f (t) g(t) plowast]

initialized from a gaussian with mean x1 = [0 0 0 0 plowast] The auxiliarystate variable g(t) is needed to implement a second-order filter The tar-get plowast is included in the state so that we can capture the above cost func-tion using a quadratic with no linear terms defining p = [1 0 0 0 minus1]we have pTxt = p(tend) minus plowast and so xT

t (ppT)xt = (p(tend) minus plowast)2 Note thatthe same could be accomplished by setting p = [1 0 0 0 minusplowast] and xt =[p(t) p(t) f (t) g(t) 1] The advantage of the formulation used here is thatbecause the target is represented in the state the same control law can bereused for other targets The control law of course depends on the filterwhich depends on the initial expected state which depends on the targetmdashand so a control law optimal for one target is not necessarily optimal for allother targets Unpublished simulation results indicate good generalizationbut a more detailed investigation of how the optimal control law dependson the target position is needed

The sensory feedback carries information about position velocity andforce

yt = [p(t) p(t) f (t)] + ωt

The vector ωt of sensory noise terms has zero-mean gaussian distributionwith diagonal covariance

ω = (σsdiag[002 m 02 ms 1 N])2

where the relative magnitudes are set using the same order-of-magnitudereasoning as before and σs = 05 The multiplicative noise term added tothe discrete-time control signal ut = u(t) is σcεtut where σc = 05 Note that

Methods for Optimal Sensorimotor Control 1099

σc is a unitless quantity that defines the noise magnitude relative to thecontrol signal magnitude

The discrete-time dynamics of the above system are

p(t + ) = p(t) + p(t)

p(t + ) = p(t) + f (t)m

f (t + ) = f (t) (1 minus τ2) + g(t)τ2

g(t + ) = g(t) (1 minus τ1) + u(t) (1 + σcεt) τ1

which is transformed into the form of equation 31 by the matrices

A =

1 0 0 0

0 1 m 0 0

0 0 1 minus τ2 τ2 0

0 0 0 1 minus τ1 0

0 0 0 0 1

B =

0

0

0

τ1

0

H =

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

C1 = Bσc c = 1 d = 0

1 = ξ = η = 0

The cost matrices are R = r Q1nminus1 = 0 and Qn = ppT + vvT + ffT where

p = [1 0 0 0 minus1] v = [0 wv 0 0 0] f = [0 0 w f 0 0]

This completes the formulation of the first model The above algorithmcan now be applied to obtain the control law and filter and the closed-loopsystem can be simulated To replace the control-dependent noise with ad-ditive noise of similar magnitude (and compare the effects of the two formsof noise) we will set c = 0 and ξ = (46 N)2 B BT The value of 46 N is theaverage magnitude of the control-dependent noise over the range of move-ment durations (found through 10000 simulation runs at each movementduration)

We also model an estimation process under state-dependent and inter-nal noise in the absence of movement In that case the state is xt = plowastwhere the stationary target plowast is sampled from a gaussian with mean x1 isin5 cm 15 cm 25 cm and variance 1 = (5 cm)2 Note that target eccentric-ity is represented as distance rather than visual angle The state-dependentnoise has scale D1 = 05 fixation is assumed to be at 0 cm the time step is = 10 msec and we run the estimation process for n = 100 time steps Inone set of simulations we use internal noise η = (05 cm)2 without addi-tive noise In another set of simulations we study additive noise with thesame magnitude ω = (05 cm)2 without internal noise There is no actu-ator to be controlled so we have A = H = 1 and B = L = 0 Estimation isbased on the adaptive filter from equation 53

1100 E Todorov

62 Results Reaching movements are known to have stereotyped bell-shaped speed profiles (Flash amp Hogan 1985) Models of this phenomenonhave traditionally been formulated in terms of deterministic open-loop min-imization of some cost function Cost functions that penalize physicallymeaningful quantities (such as duration or energy consumption) did notagree with empirical data (Nelson 1983) in order to obtain realistic speedprofiles it appeared necessary to minimize a smoothness-related cost thatpenalizes the derivative of acceleration (Flash amp Hogan 1985) or torque(Uno et al 1989) Smoothness-related cost functions have also been usedin the context of stochastic optimal feedback control (Hoff 1992) to obtainbell-shaped speed profiles It was recently shown however that smoothnessdoes not have to be explicitly enforced by the cost function open-loop min-imization of end-point error was found sufficient to produce realistic trajec-tories provided that the multiplicative nature of motor noise is taken intoaccount (Harris amp Wolpert 1998) While this is an important step toward amore principled optimization model of trajectory smoothness it still con-tains an ad hoc element the optimization is performed in an open loopwhich is suboptimal especially for movements of longer duration Ourmodel differs from Harris and Wolpert (1998) in that not only the averagesequence of control signals is optimal but the feedback gains that determinethe online sensory-guided adjustments are also optimal Optimal feedbackcontrol of reaching has been studied by Meyer Abrams Kornblum Wrightand Smith (1988) in an intermittent setting and Hoff (1992) in a continu-ous setting However both of these models assume full state observationOurs is the first optimal control model of reaching that incorporates sensorynoise and combines state estimation and feedback control into an optimalsensorimotor loop The predicted movement kinematics shown in Figure 1Aclosely resemble observed movement trajectories (Flash amp Hogan 1985)

Another well-known property of reaching movements first observed acentury ago by Woodworth and later quantified as Fittsrsquo law is the trade-offbetween speed and accuracy The fact that faster movements are less ac-curate implies that the instantaneous noise in the motor system is control-dependent in agreement with direct measurements of isometric force fluc-tuations (Sutton and Sykes 1967 Schmidt et al 1979 Todorov 2002) thatshow standard deviation increasing linearly with the mean Naturally thisnoise scaling has formed the basis of both closed-loop (Meyer et al 1988Hoff 1992) and open-loop (Harris amp Wolpert 1998) optimization models ofthe speed-accuracy trade-off Figure 1B illustrates the effect in our model asthe (specified) movement duration increases the standard deviation of theend-point error achieved by the optimal controller decreases To emphasizethe need for incorporating control-dependent noise we modified the modelby making the noise in the plant dynamics additive with fixed magnitudechosen to match the average multiplicative noise magnitude over the rangeof movement durations With that change the end-point error showed theopposite trend to the one observed experimentally (see Figure 1B)

Methods for Optimal Sensorimotor Control 1101

Figure 1 (A) Normalized position (Pos) velocity (Vel) and acceleration (Acc) ofthe average trajectory of the optimal controller (B) A separate optimal controllerwas constructed for each instructed duration the resulting closed-loop systemwas simulated for 10000 trials and the positional standard deviation at the endof the trial was plotted This was done with either multiplicative (solid line)or additive (dashed line) noise in the plant dynamics (C) The position of astationary peripheral target was estimated over time under internal estimationnoise (solid line) or additive observation noise (dashed line) This was done inthree sets of trials with target positions sampled from gaussians with means5 cm (bottom) 15 cm (middle) and 25 cm (top) Each curve is an average over10000 simulation runs

It is interesting to compare the effects of the control penalty r and the mul-tiplicative noise scaling σc As equation 42 shows both terms penalize largecontrol signalsmdashdirectly in the case of r and indirectly (via increased un-certainty) in the case of σc Consequently both terms lead to a negative biasin end-point position (not shown) but the effect is much more pronouncedfor r Another consequence of the fact that larger controls are more costlyarises in the control of redundant systems where the optimal strategy is tofollow a minimal intervention principle that is to leave task-irrelevant de-viations from the average behavior uncorrected (Todorov amp Jordan 2002a2002b) Simulations have shown that this more complex effect is dependenton σc and actually decreases when r is increased while σc is kept constant(Todorov amp Jordan 2002b)

Figure 1C shows simulation results from our second model where the po-sition of a stationary peripheral target is estimated by the optimal adaptivefilter in equation 53 operating under internal estimation noise or additiveobservation noise of the same magnitude In each case we show results forthree sets of targets with varying average eccentricity The standard devi-ations of the estimation error always reach an asymptote (much faster inthe case of internal noise) In the presence of internal noise this asymptotedepends on target eccentricity for the chosen model parameters the depen-dence is in quantitative agreement with our experimental results (Todorov1998) Under additive noise the error always asymptotes to 0

1102 E Todorov

Figure 2 Relative change in expected cost as a function of iteration numberin (A) psychophysical models and (B) random models (C) Relative variabil-ity (SDmean) among expected costs obtained from 100 different runs of thealgorithm on the same model (average over models in each class)

7 Convergence Properties

We studied the convergence properties of the algorithm in 10 models of psy-chophysical experiments taken from Todorov and Jordan (2002b) and 200randomly generated models The psychophysical models had dynamics andcost functions similar to the above example They included two models ofplanar reaching three models of passing through sequences of targets onemodel of isometric force production three models of tracking and reach-ing with a mechanically redundant arm and one model of throwing Thedimensionalities of the state control and feedback were between 5 and20 and the horizon n was about 100 The psychophysical models includedcontrol-dependent dynamics noise and additive observation noise but nointernal or state-dependent noise The details of all these models are inter-esting from a motor control point of view but we omit them here since theydid not affect the convergence of the algorithm in any systematic way

The random models were divided into two groups of 100 each passivelystable with all eigenvalues of A being smaller than 1 and passively unsta-ble with the largest eigenvalue of A being between 1 and 2 The dynamicswere restricted so that the last component of xt was 1mdashto make the randommodels more similar to the psychophysical models which always incorpo-rated a constant in the state description The state control and measurementdimensionalities were sampled uniformly between 5 and 20 The randommodels included all forms of noise allowed by equation 31

For each model we initialized K1nminus1 from equation 32 and applied ouriterative algorithm In all cases convergence was very rapid (see Figures 2Aand 2B) with the relative change in expected cost decreasing exponentiallyThe jitter observed at the end of the minimization (see Figure 2A) is due tonumerical round-off errors (note the log scale) and continues indefinitelyThe exponential convergence regime does not always start from the firstiteration (see Figure 2A) Similar behavior was observed for the absolute

Methods for Optimal Sensorimotor Control 1103

change in expected cost (not shown) As one would expect random mod-els with unstable passive dynamics converged more slowly than passivelystable models Convergence was observed in all cases

To test for the existence of local minima we focused on five psychophys-ical five random stable and five random unstable models For each modelthe algorithm was initialized 100 times with different randomly chosensequences K1nminus1 and run for 100 iterations For each model we com-puted the standard deviation of the expected cost obtained at each iterationand divided by the mean expected cost at that iteration The results aver-aged within each model class are plotted in Figure 2C The negligibly smallvalues after convergence indicate that the algorithm always finds the samesolution This was true for every model we studied despite the fact thatthe random initialization sometimes produced very large initial costs Wealso examined the K and L sequences found at the end of each run and thedifferences seemed to be due to round-off errors Thus we conjecture thatthe algorithm always converges to the globally optimal solution So far wehave not been able to prove this analytically and cannot offer a satisfyingintuitive explanation at this time

Note that the system can be unstable even for the optimal controllerFormally that does not affect the derivation because in a discrete-time finite-horizon system all numbers remain finite In practice the components of xt

can exceed the maximum floating-point number whenever the eigenvaluesof (Aminus BLt) are sufficiently large In the applications we are interestedin (Todorov 1998 Todorov amp Jordan 2002b) such problems were neverencountered

8 Improving Performance via Adaptive Estimation

Although the iterative algorithm given by equations 42 and 52 is guar-anteed to converge and empirically it appears to converge to the globallyoptimal solution performance can still be suboptimal due to the imposedrestriction to nonadaptive filters Here we present simulations aimed atquantifying this suboptimality

Because the potential suboptimality arises from the restriction to non-adaptive filters it is natural to ask what would happen if that restriction wereremoved in run time and the optimal adaptive linear filter from equation 53were used instead Recall that although the control law is optimized underthe assumption of a nonadaptive filter it yields better performance if a dif-ferent filter which somehow achieves lower estimation error is used in runtime Thus in our first test we simply replace the nonadaptive filter withequation 53 in run time and compute the reduction in expected total cost

The above discussion suggests a possibility for further improvement Thecontrol law is optimal with respect to some sequence of filter gains K1nminus1But the adaptive filter applied in run time uses systematically differentgains because it achieves systematically lower estimation error We can run

1104 E Todorov

Table 2 Cost Reduction

Model

Method Psychophysical Random Stable Random Unstable

Adaptive Estimator 19 0 314 Reoptimized Controller 17 0 283

Notes Numbers indicate percent reduction in expected total cost relative to thecost of the solution found by our iterative algorithm The two improvement meth-ods are described in the text Each method is applied to 10 models in each modelclass For each model and method expected total cost is computed from 10000simulation runs A value of 0 indicates that with a sample size of 10 models theimprovement was not significantly different from 0 (t-test p = 005 threshold)

our control law in conjunction with the adaptive filter and find the averagefilter gains K1nminus1 that are used online Now one would think that if wereoptimized the control law for the nonadaptive filter K1nminus1 which betterreflects the gains being used online by the adaptive filter this will furtherimprove performance This is the second test we apply

As Table 2 shows neither method improves performance substantiallyfor psychophysical models or random stable models However both meth-ods result in substantial improvement for random unstable models This isnot surprising In the passively stable models the differences between theexpected and actual values of the states and controls are relatively smalland so the optimal nonadaptive filter is not that different from the optimaladaptive filter The unstable models on the other hand are very sensitive tosmall perturbations and thus follow substantially different state-control tra-jectories in different simulation runs So the advantage of adaptive filteringis much greater Since musculoskeletal plants have stable passive dynamicswe conclude that our algorithm is well suited for approximating the optimalsensorimotor system

It is interesting that control law reoptimization in addition to adaptivefiltering is actually worse than adaptive filtering alonemdashcontrary to our in-tuition This was the case for every model we studied Although it is notclear where the problem with the reoptimization method lies this some-what unexpected result provides further justification for the restriction weintroduced In particular it suggests that the control law that is optimalunder the best nonadaptive filter may be close to optimal under the bestadaptive filter

9 Discussion

We have presented an algorithm for stochastic optimal control and estima-tion of partially observable linear dynamical systems subject to quadraticcosts and noise processes characteristic of the sensorimotor system (see

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 11: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

1094 E Todorov

5 Optimal Estimator

So far we have computed the optimal control law L for any fixed sequenceof filter gains K What should these gains be fixed to Ideally they shouldcorrespond to a Kalman filter which is the optimal linear estimator How-ever in the presence of control-dependent and state-dependent noise theKalman filter gains become adaptive (ie Kt depends on xt and ut) whichwould make our control law derivation invalid Thus if we want to preservethe optimality of the control law given by equation 42 and obtain an iter-ative algorithm with guaranteed convergence we need to compute a fixedsequence of filter gains that are optimal for a given control law Once theiterative algorithm has converged and the control law has been designedwe could use an adaptive filter in place of the fixed-gain filter in run time(see section 8)

Thus our objective here is the following given a linear feedback controllaw L1 Lnminus1 (which is optimal for the previous filter K1 Knminus1)compute a new filter that in conjunction with the given control law resultsin minimal expected cost In other words we will evaluate the filter not bythe magnitude of its estimation errors but by the effect that these estimationerrors have on the performance of the composite estimation-control system

We will show that the new optimal filter can be designed in a forwardpass through time In particular we will show that regardless of the newvalues of K1 Ktminus1 the optimal Kt can be found analytically as long asKt+1 Knminus1 still have the values for which Lt+1 Lnminus1 are optimalRecall that the optimal Lt+1 Lnminus1 depend only on Kt+1 Knminus1 andso the parameters (as well as the form) of the optimal cost-to-go functionvt+1 cannot be affected by changing K1 Kt Since Kt affects only thecomputation of xt+1 and the effect of xt+1 on the total expected cost is cap-tured by the function vt+1 we have to minimize vt+1 with respect to Kt But v is a function of x and x while K cannot be adapted to the specificvalues of x and x within a simulation run (by assumption) Thus thequantity we have to minimize is the unconditional expectation of vt+1 Indoing so we will use that fact that

E [vt+1(xt+1 xt+1)] = Ext xt [E [vt+1(xt+1 xt+1)|xt xt Lt]]

The conditional expectation was already computed as an intermediate stepin the previous section (not shown) The terms in E [vt+1(xt+1 xt+1)|xt xt Lt]that depend on Kt are

eTt (Aminus Kt H)TSe

t+1(Aminus Kt H)et + tr

(Kt

(ω +

sumi

Di xtxTt DT

i

)K T

t Set+1

)

Defining the (uncentered) unconditional covariances et E [eteT

t ] andx

t E [xtxTt ] the unconditional expectation of the Kt-dependent expression

Methods for Optimal Sensorimotor Control 1095

above becomes

a (Kt) = tr((

(Aminus Kt H)et (Aminus Kt H)T + KtPt K T

t

)Se

t+1

)Pt ω +

sumi

Dixt DT

i

The minimum of a (Kt) is found by setting its derivative with regard to Kt to0 Using the matrix identities part

part X tr (XU) = UT and partpart X tr

(XU XTV

) = VXU +VT XUT and the fact that the matrices Se

t+1 ω e

t xt are symmetric we

obtainparta (Kt)partKt

= 2Set+1

(Kt

(He

t HT + Pt) minus Ae

t HT)

This expression is equal to 0 whenever Kt = Aet HT(He

t HT + Pt)minus1 re-gardless of the value of Se

t+1 Given our assumptions the matrix being in-verted is symmetric positive-definite Note that the optimal Kt dependson K1 Ktminus1 (through e

t and xt ) but is independent of Kt+1 Knminus1

(since it is independent of Set+1) This is the reason that the filter gains are

reoptimized in a forward passTo complete the derivation we have to substitute the optimal filter

gains and compute the unconditional covariances Recall that the variablesxt xt et are deterministically related by et = xt minus xt so the covariance ofany one of them can be computed given the covariances of the other twoand we have a choice of which pair of covariance matrices to computeThe resulting equations are most compact for the pair xt et The stochasticdynamics of these variables are

xt+1 = (Aminus BLt) xt + Kt Het + Ktωt + ηt +sum

i

εit Kt Di (et + xt)

et+1 = (Aminus Kt H)et + ξt minus Ktωt minus ηt minussum

i

εitCi Ltxt (51)

minussum

i

εit Kt Di (et + xt)

Define the unconditional covariances

et E

[eteT

t

] xt E

[xtxT

t

] xet E

[xteT

t

]

noting that xt is uncentered and ex

t = (xet )T Since x1 is a known constant

the initialization at t = 1 is e1 = 1 x

1 = x1xT1 xe

1 = 0 With these defi-nitions we have x

t = E [(et + xt)(et + xt)T] = et + x

t + xet + xeT

t Usingequation 51 the updates for the unconditional covariances are

et+1 = (Aminus Kt H)e

t (Aminus Kt H)T + ξ + η + KtPt K Tt

+sum

i

Ci Ltxt LT

t CTi

1096 E Todorov

xt+1 = (Aminus BLt)x

t (Aminus BLt)T + η + Kt(He

t HT + Pt)K T

t

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T

xet+1 = (Aminus BLt)xe

t (Aminus Kt H)T + Kt Het (Aminus Kt H)T

minus η minus KtPt K Tt

Substituting the optimal value of Kt which allows some simplifications tothe above update equations the optimal nonadaptive linear filter is com-puted in a forward pass through time as

Estimator xt+1 = (Aminus BLt) xt + Kt(yt minus H xt) + ηt

Kt = Aet HT

(He

t HT + ω +sum

i

Di(e

t + xt + xe

t + ext

)DT

i

)minus1

et+1 = ξ + η + (Aminus Kt H)e

t AT +sum

i

Ci Ltxt LT

t CTi e

1 = 1

xt+1 = η + Kt He

t AT + (Aminus BLt)xt (Aminus BLt)T

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T x

1 = x1xT1

xet+1 = (Aminus BLt) xe

t (Aminus Kt H)T minus η xe1 = 0

(52)It is worth noting the effects of the internal noise ηt If that term did not

exist (ie η = 0) the last update equation would yield xet = 0 for all t

Indeed for an optimal filter one would expect xet = 0 from the orthogo-

nality principle if the state estimate and estimation error were correlatedone could improve the filter by taking that correlation into account How-ever the situation here is different because we have noise acting directlyon the state estimate When such noise pushes xt in one direction et is (bydefinition) pushed in the opposite direction creating a negative correlationbetween xt and et This is the reason for the negative sign in front of the η

term in the last update equationThe complete algorithm is the followingAlgorithmInitialize K1 Knminus1 and iterate equation 42 and equation 52 until

convergence Convergence is guaranteed because the expected cost is non-negative by definition and we are using a coordinate-descent algorithmwhich decreases the expected cost in each step The initial sequence K couldbe set to 0mdashin which case the first pass of equation 42 will find the optimalopen-loop controls or initialized from equation 32mdashwhich is equivalent toassuming additive noise in the first pass

We can also derive the optimal adaptive linear filter with gains Kt thatdepend on the specific xt and ut = minusLtxt within each simulation run This isagain accomplished by minimizing E [vt+1] with respect to Kt but the expec-tation is computed with xt being a known constant rather than a randomvariable We now have x

t = xtxTt and xe

t = 0 and so the last two update

Methods for Optimal Sensorimotor Control 1097

equations in equation 52 are no longer needed The optimal adaptive linearfilter is

Adaptive estimator xt+1 = (Aminus BLt) xt + Kt (yt minus H xt) + ηt

Kt = At HT

(Ht HT + ω +

sumi

Di(t + xt xT

t

)DT

i

)minus1

t+1 = ξ + η + (Aminus Kt H) t AT +sum

i

Ci Ltxt xTt LT

t CTi

(53)

where t = Cov [xt| xt] is the conditional estimation error covariance(initialized from 1 which is given) When the control-dependentstate-dependent and internal noise terms are removed (C1 Cc = 0D1 Dd = 0 η = 0) equation 53 reduces to the Kalman filter in equa-tion 32 Note that using equation 53 instead of equation 52 online reducesthe total expected cost because equation 53 achieves lower estimation errorthan any other linear filter and the expected cost depends on the conditionalestimation error covariance This can be seen from

E[vt(xt xt)| xt] = xTt Sx

t xt + st + tr((

Sxt + Se

t

)Cov[xt| xt]

)6 Application to Reaching Movements

We now illustrate how the methodology developed above can be used toconstruct models relevant to motor control Since this is a methodologi-cal rather than a modeling article a detailed evaluation of the resultingmodels in the context of the motor control literature will not be givenhere The first model is a one-dimensional model of reaching and in-cludes control-dependent noise but no state-dependent or internal noiseThe latter two forms of noise are illustrated in the second model wherewe estimate the position of a stationary peripheral target without making amovement

61 Models We model a single-joint movement (such as flexing the el-bow) that brings the hand to a specified target For simplicity the rotationalmotion is replaced with translational motion the hand is modeled as a pointmass (m = 1 kg) whose one-dimensional position at time t is p(t) The com-bined action of all muscles is represented with the force f (t) acting on thehand The control signal u(t) is transformed into force f (t) by adding control-dependent multiplicative noise and applying a second-order muscle-likelow-pass filter (Winter 1990) of the form τ1τ2 f (t) + (τ1 + τ2) f(t) + f (t) =u(t) with time constants τ1 = τ2 = 004 sec Note that a second-order filtercan be written as a pair of coupled first-order filters (with outputs g and f )as follows τ1g(t) + g(t) = u(t) τ2 f (t) + f (t) = g(t)

The task is to move the hand from the starting position p(0) = 0 m to thetarget position plowast = 01 m and stop there at time tend with minimal energy

1098 E Todorov

consumption Movement durations are in the interval tend isin [025 sec035 sec] Time is discretized at = 001 sec The total cost is defined as

(p(tend) minus plowast)2 + (wv p(tend))2 + (w f f (tend))2 + rn minus 1

nminus1sumk=1

u(k)2

The first term enforces positional accuracy the second and third termsspecify that the movement has to stop at time tend that is both the velocityand force have to vanish and the last term penalizes energy consumptionIt makes sense to set the scaling weights wv and w f so that wv p(t) andw f f (t) averaged over the movement have magnitudes similar to the handdisplacement plowast minus p(0) For a 01 m reaching movement that lasts about03 sec these weights are wv = 02 and w f = 002 The weight of the energyterm was set to r = 000001

The discrete-time system state is represented with the five-dimensionalvector

xt = [p(t) p(t) f (t) g(t) plowast]

initialized from a gaussian with mean x1 = [0 0 0 0 plowast] The auxiliarystate variable g(t) is needed to implement a second-order filter The tar-get plowast is included in the state so that we can capture the above cost func-tion using a quadratic with no linear terms defining p = [1 0 0 0 minus1]we have pTxt = p(tend) minus plowast and so xT

t (ppT)xt = (p(tend) minus plowast)2 Note thatthe same could be accomplished by setting p = [1 0 0 0 minusplowast] and xt =[p(t) p(t) f (t) g(t) 1] The advantage of the formulation used here is thatbecause the target is represented in the state the same control law can bereused for other targets The control law of course depends on the filterwhich depends on the initial expected state which depends on the targetmdashand so a control law optimal for one target is not necessarily optimal for allother targets Unpublished simulation results indicate good generalizationbut a more detailed investigation of how the optimal control law dependson the target position is needed

The sensory feedback carries information about position velocity andforce

yt = [p(t) p(t) f (t)] + ωt

The vector ωt of sensory noise terms has zero-mean gaussian distributionwith diagonal covariance

ω = (σsdiag[002 m 02 ms 1 N])2

where the relative magnitudes are set using the same order-of-magnitudereasoning as before and σs = 05 The multiplicative noise term added tothe discrete-time control signal ut = u(t) is σcεtut where σc = 05 Note that

Methods for Optimal Sensorimotor Control 1099

σc is a unitless quantity that defines the noise magnitude relative to thecontrol signal magnitude

The discrete-time dynamics of the above system are

p(t + ) = p(t) + p(t)

p(t + ) = p(t) + f (t)m

f (t + ) = f (t) (1 minus τ2) + g(t)τ2

g(t + ) = g(t) (1 minus τ1) + u(t) (1 + σcεt) τ1

which is transformed into the form of equation 31 by the matrices

A =

1 0 0 0

0 1 m 0 0

0 0 1 minus τ2 τ2 0

0 0 0 1 minus τ1 0

0 0 0 0 1

B =

0

0

0

τ1

0

H =

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

C1 = Bσc c = 1 d = 0

1 = ξ = η = 0

The cost matrices are R = r Q1nminus1 = 0 and Qn = ppT + vvT + ffT where

p = [1 0 0 0 minus1] v = [0 wv 0 0 0] f = [0 0 w f 0 0]

This completes the formulation of the first model The above algorithmcan now be applied to obtain the control law and filter and the closed-loopsystem can be simulated To replace the control-dependent noise with ad-ditive noise of similar magnitude (and compare the effects of the two formsof noise) we will set c = 0 and ξ = (46 N)2 B BT The value of 46 N is theaverage magnitude of the control-dependent noise over the range of move-ment durations (found through 10000 simulation runs at each movementduration)

We also model an estimation process under state-dependent and inter-nal noise in the absence of movement In that case the state is xt = plowastwhere the stationary target plowast is sampled from a gaussian with mean x1 isin5 cm 15 cm 25 cm and variance 1 = (5 cm)2 Note that target eccentric-ity is represented as distance rather than visual angle The state-dependentnoise has scale D1 = 05 fixation is assumed to be at 0 cm the time step is = 10 msec and we run the estimation process for n = 100 time steps Inone set of simulations we use internal noise η = (05 cm)2 without addi-tive noise In another set of simulations we study additive noise with thesame magnitude ω = (05 cm)2 without internal noise There is no actu-ator to be controlled so we have A = H = 1 and B = L = 0 Estimation isbased on the adaptive filter from equation 53

1100 E Todorov

62 Results Reaching movements are known to have stereotyped bell-shaped speed profiles (Flash amp Hogan 1985) Models of this phenomenonhave traditionally been formulated in terms of deterministic open-loop min-imization of some cost function Cost functions that penalize physicallymeaningful quantities (such as duration or energy consumption) did notagree with empirical data (Nelson 1983) in order to obtain realistic speedprofiles it appeared necessary to minimize a smoothness-related cost thatpenalizes the derivative of acceleration (Flash amp Hogan 1985) or torque(Uno et al 1989) Smoothness-related cost functions have also been usedin the context of stochastic optimal feedback control (Hoff 1992) to obtainbell-shaped speed profiles It was recently shown however that smoothnessdoes not have to be explicitly enforced by the cost function open-loop min-imization of end-point error was found sufficient to produce realistic trajec-tories provided that the multiplicative nature of motor noise is taken intoaccount (Harris amp Wolpert 1998) While this is an important step toward amore principled optimization model of trajectory smoothness it still con-tains an ad hoc element the optimization is performed in an open loopwhich is suboptimal especially for movements of longer duration Ourmodel differs from Harris and Wolpert (1998) in that not only the averagesequence of control signals is optimal but the feedback gains that determinethe online sensory-guided adjustments are also optimal Optimal feedbackcontrol of reaching has been studied by Meyer Abrams Kornblum Wrightand Smith (1988) in an intermittent setting and Hoff (1992) in a continu-ous setting However both of these models assume full state observationOurs is the first optimal control model of reaching that incorporates sensorynoise and combines state estimation and feedback control into an optimalsensorimotor loop The predicted movement kinematics shown in Figure 1Aclosely resemble observed movement trajectories (Flash amp Hogan 1985)

Another well-known property of reaching movements first observed acentury ago by Woodworth and later quantified as Fittsrsquo law is the trade-offbetween speed and accuracy The fact that faster movements are less ac-curate implies that the instantaneous noise in the motor system is control-dependent in agreement with direct measurements of isometric force fluc-tuations (Sutton and Sykes 1967 Schmidt et al 1979 Todorov 2002) thatshow standard deviation increasing linearly with the mean Naturally thisnoise scaling has formed the basis of both closed-loop (Meyer et al 1988Hoff 1992) and open-loop (Harris amp Wolpert 1998) optimization models ofthe speed-accuracy trade-off Figure 1B illustrates the effect in our model asthe (specified) movement duration increases the standard deviation of theend-point error achieved by the optimal controller decreases To emphasizethe need for incorporating control-dependent noise we modified the modelby making the noise in the plant dynamics additive with fixed magnitudechosen to match the average multiplicative noise magnitude over the rangeof movement durations With that change the end-point error showed theopposite trend to the one observed experimentally (see Figure 1B)

Methods for Optimal Sensorimotor Control 1101

Figure 1 (A) Normalized position (Pos) velocity (Vel) and acceleration (Acc) ofthe average trajectory of the optimal controller (B) A separate optimal controllerwas constructed for each instructed duration the resulting closed-loop systemwas simulated for 10000 trials and the positional standard deviation at the endof the trial was plotted This was done with either multiplicative (solid line)or additive (dashed line) noise in the plant dynamics (C) The position of astationary peripheral target was estimated over time under internal estimationnoise (solid line) or additive observation noise (dashed line) This was done inthree sets of trials with target positions sampled from gaussians with means5 cm (bottom) 15 cm (middle) and 25 cm (top) Each curve is an average over10000 simulation runs

It is interesting to compare the effects of the control penalty r and the mul-tiplicative noise scaling σc As equation 42 shows both terms penalize largecontrol signalsmdashdirectly in the case of r and indirectly (via increased un-certainty) in the case of σc Consequently both terms lead to a negative biasin end-point position (not shown) but the effect is much more pronouncedfor r Another consequence of the fact that larger controls are more costlyarises in the control of redundant systems where the optimal strategy is tofollow a minimal intervention principle that is to leave task-irrelevant de-viations from the average behavior uncorrected (Todorov amp Jordan 2002a2002b) Simulations have shown that this more complex effect is dependenton σc and actually decreases when r is increased while σc is kept constant(Todorov amp Jordan 2002b)

Figure 1C shows simulation results from our second model where the po-sition of a stationary peripheral target is estimated by the optimal adaptivefilter in equation 53 operating under internal estimation noise or additiveobservation noise of the same magnitude In each case we show results forthree sets of targets with varying average eccentricity The standard devi-ations of the estimation error always reach an asymptote (much faster inthe case of internal noise) In the presence of internal noise this asymptotedepends on target eccentricity for the chosen model parameters the depen-dence is in quantitative agreement with our experimental results (Todorov1998) Under additive noise the error always asymptotes to 0

1102 E Todorov

Figure 2 Relative change in expected cost as a function of iteration numberin (A) psychophysical models and (B) random models (C) Relative variabil-ity (SDmean) among expected costs obtained from 100 different runs of thealgorithm on the same model (average over models in each class)

7 Convergence Properties

We studied the convergence properties of the algorithm in 10 models of psy-chophysical experiments taken from Todorov and Jordan (2002b) and 200randomly generated models The psychophysical models had dynamics andcost functions similar to the above example They included two models ofplanar reaching three models of passing through sequences of targets onemodel of isometric force production three models of tracking and reach-ing with a mechanically redundant arm and one model of throwing Thedimensionalities of the state control and feedback were between 5 and20 and the horizon n was about 100 The psychophysical models includedcontrol-dependent dynamics noise and additive observation noise but nointernal or state-dependent noise The details of all these models are inter-esting from a motor control point of view but we omit them here since theydid not affect the convergence of the algorithm in any systematic way

The random models were divided into two groups of 100 each passivelystable with all eigenvalues of A being smaller than 1 and passively unsta-ble with the largest eigenvalue of A being between 1 and 2 The dynamicswere restricted so that the last component of xt was 1mdashto make the randommodels more similar to the psychophysical models which always incorpo-rated a constant in the state description The state control and measurementdimensionalities were sampled uniformly between 5 and 20 The randommodels included all forms of noise allowed by equation 31

For each model we initialized K1nminus1 from equation 32 and applied ouriterative algorithm In all cases convergence was very rapid (see Figures 2Aand 2B) with the relative change in expected cost decreasing exponentiallyThe jitter observed at the end of the minimization (see Figure 2A) is due tonumerical round-off errors (note the log scale) and continues indefinitelyThe exponential convergence regime does not always start from the firstiteration (see Figure 2A) Similar behavior was observed for the absolute

Methods for Optimal Sensorimotor Control 1103

change in expected cost (not shown) As one would expect random mod-els with unstable passive dynamics converged more slowly than passivelystable models Convergence was observed in all cases

To test for the existence of local minima we focused on five psychophys-ical five random stable and five random unstable models For each modelthe algorithm was initialized 100 times with different randomly chosensequences K1nminus1 and run for 100 iterations For each model we com-puted the standard deviation of the expected cost obtained at each iterationand divided by the mean expected cost at that iteration The results aver-aged within each model class are plotted in Figure 2C The negligibly smallvalues after convergence indicate that the algorithm always finds the samesolution This was true for every model we studied despite the fact thatthe random initialization sometimes produced very large initial costs Wealso examined the K and L sequences found at the end of each run and thedifferences seemed to be due to round-off errors Thus we conjecture thatthe algorithm always converges to the globally optimal solution So far wehave not been able to prove this analytically and cannot offer a satisfyingintuitive explanation at this time

Note that the system can be unstable even for the optimal controllerFormally that does not affect the derivation because in a discrete-time finite-horizon system all numbers remain finite In practice the components of xt

can exceed the maximum floating-point number whenever the eigenvaluesof (Aminus BLt) are sufficiently large In the applications we are interestedin (Todorov 1998 Todorov amp Jordan 2002b) such problems were neverencountered

8 Improving Performance via Adaptive Estimation

Although the iterative algorithm given by equations 42 and 52 is guar-anteed to converge and empirically it appears to converge to the globallyoptimal solution performance can still be suboptimal due to the imposedrestriction to nonadaptive filters Here we present simulations aimed atquantifying this suboptimality

Because the potential suboptimality arises from the restriction to non-adaptive filters it is natural to ask what would happen if that restriction wereremoved in run time and the optimal adaptive linear filter from equation 53were used instead Recall that although the control law is optimized underthe assumption of a nonadaptive filter it yields better performance if a dif-ferent filter which somehow achieves lower estimation error is used in runtime Thus in our first test we simply replace the nonadaptive filter withequation 53 in run time and compute the reduction in expected total cost

The above discussion suggests a possibility for further improvement Thecontrol law is optimal with respect to some sequence of filter gains K1nminus1But the adaptive filter applied in run time uses systematically differentgains because it achieves systematically lower estimation error We can run

1104 E Todorov

Table 2 Cost Reduction

Model

Method Psychophysical Random Stable Random Unstable

Adaptive Estimator 19 0 314 Reoptimized Controller 17 0 283

Notes Numbers indicate percent reduction in expected total cost relative to thecost of the solution found by our iterative algorithm The two improvement meth-ods are described in the text Each method is applied to 10 models in each modelclass For each model and method expected total cost is computed from 10000simulation runs A value of 0 indicates that with a sample size of 10 models theimprovement was not significantly different from 0 (t-test p = 005 threshold)

our control law in conjunction with the adaptive filter and find the averagefilter gains K1nminus1 that are used online Now one would think that if wereoptimized the control law for the nonadaptive filter K1nminus1 which betterreflects the gains being used online by the adaptive filter this will furtherimprove performance This is the second test we apply

As Table 2 shows neither method improves performance substantiallyfor psychophysical models or random stable models However both meth-ods result in substantial improvement for random unstable models This isnot surprising In the passively stable models the differences between theexpected and actual values of the states and controls are relatively smalland so the optimal nonadaptive filter is not that different from the optimaladaptive filter The unstable models on the other hand are very sensitive tosmall perturbations and thus follow substantially different state-control tra-jectories in different simulation runs So the advantage of adaptive filteringis much greater Since musculoskeletal plants have stable passive dynamicswe conclude that our algorithm is well suited for approximating the optimalsensorimotor system

It is interesting that control law reoptimization in addition to adaptivefiltering is actually worse than adaptive filtering alonemdashcontrary to our in-tuition This was the case for every model we studied Although it is notclear where the problem with the reoptimization method lies this some-what unexpected result provides further justification for the restriction weintroduced In particular it suggests that the control law that is optimalunder the best nonadaptive filter may be close to optimal under the bestadaptive filter

9 Discussion

We have presented an algorithm for stochastic optimal control and estima-tion of partially observable linear dynamical systems subject to quadraticcosts and noise processes characteristic of the sensorimotor system (see

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 12: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

Methods for Optimal Sensorimotor Control 1095

above becomes

a (Kt) = tr((

(Aminus Kt H)et (Aminus Kt H)T + KtPt K T

t

)Se

t+1

)Pt ω +

sumi

Dixt DT

i

The minimum of a (Kt) is found by setting its derivative with regard to Kt to0 Using the matrix identities part

part X tr (XU) = UT and partpart X tr

(XU XTV

) = VXU +VT XUT and the fact that the matrices Se

t+1 ω e

t xt are symmetric we

obtainparta (Kt)partKt

= 2Set+1

(Kt

(He

t HT + Pt) minus Ae

t HT)

This expression is equal to 0 whenever Kt = Aet HT(He

t HT + Pt)minus1 re-gardless of the value of Se

t+1 Given our assumptions the matrix being in-verted is symmetric positive-definite Note that the optimal Kt dependson K1 Ktminus1 (through e

t and xt ) but is independent of Kt+1 Knminus1

(since it is independent of Set+1) This is the reason that the filter gains are

reoptimized in a forward passTo complete the derivation we have to substitute the optimal filter

gains and compute the unconditional covariances Recall that the variablesxt xt et are deterministically related by et = xt minus xt so the covariance ofany one of them can be computed given the covariances of the other twoand we have a choice of which pair of covariance matrices to computeThe resulting equations are most compact for the pair xt et The stochasticdynamics of these variables are

xt+1 = (Aminus BLt) xt + Kt Het + Ktωt + ηt +sum

i

εit Kt Di (et + xt)

et+1 = (Aminus Kt H)et + ξt minus Ktωt minus ηt minussum

i

εitCi Ltxt (51)

minussum

i

εit Kt Di (et + xt)

Define the unconditional covariances

et E

[eteT

t

] xt E

[xtxT

t

] xet E

[xteT

t

]

noting that xt is uncentered and ex

t = (xet )T Since x1 is a known constant

the initialization at t = 1 is e1 = 1 x

1 = x1xT1 xe

1 = 0 With these defi-nitions we have x

t = E [(et + xt)(et + xt)T] = et + x

t + xet + xeT

t Usingequation 51 the updates for the unconditional covariances are

et+1 = (Aminus Kt H)e

t (Aminus Kt H)T + ξ + η + KtPt K Tt

+sum

i

Ci Ltxt LT

t CTi

1096 E Todorov

xt+1 = (Aminus BLt)x

t (Aminus BLt)T + η + Kt(He

t HT + Pt)K T

t

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T

xet+1 = (Aminus BLt)xe

t (Aminus Kt H)T + Kt Het (Aminus Kt H)T

minus η minus KtPt K Tt

Substituting the optimal value of Kt which allows some simplifications tothe above update equations the optimal nonadaptive linear filter is com-puted in a forward pass through time as

Estimator xt+1 = (Aminus BLt) xt + Kt(yt minus H xt) + ηt

Kt = Aet HT

(He

t HT + ω +sum

i

Di(e

t + xt + xe

t + ext

)DT

i

)minus1

et+1 = ξ + η + (Aminus Kt H)e

t AT +sum

i

Ci Ltxt LT

t CTi e

1 = 1

xt+1 = η + Kt He

t AT + (Aminus BLt)xt (Aminus BLt)T

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T x

1 = x1xT1

xet+1 = (Aminus BLt) xe

t (Aminus Kt H)T minus η xe1 = 0

(52)It is worth noting the effects of the internal noise ηt If that term did not

exist (ie η = 0) the last update equation would yield xet = 0 for all t

Indeed for an optimal filter one would expect xet = 0 from the orthogo-

nality principle if the state estimate and estimation error were correlatedone could improve the filter by taking that correlation into account How-ever the situation here is different because we have noise acting directlyon the state estimate When such noise pushes xt in one direction et is (bydefinition) pushed in the opposite direction creating a negative correlationbetween xt and et This is the reason for the negative sign in front of the η

term in the last update equationThe complete algorithm is the followingAlgorithmInitialize K1 Knminus1 and iterate equation 42 and equation 52 until

convergence Convergence is guaranteed because the expected cost is non-negative by definition and we are using a coordinate-descent algorithmwhich decreases the expected cost in each step The initial sequence K couldbe set to 0mdashin which case the first pass of equation 42 will find the optimalopen-loop controls or initialized from equation 32mdashwhich is equivalent toassuming additive noise in the first pass

We can also derive the optimal adaptive linear filter with gains Kt thatdepend on the specific xt and ut = minusLtxt within each simulation run This isagain accomplished by minimizing E [vt+1] with respect to Kt but the expec-tation is computed with xt being a known constant rather than a randomvariable We now have x

t = xtxTt and xe

t = 0 and so the last two update

Methods for Optimal Sensorimotor Control 1097

equations in equation 52 are no longer needed The optimal adaptive linearfilter is

Adaptive estimator xt+1 = (Aminus BLt) xt + Kt (yt minus H xt) + ηt

Kt = At HT

(Ht HT + ω +

sumi

Di(t + xt xT

t

)DT

i

)minus1

t+1 = ξ + η + (Aminus Kt H) t AT +sum

i

Ci Ltxt xTt LT

t CTi

(53)

where t = Cov [xt| xt] is the conditional estimation error covariance(initialized from 1 which is given) When the control-dependentstate-dependent and internal noise terms are removed (C1 Cc = 0D1 Dd = 0 η = 0) equation 53 reduces to the Kalman filter in equa-tion 32 Note that using equation 53 instead of equation 52 online reducesthe total expected cost because equation 53 achieves lower estimation errorthan any other linear filter and the expected cost depends on the conditionalestimation error covariance This can be seen from

E[vt(xt xt)| xt] = xTt Sx

t xt + st + tr((

Sxt + Se

t

)Cov[xt| xt]

)6 Application to Reaching Movements

We now illustrate how the methodology developed above can be used toconstruct models relevant to motor control Since this is a methodologi-cal rather than a modeling article a detailed evaluation of the resultingmodels in the context of the motor control literature will not be givenhere The first model is a one-dimensional model of reaching and in-cludes control-dependent noise but no state-dependent or internal noiseThe latter two forms of noise are illustrated in the second model wherewe estimate the position of a stationary peripheral target without making amovement

61 Models We model a single-joint movement (such as flexing the el-bow) that brings the hand to a specified target For simplicity the rotationalmotion is replaced with translational motion the hand is modeled as a pointmass (m = 1 kg) whose one-dimensional position at time t is p(t) The com-bined action of all muscles is represented with the force f (t) acting on thehand The control signal u(t) is transformed into force f (t) by adding control-dependent multiplicative noise and applying a second-order muscle-likelow-pass filter (Winter 1990) of the form τ1τ2 f (t) + (τ1 + τ2) f(t) + f (t) =u(t) with time constants τ1 = τ2 = 004 sec Note that a second-order filtercan be written as a pair of coupled first-order filters (with outputs g and f )as follows τ1g(t) + g(t) = u(t) τ2 f (t) + f (t) = g(t)

The task is to move the hand from the starting position p(0) = 0 m to thetarget position plowast = 01 m and stop there at time tend with minimal energy

1098 E Todorov

consumption Movement durations are in the interval tend isin [025 sec035 sec] Time is discretized at = 001 sec The total cost is defined as

(p(tend) minus plowast)2 + (wv p(tend))2 + (w f f (tend))2 + rn minus 1

nminus1sumk=1

u(k)2

The first term enforces positional accuracy the second and third termsspecify that the movement has to stop at time tend that is both the velocityand force have to vanish and the last term penalizes energy consumptionIt makes sense to set the scaling weights wv and w f so that wv p(t) andw f f (t) averaged over the movement have magnitudes similar to the handdisplacement plowast minus p(0) For a 01 m reaching movement that lasts about03 sec these weights are wv = 02 and w f = 002 The weight of the energyterm was set to r = 000001

The discrete-time system state is represented with the five-dimensionalvector

xt = [p(t) p(t) f (t) g(t) plowast]

initialized from a gaussian with mean x1 = [0 0 0 0 plowast] The auxiliarystate variable g(t) is needed to implement a second-order filter The tar-get plowast is included in the state so that we can capture the above cost func-tion using a quadratic with no linear terms defining p = [1 0 0 0 minus1]we have pTxt = p(tend) minus plowast and so xT

t (ppT)xt = (p(tend) minus plowast)2 Note thatthe same could be accomplished by setting p = [1 0 0 0 minusplowast] and xt =[p(t) p(t) f (t) g(t) 1] The advantage of the formulation used here is thatbecause the target is represented in the state the same control law can bereused for other targets The control law of course depends on the filterwhich depends on the initial expected state which depends on the targetmdashand so a control law optimal for one target is not necessarily optimal for allother targets Unpublished simulation results indicate good generalizationbut a more detailed investigation of how the optimal control law dependson the target position is needed

The sensory feedback carries information about position velocity andforce

yt = [p(t) p(t) f (t)] + ωt

The vector ωt of sensory noise terms has zero-mean gaussian distributionwith diagonal covariance

ω = (σsdiag[002 m 02 ms 1 N])2

where the relative magnitudes are set using the same order-of-magnitudereasoning as before and σs = 05 The multiplicative noise term added tothe discrete-time control signal ut = u(t) is σcεtut where σc = 05 Note that

Methods for Optimal Sensorimotor Control 1099

σc is a unitless quantity that defines the noise magnitude relative to thecontrol signal magnitude

The discrete-time dynamics of the above system are

p(t + ) = p(t) + p(t)

p(t + ) = p(t) + f (t)m

f (t + ) = f (t) (1 minus τ2) + g(t)τ2

g(t + ) = g(t) (1 minus τ1) + u(t) (1 + σcεt) τ1

which is transformed into the form of equation 31 by the matrices

A =

1 0 0 0

0 1 m 0 0

0 0 1 minus τ2 τ2 0

0 0 0 1 minus τ1 0

0 0 0 0 1

B =

0

0

0

τ1

0

H =

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

C1 = Bσc c = 1 d = 0

1 = ξ = η = 0

The cost matrices are R = r Q1nminus1 = 0 and Qn = ppT + vvT + ffT where

p = [1 0 0 0 minus1] v = [0 wv 0 0 0] f = [0 0 w f 0 0]

This completes the formulation of the first model The above algorithmcan now be applied to obtain the control law and filter and the closed-loopsystem can be simulated To replace the control-dependent noise with ad-ditive noise of similar magnitude (and compare the effects of the two formsof noise) we will set c = 0 and ξ = (46 N)2 B BT The value of 46 N is theaverage magnitude of the control-dependent noise over the range of move-ment durations (found through 10000 simulation runs at each movementduration)

We also model an estimation process under state-dependent and inter-nal noise in the absence of movement In that case the state is xt = plowastwhere the stationary target plowast is sampled from a gaussian with mean x1 isin5 cm 15 cm 25 cm and variance 1 = (5 cm)2 Note that target eccentric-ity is represented as distance rather than visual angle The state-dependentnoise has scale D1 = 05 fixation is assumed to be at 0 cm the time step is = 10 msec and we run the estimation process for n = 100 time steps Inone set of simulations we use internal noise η = (05 cm)2 without addi-tive noise In another set of simulations we study additive noise with thesame magnitude ω = (05 cm)2 without internal noise There is no actu-ator to be controlled so we have A = H = 1 and B = L = 0 Estimation isbased on the adaptive filter from equation 53

1100 E Todorov

62 Results Reaching movements are known to have stereotyped bell-shaped speed profiles (Flash amp Hogan 1985) Models of this phenomenonhave traditionally been formulated in terms of deterministic open-loop min-imization of some cost function Cost functions that penalize physicallymeaningful quantities (such as duration or energy consumption) did notagree with empirical data (Nelson 1983) in order to obtain realistic speedprofiles it appeared necessary to minimize a smoothness-related cost thatpenalizes the derivative of acceleration (Flash amp Hogan 1985) or torque(Uno et al 1989) Smoothness-related cost functions have also been usedin the context of stochastic optimal feedback control (Hoff 1992) to obtainbell-shaped speed profiles It was recently shown however that smoothnessdoes not have to be explicitly enforced by the cost function open-loop min-imization of end-point error was found sufficient to produce realistic trajec-tories provided that the multiplicative nature of motor noise is taken intoaccount (Harris amp Wolpert 1998) While this is an important step toward amore principled optimization model of trajectory smoothness it still con-tains an ad hoc element the optimization is performed in an open loopwhich is suboptimal especially for movements of longer duration Ourmodel differs from Harris and Wolpert (1998) in that not only the averagesequence of control signals is optimal but the feedback gains that determinethe online sensory-guided adjustments are also optimal Optimal feedbackcontrol of reaching has been studied by Meyer Abrams Kornblum Wrightand Smith (1988) in an intermittent setting and Hoff (1992) in a continu-ous setting However both of these models assume full state observationOurs is the first optimal control model of reaching that incorporates sensorynoise and combines state estimation and feedback control into an optimalsensorimotor loop The predicted movement kinematics shown in Figure 1Aclosely resemble observed movement trajectories (Flash amp Hogan 1985)

Another well-known property of reaching movements first observed acentury ago by Woodworth and later quantified as Fittsrsquo law is the trade-offbetween speed and accuracy The fact that faster movements are less ac-curate implies that the instantaneous noise in the motor system is control-dependent in agreement with direct measurements of isometric force fluc-tuations (Sutton and Sykes 1967 Schmidt et al 1979 Todorov 2002) thatshow standard deviation increasing linearly with the mean Naturally thisnoise scaling has formed the basis of both closed-loop (Meyer et al 1988Hoff 1992) and open-loop (Harris amp Wolpert 1998) optimization models ofthe speed-accuracy trade-off Figure 1B illustrates the effect in our model asthe (specified) movement duration increases the standard deviation of theend-point error achieved by the optimal controller decreases To emphasizethe need for incorporating control-dependent noise we modified the modelby making the noise in the plant dynamics additive with fixed magnitudechosen to match the average multiplicative noise magnitude over the rangeof movement durations With that change the end-point error showed theopposite trend to the one observed experimentally (see Figure 1B)

Methods for Optimal Sensorimotor Control 1101

Figure 1 (A) Normalized position (Pos) velocity (Vel) and acceleration (Acc) ofthe average trajectory of the optimal controller (B) A separate optimal controllerwas constructed for each instructed duration the resulting closed-loop systemwas simulated for 10000 trials and the positional standard deviation at the endof the trial was plotted This was done with either multiplicative (solid line)or additive (dashed line) noise in the plant dynamics (C) The position of astationary peripheral target was estimated over time under internal estimationnoise (solid line) or additive observation noise (dashed line) This was done inthree sets of trials with target positions sampled from gaussians with means5 cm (bottom) 15 cm (middle) and 25 cm (top) Each curve is an average over10000 simulation runs

It is interesting to compare the effects of the control penalty r and the mul-tiplicative noise scaling σc As equation 42 shows both terms penalize largecontrol signalsmdashdirectly in the case of r and indirectly (via increased un-certainty) in the case of σc Consequently both terms lead to a negative biasin end-point position (not shown) but the effect is much more pronouncedfor r Another consequence of the fact that larger controls are more costlyarises in the control of redundant systems where the optimal strategy is tofollow a minimal intervention principle that is to leave task-irrelevant de-viations from the average behavior uncorrected (Todorov amp Jordan 2002a2002b) Simulations have shown that this more complex effect is dependenton σc and actually decreases when r is increased while σc is kept constant(Todorov amp Jordan 2002b)

Figure 1C shows simulation results from our second model where the po-sition of a stationary peripheral target is estimated by the optimal adaptivefilter in equation 53 operating under internal estimation noise or additiveobservation noise of the same magnitude In each case we show results forthree sets of targets with varying average eccentricity The standard devi-ations of the estimation error always reach an asymptote (much faster inthe case of internal noise) In the presence of internal noise this asymptotedepends on target eccentricity for the chosen model parameters the depen-dence is in quantitative agreement with our experimental results (Todorov1998) Under additive noise the error always asymptotes to 0

1102 E Todorov

Figure 2 Relative change in expected cost as a function of iteration numberin (A) psychophysical models and (B) random models (C) Relative variabil-ity (SDmean) among expected costs obtained from 100 different runs of thealgorithm on the same model (average over models in each class)

7 Convergence Properties

We studied the convergence properties of the algorithm in 10 models of psy-chophysical experiments taken from Todorov and Jordan (2002b) and 200randomly generated models The psychophysical models had dynamics andcost functions similar to the above example They included two models ofplanar reaching three models of passing through sequences of targets onemodel of isometric force production three models of tracking and reach-ing with a mechanically redundant arm and one model of throwing Thedimensionalities of the state control and feedback were between 5 and20 and the horizon n was about 100 The psychophysical models includedcontrol-dependent dynamics noise and additive observation noise but nointernal or state-dependent noise The details of all these models are inter-esting from a motor control point of view but we omit them here since theydid not affect the convergence of the algorithm in any systematic way

The random models were divided into two groups of 100 each passivelystable with all eigenvalues of A being smaller than 1 and passively unsta-ble with the largest eigenvalue of A being between 1 and 2 The dynamicswere restricted so that the last component of xt was 1mdashto make the randommodels more similar to the psychophysical models which always incorpo-rated a constant in the state description The state control and measurementdimensionalities were sampled uniformly between 5 and 20 The randommodels included all forms of noise allowed by equation 31

For each model we initialized K1nminus1 from equation 32 and applied ouriterative algorithm In all cases convergence was very rapid (see Figures 2Aand 2B) with the relative change in expected cost decreasing exponentiallyThe jitter observed at the end of the minimization (see Figure 2A) is due tonumerical round-off errors (note the log scale) and continues indefinitelyThe exponential convergence regime does not always start from the firstiteration (see Figure 2A) Similar behavior was observed for the absolute

Methods for Optimal Sensorimotor Control 1103

change in expected cost (not shown) As one would expect random mod-els with unstable passive dynamics converged more slowly than passivelystable models Convergence was observed in all cases

To test for the existence of local minima we focused on five psychophys-ical five random stable and five random unstable models For each modelthe algorithm was initialized 100 times with different randomly chosensequences K1nminus1 and run for 100 iterations For each model we com-puted the standard deviation of the expected cost obtained at each iterationand divided by the mean expected cost at that iteration The results aver-aged within each model class are plotted in Figure 2C The negligibly smallvalues after convergence indicate that the algorithm always finds the samesolution This was true for every model we studied despite the fact thatthe random initialization sometimes produced very large initial costs Wealso examined the K and L sequences found at the end of each run and thedifferences seemed to be due to round-off errors Thus we conjecture thatthe algorithm always converges to the globally optimal solution So far wehave not been able to prove this analytically and cannot offer a satisfyingintuitive explanation at this time

Note that the system can be unstable even for the optimal controllerFormally that does not affect the derivation because in a discrete-time finite-horizon system all numbers remain finite In practice the components of xt

can exceed the maximum floating-point number whenever the eigenvaluesof (Aminus BLt) are sufficiently large In the applications we are interestedin (Todorov 1998 Todorov amp Jordan 2002b) such problems were neverencountered

8 Improving Performance via Adaptive Estimation

Although the iterative algorithm given by equations 42 and 52 is guar-anteed to converge and empirically it appears to converge to the globallyoptimal solution performance can still be suboptimal due to the imposedrestriction to nonadaptive filters Here we present simulations aimed atquantifying this suboptimality

Because the potential suboptimality arises from the restriction to non-adaptive filters it is natural to ask what would happen if that restriction wereremoved in run time and the optimal adaptive linear filter from equation 53were used instead Recall that although the control law is optimized underthe assumption of a nonadaptive filter it yields better performance if a dif-ferent filter which somehow achieves lower estimation error is used in runtime Thus in our first test we simply replace the nonadaptive filter withequation 53 in run time and compute the reduction in expected total cost

The above discussion suggests a possibility for further improvement Thecontrol law is optimal with respect to some sequence of filter gains K1nminus1But the adaptive filter applied in run time uses systematically differentgains because it achieves systematically lower estimation error We can run

1104 E Todorov

Table 2 Cost Reduction

Model

Method Psychophysical Random Stable Random Unstable

Adaptive Estimator 19 0 314 Reoptimized Controller 17 0 283

Notes Numbers indicate percent reduction in expected total cost relative to thecost of the solution found by our iterative algorithm The two improvement meth-ods are described in the text Each method is applied to 10 models in each modelclass For each model and method expected total cost is computed from 10000simulation runs A value of 0 indicates that with a sample size of 10 models theimprovement was not significantly different from 0 (t-test p = 005 threshold)

our control law in conjunction with the adaptive filter and find the averagefilter gains K1nminus1 that are used online Now one would think that if wereoptimized the control law for the nonadaptive filter K1nminus1 which betterreflects the gains being used online by the adaptive filter this will furtherimprove performance This is the second test we apply

As Table 2 shows neither method improves performance substantiallyfor psychophysical models or random stable models However both meth-ods result in substantial improvement for random unstable models This isnot surprising In the passively stable models the differences between theexpected and actual values of the states and controls are relatively smalland so the optimal nonadaptive filter is not that different from the optimaladaptive filter The unstable models on the other hand are very sensitive tosmall perturbations and thus follow substantially different state-control tra-jectories in different simulation runs So the advantage of adaptive filteringis much greater Since musculoskeletal plants have stable passive dynamicswe conclude that our algorithm is well suited for approximating the optimalsensorimotor system

It is interesting that control law reoptimization in addition to adaptivefiltering is actually worse than adaptive filtering alonemdashcontrary to our in-tuition This was the case for every model we studied Although it is notclear where the problem with the reoptimization method lies this some-what unexpected result provides further justification for the restriction weintroduced In particular it suggests that the control law that is optimalunder the best nonadaptive filter may be close to optimal under the bestadaptive filter

9 Discussion

We have presented an algorithm for stochastic optimal control and estima-tion of partially observable linear dynamical systems subject to quadraticcosts and noise processes characteristic of the sensorimotor system (see

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 13: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

1096 E Todorov

xt+1 = (Aminus BLt)x

t (Aminus BLt)T + η + Kt(He

t HT + Pt)K T

t

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T

xet+1 = (Aminus BLt)xe

t (Aminus Kt H)T + Kt Het (Aminus Kt H)T

minus η minus KtPt K Tt

Substituting the optimal value of Kt which allows some simplifications tothe above update equations the optimal nonadaptive linear filter is com-puted in a forward pass through time as

Estimator xt+1 = (Aminus BLt) xt + Kt(yt minus H xt) + ηt

Kt = Aet HT

(He

t HT + ω +sum

i

Di(e

t + xt + xe

t + ext

)DT

i

)minus1

et+1 = ξ + η + (Aminus Kt H)e

t AT +sum

i

Ci Ltxt LT

t CTi e

1 = 1

xt+1 = η + Kt He

t AT + (Aminus BLt)xt (Aminus BLt)T

+ (Aminus BLt)xet HT K T

t + Kt Hext (Aminus BLt)T x

1 = x1xT1

xet+1 = (Aminus BLt) xe

t (Aminus Kt H)T minus η xe1 = 0

(52)It is worth noting the effects of the internal noise ηt If that term did not

exist (ie η = 0) the last update equation would yield xet = 0 for all t

Indeed for an optimal filter one would expect xet = 0 from the orthogo-

nality principle if the state estimate and estimation error were correlatedone could improve the filter by taking that correlation into account How-ever the situation here is different because we have noise acting directlyon the state estimate When such noise pushes xt in one direction et is (bydefinition) pushed in the opposite direction creating a negative correlationbetween xt and et This is the reason for the negative sign in front of the η

term in the last update equationThe complete algorithm is the followingAlgorithmInitialize K1 Knminus1 and iterate equation 42 and equation 52 until

convergence Convergence is guaranteed because the expected cost is non-negative by definition and we are using a coordinate-descent algorithmwhich decreases the expected cost in each step The initial sequence K couldbe set to 0mdashin which case the first pass of equation 42 will find the optimalopen-loop controls or initialized from equation 32mdashwhich is equivalent toassuming additive noise in the first pass

We can also derive the optimal adaptive linear filter with gains Kt thatdepend on the specific xt and ut = minusLtxt within each simulation run This isagain accomplished by minimizing E [vt+1] with respect to Kt but the expec-tation is computed with xt being a known constant rather than a randomvariable We now have x

t = xtxTt and xe

t = 0 and so the last two update

Methods for Optimal Sensorimotor Control 1097

equations in equation 52 are no longer needed The optimal adaptive linearfilter is

Adaptive estimator xt+1 = (Aminus BLt) xt + Kt (yt minus H xt) + ηt

Kt = At HT

(Ht HT + ω +

sumi

Di(t + xt xT

t

)DT

i

)minus1

t+1 = ξ + η + (Aminus Kt H) t AT +sum

i

Ci Ltxt xTt LT

t CTi

(53)

where t = Cov [xt| xt] is the conditional estimation error covariance(initialized from 1 which is given) When the control-dependentstate-dependent and internal noise terms are removed (C1 Cc = 0D1 Dd = 0 η = 0) equation 53 reduces to the Kalman filter in equa-tion 32 Note that using equation 53 instead of equation 52 online reducesthe total expected cost because equation 53 achieves lower estimation errorthan any other linear filter and the expected cost depends on the conditionalestimation error covariance This can be seen from

E[vt(xt xt)| xt] = xTt Sx

t xt + st + tr((

Sxt + Se

t

)Cov[xt| xt]

)6 Application to Reaching Movements

We now illustrate how the methodology developed above can be used toconstruct models relevant to motor control Since this is a methodologi-cal rather than a modeling article a detailed evaluation of the resultingmodels in the context of the motor control literature will not be givenhere The first model is a one-dimensional model of reaching and in-cludes control-dependent noise but no state-dependent or internal noiseThe latter two forms of noise are illustrated in the second model wherewe estimate the position of a stationary peripheral target without making amovement

61 Models We model a single-joint movement (such as flexing the el-bow) that brings the hand to a specified target For simplicity the rotationalmotion is replaced with translational motion the hand is modeled as a pointmass (m = 1 kg) whose one-dimensional position at time t is p(t) The com-bined action of all muscles is represented with the force f (t) acting on thehand The control signal u(t) is transformed into force f (t) by adding control-dependent multiplicative noise and applying a second-order muscle-likelow-pass filter (Winter 1990) of the form τ1τ2 f (t) + (τ1 + τ2) f(t) + f (t) =u(t) with time constants τ1 = τ2 = 004 sec Note that a second-order filtercan be written as a pair of coupled first-order filters (with outputs g and f )as follows τ1g(t) + g(t) = u(t) τ2 f (t) + f (t) = g(t)

The task is to move the hand from the starting position p(0) = 0 m to thetarget position plowast = 01 m and stop there at time tend with minimal energy

1098 E Todorov

consumption Movement durations are in the interval tend isin [025 sec035 sec] Time is discretized at = 001 sec The total cost is defined as

(p(tend) minus plowast)2 + (wv p(tend))2 + (w f f (tend))2 + rn minus 1

nminus1sumk=1

u(k)2

The first term enforces positional accuracy the second and third termsspecify that the movement has to stop at time tend that is both the velocityand force have to vanish and the last term penalizes energy consumptionIt makes sense to set the scaling weights wv and w f so that wv p(t) andw f f (t) averaged over the movement have magnitudes similar to the handdisplacement plowast minus p(0) For a 01 m reaching movement that lasts about03 sec these weights are wv = 02 and w f = 002 The weight of the energyterm was set to r = 000001

The discrete-time system state is represented with the five-dimensionalvector

xt = [p(t) p(t) f (t) g(t) plowast]

initialized from a gaussian with mean x1 = [0 0 0 0 plowast] The auxiliarystate variable g(t) is needed to implement a second-order filter The tar-get plowast is included in the state so that we can capture the above cost func-tion using a quadratic with no linear terms defining p = [1 0 0 0 minus1]we have pTxt = p(tend) minus plowast and so xT

t (ppT)xt = (p(tend) minus plowast)2 Note thatthe same could be accomplished by setting p = [1 0 0 0 minusplowast] and xt =[p(t) p(t) f (t) g(t) 1] The advantage of the formulation used here is thatbecause the target is represented in the state the same control law can bereused for other targets The control law of course depends on the filterwhich depends on the initial expected state which depends on the targetmdashand so a control law optimal for one target is not necessarily optimal for allother targets Unpublished simulation results indicate good generalizationbut a more detailed investigation of how the optimal control law dependson the target position is needed

The sensory feedback carries information about position velocity andforce

yt = [p(t) p(t) f (t)] + ωt

The vector ωt of sensory noise terms has zero-mean gaussian distributionwith diagonal covariance

ω = (σsdiag[002 m 02 ms 1 N])2

where the relative magnitudes are set using the same order-of-magnitudereasoning as before and σs = 05 The multiplicative noise term added tothe discrete-time control signal ut = u(t) is σcεtut where σc = 05 Note that

Methods for Optimal Sensorimotor Control 1099

σc is a unitless quantity that defines the noise magnitude relative to thecontrol signal magnitude

The discrete-time dynamics of the above system are

p(t + ) = p(t) + p(t)

p(t + ) = p(t) + f (t)m

f (t + ) = f (t) (1 minus τ2) + g(t)τ2

g(t + ) = g(t) (1 minus τ1) + u(t) (1 + σcεt) τ1

which is transformed into the form of equation 31 by the matrices

A =

1 0 0 0

0 1 m 0 0

0 0 1 minus τ2 τ2 0

0 0 0 1 minus τ1 0

0 0 0 0 1

B =

0

0

0

τ1

0

H =

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

C1 = Bσc c = 1 d = 0

1 = ξ = η = 0

The cost matrices are R = r Q1nminus1 = 0 and Qn = ppT + vvT + ffT where

p = [1 0 0 0 minus1] v = [0 wv 0 0 0] f = [0 0 w f 0 0]

This completes the formulation of the first model The above algorithmcan now be applied to obtain the control law and filter and the closed-loopsystem can be simulated To replace the control-dependent noise with ad-ditive noise of similar magnitude (and compare the effects of the two formsof noise) we will set c = 0 and ξ = (46 N)2 B BT The value of 46 N is theaverage magnitude of the control-dependent noise over the range of move-ment durations (found through 10000 simulation runs at each movementduration)

We also model an estimation process under state-dependent and inter-nal noise in the absence of movement In that case the state is xt = plowastwhere the stationary target plowast is sampled from a gaussian with mean x1 isin5 cm 15 cm 25 cm and variance 1 = (5 cm)2 Note that target eccentric-ity is represented as distance rather than visual angle The state-dependentnoise has scale D1 = 05 fixation is assumed to be at 0 cm the time step is = 10 msec and we run the estimation process for n = 100 time steps Inone set of simulations we use internal noise η = (05 cm)2 without addi-tive noise In another set of simulations we study additive noise with thesame magnitude ω = (05 cm)2 without internal noise There is no actu-ator to be controlled so we have A = H = 1 and B = L = 0 Estimation isbased on the adaptive filter from equation 53

1100 E Todorov

62 Results Reaching movements are known to have stereotyped bell-shaped speed profiles (Flash amp Hogan 1985) Models of this phenomenonhave traditionally been formulated in terms of deterministic open-loop min-imization of some cost function Cost functions that penalize physicallymeaningful quantities (such as duration or energy consumption) did notagree with empirical data (Nelson 1983) in order to obtain realistic speedprofiles it appeared necessary to minimize a smoothness-related cost thatpenalizes the derivative of acceleration (Flash amp Hogan 1985) or torque(Uno et al 1989) Smoothness-related cost functions have also been usedin the context of stochastic optimal feedback control (Hoff 1992) to obtainbell-shaped speed profiles It was recently shown however that smoothnessdoes not have to be explicitly enforced by the cost function open-loop min-imization of end-point error was found sufficient to produce realistic trajec-tories provided that the multiplicative nature of motor noise is taken intoaccount (Harris amp Wolpert 1998) While this is an important step toward amore principled optimization model of trajectory smoothness it still con-tains an ad hoc element the optimization is performed in an open loopwhich is suboptimal especially for movements of longer duration Ourmodel differs from Harris and Wolpert (1998) in that not only the averagesequence of control signals is optimal but the feedback gains that determinethe online sensory-guided adjustments are also optimal Optimal feedbackcontrol of reaching has been studied by Meyer Abrams Kornblum Wrightand Smith (1988) in an intermittent setting and Hoff (1992) in a continu-ous setting However both of these models assume full state observationOurs is the first optimal control model of reaching that incorporates sensorynoise and combines state estimation and feedback control into an optimalsensorimotor loop The predicted movement kinematics shown in Figure 1Aclosely resemble observed movement trajectories (Flash amp Hogan 1985)

Another well-known property of reaching movements first observed acentury ago by Woodworth and later quantified as Fittsrsquo law is the trade-offbetween speed and accuracy The fact that faster movements are less ac-curate implies that the instantaneous noise in the motor system is control-dependent in agreement with direct measurements of isometric force fluc-tuations (Sutton and Sykes 1967 Schmidt et al 1979 Todorov 2002) thatshow standard deviation increasing linearly with the mean Naturally thisnoise scaling has formed the basis of both closed-loop (Meyer et al 1988Hoff 1992) and open-loop (Harris amp Wolpert 1998) optimization models ofthe speed-accuracy trade-off Figure 1B illustrates the effect in our model asthe (specified) movement duration increases the standard deviation of theend-point error achieved by the optimal controller decreases To emphasizethe need for incorporating control-dependent noise we modified the modelby making the noise in the plant dynamics additive with fixed magnitudechosen to match the average multiplicative noise magnitude over the rangeof movement durations With that change the end-point error showed theopposite trend to the one observed experimentally (see Figure 1B)

Methods for Optimal Sensorimotor Control 1101

Figure 1 (A) Normalized position (Pos) velocity (Vel) and acceleration (Acc) ofthe average trajectory of the optimal controller (B) A separate optimal controllerwas constructed for each instructed duration the resulting closed-loop systemwas simulated for 10000 trials and the positional standard deviation at the endof the trial was plotted This was done with either multiplicative (solid line)or additive (dashed line) noise in the plant dynamics (C) The position of astationary peripheral target was estimated over time under internal estimationnoise (solid line) or additive observation noise (dashed line) This was done inthree sets of trials with target positions sampled from gaussians with means5 cm (bottom) 15 cm (middle) and 25 cm (top) Each curve is an average over10000 simulation runs

It is interesting to compare the effects of the control penalty r and the mul-tiplicative noise scaling σc As equation 42 shows both terms penalize largecontrol signalsmdashdirectly in the case of r and indirectly (via increased un-certainty) in the case of σc Consequently both terms lead to a negative biasin end-point position (not shown) but the effect is much more pronouncedfor r Another consequence of the fact that larger controls are more costlyarises in the control of redundant systems where the optimal strategy is tofollow a minimal intervention principle that is to leave task-irrelevant de-viations from the average behavior uncorrected (Todorov amp Jordan 2002a2002b) Simulations have shown that this more complex effect is dependenton σc and actually decreases when r is increased while σc is kept constant(Todorov amp Jordan 2002b)

Figure 1C shows simulation results from our second model where the po-sition of a stationary peripheral target is estimated by the optimal adaptivefilter in equation 53 operating under internal estimation noise or additiveobservation noise of the same magnitude In each case we show results forthree sets of targets with varying average eccentricity The standard devi-ations of the estimation error always reach an asymptote (much faster inthe case of internal noise) In the presence of internal noise this asymptotedepends on target eccentricity for the chosen model parameters the depen-dence is in quantitative agreement with our experimental results (Todorov1998) Under additive noise the error always asymptotes to 0

1102 E Todorov

Figure 2 Relative change in expected cost as a function of iteration numberin (A) psychophysical models and (B) random models (C) Relative variabil-ity (SDmean) among expected costs obtained from 100 different runs of thealgorithm on the same model (average over models in each class)

7 Convergence Properties

We studied the convergence properties of the algorithm in 10 models of psy-chophysical experiments taken from Todorov and Jordan (2002b) and 200randomly generated models The psychophysical models had dynamics andcost functions similar to the above example They included two models ofplanar reaching three models of passing through sequences of targets onemodel of isometric force production three models of tracking and reach-ing with a mechanically redundant arm and one model of throwing Thedimensionalities of the state control and feedback were between 5 and20 and the horizon n was about 100 The psychophysical models includedcontrol-dependent dynamics noise and additive observation noise but nointernal or state-dependent noise The details of all these models are inter-esting from a motor control point of view but we omit them here since theydid not affect the convergence of the algorithm in any systematic way

The random models were divided into two groups of 100 each passivelystable with all eigenvalues of A being smaller than 1 and passively unsta-ble with the largest eigenvalue of A being between 1 and 2 The dynamicswere restricted so that the last component of xt was 1mdashto make the randommodels more similar to the psychophysical models which always incorpo-rated a constant in the state description The state control and measurementdimensionalities were sampled uniformly between 5 and 20 The randommodels included all forms of noise allowed by equation 31

For each model we initialized K1nminus1 from equation 32 and applied ouriterative algorithm In all cases convergence was very rapid (see Figures 2Aand 2B) with the relative change in expected cost decreasing exponentiallyThe jitter observed at the end of the minimization (see Figure 2A) is due tonumerical round-off errors (note the log scale) and continues indefinitelyThe exponential convergence regime does not always start from the firstiteration (see Figure 2A) Similar behavior was observed for the absolute

Methods for Optimal Sensorimotor Control 1103

change in expected cost (not shown) As one would expect random mod-els with unstable passive dynamics converged more slowly than passivelystable models Convergence was observed in all cases

To test for the existence of local minima we focused on five psychophys-ical five random stable and five random unstable models For each modelthe algorithm was initialized 100 times with different randomly chosensequences K1nminus1 and run for 100 iterations For each model we com-puted the standard deviation of the expected cost obtained at each iterationand divided by the mean expected cost at that iteration The results aver-aged within each model class are plotted in Figure 2C The negligibly smallvalues after convergence indicate that the algorithm always finds the samesolution This was true for every model we studied despite the fact thatthe random initialization sometimes produced very large initial costs Wealso examined the K and L sequences found at the end of each run and thedifferences seemed to be due to round-off errors Thus we conjecture thatthe algorithm always converges to the globally optimal solution So far wehave not been able to prove this analytically and cannot offer a satisfyingintuitive explanation at this time

Note that the system can be unstable even for the optimal controllerFormally that does not affect the derivation because in a discrete-time finite-horizon system all numbers remain finite In practice the components of xt

can exceed the maximum floating-point number whenever the eigenvaluesof (Aminus BLt) are sufficiently large In the applications we are interestedin (Todorov 1998 Todorov amp Jordan 2002b) such problems were neverencountered

8 Improving Performance via Adaptive Estimation

Although the iterative algorithm given by equations 42 and 52 is guar-anteed to converge and empirically it appears to converge to the globallyoptimal solution performance can still be suboptimal due to the imposedrestriction to nonadaptive filters Here we present simulations aimed atquantifying this suboptimality

Because the potential suboptimality arises from the restriction to non-adaptive filters it is natural to ask what would happen if that restriction wereremoved in run time and the optimal adaptive linear filter from equation 53were used instead Recall that although the control law is optimized underthe assumption of a nonadaptive filter it yields better performance if a dif-ferent filter which somehow achieves lower estimation error is used in runtime Thus in our first test we simply replace the nonadaptive filter withequation 53 in run time and compute the reduction in expected total cost

The above discussion suggests a possibility for further improvement Thecontrol law is optimal with respect to some sequence of filter gains K1nminus1But the adaptive filter applied in run time uses systematically differentgains because it achieves systematically lower estimation error We can run

1104 E Todorov

Table 2 Cost Reduction

Model

Method Psychophysical Random Stable Random Unstable

Adaptive Estimator 19 0 314 Reoptimized Controller 17 0 283

Notes Numbers indicate percent reduction in expected total cost relative to thecost of the solution found by our iterative algorithm The two improvement meth-ods are described in the text Each method is applied to 10 models in each modelclass For each model and method expected total cost is computed from 10000simulation runs A value of 0 indicates that with a sample size of 10 models theimprovement was not significantly different from 0 (t-test p = 005 threshold)

our control law in conjunction with the adaptive filter and find the averagefilter gains K1nminus1 that are used online Now one would think that if wereoptimized the control law for the nonadaptive filter K1nminus1 which betterreflects the gains being used online by the adaptive filter this will furtherimprove performance This is the second test we apply

As Table 2 shows neither method improves performance substantiallyfor psychophysical models or random stable models However both meth-ods result in substantial improvement for random unstable models This isnot surprising In the passively stable models the differences between theexpected and actual values of the states and controls are relatively smalland so the optimal nonadaptive filter is not that different from the optimaladaptive filter The unstable models on the other hand are very sensitive tosmall perturbations and thus follow substantially different state-control tra-jectories in different simulation runs So the advantage of adaptive filteringis much greater Since musculoskeletal plants have stable passive dynamicswe conclude that our algorithm is well suited for approximating the optimalsensorimotor system

It is interesting that control law reoptimization in addition to adaptivefiltering is actually worse than adaptive filtering alonemdashcontrary to our in-tuition This was the case for every model we studied Although it is notclear where the problem with the reoptimization method lies this some-what unexpected result provides further justification for the restriction weintroduced In particular it suggests that the control law that is optimalunder the best nonadaptive filter may be close to optimal under the bestadaptive filter

9 Discussion

We have presented an algorithm for stochastic optimal control and estima-tion of partially observable linear dynamical systems subject to quadraticcosts and noise processes characteristic of the sensorimotor system (see

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 14: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

Methods for Optimal Sensorimotor Control 1097

equations in equation 52 are no longer needed The optimal adaptive linearfilter is

Adaptive estimator xt+1 = (Aminus BLt) xt + Kt (yt minus H xt) + ηt

Kt = At HT

(Ht HT + ω +

sumi

Di(t + xt xT

t

)DT

i

)minus1

t+1 = ξ + η + (Aminus Kt H) t AT +sum

i

Ci Ltxt xTt LT

t CTi

(53)

where t = Cov [xt| xt] is the conditional estimation error covariance(initialized from 1 which is given) When the control-dependentstate-dependent and internal noise terms are removed (C1 Cc = 0D1 Dd = 0 η = 0) equation 53 reduces to the Kalman filter in equa-tion 32 Note that using equation 53 instead of equation 52 online reducesthe total expected cost because equation 53 achieves lower estimation errorthan any other linear filter and the expected cost depends on the conditionalestimation error covariance This can be seen from

E[vt(xt xt)| xt] = xTt Sx

t xt + st + tr((

Sxt + Se

t

)Cov[xt| xt]

)6 Application to Reaching Movements

We now illustrate how the methodology developed above can be used toconstruct models relevant to motor control Since this is a methodologi-cal rather than a modeling article a detailed evaluation of the resultingmodels in the context of the motor control literature will not be givenhere The first model is a one-dimensional model of reaching and in-cludes control-dependent noise but no state-dependent or internal noiseThe latter two forms of noise are illustrated in the second model wherewe estimate the position of a stationary peripheral target without making amovement

61 Models We model a single-joint movement (such as flexing the el-bow) that brings the hand to a specified target For simplicity the rotationalmotion is replaced with translational motion the hand is modeled as a pointmass (m = 1 kg) whose one-dimensional position at time t is p(t) The com-bined action of all muscles is represented with the force f (t) acting on thehand The control signal u(t) is transformed into force f (t) by adding control-dependent multiplicative noise and applying a second-order muscle-likelow-pass filter (Winter 1990) of the form τ1τ2 f (t) + (τ1 + τ2) f(t) + f (t) =u(t) with time constants τ1 = τ2 = 004 sec Note that a second-order filtercan be written as a pair of coupled first-order filters (with outputs g and f )as follows τ1g(t) + g(t) = u(t) τ2 f (t) + f (t) = g(t)

The task is to move the hand from the starting position p(0) = 0 m to thetarget position plowast = 01 m and stop there at time tend with minimal energy

1098 E Todorov

consumption Movement durations are in the interval tend isin [025 sec035 sec] Time is discretized at = 001 sec The total cost is defined as

(p(tend) minus plowast)2 + (wv p(tend))2 + (w f f (tend))2 + rn minus 1

nminus1sumk=1

u(k)2

The first term enforces positional accuracy the second and third termsspecify that the movement has to stop at time tend that is both the velocityand force have to vanish and the last term penalizes energy consumptionIt makes sense to set the scaling weights wv and w f so that wv p(t) andw f f (t) averaged over the movement have magnitudes similar to the handdisplacement plowast minus p(0) For a 01 m reaching movement that lasts about03 sec these weights are wv = 02 and w f = 002 The weight of the energyterm was set to r = 000001

The discrete-time system state is represented with the five-dimensionalvector

xt = [p(t) p(t) f (t) g(t) plowast]

initialized from a gaussian with mean x1 = [0 0 0 0 plowast] The auxiliarystate variable g(t) is needed to implement a second-order filter The tar-get plowast is included in the state so that we can capture the above cost func-tion using a quadratic with no linear terms defining p = [1 0 0 0 minus1]we have pTxt = p(tend) minus plowast and so xT

t (ppT)xt = (p(tend) minus plowast)2 Note thatthe same could be accomplished by setting p = [1 0 0 0 minusplowast] and xt =[p(t) p(t) f (t) g(t) 1] The advantage of the formulation used here is thatbecause the target is represented in the state the same control law can bereused for other targets The control law of course depends on the filterwhich depends on the initial expected state which depends on the targetmdashand so a control law optimal for one target is not necessarily optimal for allother targets Unpublished simulation results indicate good generalizationbut a more detailed investigation of how the optimal control law dependson the target position is needed

The sensory feedback carries information about position velocity andforce

yt = [p(t) p(t) f (t)] + ωt

The vector ωt of sensory noise terms has zero-mean gaussian distributionwith diagonal covariance

ω = (σsdiag[002 m 02 ms 1 N])2

where the relative magnitudes are set using the same order-of-magnitudereasoning as before and σs = 05 The multiplicative noise term added tothe discrete-time control signal ut = u(t) is σcεtut where σc = 05 Note that

Methods for Optimal Sensorimotor Control 1099

σc is a unitless quantity that defines the noise magnitude relative to thecontrol signal magnitude

The discrete-time dynamics of the above system are

p(t + ) = p(t) + p(t)

p(t + ) = p(t) + f (t)m

f (t + ) = f (t) (1 minus τ2) + g(t)τ2

g(t + ) = g(t) (1 minus τ1) + u(t) (1 + σcεt) τ1

which is transformed into the form of equation 31 by the matrices

A =

1 0 0 0

0 1 m 0 0

0 0 1 minus τ2 τ2 0

0 0 0 1 minus τ1 0

0 0 0 0 1

B =

0

0

0

τ1

0

H =

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

C1 = Bσc c = 1 d = 0

1 = ξ = η = 0

The cost matrices are R = r Q1nminus1 = 0 and Qn = ppT + vvT + ffT where

p = [1 0 0 0 minus1] v = [0 wv 0 0 0] f = [0 0 w f 0 0]

This completes the formulation of the first model The above algorithmcan now be applied to obtain the control law and filter and the closed-loopsystem can be simulated To replace the control-dependent noise with ad-ditive noise of similar magnitude (and compare the effects of the two formsof noise) we will set c = 0 and ξ = (46 N)2 B BT The value of 46 N is theaverage magnitude of the control-dependent noise over the range of move-ment durations (found through 10000 simulation runs at each movementduration)

We also model an estimation process under state-dependent and inter-nal noise in the absence of movement In that case the state is xt = plowastwhere the stationary target plowast is sampled from a gaussian with mean x1 isin5 cm 15 cm 25 cm and variance 1 = (5 cm)2 Note that target eccentric-ity is represented as distance rather than visual angle The state-dependentnoise has scale D1 = 05 fixation is assumed to be at 0 cm the time step is = 10 msec and we run the estimation process for n = 100 time steps Inone set of simulations we use internal noise η = (05 cm)2 without addi-tive noise In another set of simulations we study additive noise with thesame magnitude ω = (05 cm)2 without internal noise There is no actu-ator to be controlled so we have A = H = 1 and B = L = 0 Estimation isbased on the adaptive filter from equation 53

1100 E Todorov

62 Results Reaching movements are known to have stereotyped bell-shaped speed profiles (Flash amp Hogan 1985) Models of this phenomenonhave traditionally been formulated in terms of deterministic open-loop min-imization of some cost function Cost functions that penalize physicallymeaningful quantities (such as duration or energy consumption) did notagree with empirical data (Nelson 1983) in order to obtain realistic speedprofiles it appeared necessary to minimize a smoothness-related cost thatpenalizes the derivative of acceleration (Flash amp Hogan 1985) or torque(Uno et al 1989) Smoothness-related cost functions have also been usedin the context of stochastic optimal feedback control (Hoff 1992) to obtainbell-shaped speed profiles It was recently shown however that smoothnessdoes not have to be explicitly enforced by the cost function open-loop min-imization of end-point error was found sufficient to produce realistic trajec-tories provided that the multiplicative nature of motor noise is taken intoaccount (Harris amp Wolpert 1998) While this is an important step toward amore principled optimization model of trajectory smoothness it still con-tains an ad hoc element the optimization is performed in an open loopwhich is suboptimal especially for movements of longer duration Ourmodel differs from Harris and Wolpert (1998) in that not only the averagesequence of control signals is optimal but the feedback gains that determinethe online sensory-guided adjustments are also optimal Optimal feedbackcontrol of reaching has been studied by Meyer Abrams Kornblum Wrightand Smith (1988) in an intermittent setting and Hoff (1992) in a continu-ous setting However both of these models assume full state observationOurs is the first optimal control model of reaching that incorporates sensorynoise and combines state estimation and feedback control into an optimalsensorimotor loop The predicted movement kinematics shown in Figure 1Aclosely resemble observed movement trajectories (Flash amp Hogan 1985)

Another well-known property of reaching movements first observed acentury ago by Woodworth and later quantified as Fittsrsquo law is the trade-offbetween speed and accuracy The fact that faster movements are less ac-curate implies that the instantaneous noise in the motor system is control-dependent in agreement with direct measurements of isometric force fluc-tuations (Sutton and Sykes 1967 Schmidt et al 1979 Todorov 2002) thatshow standard deviation increasing linearly with the mean Naturally thisnoise scaling has formed the basis of both closed-loop (Meyer et al 1988Hoff 1992) and open-loop (Harris amp Wolpert 1998) optimization models ofthe speed-accuracy trade-off Figure 1B illustrates the effect in our model asthe (specified) movement duration increases the standard deviation of theend-point error achieved by the optimal controller decreases To emphasizethe need for incorporating control-dependent noise we modified the modelby making the noise in the plant dynamics additive with fixed magnitudechosen to match the average multiplicative noise magnitude over the rangeof movement durations With that change the end-point error showed theopposite trend to the one observed experimentally (see Figure 1B)

Methods for Optimal Sensorimotor Control 1101

Figure 1 (A) Normalized position (Pos) velocity (Vel) and acceleration (Acc) ofthe average trajectory of the optimal controller (B) A separate optimal controllerwas constructed for each instructed duration the resulting closed-loop systemwas simulated for 10000 trials and the positional standard deviation at the endof the trial was plotted This was done with either multiplicative (solid line)or additive (dashed line) noise in the plant dynamics (C) The position of astationary peripheral target was estimated over time under internal estimationnoise (solid line) or additive observation noise (dashed line) This was done inthree sets of trials with target positions sampled from gaussians with means5 cm (bottom) 15 cm (middle) and 25 cm (top) Each curve is an average over10000 simulation runs

It is interesting to compare the effects of the control penalty r and the mul-tiplicative noise scaling σc As equation 42 shows both terms penalize largecontrol signalsmdashdirectly in the case of r and indirectly (via increased un-certainty) in the case of σc Consequently both terms lead to a negative biasin end-point position (not shown) but the effect is much more pronouncedfor r Another consequence of the fact that larger controls are more costlyarises in the control of redundant systems where the optimal strategy is tofollow a minimal intervention principle that is to leave task-irrelevant de-viations from the average behavior uncorrected (Todorov amp Jordan 2002a2002b) Simulations have shown that this more complex effect is dependenton σc and actually decreases when r is increased while σc is kept constant(Todorov amp Jordan 2002b)

Figure 1C shows simulation results from our second model where the po-sition of a stationary peripheral target is estimated by the optimal adaptivefilter in equation 53 operating under internal estimation noise or additiveobservation noise of the same magnitude In each case we show results forthree sets of targets with varying average eccentricity The standard devi-ations of the estimation error always reach an asymptote (much faster inthe case of internal noise) In the presence of internal noise this asymptotedepends on target eccentricity for the chosen model parameters the depen-dence is in quantitative agreement with our experimental results (Todorov1998) Under additive noise the error always asymptotes to 0

1102 E Todorov

Figure 2 Relative change in expected cost as a function of iteration numberin (A) psychophysical models and (B) random models (C) Relative variabil-ity (SDmean) among expected costs obtained from 100 different runs of thealgorithm on the same model (average over models in each class)

7 Convergence Properties

We studied the convergence properties of the algorithm in 10 models of psy-chophysical experiments taken from Todorov and Jordan (2002b) and 200randomly generated models The psychophysical models had dynamics andcost functions similar to the above example They included two models ofplanar reaching three models of passing through sequences of targets onemodel of isometric force production three models of tracking and reach-ing with a mechanically redundant arm and one model of throwing Thedimensionalities of the state control and feedback were between 5 and20 and the horizon n was about 100 The psychophysical models includedcontrol-dependent dynamics noise and additive observation noise but nointernal or state-dependent noise The details of all these models are inter-esting from a motor control point of view but we omit them here since theydid not affect the convergence of the algorithm in any systematic way

The random models were divided into two groups of 100 each passivelystable with all eigenvalues of A being smaller than 1 and passively unsta-ble with the largest eigenvalue of A being between 1 and 2 The dynamicswere restricted so that the last component of xt was 1mdashto make the randommodels more similar to the psychophysical models which always incorpo-rated a constant in the state description The state control and measurementdimensionalities were sampled uniformly between 5 and 20 The randommodels included all forms of noise allowed by equation 31

For each model we initialized K1nminus1 from equation 32 and applied ouriterative algorithm In all cases convergence was very rapid (see Figures 2Aand 2B) with the relative change in expected cost decreasing exponentiallyThe jitter observed at the end of the minimization (see Figure 2A) is due tonumerical round-off errors (note the log scale) and continues indefinitelyThe exponential convergence regime does not always start from the firstiteration (see Figure 2A) Similar behavior was observed for the absolute

Methods for Optimal Sensorimotor Control 1103

change in expected cost (not shown) As one would expect random mod-els with unstable passive dynamics converged more slowly than passivelystable models Convergence was observed in all cases

To test for the existence of local minima we focused on five psychophys-ical five random stable and five random unstable models For each modelthe algorithm was initialized 100 times with different randomly chosensequences K1nminus1 and run for 100 iterations For each model we com-puted the standard deviation of the expected cost obtained at each iterationand divided by the mean expected cost at that iteration The results aver-aged within each model class are plotted in Figure 2C The negligibly smallvalues after convergence indicate that the algorithm always finds the samesolution This was true for every model we studied despite the fact thatthe random initialization sometimes produced very large initial costs Wealso examined the K and L sequences found at the end of each run and thedifferences seemed to be due to round-off errors Thus we conjecture thatthe algorithm always converges to the globally optimal solution So far wehave not been able to prove this analytically and cannot offer a satisfyingintuitive explanation at this time

Note that the system can be unstable even for the optimal controllerFormally that does not affect the derivation because in a discrete-time finite-horizon system all numbers remain finite In practice the components of xt

can exceed the maximum floating-point number whenever the eigenvaluesof (Aminus BLt) are sufficiently large In the applications we are interestedin (Todorov 1998 Todorov amp Jordan 2002b) such problems were neverencountered

8 Improving Performance via Adaptive Estimation

Although the iterative algorithm given by equations 42 and 52 is guar-anteed to converge and empirically it appears to converge to the globallyoptimal solution performance can still be suboptimal due to the imposedrestriction to nonadaptive filters Here we present simulations aimed atquantifying this suboptimality

Because the potential suboptimality arises from the restriction to non-adaptive filters it is natural to ask what would happen if that restriction wereremoved in run time and the optimal adaptive linear filter from equation 53were used instead Recall that although the control law is optimized underthe assumption of a nonadaptive filter it yields better performance if a dif-ferent filter which somehow achieves lower estimation error is used in runtime Thus in our first test we simply replace the nonadaptive filter withequation 53 in run time and compute the reduction in expected total cost

The above discussion suggests a possibility for further improvement Thecontrol law is optimal with respect to some sequence of filter gains K1nminus1But the adaptive filter applied in run time uses systematically differentgains because it achieves systematically lower estimation error We can run

1104 E Todorov

Table 2 Cost Reduction

Model

Method Psychophysical Random Stable Random Unstable

Adaptive Estimator 19 0 314 Reoptimized Controller 17 0 283

Notes Numbers indicate percent reduction in expected total cost relative to thecost of the solution found by our iterative algorithm The two improvement meth-ods are described in the text Each method is applied to 10 models in each modelclass For each model and method expected total cost is computed from 10000simulation runs A value of 0 indicates that with a sample size of 10 models theimprovement was not significantly different from 0 (t-test p = 005 threshold)

our control law in conjunction with the adaptive filter and find the averagefilter gains K1nminus1 that are used online Now one would think that if wereoptimized the control law for the nonadaptive filter K1nminus1 which betterreflects the gains being used online by the adaptive filter this will furtherimprove performance This is the second test we apply

As Table 2 shows neither method improves performance substantiallyfor psychophysical models or random stable models However both meth-ods result in substantial improvement for random unstable models This isnot surprising In the passively stable models the differences between theexpected and actual values of the states and controls are relatively smalland so the optimal nonadaptive filter is not that different from the optimaladaptive filter The unstable models on the other hand are very sensitive tosmall perturbations and thus follow substantially different state-control tra-jectories in different simulation runs So the advantage of adaptive filteringis much greater Since musculoskeletal plants have stable passive dynamicswe conclude that our algorithm is well suited for approximating the optimalsensorimotor system

It is interesting that control law reoptimization in addition to adaptivefiltering is actually worse than adaptive filtering alonemdashcontrary to our in-tuition This was the case for every model we studied Although it is notclear where the problem with the reoptimization method lies this some-what unexpected result provides further justification for the restriction weintroduced In particular it suggests that the control law that is optimalunder the best nonadaptive filter may be close to optimal under the bestadaptive filter

9 Discussion

We have presented an algorithm for stochastic optimal control and estima-tion of partially observable linear dynamical systems subject to quadraticcosts and noise processes characteristic of the sensorimotor system (see

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 15: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

1098 E Todorov

consumption Movement durations are in the interval tend isin [025 sec035 sec] Time is discretized at = 001 sec The total cost is defined as

(p(tend) minus plowast)2 + (wv p(tend))2 + (w f f (tend))2 + rn minus 1

nminus1sumk=1

u(k)2

The first term enforces positional accuracy the second and third termsspecify that the movement has to stop at time tend that is both the velocityand force have to vanish and the last term penalizes energy consumptionIt makes sense to set the scaling weights wv and w f so that wv p(t) andw f f (t) averaged over the movement have magnitudes similar to the handdisplacement plowast minus p(0) For a 01 m reaching movement that lasts about03 sec these weights are wv = 02 and w f = 002 The weight of the energyterm was set to r = 000001

The discrete-time system state is represented with the five-dimensionalvector

xt = [p(t) p(t) f (t) g(t) plowast]

initialized from a gaussian with mean x1 = [0 0 0 0 plowast] The auxiliarystate variable g(t) is needed to implement a second-order filter The tar-get plowast is included in the state so that we can capture the above cost func-tion using a quadratic with no linear terms defining p = [1 0 0 0 minus1]we have pTxt = p(tend) minus plowast and so xT

t (ppT)xt = (p(tend) minus plowast)2 Note thatthe same could be accomplished by setting p = [1 0 0 0 minusplowast] and xt =[p(t) p(t) f (t) g(t) 1] The advantage of the formulation used here is thatbecause the target is represented in the state the same control law can bereused for other targets The control law of course depends on the filterwhich depends on the initial expected state which depends on the targetmdashand so a control law optimal for one target is not necessarily optimal for allother targets Unpublished simulation results indicate good generalizationbut a more detailed investigation of how the optimal control law dependson the target position is needed

The sensory feedback carries information about position velocity andforce

yt = [p(t) p(t) f (t)] + ωt

The vector ωt of sensory noise terms has zero-mean gaussian distributionwith diagonal covariance

ω = (σsdiag[002 m 02 ms 1 N])2

where the relative magnitudes are set using the same order-of-magnitudereasoning as before and σs = 05 The multiplicative noise term added tothe discrete-time control signal ut = u(t) is σcεtut where σc = 05 Note that

Methods for Optimal Sensorimotor Control 1099

σc is a unitless quantity that defines the noise magnitude relative to thecontrol signal magnitude

The discrete-time dynamics of the above system are

p(t + ) = p(t) + p(t)

p(t + ) = p(t) + f (t)m

f (t + ) = f (t) (1 minus τ2) + g(t)τ2

g(t + ) = g(t) (1 minus τ1) + u(t) (1 + σcεt) τ1

which is transformed into the form of equation 31 by the matrices

A =

1 0 0 0

0 1 m 0 0

0 0 1 minus τ2 τ2 0

0 0 0 1 minus τ1 0

0 0 0 0 1

B =

0

0

0

τ1

0

H =

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

C1 = Bσc c = 1 d = 0

1 = ξ = η = 0

The cost matrices are R = r Q1nminus1 = 0 and Qn = ppT + vvT + ffT where

p = [1 0 0 0 minus1] v = [0 wv 0 0 0] f = [0 0 w f 0 0]

This completes the formulation of the first model The above algorithmcan now be applied to obtain the control law and filter and the closed-loopsystem can be simulated To replace the control-dependent noise with ad-ditive noise of similar magnitude (and compare the effects of the two formsof noise) we will set c = 0 and ξ = (46 N)2 B BT The value of 46 N is theaverage magnitude of the control-dependent noise over the range of move-ment durations (found through 10000 simulation runs at each movementduration)

We also model an estimation process under state-dependent and inter-nal noise in the absence of movement In that case the state is xt = plowastwhere the stationary target plowast is sampled from a gaussian with mean x1 isin5 cm 15 cm 25 cm and variance 1 = (5 cm)2 Note that target eccentric-ity is represented as distance rather than visual angle The state-dependentnoise has scale D1 = 05 fixation is assumed to be at 0 cm the time step is = 10 msec and we run the estimation process for n = 100 time steps Inone set of simulations we use internal noise η = (05 cm)2 without addi-tive noise In another set of simulations we study additive noise with thesame magnitude ω = (05 cm)2 without internal noise There is no actu-ator to be controlled so we have A = H = 1 and B = L = 0 Estimation isbased on the adaptive filter from equation 53

1100 E Todorov

62 Results Reaching movements are known to have stereotyped bell-shaped speed profiles (Flash amp Hogan 1985) Models of this phenomenonhave traditionally been formulated in terms of deterministic open-loop min-imization of some cost function Cost functions that penalize physicallymeaningful quantities (such as duration or energy consumption) did notagree with empirical data (Nelson 1983) in order to obtain realistic speedprofiles it appeared necessary to minimize a smoothness-related cost thatpenalizes the derivative of acceleration (Flash amp Hogan 1985) or torque(Uno et al 1989) Smoothness-related cost functions have also been usedin the context of stochastic optimal feedback control (Hoff 1992) to obtainbell-shaped speed profiles It was recently shown however that smoothnessdoes not have to be explicitly enforced by the cost function open-loop min-imization of end-point error was found sufficient to produce realistic trajec-tories provided that the multiplicative nature of motor noise is taken intoaccount (Harris amp Wolpert 1998) While this is an important step toward amore principled optimization model of trajectory smoothness it still con-tains an ad hoc element the optimization is performed in an open loopwhich is suboptimal especially for movements of longer duration Ourmodel differs from Harris and Wolpert (1998) in that not only the averagesequence of control signals is optimal but the feedback gains that determinethe online sensory-guided adjustments are also optimal Optimal feedbackcontrol of reaching has been studied by Meyer Abrams Kornblum Wrightand Smith (1988) in an intermittent setting and Hoff (1992) in a continu-ous setting However both of these models assume full state observationOurs is the first optimal control model of reaching that incorporates sensorynoise and combines state estimation and feedback control into an optimalsensorimotor loop The predicted movement kinematics shown in Figure 1Aclosely resemble observed movement trajectories (Flash amp Hogan 1985)

Another well-known property of reaching movements first observed acentury ago by Woodworth and later quantified as Fittsrsquo law is the trade-offbetween speed and accuracy The fact that faster movements are less ac-curate implies that the instantaneous noise in the motor system is control-dependent in agreement with direct measurements of isometric force fluc-tuations (Sutton and Sykes 1967 Schmidt et al 1979 Todorov 2002) thatshow standard deviation increasing linearly with the mean Naturally thisnoise scaling has formed the basis of both closed-loop (Meyer et al 1988Hoff 1992) and open-loop (Harris amp Wolpert 1998) optimization models ofthe speed-accuracy trade-off Figure 1B illustrates the effect in our model asthe (specified) movement duration increases the standard deviation of theend-point error achieved by the optimal controller decreases To emphasizethe need for incorporating control-dependent noise we modified the modelby making the noise in the plant dynamics additive with fixed magnitudechosen to match the average multiplicative noise magnitude over the rangeof movement durations With that change the end-point error showed theopposite trend to the one observed experimentally (see Figure 1B)

Methods for Optimal Sensorimotor Control 1101

Figure 1 (A) Normalized position (Pos) velocity (Vel) and acceleration (Acc) ofthe average trajectory of the optimal controller (B) A separate optimal controllerwas constructed for each instructed duration the resulting closed-loop systemwas simulated for 10000 trials and the positional standard deviation at the endof the trial was plotted This was done with either multiplicative (solid line)or additive (dashed line) noise in the plant dynamics (C) The position of astationary peripheral target was estimated over time under internal estimationnoise (solid line) or additive observation noise (dashed line) This was done inthree sets of trials with target positions sampled from gaussians with means5 cm (bottom) 15 cm (middle) and 25 cm (top) Each curve is an average over10000 simulation runs

It is interesting to compare the effects of the control penalty r and the mul-tiplicative noise scaling σc As equation 42 shows both terms penalize largecontrol signalsmdashdirectly in the case of r and indirectly (via increased un-certainty) in the case of σc Consequently both terms lead to a negative biasin end-point position (not shown) but the effect is much more pronouncedfor r Another consequence of the fact that larger controls are more costlyarises in the control of redundant systems where the optimal strategy is tofollow a minimal intervention principle that is to leave task-irrelevant de-viations from the average behavior uncorrected (Todorov amp Jordan 2002a2002b) Simulations have shown that this more complex effect is dependenton σc and actually decreases when r is increased while σc is kept constant(Todorov amp Jordan 2002b)

Figure 1C shows simulation results from our second model where the po-sition of a stationary peripheral target is estimated by the optimal adaptivefilter in equation 53 operating under internal estimation noise or additiveobservation noise of the same magnitude In each case we show results forthree sets of targets with varying average eccentricity The standard devi-ations of the estimation error always reach an asymptote (much faster inthe case of internal noise) In the presence of internal noise this asymptotedepends on target eccentricity for the chosen model parameters the depen-dence is in quantitative agreement with our experimental results (Todorov1998) Under additive noise the error always asymptotes to 0

1102 E Todorov

Figure 2 Relative change in expected cost as a function of iteration numberin (A) psychophysical models and (B) random models (C) Relative variabil-ity (SDmean) among expected costs obtained from 100 different runs of thealgorithm on the same model (average over models in each class)

7 Convergence Properties

We studied the convergence properties of the algorithm in 10 models of psy-chophysical experiments taken from Todorov and Jordan (2002b) and 200randomly generated models The psychophysical models had dynamics andcost functions similar to the above example They included two models ofplanar reaching three models of passing through sequences of targets onemodel of isometric force production three models of tracking and reach-ing with a mechanically redundant arm and one model of throwing Thedimensionalities of the state control and feedback were between 5 and20 and the horizon n was about 100 The psychophysical models includedcontrol-dependent dynamics noise and additive observation noise but nointernal or state-dependent noise The details of all these models are inter-esting from a motor control point of view but we omit them here since theydid not affect the convergence of the algorithm in any systematic way

The random models were divided into two groups of 100 each passivelystable with all eigenvalues of A being smaller than 1 and passively unsta-ble with the largest eigenvalue of A being between 1 and 2 The dynamicswere restricted so that the last component of xt was 1mdashto make the randommodels more similar to the psychophysical models which always incorpo-rated a constant in the state description The state control and measurementdimensionalities were sampled uniformly between 5 and 20 The randommodels included all forms of noise allowed by equation 31

For each model we initialized K1nminus1 from equation 32 and applied ouriterative algorithm In all cases convergence was very rapid (see Figures 2Aand 2B) with the relative change in expected cost decreasing exponentiallyThe jitter observed at the end of the minimization (see Figure 2A) is due tonumerical round-off errors (note the log scale) and continues indefinitelyThe exponential convergence regime does not always start from the firstiteration (see Figure 2A) Similar behavior was observed for the absolute

Methods for Optimal Sensorimotor Control 1103

change in expected cost (not shown) As one would expect random mod-els with unstable passive dynamics converged more slowly than passivelystable models Convergence was observed in all cases

To test for the existence of local minima we focused on five psychophys-ical five random stable and five random unstable models For each modelthe algorithm was initialized 100 times with different randomly chosensequences K1nminus1 and run for 100 iterations For each model we com-puted the standard deviation of the expected cost obtained at each iterationand divided by the mean expected cost at that iteration The results aver-aged within each model class are plotted in Figure 2C The negligibly smallvalues after convergence indicate that the algorithm always finds the samesolution This was true for every model we studied despite the fact thatthe random initialization sometimes produced very large initial costs Wealso examined the K and L sequences found at the end of each run and thedifferences seemed to be due to round-off errors Thus we conjecture thatthe algorithm always converges to the globally optimal solution So far wehave not been able to prove this analytically and cannot offer a satisfyingintuitive explanation at this time

Note that the system can be unstable even for the optimal controllerFormally that does not affect the derivation because in a discrete-time finite-horizon system all numbers remain finite In practice the components of xt

can exceed the maximum floating-point number whenever the eigenvaluesof (Aminus BLt) are sufficiently large In the applications we are interestedin (Todorov 1998 Todorov amp Jordan 2002b) such problems were neverencountered

8 Improving Performance via Adaptive Estimation

Although the iterative algorithm given by equations 42 and 52 is guar-anteed to converge and empirically it appears to converge to the globallyoptimal solution performance can still be suboptimal due to the imposedrestriction to nonadaptive filters Here we present simulations aimed atquantifying this suboptimality

Because the potential suboptimality arises from the restriction to non-adaptive filters it is natural to ask what would happen if that restriction wereremoved in run time and the optimal adaptive linear filter from equation 53were used instead Recall that although the control law is optimized underthe assumption of a nonadaptive filter it yields better performance if a dif-ferent filter which somehow achieves lower estimation error is used in runtime Thus in our first test we simply replace the nonadaptive filter withequation 53 in run time and compute the reduction in expected total cost

The above discussion suggests a possibility for further improvement Thecontrol law is optimal with respect to some sequence of filter gains K1nminus1But the adaptive filter applied in run time uses systematically differentgains because it achieves systematically lower estimation error We can run

1104 E Todorov

Table 2 Cost Reduction

Model

Method Psychophysical Random Stable Random Unstable

Adaptive Estimator 19 0 314 Reoptimized Controller 17 0 283

Notes Numbers indicate percent reduction in expected total cost relative to thecost of the solution found by our iterative algorithm The two improvement meth-ods are described in the text Each method is applied to 10 models in each modelclass For each model and method expected total cost is computed from 10000simulation runs A value of 0 indicates that with a sample size of 10 models theimprovement was not significantly different from 0 (t-test p = 005 threshold)

our control law in conjunction with the adaptive filter and find the averagefilter gains K1nminus1 that are used online Now one would think that if wereoptimized the control law for the nonadaptive filter K1nminus1 which betterreflects the gains being used online by the adaptive filter this will furtherimprove performance This is the second test we apply

As Table 2 shows neither method improves performance substantiallyfor psychophysical models or random stable models However both meth-ods result in substantial improvement for random unstable models This isnot surprising In the passively stable models the differences between theexpected and actual values of the states and controls are relatively smalland so the optimal nonadaptive filter is not that different from the optimaladaptive filter The unstable models on the other hand are very sensitive tosmall perturbations and thus follow substantially different state-control tra-jectories in different simulation runs So the advantage of adaptive filteringis much greater Since musculoskeletal plants have stable passive dynamicswe conclude that our algorithm is well suited for approximating the optimalsensorimotor system

It is interesting that control law reoptimization in addition to adaptivefiltering is actually worse than adaptive filtering alonemdashcontrary to our in-tuition This was the case for every model we studied Although it is notclear where the problem with the reoptimization method lies this some-what unexpected result provides further justification for the restriction weintroduced In particular it suggests that the control law that is optimalunder the best nonadaptive filter may be close to optimal under the bestadaptive filter

9 Discussion

We have presented an algorithm for stochastic optimal control and estima-tion of partially observable linear dynamical systems subject to quadraticcosts and noise processes characteristic of the sensorimotor system (see

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 16: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

Methods for Optimal Sensorimotor Control 1099

σc is a unitless quantity that defines the noise magnitude relative to thecontrol signal magnitude

The discrete-time dynamics of the above system are

p(t + ) = p(t) + p(t)

p(t + ) = p(t) + f (t)m

f (t + ) = f (t) (1 minus τ2) + g(t)τ2

g(t + ) = g(t) (1 minus τ1) + u(t) (1 + σcεt) τ1

which is transformed into the form of equation 31 by the matrices

A =

1 0 0 0

0 1 m 0 0

0 0 1 minus τ2 τ2 0

0 0 0 1 minus τ1 0

0 0 0 0 1

B =

0

0

0

τ1

0

H =

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

C1 = Bσc c = 1 d = 0

1 = ξ = η = 0

The cost matrices are R = r Q1nminus1 = 0 and Qn = ppT + vvT + ffT where

p = [1 0 0 0 minus1] v = [0 wv 0 0 0] f = [0 0 w f 0 0]

This completes the formulation of the first model The above algorithmcan now be applied to obtain the control law and filter and the closed-loopsystem can be simulated To replace the control-dependent noise with ad-ditive noise of similar magnitude (and compare the effects of the two formsof noise) we will set c = 0 and ξ = (46 N)2 B BT The value of 46 N is theaverage magnitude of the control-dependent noise over the range of move-ment durations (found through 10000 simulation runs at each movementduration)

We also model an estimation process under state-dependent and inter-nal noise in the absence of movement In that case the state is xt = plowastwhere the stationary target plowast is sampled from a gaussian with mean x1 isin5 cm 15 cm 25 cm and variance 1 = (5 cm)2 Note that target eccentric-ity is represented as distance rather than visual angle The state-dependentnoise has scale D1 = 05 fixation is assumed to be at 0 cm the time step is = 10 msec and we run the estimation process for n = 100 time steps Inone set of simulations we use internal noise η = (05 cm)2 without addi-tive noise In another set of simulations we study additive noise with thesame magnitude ω = (05 cm)2 without internal noise There is no actu-ator to be controlled so we have A = H = 1 and B = L = 0 Estimation isbased on the adaptive filter from equation 53

1100 E Todorov

62 Results Reaching movements are known to have stereotyped bell-shaped speed profiles (Flash amp Hogan 1985) Models of this phenomenonhave traditionally been formulated in terms of deterministic open-loop min-imization of some cost function Cost functions that penalize physicallymeaningful quantities (such as duration or energy consumption) did notagree with empirical data (Nelson 1983) in order to obtain realistic speedprofiles it appeared necessary to minimize a smoothness-related cost thatpenalizes the derivative of acceleration (Flash amp Hogan 1985) or torque(Uno et al 1989) Smoothness-related cost functions have also been usedin the context of stochastic optimal feedback control (Hoff 1992) to obtainbell-shaped speed profiles It was recently shown however that smoothnessdoes not have to be explicitly enforced by the cost function open-loop min-imization of end-point error was found sufficient to produce realistic trajec-tories provided that the multiplicative nature of motor noise is taken intoaccount (Harris amp Wolpert 1998) While this is an important step toward amore principled optimization model of trajectory smoothness it still con-tains an ad hoc element the optimization is performed in an open loopwhich is suboptimal especially for movements of longer duration Ourmodel differs from Harris and Wolpert (1998) in that not only the averagesequence of control signals is optimal but the feedback gains that determinethe online sensory-guided adjustments are also optimal Optimal feedbackcontrol of reaching has been studied by Meyer Abrams Kornblum Wrightand Smith (1988) in an intermittent setting and Hoff (1992) in a continu-ous setting However both of these models assume full state observationOurs is the first optimal control model of reaching that incorporates sensorynoise and combines state estimation and feedback control into an optimalsensorimotor loop The predicted movement kinematics shown in Figure 1Aclosely resemble observed movement trajectories (Flash amp Hogan 1985)

Another well-known property of reaching movements first observed acentury ago by Woodworth and later quantified as Fittsrsquo law is the trade-offbetween speed and accuracy The fact that faster movements are less ac-curate implies that the instantaneous noise in the motor system is control-dependent in agreement with direct measurements of isometric force fluc-tuations (Sutton and Sykes 1967 Schmidt et al 1979 Todorov 2002) thatshow standard deviation increasing linearly with the mean Naturally thisnoise scaling has formed the basis of both closed-loop (Meyer et al 1988Hoff 1992) and open-loop (Harris amp Wolpert 1998) optimization models ofthe speed-accuracy trade-off Figure 1B illustrates the effect in our model asthe (specified) movement duration increases the standard deviation of theend-point error achieved by the optimal controller decreases To emphasizethe need for incorporating control-dependent noise we modified the modelby making the noise in the plant dynamics additive with fixed magnitudechosen to match the average multiplicative noise magnitude over the rangeof movement durations With that change the end-point error showed theopposite trend to the one observed experimentally (see Figure 1B)

Methods for Optimal Sensorimotor Control 1101

Figure 1 (A) Normalized position (Pos) velocity (Vel) and acceleration (Acc) ofthe average trajectory of the optimal controller (B) A separate optimal controllerwas constructed for each instructed duration the resulting closed-loop systemwas simulated for 10000 trials and the positional standard deviation at the endof the trial was plotted This was done with either multiplicative (solid line)or additive (dashed line) noise in the plant dynamics (C) The position of astationary peripheral target was estimated over time under internal estimationnoise (solid line) or additive observation noise (dashed line) This was done inthree sets of trials with target positions sampled from gaussians with means5 cm (bottom) 15 cm (middle) and 25 cm (top) Each curve is an average over10000 simulation runs

It is interesting to compare the effects of the control penalty r and the mul-tiplicative noise scaling σc As equation 42 shows both terms penalize largecontrol signalsmdashdirectly in the case of r and indirectly (via increased un-certainty) in the case of σc Consequently both terms lead to a negative biasin end-point position (not shown) but the effect is much more pronouncedfor r Another consequence of the fact that larger controls are more costlyarises in the control of redundant systems where the optimal strategy is tofollow a minimal intervention principle that is to leave task-irrelevant de-viations from the average behavior uncorrected (Todorov amp Jordan 2002a2002b) Simulations have shown that this more complex effect is dependenton σc and actually decreases when r is increased while σc is kept constant(Todorov amp Jordan 2002b)

Figure 1C shows simulation results from our second model where the po-sition of a stationary peripheral target is estimated by the optimal adaptivefilter in equation 53 operating under internal estimation noise or additiveobservation noise of the same magnitude In each case we show results forthree sets of targets with varying average eccentricity The standard devi-ations of the estimation error always reach an asymptote (much faster inthe case of internal noise) In the presence of internal noise this asymptotedepends on target eccentricity for the chosen model parameters the depen-dence is in quantitative agreement with our experimental results (Todorov1998) Under additive noise the error always asymptotes to 0

1102 E Todorov

Figure 2 Relative change in expected cost as a function of iteration numberin (A) psychophysical models and (B) random models (C) Relative variabil-ity (SDmean) among expected costs obtained from 100 different runs of thealgorithm on the same model (average over models in each class)

7 Convergence Properties

We studied the convergence properties of the algorithm in 10 models of psy-chophysical experiments taken from Todorov and Jordan (2002b) and 200randomly generated models The psychophysical models had dynamics andcost functions similar to the above example They included two models ofplanar reaching three models of passing through sequences of targets onemodel of isometric force production three models of tracking and reach-ing with a mechanically redundant arm and one model of throwing Thedimensionalities of the state control and feedback were between 5 and20 and the horizon n was about 100 The psychophysical models includedcontrol-dependent dynamics noise and additive observation noise but nointernal or state-dependent noise The details of all these models are inter-esting from a motor control point of view but we omit them here since theydid not affect the convergence of the algorithm in any systematic way

The random models were divided into two groups of 100 each passivelystable with all eigenvalues of A being smaller than 1 and passively unsta-ble with the largest eigenvalue of A being between 1 and 2 The dynamicswere restricted so that the last component of xt was 1mdashto make the randommodels more similar to the psychophysical models which always incorpo-rated a constant in the state description The state control and measurementdimensionalities were sampled uniformly between 5 and 20 The randommodels included all forms of noise allowed by equation 31

For each model we initialized K1nminus1 from equation 32 and applied ouriterative algorithm In all cases convergence was very rapid (see Figures 2Aand 2B) with the relative change in expected cost decreasing exponentiallyThe jitter observed at the end of the minimization (see Figure 2A) is due tonumerical round-off errors (note the log scale) and continues indefinitelyThe exponential convergence regime does not always start from the firstiteration (see Figure 2A) Similar behavior was observed for the absolute

Methods for Optimal Sensorimotor Control 1103

change in expected cost (not shown) As one would expect random mod-els with unstable passive dynamics converged more slowly than passivelystable models Convergence was observed in all cases

To test for the existence of local minima we focused on five psychophys-ical five random stable and five random unstable models For each modelthe algorithm was initialized 100 times with different randomly chosensequences K1nminus1 and run for 100 iterations For each model we com-puted the standard deviation of the expected cost obtained at each iterationand divided by the mean expected cost at that iteration The results aver-aged within each model class are plotted in Figure 2C The negligibly smallvalues after convergence indicate that the algorithm always finds the samesolution This was true for every model we studied despite the fact thatthe random initialization sometimes produced very large initial costs Wealso examined the K and L sequences found at the end of each run and thedifferences seemed to be due to round-off errors Thus we conjecture thatthe algorithm always converges to the globally optimal solution So far wehave not been able to prove this analytically and cannot offer a satisfyingintuitive explanation at this time

Note that the system can be unstable even for the optimal controllerFormally that does not affect the derivation because in a discrete-time finite-horizon system all numbers remain finite In practice the components of xt

can exceed the maximum floating-point number whenever the eigenvaluesof (Aminus BLt) are sufficiently large In the applications we are interestedin (Todorov 1998 Todorov amp Jordan 2002b) such problems were neverencountered

8 Improving Performance via Adaptive Estimation

Although the iterative algorithm given by equations 42 and 52 is guar-anteed to converge and empirically it appears to converge to the globallyoptimal solution performance can still be suboptimal due to the imposedrestriction to nonadaptive filters Here we present simulations aimed atquantifying this suboptimality

Because the potential suboptimality arises from the restriction to non-adaptive filters it is natural to ask what would happen if that restriction wereremoved in run time and the optimal adaptive linear filter from equation 53were used instead Recall that although the control law is optimized underthe assumption of a nonadaptive filter it yields better performance if a dif-ferent filter which somehow achieves lower estimation error is used in runtime Thus in our first test we simply replace the nonadaptive filter withequation 53 in run time and compute the reduction in expected total cost

The above discussion suggests a possibility for further improvement Thecontrol law is optimal with respect to some sequence of filter gains K1nminus1But the adaptive filter applied in run time uses systematically differentgains because it achieves systematically lower estimation error We can run

1104 E Todorov

Table 2 Cost Reduction

Model

Method Psychophysical Random Stable Random Unstable

Adaptive Estimator 19 0 314 Reoptimized Controller 17 0 283

Notes Numbers indicate percent reduction in expected total cost relative to thecost of the solution found by our iterative algorithm The two improvement meth-ods are described in the text Each method is applied to 10 models in each modelclass For each model and method expected total cost is computed from 10000simulation runs A value of 0 indicates that with a sample size of 10 models theimprovement was not significantly different from 0 (t-test p = 005 threshold)

our control law in conjunction with the adaptive filter and find the averagefilter gains K1nminus1 that are used online Now one would think that if wereoptimized the control law for the nonadaptive filter K1nminus1 which betterreflects the gains being used online by the adaptive filter this will furtherimprove performance This is the second test we apply

As Table 2 shows neither method improves performance substantiallyfor psychophysical models or random stable models However both meth-ods result in substantial improvement for random unstable models This isnot surprising In the passively stable models the differences between theexpected and actual values of the states and controls are relatively smalland so the optimal nonadaptive filter is not that different from the optimaladaptive filter The unstable models on the other hand are very sensitive tosmall perturbations and thus follow substantially different state-control tra-jectories in different simulation runs So the advantage of adaptive filteringis much greater Since musculoskeletal plants have stable passive dynamicswe conclude that our algorithm is well suited for approximating the optimalsensorimotor system

It is interesting that control law reoptimization in addition to adaptivefiltering is actually worse than adaptive filtering alonemdashcontrary to our in-tuition This was the case for every model we studied Although it is notclear where the problem with the reoptimization method lies this some-what unexpected result provides further justification for the restriction weintroduced In particular it suggests that the control law that is optimalunder the best nonadaptive filter may be close to optimal under the bestadaptive filter

9 Discussion

We have presented an algorithm for stochastic optimal control and estima-tion of partially observable linear dynamical systems subject to quadraticcosts and noise processes characteristic of the sensorimotor system (see

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 17: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

1100 E Todorov

62 Results Reaching movements are known to have stereotyped bell-shaped speed profiles (Flash amp Hogan 1985) Models of this phenomenonhave traditionally been formulated in terms of deterministic open-loop min-imization of some cost function Cost functions that penalize physicallymeaningful quantities (such as duration or energy consumption) did notagree with empirical data (Nelson 1983) in order to obtain realistic speedprofiles it appeared necessary to minimize a smoothness-related cost thatpenalizes the derivative of acceleration (Flash amp Hogan 1985) or torque(Uno et al 1989) Smoothness-related cost functions have also been usedin the context of stochastic optimal feedback control (Hoff 1992) to obtainbell-shaped speed profiles It was recently shown however that smoothnessdoes not have to be explicitly enforced by the cost function open-loop min-imization of end-point error was found sufficient to produce realistic trajec-tories provided that the multiplicative nature of motor noise is taken intoaccount (Harris amp Wolpert 1998) While this is an important step toward amore principled optimization model of trajectory smoothness it still con-tains an ad hoc element the optimization is performed in an open loopwhich is suboptimal especially for movements of longer duration Ourmodel differs from Harris and Wolpert (1998) in that not only the averagesequence of control signals is optimal but the feedback gains that determinethe online sensory-guided adjustments are also optimal Optimal feedbackcontrol of reaching has been studied by Meyer Abrams Kornblum Wrightand Smith (1988) in an intermittent setting and Hoff (1992) in a continu-ous setting However both of these models assume full state observationOurs is the first optimal control model of reaching that incorporates sensorynoise and combines state estimation and feedback control into an optimalsensorimotor loop The predicted movement kinematics shown in Figure 1Aclosely resemble observed movement trajectories (Flash amp Hogan 1985)

Another well-known property of reaching movements first observed acentury ago by Woodworth and later quantified as Fittsrsquo law is the trade-offbetween speed and accuracy The fact that faster movements are less ac-curate implies that the instantaneous noise in the motor system is control-dependent in agreement with direct measurements of isometric force fluc-tuations (Sutton and Sykes 1967 Schmidt et al 1979 Todorov 2002) thatshow standard deviation increasing linearly with the mean Naturally thisnoise scaling has formed the basis of both closed-loop (Meyer et al 1988Hoff 1992) and open-loop (Harris amp Wolpert 1998) optimization models ofthe speed-accuracy trade-off Figure 1B illustrates the effect in our model asthe (specified) movement duration increases the standard deviation of theend-point error achieved by the optimal controller decreases To emphasizethe need for incorporating control-dependent noise we modified the modelby making the noise in the plant dynamics additive with fixed magnitudechosen to match the average multiplicative noise magnitude over the rangeof movement durations With that change the end-point error showed theopposite trend to the one observed experimentally (see Figure 1B)

Methods for Optimal Sensorimotor Control 1101

Figure 1 (A) Normalized position (Pos) velocity (Vel) and acceleration (Acc) ofthe average trajectory of the optimal controller (B) A separate optimal controllerwas constructed for each instructed duration the resulting closed-loop systemwas simulated for 10000 trials and the positional standard deviation at the endof the trial was plotted This was done with either multiplicative (solid line)or additive (dashed line) noise in the plant dynamics (C) The position of astationary peripheral target was estimated over time under internal estimationnoise (solid line) or additive observation noise (dashed line) This was done inthree sets of trials with target positions sampled from gaussians with means5 cm (bottom) 15 cm (middle) and 25 cm (top) Each curve is an average over10000 simulation runs

It is interesting to compare the effects of the control penalty r and the mul-tiplicative noise scaling σc As equation 42 shows both terms penalize largecontrol signalsmdashdirectly in the case of r and indirectly (via increased un-certainty) in the case of σc Consequently both terms lead to a negative biasin end-point position (not shown) but the effect is much more pronouncedfor r Another consequence of the fact that larger controls are more costlyarises in the control of redundant systems where the optimal strategy is tofollow a minimal intervention principle that is to leave task-irrelevant de-viations from the average behavior uncorrected (Todorov amp Jordan 2002a2002b) Simulations have shown that this more complex effect is dependenton σc and actually decreases when r is increased while σc is kept constant(Todorov amp Jordan 2002b)

Figure 1C shows simulation results from our second model where the po-sition of a stationary peripheral target is estimated by the optimal adaptivefilter in equation 53 operating under internal estimation noise or additiveobservation noise of the same magnitude In each case we show results forthree sets of targets with varying average eccentricity The standard devi-ations of the estimation error always reach an asymptote (much faster inthe case of internal noise) In the presence of internal noise this asymptotedepends on target eccentricity for the chosen model parameters the depen-dence is in quantitative agreement with our experimental results (Todorov1998) Under additive noise the error always asymptotes to 0

1102 E Todorov

Figure 2 Relative change in expected cost as a function of iteration numberin (A) psychophysical models and (B) random models (C) Relative variabil-ity (SDmean) among expected costs obtained from 100 different runs of thealgorithm on the same model (average over models in each class)

7 Convergence Properties

We studied the convergence properties of the algorithm in 10 models of psy-chophysical experiments taken from Todorov and Jordan (2002b) and 200randomly generated models The psychophysical models had dynamics andcost functions similar to the above example They included two models ofplanar reaching three models of passing through sequences of targets onemodel of isometric force production three models of tracking and reach-ing with a mechanically redundant arm and one model of throwing Thedimensionalities of the state control and feedback were between 5 and20 and the horizon n was about 100 The psychophysical models includedcontrol-dependent dynamics noise and additive observation noise but nointernal or state-dependent noise The details of all these models are inter-esting from a motor control point of view but we omit them here since theydid not affect the convergence of the algorithm in any systematic way

The random models were divided into two groups of 100 each passivelystable with all eigenvalues of A being smaller than 1 and passively unsta-ble with the largest eigenvalue of A being between 1 and 2 The dynamicswere restricted so that the last component of xt was 1mdashto make the randommodels more similar to the psychophysical models which always incorpo-rated a constant in the state description The state control and measurementdimensionalities were sampled uniformly between 5 and 20 The randommodels included all forms of noise allowed by equation 31

For each model we initialized K1nminus1 from equation 32 and applied ouriterative algorithm In all cases convergence was very rapid (see Figures 2Aand 2B) with the relative change in expected cost decreasing exponentiallyThe jitter observed at the end of the minimization (see Figure 2A) is due tonumerical round-off errors (note the log scale) and continues indefinitelyThe exponential convergence regime does not always start from the firstiteration (see Figure 2A) Similar behavior was observed for the absolute

Methods for Optimal Sensorimotor Control 1103

change in expected cost (not shown) As one would expect random mod-els with unstable passive dynamics converged more slowly than passivelystable models Convergence was observed in all cases

To test for the existence of local minima we focused on five psychophys-ical five random stable and five random unstable models For each modelthe algorithm was initialized 100 times with different randomly chosensequences K1nminus1 and run for 100 iterations For each model we com-puted the standard deviation of the expected cost obtained at each iterationand divided by the mean expected cost at that iteration The results aver-aged within each model class are plotted in Figure 2C The negligibly smallvalues after convergence indicate that the algorithm always finds the samesolution This was true for every model we studied despite the fact thatthe random initialization sometimes produced very large initial costs Wealso examined the K and L sequences found at the end of each run and thedifferences seemed to be due to round-off errors Thus we conjecture thatthe algorithm always converges to the globally optimal solution So far wehave not been able to prove this analytically and cannot offer a satisfyingintuitive explanation at this time

Note that the system can be unstable even for the optimal controllerFormally that does not affect the derivation because in a discrete-time finite-horizon system all numbers remain finite In practice the components of xt

can exceed the maximum floating-point number whenever the eigenvaluesof (Aminus BLt) are sufficiently large In the applications we are interestedin (Todorov 1998 Todorov amp Jordan 2002b) such problems were neverencountered

8 Improving Performance via Adaptive Estimation

Although the iterative algorithm given by equations 42 and 52 is guar-anteed to converge and empirically it appears to converge to the globallyoptimal solution performance can still be suboptimal due to the imposedrestriction to nonadaptive filters Here we present simulations aimed atquantifying this suboptimality

Because the potential suboptimality arises from the restriction to non-adaptive filters it is natural to ask what would happen if that restriction wereremoved in run time and the optimal adaptive linear filter from equation 53were used instead Recall that although the control law is optimized underthe assumption of a nonadaptive filter it yields better performance if a dif-ferent filter which somehow achieves lower estimation error is used in runtime Thus in our first test we simply replace the nonadaptive filter withequation 53 in run time and compute the reduction in expected total cost

The above discussion suggests a possibility for further improvement Thecontrol law is optimal with respect to some sequence of filter gains K1nminus1But the adaptive filter applied in run time uses systematically differentgains because it achieves systematically lower estimation error We can run

1104 E Todorov

Table 2 Cost Reduction

Model

Method Psychophysical Random Stable Random Unstable

Adaptive Estimator 19 0 314 Reoptimized Controller 17 0 283

Notes Numbers indicate percent reduction in expected total cost relative to thecost of the solution found by our iterative algorithm The two improvement meth-ods are described in the text Each method is applied to 10 models in each modelclass For each model and method expected total cost is computed from 10000simulation runs A value of 0 indicates that with a sample size of 10 models theimprovement was not significantly different from 0 (t-test p = 005 threshold)

our control law in conjunction with the adaptive filter and find the averagefilter gains K1nminus1 that are used online Now one would think that if wereoptimized the control law for the nonadaptive filter K1nminus1 which betterreflects the gains being used online by the adaptive filter this will furtherimprove performance This is the second test we apply

As Table 2 shows neither method improves performance substantiallyfor psychophysical models or random stable models However both meth-ods result in substantial improvement for random unstable models This isnot surprising In the passively stable models the differences between theexpected and actual values of the states and controls are relatively smalland so the optimal nonadaptive filter is not that different from the optimaladaptive filter The unstable models on the other hand are very sensitive tosmall perturbations and thus follow substantially different state-control tra-jectories in different simulation runs So the advantage of adaptive filteringis much greater Since musculoskeletal plants have stable passive dynamicswe conclude that our algorithm is well suited for approximating the optimalsensorimotor system

It is interesting that control law reoptimization in addition to adaptivefiltering is actually worse than adaptive filtering alonemdashcontrary to our in-tuition This was the case for every model we studied Although it is notclear where the problem with the reoptimization method lies this some-what unexpected result provides further justification for the restriction weintroduced In particular it suggests that the control law that is optimalunder the best nonadaptive filter may be close to optimal under the bestadaptive filter

9 Discussion

We have presented an algorithm for stochastic optimal control and estima-tion of partially observable linear dynamical systems subject to quadraticcosts and noise processes characteristic of the sensorimotor system (see

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 18: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

Methods for Optimal Sensorimotor Control 1101

Figure 1 (A) Normalized position (Pos) velocity (Vel) and acceleration (Acc) ofthe average trajectory of the optimal controller (B) A separate optimal controllerwas constructed for each instructed duration the resulting closed-loop systemwas simulated for 10000 trials and the positional standard deviation at the endof the trial was plotted This was done with either multiplicative (solid line)or additive (dashed line) noise in the plant dynamics (C) The position of astationary peripheral target was estimated over time under internal estimationnoise (solid line) or additive observation noise (dashed line) This was done inthree sets of trials with target positions sampled from gaussians with means5 cm (bottom) 15 cm (middle) and 25 cm (top) Each curve is an average over10000 simulation runs

It is interesting to compare the effects of the control penalty r and the mul-tiplicative noise scaling σc As equation 42 shows both terms penalize largecontrol signalsmdashdirectly in the case of r and indirectly (via increased un-certainty) in the case of σc Consequently both terms lead to a negative biasin end-point position (not shown) but the effect is much more pronouncedfor r Another consequence of the fact that larger controls are more costlyarises in the control of redundant systems where the optimal strategy is tofollow a minimal intervention principle that is to leave task-irrelevant de-viations from the average behavior uncorrected (Todorov amp Jordan 2002a2002b) Simulations have shown that this more complex effect is dependenton σc and actually decreases when r is increased while σc is kept constant(Todorov amp Jordan 2002b)

Figure 1C shows simulation results from our second model where the po-sition of a stationary peripheral target is estimated by the optimal adaptivefilter in equation 53 operating under internal estimation noise or additiveobservation noise of the same magnitude In each case we show results forthree sets of targets with varying average eccentricity The standard devi-ations of the estimation error always reach an asymptote (much faster inthe case of internal noise) In the presence of internal noise this asymptotedepends on target eccentricity for the chosen model parameters the depen-dence is in quantitative agreement with our experimental results (Todorov1998) Under additive noise the error always asymptotes to 0

1102 E Todorov

Figure 2 Relative change in expected cost as a function of iteration numberin (A) psychophysical models and (B) random models (C) Relative variabil-ity (SDmean) among expected costs obtained from 100 different runs of thealgorithm on the same model (average over models in each class)

7 Convergence Properties

We studied the convergence properties of the algorithm in 10 models of psy-chophysical experiments taken from Todorov and Jordan (2002b) and 200randomly generated models The psychophysical models had dynamics andcost functions similar to the above example They included two models ofplanar reaching three models of passing through sequences of targets onemodel of isometric force production three models of tracking and reach-ing with a mechanically redundant arm and one model of throwing Thedimensionalities of the state control and feedback were between 5 and20 and the horizon n was about 100 The psychophysical models includedcontrol-dependent dynamics noise and additive observation noise but nointernal or state-dependent noise The details of all these models are inter-esting from a motor control point of view but we omit them here since theydid not affect the convergence of the algorithm in any systematic way

The random models were divided into two groups of 100 each passivelystable with all eigenvalues of A being smaller than 1 and passively unsta-ble with the largest eigenvalue of A being between 1 and 2 The dynamicswere restricted so that the last component of xt was 1mdashto make the randommodels more similar to the psychophysical models which always incorpo-rated a constant in the state description The state control and measurementdimensionalities were sampled uniformly between 5 and 20 The randommodels included all forms of noise allowed by equation 31

For each model we initialized K1nminus1 from equation 32 and applied ouriterative algorithm In all cases convergence was very rapid (see Figures 2Aand 2B) with the relative change in expected cost decreasing exponentiallyThe jitter observed at the end of the minimization (see Figure 2A) is due tonumerical round-off errors (note the log scale) and continues indefinitelyThe exponential convergence regime does not always start from the firstiteration (see Figure 2A) Similar behavior was observed for the absolute

Methods for Optimal Sensorimotor Control 1103

change in expected cost (not shown) As one would expect random mod-els with unstable passive dynamics converged more slowly than passivelystable models Convergence was observed in all cases

To test for the existence of local minima we focused on five psychophys-ical five random stable and five random unstable models For each modelthe algorithm was initialized 100 times with different randomly chosensequences K1nminus1 and run for 100 iterations For each model we com-puted the standard deviation of the expected cost obtained at each iterationand divided by the mean expected cost at that iteration The results aver-aged within each model class are plotted in Figure 2C The negligibly smallvalues after convergence indicate that the algorithm always finds the samesolution This was true for every model we studied despite the fact thatthe random initialization sometimes produced very large initial costs Wealso examined the K and L sequences found at the end of each run and thedifferences seemed to be due to round-off errors Thus we conjecture thatthe algorithm always converges to the globally optimal solution So far wehave not been able to prove this analytically and cannot offer a satisfyingintuitive explanation at this time

Note that the system can be unstable even for the optimal controllerFormally that does not affect the derivation because in a discrete-time finite-horizon system all numbers remain finite In practice the components of xt

can exceed the maximum floating-point number whenever the eigenvaluesof (Aminus BLt) are sufficiently large In the applications we are interestedin (Todorov 1998 Todorov amp Jordan 2002b) such problems were neverencountered

8 Improving Performance via Adaptive Estimation

Although the iterative algorithm given by equations 42 and 52 is guar-anteed to converge and empirically it appears to converge to the globallyoptimal solution performance can still be suboptimal due to the imposedrestriction to nonadaptive filters Here we present simulations aimed atquantifying this suboptimality

Because the potential suboptimality arises from the restriction to non-adaptive filters it is natural to ask what would happen if that restriction wereremoved in run time and the optimal adaptive linear filter from equation 53were used instead Recall that although the control law is optimized underthe assumption of a nonadaptive filter it yields better performance if a dif-ferent filter which somehow achieves lower estimation error is used in runtime Thus in our first test we simply replace the nonadaptive filter withequation 53 in run time and compute the reduction in expected total cost

The above discussion suggests a possibility for further improvement Thecontrol law is optimal with respect to some sequence of filter gains K1nminus1But the adaptive filter applied in run time uses systematically differentgains because it achieves systematically lower estimation error We can run

1104 E Todorov

Table 2 Cost Reduction

Model

Method Psychophysical Random Stable Random Unstable

Adaptive Estimator 19 0 314 Reoptimized Controller 17 0 283

Notes Numbers indicate percent reduction in expected total cost relative to thecost of the solution found by our iterative algorithm The two improvement meth-ods are described in the text Each method is applied to 10 models in each modelclass For each model and method expected total cost is computed from 10000simulation runs A value of 0 indicates that with a sample size of 10 models theimprovement was not significantly different from 0 (t-test p = 005 threshold)

our control law in conjunction with the adaptive filter and find the averagefilter gains K1nminus1 that are used online Now one would think that if wereoptimized the control law for the nonadaptive filter K1nminus1 which betterreflects the gains being used online by the adaptive filter this will furtherimprove performance This is the second test we apply

As Table 2 shows neither method improves performance substantiallyfor psychophysical models or random stable models However both meth-ods result in substantial improvement for random unstable models This isnot surprising In the passively stable models the differences between theexpected and actual values of the states and controls are relatively smalland so the optimal nonadaptive filter is not that different from the optimaladaptive filter The unstable models on the other hand are very sensitive tosmall perturbations and thus follow substantially different state-control tra-jectories in different simulation runs So the advantage of adaptive filteringis much greater Since musculoskeletal plants have stable passive dynamicswe conclude that our algorithm is well suited for approximating the optimalsensorimotor system

It is interesting that control law reoptimization in addition to adaptivefiltering is actually worse than adaptive filtering alonemdashcontrary to our in-tuition This was the case for every model we studied Although it is notclear where the problem with the reoptimization method lies this some-what unexpected result provides further justification for the restriction weintroduced In particular it suggests that the control law that is optimalunder the best nonadaptive filter may be close to optimal under the bestadaptive filter

9 Discussion

We have presented an algorithm for stochastic optimal control and estima-tion of partially observable linear dynamical systems subject to quadraticcosts and noise processes characteristic of the sensorimotor system (see

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 19: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

1102 E Todorov

Figure 2 Relative change in expected cost as a function of iteration numberin (A) psychophysical models and (B) random models (C) Relative variabil-ity (SDmean) among expected costs obtained from 100 different runs of thealgorithm on the same model (average over models in each class)

7 Convergence Properties

We studied the convergence properties of the algorithm in 10 models of psy-chophysical experiments taken from Todorov and Jordan (2002b) and 200randomly generated models The psychophysical models had dynamics andcost functions similar to the above example They included two models ofplanar reaching three models of passing through sequences of targets onemodel of isometric force production three models of tracking and reach-ing with a mechanically redundant arm and one model of throwing Thedimensionalities of the state control and feedback were between 5 and20 and the horizon n was about 100 The psychophysical models includedcontrol-dependent dynamics noise and additive observation noise but nointernal or state-dependent noise The details of all these models are inter-esting from a motor control point of view but we omit them here since theydid not affect the convergence of the algorithm in any systematic way

The random models were divided into two groups of 100 each passivelystable with all eigenvalues of A being smaller than 1 and passively unsta-ble with the largest eigenvalue of A being between 1 and 2 The dynamicswere restricted so that the last component of xt was 1mdashto make the randommodels more similar to the psychophysical models which always incorpo-rated a constant in the state description The state control and measurementdimensionalities were sampled uniformly between 5 and 20 The randommodels included all forms of noise allowed by equation 31

For each model we initialized K1nminus1 from equation 32 and applied ouriterative algorithm In all cases convergence was very rapid (see Figures 2Aand 2B) with the relative change in expected cost decreasing exponentiallyThe jitter observed at the end of the minimization (see Figure 2A) is due tonumerical round-off errors (note the log scale) and continues indefinitelyThe exponential convergence regime does not always start from the firstiteration (see Figure 2A) Similar behavior was observed for the absolute

Methods for Optimal Sensorimotor Control 1103

change in expected cost (not shown) As one would expect random mod-els with unstable passive dynamics converged more slowly than passivelystable models Convergence was observed in all cases

To test for the existence of local minima we focused on five psychophys-ical five random stable and five random unstable models For each modelthe algorithm was initialized 100 times with different randomly chosensequences K1nminus1 and run for 100 iterations For each model we com-puted the standard deviation of the expected cost obtained at each iterationand divided by the mean expected cost at that iteration The results aver-aged within each model class are plotted in Figure 2C The negligibly smallvalues after convergence indicate that the algorithm always finds the samesolution This was true for every model we studied despite the fact thatthe random initialization sometimes produced very large initial costs Wealso examined the K and L sequences found at the end of each run and thedifferences seemed to be due to round-off errors Thus we conjecture thatthe algorithm always converges to the globally optimal solution So far wehave not been able to prove this analytically and cannot offer a satisfyingintuitive explanation at this time

Note that the system can be unstable even for the optimal controllerFormally that does not affect the derivation because in a discrete-time finite-horizon system all numbers remain finite In practice the components of xt

can exceed the maximum floating-point number whenever the eigenvaluesof (Aminus BLt) are sufficiently large In the applications we are interestedin (Todorov 1998 Todorov amp Jordan 2002b) such problems were neverencountered

8 Improving Performance via Adaptive Estimation

Although the iterative algorithm given by equations 42 and 52 is guar-anteed to converge and empirically it appears to converge to the globallyoptimal solution performance can still be suboptimal due to the imposedrestriction to nonadaptive filters Here we present simulations aimed atquantifying this suboptimality

Because the potential suboptimality arises from the restriction to non-adaptive filters it is natural to ask what would happen if that restriction wereremoved in run time and the optimal adaptive linear filter from equation 53were used instead Recall that although the control law is optimized underthe assumption of a nonadaptive filter it yields better performance if a dif-ferent filter which somehow achieves lower estimation error is used in runtime Thus in our first test we simply replace the nonadaptive filter withequation 53 in run time and compute the reduction in expected total cost

The above discussion suggests a possibility for further improvement Thecontrol law is optimal with respect to some sequence of filter gains K1nminus1But the adaptive filter applied in run time uses systematically differentgains because it achieves systematically lower estimation error We can run

1104 E Todorov

Table 2 Cost Reduction

Model

Method Psychophysical Random Stable Random Unstable

Adaptive Estimator 19 0 314 Reoptimized Controller 17 0 283

Notes Numbers indicate percent reduction in expected total cost relative to thecost of the solution found by our iterative algorithm The two improvement meth-ods are described in the text Each method is applied to 10 models in each modelclass For each model and method expected total cost is computed from 10000simulation runs A value of 0 indicates that with a sample size of 10 models theimprovement was not significantly different from 0 (t-test p = 005 threshold)

our control law in conjunction with the adaptive filter and find the averagefilter gains K1nminus1 that are used online Now one would think that if wereoptimized the control law for the nonadaptive filter K1nminus1 which betterreflects the gains being used online by the adaptive filter this will furtherimprove performance This is the second test we apply

As Table 2 shows neither method improves performance substantiallyfor psychophysical models or random stable models However both meth-ods result in substantial improvement for random unstable models This isnot surprising In the passively stable models the differences between theexpected and actual values of the states and controls are relatively smalland so the optimal nonadaptive filter is not that different from the optimaladaptive filter The unstable models on the other hand are very sensitive tosmall perturbations and thus follow substantially different state-control tra-jectories in different simulation runs So the advantage of adaptive filteringis much greater Since musculoskeletal plants have stable passive dynamicswe conclude that our algorithm is well suited for approximating the optimalsensorimotor system

It is interesting that control law reoptimization in addition to adaptivefiltering is actually worse than adaptive filtering alonemdashcontrary to our in-tuition This was the case for every model we studied Although it is notclear where the problem with the reoptimization method lies this some-what unexpected result provides further justification for the restriction weintroduced In particular it suggests that the control law that is optimalunder the best nonadaptive filter may be close to optimal under the bestadaptive filter

9 Discussion

We have presented an algorithm for stochastic optimal control and estima-tion of partially observable linear dynamical systems subject to quadraticcosts and noise processes characteristic of the sensorimotor system (see

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 20: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

Methods for Optimal Sensorimotor Control 1103

change in expected cost (not shown) As one would expect random mod-els with unstable passive dynamics converged more slowly than passivelystable models Convergence was observed in all cases

To test for the existence of local minima we focused on five psychophys-ical five random stable and five random unstable models For each modelthe algorithm was initialized 100 times with different randomly chosensequences K1nminus1 and run for 100 iterations For each model we com-puted the standard deviation of the expected cost obtained at each iterationand divided by the mean expected cost at that iteration The results aver-aged within each model class are plotted in Figure 2C The negligibly smallvalues after convergence indicate that the algorithm always finds the samesolution This was true for every model we studied despite the fact thatthe random initialization sometimes produced very large initial costs Wealso examined the K and L sequences found at the end of each run and thedifferences seemed to be due to round-off errors Thus we conjecture thatthe algorithm always converges to the globally optimal solution So far wehave not been able to prove this analytically and cannot offer a satisfyingintuitive explanation at this time

Note that the system can be unstable even for the optimal controllerFormally that does not affect the derivation because in a discrete-time finite-horizon system all numbers remain finite In practice the components of xt

can exceed the maximum floating-point number whenever the eigenvaluesof (Aminus BLt) are sufficiently large In the applications we are interestedin (Todorov 1998 Todorov amp Jordan 2002b) such problems were neverencountered

8 Improving Performance via Adaptive Estimation

Although the iterative algorithm given by equations 42 and 52 is guar-anteed to converge and empirically it appears to converge to the globallyoptimal solution performance can still be suboptimal due to the imposedrestriction to nonadaptive filters Here we present simulations aimed atquantifying this suboptimality

Because the potential suboptimality arises from the restriction to non-adaptive filters it is natural to ask what would happen if that restriction wereremoved in run time and the optimal adaptive linear filter from equation 53were used instead Recall that although the control law is optimized underthe assumption of a nonadaptive filter it yields better performance if a dif-ferent filter which somehow achieves lower estimation error is used in runtime Thus in our first test we simply replace the nonadaptive filter withequation 53 in run time and compute the reduction in expected total cost

The above discussion suggests a possibility for further improvement Thecontrol law is optimal with respect to some sequence of filter gains K1nminus1But the adaptive filter applied in run time uses systematically differentgains because it achieves systematically lower estimation error We can run

1104 E Todorov

Table 2 Cost Reduction

Model

Method Psychophysical Random Stable Random Unstable

Adaptive Estimator 19 0 314 Reoptimized Controller 17 0 283

Notes Numbers indicate percent reduction in expected total cost relative to thecost of the solution found by our iterative algorithm The two improvement meth-ods are described in the text Each method is applied to 10 models in each modelclass For each model and method expected total cost is computed from 10000simulation runs A value of 0 indicates that with a sample size of 10 models theimprovement was not significantly different from 0 (t-test p = 005 threshold)

our control law in conjunction with the adaptive filter and find the averagefilter gains K1nminus1 that are used online Now one would think that if wereoptimized the control law for the nonadaptive filter K1nminus1 which betterreflects the gains being used online by the adaptive filter this will furtherimprove performance This is the second test we apply

As Table 2 shows neither method improves performance substantiallyfor psychophysical models or random stable models However both meth-ods result in substantial improvement for random unstable models This isnot surprising In the passively stable models the differences between theexpected and actual values of the states and controls are relatively smalland so the optimal nonadaptive filter is not that different from the optimaladaptive filter The unstable models on the other hand are very sensitive tosmall perturbations and thus follow substantially different state-control tra-jectories in different simulation runs So the advantage of adaptive filteringis much greater Since musculoskeletal plants have stable passive dynamicswe conclude that our algorithm is well suited for approximating the optimalsensorimotor system

It is interesting that control law reoptimization in addition to adaptivefiltering is actually worse than adaptive filtering alonemdashcontrary to our in-tuition This was the case for every model we studied Although it is notclear where the problem with the reoptimization method lies this some-what unexpected result provides further justification for the restriction weintroduced In particular it suggests that the control law that is optimalunder the best nonadaptive filter may be close to optimal under the bestadaptive filter

9 Discussion

We have presented an algorithm for stochastic optimal control and estima-tion of partially observable linear dynamical systems subject to quadraticcosts and noise processes characteristic of the sensorimotor system (see

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 21: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

1104 E Todorov

Table 2 Cost Reduction

Model

Method Psychophysical Random Stable Random Unstable

Adaptive Estimator 19 0 314 Reoptimized Controller 17 0 283

Notes Numbers indicate percent reduction in expected total cost relative to thecost of the solution found by our iterative algorithm The two improvement meth-ods are described in the text Each method is applied to 10 models in each modelclass For each model and method expected total cost is computed from 10000simulation runs A value of 0 indicates that with a sample size of 10 models theimprovement was not significantly different from 0 (t-test p = 005 threshold)

our control law in conjunction with the adaptive filter and find the averagefilter gains K1nminus1 that are used online Now one would think that if wereoptimized the control law for the nonadaptive filter K1nminus1 which betterreflects the gains being used online by the adaptive filter this will furtherimprove performance This is the second test we apply

As Table 2 shows neither method improves performance substantiallyfor psychophysical models or random stable models However both meth-ods result in substantial improvement for random unstable models This isnot surprising In the passively stable models the differences between theexpected and actual values of the states and controls are relatively smalland so the optimal nonadaptive filter is not that different from the optimaladaptive filter The unstable models on the other hand are very sensitive tosmall perturbations and thus follow substantially different state-control tra-jectories in different simulation runs So the advantage of adaptive filteringis much greater Since musculoskeletal plants have stable passive dynamicswe conclude that our algorithm is well suited for approximating the optimalsensorimotor system

It is interesting that control law reoptimization in addition to adaptivefiltering is actually worse than adaptive filtering alonemdashcontrary to our in-tuition This was the case for every model we studied Although it is notclear where the problem with the reoptimization method lies this some-what unexpected result provides further justification for the restriction weintroduced In particular it suggests that the control law that is optimalunder the best nonadaptive filter may be close to optimal under the bestadaptive filter

9 Discussion

We have presented an algorithm for stochastic optimal control and estima-tion of partially observable linear dynamical systems subject to quadraticcosts and noise processes characteristic of the sensorimotor system (see

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 22: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

Methods for Optimal Sensorimotor Control 1105

equation 31) We restricted our attention to controllers that use stateestimates obtained by nonadaptive linear filters The optimal control law forany such filter was shown to be linear as given by equation 42 The optimalnonadaptive linear filter for any linear control law is given by equation 52Iteration of equations 42 and 52 is guaranteed to converge to a filter and acontrol law optimal with respect to each other We found numerically thatconvergence is exponential local minima do not to exist and the effectsof assuming nonadaptive filtering are negligible for the control problemsof interest The application of the algorithm was illustrated in the context ofreaching movements The optimal adaptive filter equation 53 as well asthe optimal controller for the fully observable case equation 43 were alsoderived To facilitate the application of our algorithm in the field of motorcontrol and elsewhere we have made a Matlab implementation availableat wwwcogsciucsdedusimtodorov

While our work was motivated by models of biological movement theresults presented here could be of interest to a wider audience Problemswith multiplicative noise have been studied in the optimal control literaturebut most of that work has focused on the fully observable case (Kleinman1969 McLane 1971 Willems amp Willems 1976 Bensoussan 1992 El Ghaoui1995 Beghi amp DrsquoAlessandro 1998 Rami et al 2001) Our equation 43 is con-sistent with these results The partially observable case that we addressed(and that is most relevant to models of sensorimotor control) is much morecomplex because the independence of estimation and control breaks downin the presence of signal-dependent noise The work most similar to ours isPakshin (1978) for discrete-time dynamics and Phillis (1985) for continuous-time dynamics These authors addressed a closely related problem usinga different methodology Instead of analyzing the closed-loop system di-rectly the filter and control gains were treated as open-loop controls to amodified deterministic dynamical system whose cost function matches theexpected cost of the original system With that transformation it is pos-sible to use Pontryaginrsquos maximum principle which is applicable only todeterministic open-loop control and obtain necessary conditions that theoptimal filter and control gains must satisfy Although our results wereobtained independently we have been able to verify that they are consis-tent with Pakshin (1978) by removing from our model the internal estima-tion noise (which to our knowledge has not been studied before) combin-ing equations 42 and 52 and applying certain algebraic transformationsHowever our approach has three important advantages First we managedto prove that the optimal control law is linear under a nonadaptive filterwhile this linearity had to be assumed before Second using the optimalcost-to-go function to derive the optimal filter revealed that adaptive filter-ing improves performance even though the control law is optimized for anonadaptive filter And most important our approach yields a coordinate-descent algorithm with guaranteed convergence as well as appealing nu-merical properties illustrated in sections 7 and 8 Each of the two steps of our

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 23: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

1106 E Todorov

coordinate-descent algorithm is computed efficiently in a single passthrough time In contrast application of Pontryaginrsquos maximum principleyields a system of coupled difference (Pakshin 1978) or differential (Phillis1985) equations with boundary conditions at the initial and final time butno algorithm for solving that system In other words earlier approachesobscure the key property we uncovered that half of the problem can besolved efficiently given a solution to the other half

Finally there may be an efficient way to obtain a control law that achievesbetter performance under adaptive filtering Our attempt to do so throughreoptimization (see section 8) failed but another approach is possibleUsing the optimal adaptive filter (see equation 53) would make E [vt+1]a complex function of xt ut and the resulting vt would no longer be in theassumed parametric form (which is why we introduced the restriction tononadaptive filters) But we could force that complex vt in the desired formby approximating it with a quadratic in xt ut This yields additional terms inequation 42 We have pursued this idea in our earlier work (Todorov 1998)an independent but related method has been developed by Moore Zhouand Lim (1999) The problem with such approximations is that convergenceguarantees no longer seem possible While Moore et al did not illustratetheir method with numerical examples in our work we have found thatthe resulting algorithm is not always stable These difficulties convincedus to abandon the earlier idea in favor of the methodology presented hereNevertheless approximations that take adaptive filtering into account mayyield better control laws under certain conditions and deserve further in-vestigation Note however that the resulting control laws will have to beused in conjunction with an adaptive filter which is much less efficient interms of online computation

Acknowledgments

Thanks to Weiwei Li for comments on the manuscript This work was sup-ported by NIH grant R01-NS045915

References

Anderson F amp Pandy M (2001) Dynamic optimization of human walking JBiomech Eng 123(5) 381ndash390

Beghi A amp DrsquoAlessandro D (1998) Discrete-time optimal control with control-dependent noise and generalized Riccati difference equations Automatica 341031ndash1034

Bensoussan A (1992) Stochastic control of partially observable systems CambridgeCambridge University Press

Bertsekas D amp Tsitsiklis J (1997) Neuro-dynamic programming Belmont MAAthena Scientific

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 24: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

Methods for Optimal Sensorimotor Control 1107

Burbeck C amp Yap Y (1990) Two mechanisms for localization Evidence forseparation-dependent and separation-independent processing of position infor-mation Vision Research 30(5) 739ndash750

Chow C amp Jacobson D (1971) Studies of human locomotion via optimal program-ming Math Biosciences 10 239ndash306

Davis M amp Vinter R (1985) Stochastic modelling and control London Chapman andHall

El Ghaoui L (1995) State-feedback control of systems of multiplicative noise vialinear matrix inequalities Systems and Control Letters 24 223ndash228

Flash T amp Hogan N (1985) The coordination of arm movements An exper-imentally confirmed mathematical model Journal of Neuroscience 5(7) 1688ndash1703

Harris C amp Wolpert D (1998) Signal-dependent noise determines motor planningNature 394 780ndash784

Hatze H amp Buys J (1977) Energy-optimal controls in the mammalian neuromus-cular system Biol Cybern 27(1) 9ndash20

Hoff B (1992) A computational description of the organization of human reaching andprehension Unpublished doctoral dissertation University of Southern California

Jacobson D amp Mayne D (1970) Differential dynamic programming New YorkElsevier

Jones K Hamilton A amp Wolpert D (2002) Sources of signal-dependent noiseduring isometric force production Journal of Neurophysiology 88 1533ndash1544

Kleinman D (1969) Optimal stationary control of linear systems with control-dependent noise IEEE Transactions on Automatic Control AC-14(6) 673ndash677

Kording K amp Wolpert D (2004) The loss function of sensorimotor learning Pro-ceedings of the National Academy of Sciences 101 9839ndash9842

Kuo A (1995) An optimal control model for analyzing human postural balanceIEEE Transactions on Biomedical Engineering 42 87ndash101

Kushner H amp Dupuis P (2001) Numerical methods for stochastic optimal control prob-lems in continuous time (2nd ed) New York Springer

Li W amp Todorov E (2004) Iterative linear-quadratic regulator design for nonlinearbiological movement systems In First International Conference on Informatics inControl Automation and Robotics vol 1 222ndash229 NP INSTICC Press

Loeb G Levine W amp He J (1990) Understanding sensorimotor feedback throughoptimal control Cold Spring Harb Symp Quant Biol 55 791ndash803

McLane P (1971) Optimal stochastic control of linear systems with state- andcontrol-dependent disturbances IEEE Transactions on Automatic Control AC-16(6)793ndash798

Meyer D Abrams R Kornblum S Wright C amp Smith J (1988) Optimality inhuman motor performance Ideal control of rapid aimed movements PsychologicalReview 95 340ndash370

Moore J Zhou X amp Lim A (1999) Discrete time LQG controls with control de-pendent noise Systems and Control Letters 36 199ndash206

Nelson W (1983) Physical principles for economies of skilled movements BiologicalCybernetics 46 135ndash147

Pakshin P (1978) State estimation and control synthesis for discrete linear systemswith additive and multiplicative noise Avtomatika i Telemetrika 4 75ndash85

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004

Page 25: Stochastic Optimal Control and Estimation Methods Adapted to …e.guigon.free.fr/rsc/article/Todorov05.pdf · 2011-10-17 · ods for optimal feedback control of stochastic, partially

1108 E Todorov

Phillis Y (1985) Controller design of systems with multiplicative noise IEEE Trans-actions on Automatic Control AC-30(10) 1017ndash1019

Rami M Chen X amp Moore J (2001) Solvability and asymptotic behavior of gener-alized Riccati equations arising in indefinite stochastic LQ problems IEEE Trans-actions on Automatic Control 46(3) 428ndash440

Schmidt R Zelaznik H Hawkins B Frank J amp Quinn J (1979) Motor-outputvariability A theory for the accuracy of rapid motor acts Psychol Rev 86(5)415ndash451

Sutton G amp Sykes K (1967) The variation of hand tremor with force in healthysubjects Journal of Physiology 191(3) 699ndash711

Sutton R amp Barto A (1998) Reinforcement learning An introduction CambridgeMA MIT Press

Todorov E (1998) Studies of goal-directed movements Unpublished doctoral disserta-tion Massachusetts Institute of Technology

Todorov E (2002) Cosine tuning minimizes motor errors Neural Computation 14(6)1233ndash1260

Todorov E (2004) Optimality principles in sensorimotor control Nature Neuroscience7(9) 907ndash915

Todorov E amp Jordan M (2002a) A minimal intervention principle for coordinatedmovement In S Becker S Thrun amp K Obermayer (Eds) Advances in neuralinformation processing systems 15 (pp 27ndash34) Cambridge MA MIT Press

Todorov E amp Jordan M (2002b) Optimal feedback control as a theory of motorcoordination Nature Neuroscience 5(11) 1226ndash1235

Todorov E amp Li W (2004) A generalized iterative LQG method for locally-optimalfeedback control of constrained nonlinear stochastic systems Manuscript submittedfor publication

Uno Y Kawato M amp Suzuki R (1989) Formation and control of optimal trajectoryin human multijoint arm movement Minimum torque-change model BiologicalCybernetics 61 89ndash101

Whitaker D amp Latham K (1997) Disentangling the role of spatial scale separationand eccentricity in Weberrsquos law for position Vision Research 37(5) 515ndash524

Whittle P (1990) Risk-sensitive optimal control New York WileyWillems J L amp Willems J C (1976) Feedback stabilizability for stochastic systems

with state and control dependent noise Automatica 1976 277ndash283Winter D (1990) Biomechanics and motor control of human movement New York Wiley

Received June 21 2002 accepted October 1 2004