39
1 Techniques in Computational Stochastic Dynamic Programming Floyd B. Hanson University of Illinois at Chicago Chicago, Illinois 60607-7045 I. INTRODUCTION When Bellman introduced dynamic programming in his original monograph [8], computers were not as powerful as current personal computers. Hence, his description of the extreme computational demands as the Curse of Dimen- sionality [9] would not have had the super and massively parallel processors of today in mind. However, massive and super computers can not overcome the Curse of Dimensionality alone, but parallel and vector computation can permit the solution of higher dimension than was previously possible and thus permit more realistic dynamic programming applications. Today such large problems are called Grand and National Challenge problems [45, 46] in high perfor- mance computing. Today’s availability of high performance vector supercomputers and massively parallel processors have made it possible to compute optimal policies and values of control systems for much larger dimensions than was possible earlier. Advances in algorithms have also paid a large role. In this chapter, the focus will be on the stochastic dynamic programming in continuous time, yet related problems and methods will be discussed where appropriate. The primary stochastic noise considered here is Markov noise in continuous time, since this type of noise is separable in time just as the optimization steps in the principle of optimality. Thus, the stochastic perturbations treated here will be of the continuous, but non-smooth Gaussian type or the discon- tinuous, randomly distributed Poisson type. Due to its continuity property, Gaussian noise is suitable for modeling background randomness. In contrast, Poisson noise is suitable for modeling the catastrophic, rare random events. For some stochastic models, random shocks to the system are more important than the continuous perturbations, although the continuous changes are more easy to treat. Unlike deterministic applications of dynamical programming, the use of general stochastic noise in continuous time makes it difficult to use different formulations other than the partial differential equation of dynamic programming or Bellman equation. For deterministic problems in continuous time, there is the option of applying the maximum prin- ciple to formulate a dual set of forward and adjoint backward ordinary differential equations coupled with information on the critical points of the Hamiltonian, so the method of solution is quite different from the dynamic programming approach. Other methods will be discussed later. Numerical partial differential equation (PDE) methods have been modified for the nonstandard characteristics of the PDE of stochastic dynamic programming. In order to manage the large computational requirements, high performance supercomputers have been employed [17, 116, 117, 118]. For instance, the problems with up to 5 states and 32 mesh points per state have been successfully solved using finite difference methods [116, 117, 58, 118] on both Cray and Connection Machines. Larger problems are possible with recent hardware and software advances. This chapter appeared in F. B. Hanson, “Techniques in Computational Stochastic Dynamic Programming” in Stochastic Digital Control System Techniques, within series Control and Dynamic Systems: Advances in Theory and Applications , vol. 76, (C. T. Leondes, Editor), Academic Press, New York, NY, pp. 103-162, April1996. Supported in part by National Science Foundation Grant DMS 93-0117, National Center for Supercomputing Applications, Pittsburgh Super- computing Center, and Los Alamos National Laboratory’s Advanced Computing Laboratory; written while on sabbatical in Division of Applied Mathematics at Brown University, Providence, RI 02912

Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

1

Techniques in ComputationalStochastic Dynamic Programming �

Floyd B. Hanson�

Universityof Illinois atChicagoChicago,Illinois 60607-7045

I. INTRODUCTION

WhenBellmanintroduceddynamicprogrammingin hisoriginalmonograph[8], computerswerenotaspowerfulascurrentpersonalcomputers.Hence,his descriptionof theextremecomputationaldemandsastheCurseof Dimen-sionality[9] wouldnothavehadthesuperandmassively parallelprocessorsof todayin mind. However, massiveandsupercomputerscannotovercometheCurseof Dimensionalityalone,but parallelandvectorcomputationcanpermitthesolutionof higherdimensionthanwaspreviously possibleandthuspermitmorerealisticdynamicprogrammingapplications.TodaysuchlargeproblemsarecalledGrandandNationalChallenge problems[45, 46] in high perfor-mancecomputing.Today’savailability of highperformancevectorsupercomputersandmassively parallelprocessorshavemadeit possibleto computeoptimalpoliciesandvaluesof controlsystemsfor muchlargerdimensionsthanwaspossibleearlier. Advancesin algorithmshavealsopaida largerole.

In thischapter, thefocuswill beonthestochasticdynamicprogrammingin continuoustime,yetrelatedproblemsandmethodswill bediscussedwhereappropriate.Theprimarystochasticnoiseconsideredhereis Markov noiseincontinuoustime,sincethistypeof noiseis separablein timejustastheoptimizationstepsin theprincipleof optimality.Thus,thestochasticperturbationstreatedherewill beof thecontinuous,but non-smoothGaussiantypeor thediscon-tinuous,randomlydistributedPoissontype. Due to its continuityproperty, Gaussiannoiseis suitablefor modelingbackgroundrandomness.In contrast,Poissonnoiseis suitablefor modelingthecatastrophic,rarerandomevents.Forsomestochasticmodels,randomshocksto thesystemaremoreimportantthanthecontinuousperturbations,althoughthecontinuouschangesaremoreeasyto treat.

Unlike deterministicapplicationsof dynamicalprogramming,theuseof generalstochasticnoisein continuoustimemakesit difficult tousedifferentformulationsotherthanthepartialdifferentialequationof dynamicprogrammingor Bellmanequation.For deterministicproblemsin continuoustime,thereis theoptionof applyingthemaximumprin-ciple to formulateadualsetof forwardandadjointbackwardordinarydifferentialequationscoupledwith informationon thecritical pointsof theHamiltonian,sothemethodof solutionis quitedifferentfrom thedynamicprogrammingapproach.Othermethodswill bediscussedlater.

Numericalpartial differentialequation(PDE) methodshave beenmodifiedfor the nonstandardcharacteristicsof the PDE of stochasticdynamicprogramming. In order to managethe large computationalrequirements,highperformancesupercomputershave beenemployed[17, 116, 117, 118]. For instance,theproblemswith up to 5 statesand32meshpointsperstatehavebeensuccessfullysolvedusingfinite differencemethods[116, 117, 58, 118] onbothCrayandConnectionMachines.Largerproblemsarepossiblewith recenthardwareandsoftwareadvances.�

Thischapterappearedin F. B. Hanson,“Techniquesin ComputationalStochasticDynamicProgramming”in StochasticDigital Control SystemTechniques, within seriesControl andDynamicSystems:Advancesin TheoryandApplications, vol. 76, (C. T. Leondes,Editor),AcademicPress,New York, NY, pp. 103-162,April 1996.�

Supportedin partby NationalScienceFoundationGrantDMS 93-0117,NationalCenterfor SupercomputingApplications,Pittsburgh Super-computingCenter, andLos AlamosNationalLaboratory’s AdvancedComputingLaboratory;written while on sabbaticalin Division of AppliedMathematicsatBrown University, Providence,RI 02912

Page 2: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

2 ComputationalStochasticDynamicProgramming, F. B. Hanson

Thefinite elementmethodhascomputationalandmemoryadvantages.Thismethodrequiresasmallernumberofnodesthanthecorrespondingfinite differencemethodof similaraccuracy. Wehaveshown [20] thatthefinite elementmethodnot only helpsto alleviate Bellman’s Curseof Dimensionalityin dynamicalprogrammingcomputationsbypermittingthesolutionof higherdimensionproblems,but alsosaving supercomputerstoragerequirements.

Thegeneralaim is to developfastandefficientparallelcomputationalalgorithmsanddatastructuresfor optimalfeedbackcontrol of large scale,continuoustime, nonlinear, stochasticdynamicalsystems.Sincethe finite elementprocedurerequiresformulationof the meshdatastructures,it is desirableto study the mappingfrom the problemconceptualstructureto themachineconfigurationfor eitherCrayor ConnectionMachinecomputationalmodels[116].

However, the computationaltreatmentof Poissonnoiseis a particularlyuniquefeatureof this chapter. Thenumericalapproachdirectly treatsthepartialdifferentialequationof stochasticdynamicprogramming.Resultsgivethe optimal feedbackcontrol variablesandthe expectedoptimal performanceindex in termsof statevariablesandtime.

For thestochasticoptimalcontrolproblem,MonteCarloandothersimulationsusingrandomnumbergenerationarea primaryalternative for direct dynamicprogrammingcomputations,but disadvantagesresultfrom determiningthesufficientsamplesize(complicatedfor generalproblems)andthereis aquestionof maintainingfeedbackcontrol.Furthermore,for simulationcalculations,very complicatedMarkov processeshave to be randomlygeneratedandatremendousnumberof sampletrajectorieswouldhavetobeaveraged,whereasin thestochasticdynamicprogrammingapproachtheaveragingoverthestochasticprocessesis built into theequationof dynamicprogramming.Hence,thereis a greatneedto developtheuseof high performancecomputingtechniquesin stochasticdynamicprogrammingfordirectsolutionsof stochasticoptimalcontrolproblems.

Thereportof thepanelontheFutureDirectionsin Control Theory[42] confirmstheneedfor advancedscientificcomputing,bothparallelizationandvectorization,in controlproblems.TheNationalComputingInitiative [97] callsstochasticdynamicprogrammingcomputationallydemanding,but avoids the opportunityto classify it asa GrandChallengealongwith otherproblemsof similarcomputationaldemandsasit shouldbeclassified.

Applicationsof stochasticdynamicprogrammingarisein many areas,suchasaerospacedynamics,financialeconomics,resourcemanagement,roboticsandpower generation.Anothermaineffort in this area,in additionto ourown, hasbeenin France,with Quadratandhis coworkers[1] at INRIA developinganexpertsystemthatproducesamultitudeof resultsfor stochasticdifferentialequationswith Gaussiannoise,providedthatdiscountingis constantandtheproblemcanbetransformedto a stationaryone.DantasdeMelo, Calvet andGarcia[13, 26] in Francehave usedtheCray-2multitaskingfor discretetimedynamicprogrammingproblems.

Kushnerandcoworkers[75, 76, 77] have recentlydescribedmany numericalapproachesto stochasticcontrol,with specialemphasison thewell-developedMarkov chain approximationmethod.Also, muchtheoreticalprogresshasbeenmadefor usingviscositysolutions[21, 108, 22].

Shoemaker[79, 16, 24, 25] andcoworkershaveappliedseveralvariantsof thedeterministicdifferentialdynamicprogrammingalgorithmgroundwaterapplications.Differentialdynamicprogrammingis a modificationof dynamicprogrammingbasedupon quadraticexpansionsin stateand control differentialsand was originally developedbyMayne[91].

Luus[87, 88] hasdevelopedamethodfor deterministic,highdimensionaldynamicprogrammingproblemsusinggrid sizereductionin bothstateandcontrol,or in justcontrolalone,suchthatthemethodconvergesto optimalcontrolandstatetrajectoriesastheregionreductioniterationsproceed.

The authorandhis co-workershave beendevelopingcomputationalmathematicssolutionsfor fairly generalstochasticdynamicprogrammingproblemsin continuoustimeusinghighperformancecomputingtechniques[51, 52,55, 54, 17, 18, 19, 57, 116, 117, 58, 118, 20, 102, 60].

The presentationin this chapteris in the formal mannerof classicalappliedmathematicsin orderto focusonthemethodsandtheir implementation.In SectionII computationalstochasticdynamicprogrammingis discussedforcontinuoustime problemsandadvancedtechniquesarediscussedin SectionIII. In SectionIV, the direct stochasticdynamicprogrammingapproachis comparedin somedetailwith thealgorithmmodelsof differentialdynamicpro-grammingandtheMarkov chainapproximation.Thesemethodsareselectedfor comparisonin somedepthsincethatthey areactively usedto solvesimilar typeoptimalcontrolproblems,ratherthanpresentabroadsurvey withoutmuchdepth. They arereformulatedin sucha way to facilitatecomparison.In SectionV, researchdirectionsarebrieflymentioned.

II. COMPUTATIONAL STOCHASTICDYNAMICPROGRAMMING IN CONTINUOUSTIME

Page 3: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

F. B. Hanson, ComputationalStochasticDynamicProgramming 3

Thedevelopmentof fastandefficient computationalalgorithmsis thegoal for largerdimensionrelatively gen-eral optimal feedbackcontrol of nonlineardynamicalsystemsperturbedby stochasticdiffusion andPoissonjumpprocesses.Thediffusionprocessesrepresentthecontinuous,backgroundcomponentof theperturbations,suchasthatdueto fluctuatingpopulationdeathrates,randomlyvarying winds andotherbackgroundenvironmentalnoise. ThePoissonprocessesrepresentthediscontinuous,rareeventprocesses,suchasoccasionalmassmortalities,largerandomweatherchangesor otherlarge environmentaleffects. The Poissonperturbationsmodelthe moredisastrousdistur-bancesandthesedisastrousdisturbancesaremoreimportantfor many realisticmodelsthanthephenomenamodeledby continuousbut nonsmoothdisturbancesresultingfrom Markov diffusions.

Thetreatmentof Poissonnoiseis amajorfeaturehere.However, therehasbeenmuchmoreresearchonMarkovdiffusions,andthis in undoubtedlydueto thefactthatthey aremucheasierto analyzethanthediscontinuousPoissonnoise. Randomdeviations from deterministicresultstend to occur in regions of high costsand possiblefailure,indicatingtheneedfor fastalgorithmsfor largefluctuations.Our goalis thatour resultsshouldbein a practicalformsuitablefor applications.

Ourmotivationfor thisresearchcomesfrom bioeconomicmodeling,but theproceduresdevelopedareapplicableto a wide rangeof biological, physical,chemical,andengineeringapplicationswith a stochasticdynamicalsystemgoverningthemotionor growthof thesystem,andwith aperformanceor costfunctionthatneedsto beoptimized.Ourapplicationssofarhavebeenprimarily theoptimalharvestingof fisheriesresources.Athansetal. [6] analyzea flightdynamicsapplicationperturbedby Gaussiannoise,but thisapplicationcouldbetreatedwith themoregeneralrandomnoisedescribedhereto modelmorerealistictestconditions.Quadratandcoworkers[1] havemadeapplicationsto thecontrolof electricpower systems.Oneemphasishereis theusehigh performancecomputingtechniqueson a widerrangeof applications.

A. FORMULATION OFPDEFORSTOCHASTICDYNAMICPROGRAMMING

Due to the fact that the mathematicsof stochasticdynamicprogrammingis not very accessibleat the level ofapplication,we presentherea relatively generalformulation. Much of this formulation,but not all, canbe gleanedfrom GihmanandSkorohod[43, 44] with somedifficulty, or from KushnerandDupuis[76], or from themany otheraccountsrestrictedto justcontinuous,Gaussiannoise,suchasFlemingandRishel[38], andStengel[109]. Additionalinformationonstochasticdifferentialequationscanbeobtainedfrom Arnold [3], Jazwinski[67], andSchuss[106].

Thelumpedcontinuousstatevariable,�����

, denotesan � ��� vectorof positions,velocities,orientationanglesorotherimportantvariables.Thefeedbackcontrolvariable,� ����������� , is an ����� vectorof otherregulatingdynamicquantitiesor orientationvariables.Thebasicformalstochasticdifferentialis givenby,� ��������������� � ��� � �! #"$������� �&% ���' )(+*-,/.0������12���43$� � ��� � 15��

(1)

for�����7689��:;6

; <2= �76 = � = �4> ; � in ?A@ and � in ?AB . In (1),�&% ���

is thedifferentialof astandardC -dimensionalvector-valuedWienerprocess,soit hasindependentGaussiancomponents,zeromeanandCovarD �&% � �&%FE!G ��HJI � �

.Theterm

3$� � ��� � 15is a K -dimensionalPoissonrandommeasurewith independentcomponents,MeanD 3$� � ��� � 15 G �L � � 15 � �

andCovarD 32��3�E G �NM�� � 15 � �, where

L � � 15is a K -dimensionaldistribution of the jump amplitudesin-

dexed by the mark1

in marker domain ?AO , PRQ�SFT * ,/U Q � � 15 is the V th rateandM SWD U QYXZQ�[ \ G O^]_O is its diagonal

representation.In addition,CovarD 32� % E G � < . Thecoefficients�

,"

, and.

arematriceswhosesizesarecompati-blewith themultiplicationsindicatedabovein (1).

For theperformancecriterionor objective functional,weassumetheBolzatype,` D ��� � ��32� % ��� G �a(0b�cb �&d�e ����� d f� � ����� d �� d �� d � hg5�������4>i�� (2)

wheree �j:9��kl���

is theinstantaneouscostfunctionandg

is theterminalor salvagecostfunction. In (2), thevariabletime

�is taken at the lower limit of the cost integral, ratherthat

� 6, to treat the integral as a variableintegral for

necessaryfurther analysis,understandingthat� 6nm � m � >

. Obviously, other forms could be usedin placeof (2)withoutmuchdifferencein effort. Ourobjective is to optimizetheexpectedperformanceon thevariabletimehorizon

Page 4: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

4 ComputationalStochasticDynamicProgramming, F. B. Hanson�������4>i, oqp �j:9���r� s2tvuB D Meanw�x [zy2{ D ` D ��� � ��32� % ��� G4| �������}:9� � ���~��k GvG � (3)

in orderto minimizecostsof production,costsof extraction,fuel consumption,or lateralperturbationsof motion.

Dueto theMarkov propertiesof3

and%

, theprinciple of optimalityholdsasit doesin thedeterministiccase,sobothminizationandconditionalexpectationoperationscanbeseparatedinto theoperationsover thecurrenttimeincrementD �����; � � andthefuturetime interval D �/ � �����4>_ :o p �j:9����� s�tvuB&� b [ bj�!��bj� �� Meanw�x [zy${ � b [ bj�!��bj� � ( bj�!��bb �&d�e ����� � � d

(4) oqp �������; � �����! � � | �����~��:9� � ������k��+���Next, it is assumedthattheformalSDE(1) is interpretedunderIto integrationsrules,soanapplicationof theIto

chainrule for Markov processes,�_� �������f����� �J� �� � #� E ����� � ������� @ � �������� �� "�" E0� � @ � E@ ��� � � � � E@ � ����������"$������� ��% ���(5) �+� ( * , D � �j:� #� � ��:9��12��������� � �������f��� G �Z3 � � � ��� � 15��

is required,where� � �j:9��1$���

is the � th columnvectorof the jump amplitudematrix.

, the scalarmatrix product� ��� �TraceD � � E G denotesthetraceof thematrix product

� � E, and

� Edenotesthetransposeof matrix

�. The

generalizedIto chainrule is givenin GihmanandSkorohod[43, 44]. SeeFlorentin[39], Dreyfus[33], Wonham[114],andKushnerandDupuis[76] for combinednoiseproblems,with Poissonin additionto Gaussiannoise.Thesecom-binedprocessesarealsoreferredto asjumpdiffusions(cf., KushnerandDupuis[76] andSnyderandandMiller [107]for additionalreferences).FlemingandRishel[38] give treatmentsfor thecontrolof stochasticsystemsperturbedbyGaussianwhitenoise.TheIto chainrule is basicallygeneralizationof thechainruleof differentiablefunctions,mod-ified for thenon-smoothnessof thediffusionprocessesandthejump discontinuitiesof thePoissonprocesses.In fact,thisgeneralizedchainrule is probablymoreaboutdiscontinuitiesin valueandderivativesthanit is aboutstochasticity.In contrastto the ordinarychainrule in calculus,the non-smoothnessof theGaussianprocessesresultsin a secondorderHessianmatrix termfor

�, while thejumpdiscontinuitiesof thePoissonprocessesresultin thejumpof

�atall

Poissonprocessjumps,representedin (5).

Finally, substitutionof thechainrule(5) into theprincipleof optimality(4) resultin theoptimalexpectedperfor-mance

o psatisfyingtheHamilton-Jacobi-Bellmanpartialdifferentialequationof dynamicprogramming,< � � o p� � 0� @�D o p G ��:9���S � o p� � �� "�" E � � @ � E@ o p h¡ p ��:9��� (6) � �¢( * , U � � � 1 D o p �j:� #� � ��:9��12��������� o p ��:9��� G �

where<�= � = � > and:�£ ? @ . Thecontrolswitchingtermin (6) is givenby¡ p �j:9���¤� s2tvuB D ¡¥�j:9��k���� G with

(7)¡¥�j:9��k���� S e ��:9��kl���' 0� E �j:9��kl���� @ o p ��:9���f�

Page 5: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

F. B. Hanson, ComputationalStochasticDynamicProgramming 5

One advantageof (6) is that it is a deterministicpartial differentialequation,in contrastto its origin in thestochasticperformancecriterion (2) subjectto stochasticaveraging,minimizationandconstraintsby the stochasticordinarydifferentialequation(1). Theoutputof aprogramfor (6) in thegeneralnonlinearcaseis theoptimalexpectedperformance,

o p �j:9���, andthe optimal feedbackcontrol,

k p ��:9���, for arbitraryvaluesof

�j:9���. Knowledgeof the

controlis usuallythemostimportantresultantoutputfor applications,becausetheoptimalcontrolis input requiredbythecontroluseror manager.

Thefinal conditionfor theoptimal,expectedperformanceindex is thato p ��:9���4>i~� g p �j:9���4>_9��s2t¦u DMeanD g | �����4>i~�}: G¦G (8)

or salvagecosts.Thefinal valueproblem,ratherthaninitial valueproblem,propertyhereis dueto thefactthat(6) isa backwardequationwith respectto time.

1. BoundaryConditions

Theboundaryconditionsdependmoreheavily on theprecisenatureof thestochasticnoiseandthenatureof theboundaries.In many cases,thereareno simpleboundaryspecifications,but natural,Dirichlet boundaryvaluessome-timescanbeobtainedby integratingtheBellmanequationsalongtheboundaries.It shouldbenotedthat,unlike thecorrespondingforwardequationfor theoptimaltrajectory, theDirichlet boundaryconditionsareimplicitly containedin thebackward,Bellmanequation,duetheconditioningof theoptimalexpectedperformance(3) andtheinhomoge-neouspropertyof theequationdueto theinstantaneouscost,provided(1) accuratelyportraysthedynamicsandis validat theboundary. This is dueto thefactthattheDirichlet boundaryconditionsby first principlesarecalculatedfrom theapplicationof theboundaryvaluesto (3) alongwith (2). Clearly, theboundaryversionof theBellmanequationwillbethesameastheinterior versionof theBellmanequation(6) including(7) with theboundaryvaluesapplied,exceptin themostdegeneratecases.In otherwords,theapplicationof theDirichlet boundaryvaluesandthederivationof theBellmanequationby theprincipleof optimality togetherwith theIto’s rule canbeinterchanged,againignoringveryexceptionalcases.

However, sometypesof boundaryconditionssuchasNeumanntypeboundaryconditionswill requirethemod-ificationof theSDE(1) to accountfor theboundaryprocessesaspartof themodelingprocedure.In thiscase,propertreatmentsof boundaryconditionsaregivenby KushnerandDupuis[76]. They constructcompensatingprocessesthatareaddedto theunconstrainedprocessandforceboundaryconstraints,suchasusinga reflectingprocessin thecaseof reflectingboundaries.In thecaseof singularcontrol,freeboundariesareanotherproblemthatneedsconsideration[85]. Propertreatmentof boundarysolutionsis of majorimportance.

2. NearlyQuadraticCosts

Theprincipaladvantageof thedynamicprogrammingformulation,(6), is that theoptimizationstepis reducedto minimizing the function argumentof the switchingterm (7) over the control, ratherthandirectly optimizing theobjectivefunctionalin (2) overall realizationsor pathsin thestateandcontrolspaces,asin gradientandMonteCarlo-likesimulationmethods.Thelatteroptimization,on theoriginal integralobjective(2) is muchmoredifficult thanjustoptimizingthepuredeterministicfunctionappearingin theargumentof theminimumin (7).

In orderto facilitatethecalculationof theminimumin (7), it is assumedthatthecostfunctionis quadraticin thecontrol, e ��:9��kl���~� e 6+��:9���! ¨§ E� ��:9���©kª �� k E e � �j:9���©kl� (9)

andsimilarly thatthedynamicsarelinearin thecontrol,���j:9��k����~��� 6 �j:9���! 0« � �j:9���4k�� (10)

In thecaseof nearlyquadraticcostsandnearlylineardynamics,(9–10)canbeconsideredaslocalapproximationsfortheinstantaneouscostfunctionanddynamicaldrift vector, respectively.

The quadraticcostsassumptionis not uncommonin applications,sinceit may be morerealisticto have costsgrow fasterthana linearratein thecontroldueto increasedinefficiencies,e.g.,aswith the inclusionof lessefficientmachineryor lessskilled workerswith therise in production.Also, a quadratic,or near-quadratic,costsassumptionmakesthecontroldeterminationmorestraightforwardfor thealgorithmencoding.

Page 6: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

6 ComputationalStochasticDynamicProgramming, F. B. Hanson

Notethatthis is nottheclassicallinearquadratic(LQ) problem,in general,becausetheproblemcanbenonlinearin thestate

:. Restrictingthelineardynamicsandquadraticcostsassumptiononly to thecontrolpermitsmorerealism

at the modelingstage,sincethe complexities of the physicalapplicationusuallydeterminethe statenonlinearities.However, controlis aninputdeterminedby theusersothecontrolonly LQ assumptionpermitsbetterandsimplerusermanagementof controlinput.

Also, theproperlinearcontrolproblemmaybeapproachedthroughthecheapcontrol limit ase �$¬ < � using

thesamemodel,especiallywhenthedeterminationof linearcontrolandrelatedconvexity conditionsarenotstandard.This is somewhatsimilar to theuseof artificial viscosityin fluid models.

With control–onlyquadraticcostsandlineardynamics,theregularor unconstrainedcontrolk�­

canbecalculatedexplicitly, using � B ¡®�j:9��kl���9�°¯��to yield, k�­±�j:9���~��²�³�´9s2tvuB D ¡¥�j:9��k���� G �¢� e µ �� �+�Y§ � #« E� � @ o p �� (11)

wheree � �j:9��� is assumedto be symmetricandnonsingular. For coefficient functions,

�,"

,.

ande

, with moregeneralcontroldependency, theregularcontrolmaybecalculatedby appropriatemethodsof nonlinearoptimization,suchasNewton’smethodwith quadraticcostsasthefirst approximation.

Theoptimalfeedbackcontrolk¶�j:9���

is calculatedastherestrictionof theregularcontrolk ­ ��:9���

to thesetofcontrolconstraints? B , · pQ ��:9���~�}s2t¦u D ¸9¹9º» [ Q ��s$²8¼ D ¸9¹9½ ¾ [ Q � · ­ [ Q �j:9��� G¦G � (12)

asin theuseof component-wiseor hypercubeconstraints,¸�¹9½ ¾ [ Q m ¸ Q ��:9��� m ¸�¹9º�» [ Q � for V � � to � �for example.For symmetric

e � , theswitchtermhasthesimplifiedform,¡ p ��:9���~�°¡¥�j:9��k p ����� e 6 #� 6 � @ o p �� ��k p E e � ��k p � � k ­ ¿� (13)

which shows thattheswitchterm(7) is quadratic(or nearlyquadraticin theapproximatecase)in theoptimalcontrolkfor quadraticcosts.Also, sincetheregularcontrol

k�­from (11) is linear in thesolutiongradient

� @ o p andsincetheoptimalcontrol

kis apiecewiselinearfunction(includingconstantpieces)of

k-­from (12),theBellmanequation

(6) is agenuinenonlinear, functionalpartialdifferentialequation.Thiswill beelaboratedon later.

3. ForwardComputationsfor Optimal,ExpectedTrajectory

In order to obtain the expectedtrajectoryof the dynamicalsystem,the solution to the forward Kolmogorovequation, �+À p�¿Á � �Â� E@ ��� p À p � �� � @ � E@ � ��"�" E À p -�nà À p

(14) �&�Ä(+*�, U � � � 15 À p ���Å�¢Æ� p� ������12� Á f� Á �Det

�iÇ HÈ ÊÉ�� @ � p E� ���Ë� Æ� p� ������12� Á ���12� Á 4Ì E'Í µ � ���is neededusingthe optimal feedbackcontrol

k p ����� Á found in the backward dynamicprogrammingsweep.Theforwardequation(14)is theformaladjointof thecorrespondingbackwardKolmogorov equation,towhichtheBellmanequationis related.Here, À p � À p ����� Á 9� À p ����� ÁAÎ :9���

Page 7: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

F. B. Hanson, ComputationalStochasticDynamicProgramming 7

is thedensityof thedynamicalprocessfor thestate�

at forwardtime Á andstartingat��������:

usingtheoptimalcontrol

k p �������. Also, � p ��� p ����� Á S ��������k p ����� Á f� Á

is thevectordrift coefficientevaluatedat theoptimalfeedbackcontrol.

ThePoissontermis morecomplicatedsinceit is a jump process.SincethePoissontermis not describedwellelsewhere,thetransformationsaredescribedin moredetailthanusual.Thetotal jump intensityisÃh� �+� P � � �+� ( * , U � � � 15f�SinceforwardandbackwardKolmogorov equationsareadjoints,andsincethenew state

�Å ¨� �wheretheprocess

jumpedto appearsin thebackwarddynamicprogrammingequation(6), theold stateÏ ��Æ� p� wheretheprocessjumpedfrom appearsin theforwardequation(14). Sinceafteran � -jumpthenew stateisÐÑ���Ò 0� p� ������12� Á S ��HÓ NÔ. p� D � G �Y12� Á ��Hence,theinverseof thetransitionis�Õ�ÕÉ�HÓ Ô. p� Ì µ � D Ð G S ÐË Æ� p� �jÐn��1$� Á f�sotheequation, Æ� p� �jÐn��12� Á �� É HA�)��HA� Ô. p� µ � Ì D Ð G ��12� Á ��relatestheinversejumpamplitudevector

� p�to thedirectjumpamplitudeoperator

Ô. p�appearingin (14). TheJacobian

in (14)comesfrom from thevectordifferential� ÐÑ� � �Ò a� � �Ö E � @ � p� ������1$� Á 9�Ë×�HØ ��Y� @ � p E� ������12� Á E'Ù � ���with inverse � �Õ�Å×YHÈ a��� @ � p E� ������12� Á E Ù µ � � ÐÚ�sothattheJacobianis � ���Ö� ��Ð� � Det Û ×�HØ ��Y� @ � p E� ������12� Á E'Ù µ ��Ü �

Notethatwe mustsolve thebackwardequation,(6) with (12) fork p

beforewe cansolve theforwardequation(14) for the optimaldensity, i.e., À p . Equation(14) is essentiallygiven in GihmanandSkorohod[43], but heretheunusualvectorproductin thePoissonintegral is givenclearlyandexplicitly. SeealsoKushnerandDupuis[76]. Thediffusioncontribution of this equationhasbeenmoreextensively investigatedandso is muchbetterunderstoodthanthePoissoncontribution.

Finally, theoptimalstatevectoris calculatedasthefirst momentof À ,� p � Á 9� � p � ÁAÎ :9��� S MeanD � p � Á G �°(+*-Ý�Þ-� À p ����� Á ©��� (15)

startingat� p �����°:

, with similarexpressionsfor thevarianceandtheothermomentsof thestatetrajectory. Takingthefirst moment,with respectto the V th statecomponentß$Q , of bothsidesof theforwardequation(14) resultsin anordinarydifferentialequation(ODE), in vectorrepresentation,� � p � Á � Á � � p � Á ; � � P � � p� � Á �� (16)

with meanoptimalvectordrift � p � Á �� ( * Ý�Þ'� À p ����� Á 4� p ����� Á ��

Page 8: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

8 ComputationalStochasticDynamicProgramming, F. B. Hanson

andmeanoptimalvectorjumpamplitude� p� � Á 9� �P � ( * , U � � � 15 ( * Ý�Þ-� À p ����� Á ©� p� ������12� Á ��averagingover both stateand mark spaces,assumingthat À p and its derivativesvanishon the statespacebound-ary � ? @ . The first momentwill satisfya linear equationif the optimal feedbackcontrol

k p �������, plant dynamics��������k p �������f���

, and the componentjump amplitude� � ������12� Á are linear in the state

�, with linear optimal

feedbackcontrol, with sufficient restrictionson the stochasticcoefficients,thenthe first momentwill simplify to alinearequation.In general,(16)will notbeaclosedsystemof equationsin thefirst moment

� p, sothewholedensity

function À p maybeneeded.Theprimary resultsfor a givenapplicationaretheexpectedcontrol law andtheoptimalexpectedperformance

responsein timeparametrizedby currentstatevariables.

B. COMPUTATIONAL APPROACH

For generalnonlineardynamicsandperformance,thebackwardpartialdifferentialequationof dynamicprogram-ming, Eq. (6), togetherwith switchingterm,Eq. (7), cannot be solvedexactly. Althoughspecialformal solutionsfor thelineardynamics,quadraticcriterionandGaussiannoisearewell known (theLQG problem,e.g. [12, 2]), theyrequirethenumericalsolutionof matrixRicatti equations.

1. ComputationalDifficulties

Two particularfeaturesmake numericalapproximationof (6) with (7) nonstandard.ThePoissonintegral term,in general,makestheproblemthatof solvinga functionaldifferentialequation,while theparticularcaseof a discretejump sizeleadsto a problemfor a delayeddifferentialequation.In eitherthegeneralor theparticularcase(Hanson,[49]), thefunctionalinverseimageof any finite elementwill not in generalbeexistingfinite elements.Thetechniqueof trackingthe delayednodeswasusedby Hansonandco-workersfor a functionaldifferentialequation[63], for aGalerkinapproximationof theBellmanequation[61], andfor a finite differenceapproximation[104]. We have hada greatdealof experiencein themodeling,analysis,andcomputationof Poissonnoisemodels.Meshrefinementorinterpolationis requiredto preventnumericalpollutionof thenumericalaccuracy expectedof standardPDEmethods.Thereductionof this pollutionproblemis closelyrelatedto FeldsteinandNeves’ [36] argumentconcerningtheneedfor accuratedeterminationof the locationsof jump discontinuitieswhen applyinghigher ordermethodson delaydifferentialequations.

The secondnonstandardfeatureis the nonlinearcontrol switch term that is morepertinentto the constrainedcontrolproblem,whetherstochasticor deterministic.Thefact that

� @ D o p G appearsin theargumentof theminimumof¡

meansthat¡

is really a nonlinearfunctionalof� @ D o p G , andthat (6) is a nonlinearpartialdifferentialequation.

In thegeneralconstrainedcontrolcase,thenonlinearPDEalsohasdiscontinuouscoefficientswhen ? B is finite (seeNaimipourandHanson[96]), asit wouldbein mostpracticalapplications.Theconstrainedcasethusleadstoswitchingsurfaceswherethecontrolpassesthroughtheboundaryof ?AB [61].

It canbe shown that the optimal switch term¡ p

of (7) is a piecewise quadraticfunction of the shadow cost� @ D o p G with discontinuouscoefficients. Here,the term piecewisequadraticincludespiecesthat areeitherconstant,linear or quadratic. Although, the regular or unconstrainedcontrol

k ­in (11) is a continuouslinear function of� @ D o p G , providedthequadraticcostandlineardynamiccoefficientsarecontinuous,theoptimal constrainedcontrolk p

in (12) is a continuouspiecewiselinearfunctionof� @ D o p G , but with discontinuouscoefficientswhendecomposed

ascoefficientsof� @RD o p G . Thatis,k p � k p6 · p� ��� @RD o p G � (17)k p6 S ��®àá â ¸ ¹9º» [ Q � ¸ ¹9º�» [ Q�= · ­ [ Q� × e µ �� § � Ù Q � ¸�¹9½ ¾ [ Q = · ­ [ Q =�¸�¹9º�» [ Q¸9¹9½ ¾ [ Q � · ­ [ Q =)¸�¹9½ ¾ [ Q ã äå®æçlè ] � �· p� S �� àá â < � ¸ ¹9º» [ Q9= · ­ [ Q� ×Ye µ �� «�E� Ù Qj[ \ � ¸9¹9½ ¾ [ Q = · ­ [ Q =}¸9¹9º» [ Q< � · ­ [ Q =}¸9¹9½ ¾ [ Q ã äå æç è ]_é �

Page 9: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

F. B. Hanson, ComputationalStochasticDynamicProgramming 9

Clearly, thearraycoefficientsk p6

and

· p� arediscontinuousby componentfor nontrivial costanddynamics,andhencediscontinuous.However,

k pis continuousin stateandtime, while thedecompositionin (17) leadsto discontinuous

coefficientsk p6

and

· p� . Introducingthediscontinuouscoefficientdecompositionof theoptimalcontrolinto theoptimalswitchingtermyields ¡ p � ¡ p6 ¨ê p E� �^� @ D oqp G �� � E@ D o_p G ��¡ p� ��� @ D oqp G �¡ p6 S e 6 �� k p E6 � e � �+��k p6 � � e µ �� ��§ � f� (18)ê p� S � 6 �� · p E� � e � �+��k p6 � � e µ �� �^§ � �� � · p� � � e µ �� ��« E� E � e � �Zk p6 �¡ p� S �� · p E� � e � �&� · p� � � e µ �� �Z« E� f�Thus,

¡ pis piecewisequadraticin the costgradient

� @ D o p G , but it inheritsthe discontinuouscoefficientsof� @ D o p G

from thedecomposition(17),eventhough¡ p

is continuousin stateandtime.Theappearanceof

� @ D o p G in theargumentof theminimum,¡ p

, alsomeansthatthecalculated� @ D o p G shouldbe

smoothenoughin:

to make theminimumcomputationswell-conditioned.Furthersignificanceof thegenuinenon-linearbehavior is thatit requirespredictor-correctoror othernonlineartechniquesfor (6). Predictor-correctormethodsandrelatedmethodsin spacewill beusedto handlethenonlinearaspects,andwith Crank-Nicolsonapproximationsusedin time for their enhancedaccuracy andstability properties.Our approachis basicallyanoptimalcontrolmod-ification of the work on nonlinearparabolicequationsof Douglas[31, 32] andhis co-workers: Dupont,HayesandPercell.

2. Crank–Nicolson,Predictor–CorrectorFiniteDifferenceAlgorithm

The integrationof the Bellmanequation(6-7) is backward in time, because

o p �j:9���is specifiedfinally at the

final time�5�ë�4>

, ratherthanat the initial time. The finite differencediscretizationin statesandbackward time issummarizedbelow : ¬ ��ì¶� D ß2Q�[ \7í G é�] � � D ß$Q � ��vî Q � � '��ï Q G é�] � �ð � D î Q G é�] � � where

î Q � � to ñ � for V � � to � Î� ¬ Á;ò �4>�� ó5�^ô Á � foró$� < to õ Îo p ��� ì � Á!ö ¬ ÷ ì [ öiÎ (19)o pb ���Aì�� Á ö ¬ � �� ��ô Á � ÷ ì [ ö � � � ÷ ì [ ö Î� @�D o p G ��� ì � Á!ö ¬ DV

ì [ öiÎ� @ � E@ D o p G ��� ì � Á!ö ¬ DDVì [ öiÎo p ��� ì #� � [ ì � Á!ö ¬ VH

� [ ì [ öiη ­ ���Aì�� Á ö ¬ URì [ ö η p ���Aì�� Á ö ¬ � ì [ ö Î� @ D o p G ����ì^� Á ö � 6Zø ù ¬ �!ì [ ö � 6�ø ù

whereï Q is themeshsizefor stateV and

ô Á is thestepsizein backwardtime.The numericalalgorithm is a modificationof the Crank-Nicolson,predictor-correctormethodsfor nonlinear

parabolicPDEsin [31]. Modificationsaremadefor the switch term anddelay term calculations. Derivativesareapproximatedwith anaccuracy that is secondorderin the local truncationerror, ú ��ï � , at all interior andboundarypoints,where

ï Q � ú �Yï¿ .The Poissoninducedfunctionalor delayterm,

o p �j:� �� � ���, changesthe local attribute of the usualPDE to

a globalattribute,suchthat thevalueat a node D �Ë )� � G ìwill, in general,not bea node. Linear interpolationwith

secondordererrormaintainsthenumericalintegrity that is compatiblewith thenumericalaccuracy of thederivative

Page 10: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

10 ComputationalStochasticDynamicProgramming, F. B. Hanson

approximations.EventhoughtheBellmanequation(6-7) is a singlePDE,theprocessof solvingit not only producestheoptimalexpectedvalue

o p, but alsotheoptimalexpectedcontrollaw

· p.

Prior to calculatingthe values, ÷ ì [ ö � � , at the new�Yó$ � st time stepfor

óh� < to õ � � , the old values,÷ ì [ ö and ÷ ì [ ö µ � , areassumedto be known, with ÷ ì [ µ � S ÷ ì [ 6 whentwo final startingconditionsareneededforextrapolation.

Thealgorithmbeginswith anconvergenceacceleratingextrapolator�jû;

start:÷�ü x�ì [ ö � 6�ø ù � �� ��ýØ� ÷ ì [ ö � ÷ ì [ ö µ � ¿� (20)

which arethenusedto computeupdatedvaluesof finite differencearrayssuchasthegradientDV, thesecondorderderivativesDDV, the Poissonfunctional termsVH, the regular controlsUR, the optimal controls � , and finallythe new valueof the Bellmanequationspatialfunctional

� ì [ ö � 6�ø ù . Theseextrapolatorevaluationsareusedin theextrapolatedpredictor(xp) step: ÷ ü xp�ì [ ö � � � ÷ ì [ ö �� ��ô Á ��� ü x�ì [ ö � 6�ø ù � (21)

whicharethenusedin thepredictorevaluation(xpe)step:÷ ü xpe�ì [ ö � 6�ø ù � �� É ÷ ü xp�ì [ ö � � ÷ ì [ ö Ì � (22)

anapproximationwhich preservesnumericalaccuracy andwhich is usedto evaluateall termscomprising� ì [ ö � 6�ø ù .

Theevaluatedpredictionsareusedin thecorrector(xpec)step:÷ ü xpec[ þ � �ì [ ö � � � ÷ ì [ ö #ô Á �Z� ü xpece[ þ �ì [ ö � 6Zø ù (23)

for þ � < to þ ¹9º» until thestoppingcriterionis met,with correctorevaluation(xpece)step:÷ ü xpece[ þ � �ì [ ö � 6Zø ù � �� É ÷ ü xpec[ þ � �ì [ ö � � ÷ ì [ ö Ì � (24)

Thepredictedvalueis takenasthezero-th( þ � < ) correction,÷ ü xpece[ 6 �ì [ ö � 6�ø ù S ÷ ü xpe�ì [ ö � 6Zø ù �Uponsatisfyingthecorrectorstoppingcriterion,thenthevaluefor thenext time-stepis set:÷ ì [ ö � � � ÷ ü xpec[ þ ¹9º» �ì [ ö � � �Thestoppingcriterionfor thecorrectionsis formally derivedfrom a comparisonto a predictorcorrectorconvergencecriterionfor a linearized,constantcoefficientPDE[96, 59].

3. FiniteElementVersionof SolutionAlgorithm for SDP

Duetopotentialhigherorderinterpolation,it ispossibleto reducethenumberof statenodesby usingtheGalerkinFiniteElementMethod(FEM) in placeof theFiniteDifferenceMethodin dynamicprogrammingproblems[20], whileretainingthesamelevel of accuracy. Thus,theGalerkinapproximationis usedfor theoptimalexpectedvalueoqp �j:9����ÿ Ô` ��:9���~� Ô��\ ò � Ô` \ ���-��� \ ��:'�� (25)

where D � Q ��:' G Ô� ] � is asetof �ñ linearly independent,piecewisecontinuousbasisfunctions.Thebasisfunctionshavethenormalizationpropertythat � \ ��� Q 9� X Q�[ \ �at elementnode

� Q , implying the interpolationpropertyÔ`$��� Q ���Ø� Ô` \ ��� . As in [64] thebasisor shapefunctions

Page 11: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

F. B. Hanson, ComputationalStochasticDynamicProgramming 11

could be taken asa setof multi–linear functions(productsof linear Lagrangeinterpolationfunctionsin eachstatedimension)onhyper–rectangularelements(rectangularelementsin two dimensions).

Theconditionsto determinetheoptimalcostsÔ` Q ��� ateachnode

� Q aregivenin theweaksenseby theGalerkinvariationalequationfor theresidualsof theBellmandynamicprogrammingequation(6) residualswith respectto thebasisfunctions

� Q asweights: < �°(i*-Ý-Þ�:�� Q ��:'Ó� � o p� � �j:9���'  � @RD o p G �j:9��� � � (26)

for V � � to �ñ . However, Dirichlet boundaryconditionnodesmustbeexcludedfrom thesetof weightsforming thecomponentGalerkinequations(26), althoughthey remainin theappliedGalerkinapproximation(25), sinceknowncostsarespecifiedfor Dirichlet nodes.BeforetheGalerkinapproximationcanbeused,thesecondordertermsof thespatialoperator

� @ D o p G mustbereducedto first orderby Green’s theorem,i.e.,�� ( * Ý�Þ':�� Q ��:'©"�" E � � @ � E@ D o p G � � �� ( * Ý�Þ�: � E@ D � Q "�" E G � @ D o p G(27) �� (�� * Ý � � Q � Ô� E "�" E � @ D o p G �

where Ô� is theunit outwardnormalto thestatespaceboundary, � ? @ . Now, substitutingthecontrol lineardynamics,quadraticscostsmodel(6,13,27)into theGalerkinequation(26)yieldsthematrixODEfor thecostnodevectorÔ÷ � D Ô` Q ��� G Ô� ] � �

< � � � � Ô÷� � ���� �&Ô� � ���! ËÔ��� ���� Ô� � ���! �� ��� ����-�/Ô÷ ��� (28) Ô§ ���! Ôê������where � � � (i* Ý Þ':�� Q ��:'�� \ ��:'Y� Ô� ] Ô� �Ô� � ��� S �j( *-Ý Þ':�� Q ��:'©� E6 �j:9���4� @�D � \ �j:' G¦� Ô� ] Ô� �Ô� � ��� S � �� ��(i*-Ý Þ': � E@ D � Q ��:'©"�" E �j:9��� G � @ D � \ ��:' G � Ô� ] Ô� �Ô� � ��� S � � � ( * Ý�Þ�: ( * , U � � 1 �� Q �j:' D � \ �j:� #� � �j:9��12����� � \ �j:' G � Ô� ] Ô� ��� � � ��� S �� ��(�� * Ý �� � Q �j:'9� Ô� E ��"�" E �j:9���� @ D � \ �j:' G � Ô� ] Ô� �Ô§ ��� S �j(i* Ý Þ':�� Q ��:' e 6_�j:9��� � Ô� ] Ô� �Ôê���� S �� ��(+*-Ý-Þ':�� Q ��:'©k p E �j:9��� e � �j:9���f��k p � � k�­9f��:9��� � Ô� ] Ô� �For generalcoefficients, like

� 6,"

,.

ande 6

, someapproximatequadrature,suchas Simpson’s rule or Gauss-

Page 12: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

12 ComputationalStochasticDynamicProgramming, F. B. Hanson

Legendrerules,is neededto evaluatetheseFEM integralsof basisandcoefficientfunctions.However, theapproximatequadraturemustbeatleastasaccurateasusingtheselectedbasisfunctionsonthegivenelements,e.g., ú �Yï é � � wherethe orderof the sizeof the elementsis ú ��ï/ for sufficiently small

ïusingmulti–linearbasison hyper–rectangular

elements[64]. NotethattheoptimalswitchtermÔ ê

implicitly dependsonÔ÷ in anonlinearwaythroughtheoptimaland

regularcontrolvectors,k p

andk-­

, andthussubjecttocalculationslike(11,12),exceptthattheGalerkinapproximation(25) is usedfor

o p.

TheCrank–Nicolson,predictor–correctorschemeusedfor thefinite differenceformulationabove canbemod-ified for the finite elementmethodhere. The basicCrank–Nicolsonalgorithm,sometimesdifficult to seefrom thecanonicaldiffusionequationexample,is themid-pointquadraturefor thetemporalintegral, followedby averagestoapproximatethemidpointsof theunknownvariable,while leaving themidpointvaluefor theexplicit time-dependence.For thedynamicprogrammingequation,specialmodificationsareneededto handletheunknown controlvectorthataugmentstheunknown optimal costvariableandto handlethenon–localfunctionaldependencedueto the Poissonnoisecontributions.Thus,Crank-Nicolsonmodificationof theGalerkinequation(28) is� � Û Ô÷ ö � � � Ô÷ ö Ü � ô Á ��� �� É Ô��� [ ö � 6Zø ùl Ô��� [ ö � 6�ø ù� Ô��� [ ö � 6Zø ù (29) �� ��� [ ö � 6Zø ù Ì Û Ô÷ ö � � Ô÷ ö Ü Ô§ ö � 6Zø ù¶ Ôê ö � 6�ø ù�5�where ��ÿ Á;ö ���4>�� ó5�^ô Á for

ó2� < to õ �Ô÷ � Á ö ~ÿ Ô÷ ö � Ô÷ ö � 6Zø ù ÿ < ���Ó�&� Ô÷ ö � � Ô÷ ö ��Ô� � � Á ö � 6�ø ù 9� Ô� � [ ö � 6Zø ù � Ô� � � Á ö � 6Zø ù 9� Ô� � [ ö � 6Zø ù �Ô� � � Á!ö � 6�ø ù^9�FÔ� � [ ö � 6�ø ù�� �� ��� � Á!ö � 6�ø ù89� �� ��� [ ö � 6�ø ù��Ô§ � Á ö � 6�ø ù 9� Ô§ ö � 6�ø ù � andÔê�� Á ö � 6Zø ù 9ÿ Ôê ö � 6�ø ù �

An alternative collectionof termsin (29) leadsto a form lesssusceptibleto catastrophiccancellationin the caseofsmallbackwardtimesteps

ô Á :Ô� ö � 6Zø ù¥�^ô Ô÷ ö ���ô � ö � 6�ø ù¥� Ô÷ ö hô Á �¿É Ô§a Ô� Ì ö � 6Zø ù � (30)

where ô Ô÷ ö S Ô÷ ö � � � Ô÷ ö ��ô � ö � 6Zø ù S �Âô Á �¿É Ô��� Ô��� Ô��� �� ��� Ì ö � 6�ø ùand Ô� ö � 6�ø ù S � �� ���ô � ö � 6�ø ù �with bulk subscriptnotationfor thetemporalmidpoint.

The form (30) is still implicit, but the useof extrapolation,predictionandcorrectionwill convert it to a moreexplicit form [20]. Theprocedureat this point is similar to thatof thefinite differencemethod,exceptfor theevalu-ationof theregularandoptimalcontrolvectors,andthemorecomplicatedmatrix structureof theGalerkinequationapproximations.Thestartingvaluesof thebackwardtime iterationbeginswith theinterpolationof thefinal condition(8) betweennodesgiventhenodevectorÔ÷ 6 � D Ô` Q ��� > G Ô� ] � � D g p ��� Q G Ô� ] � � (31)

Theextrapolationstepneedstwo startingvalues,soa simpleexpedientis to useapost–finalvalueÔ÷ µ � � Ô÷ 6 to start

it off, althoughamoreintelligentguessis desirable.Theextrapolated(x) accelerationstepsuppliestheevaluationforthenext temporalmidpoint: Ô÷ ü x�ö � 6�ø ù � < � �È�+��ýØ� Ô÷ ö � Ô÷ ö µ � �� (32)

Page 13: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

F. B. Hanson, ComputationalStochasticDynamicProgramming 13

foró�� < to õ � � . Then,thecostnodevectoris usedto computetheregularcontrol

k�­in (11) andtheoptimal

controlk p

in (12),but basedupontheGalerkinapproximation(25)at thetemporalmidpoint��� Á!ö � 6�ø ù . Thispermits

evaluationof the nonlinearoptimizationtermÔ¡Êÿ Ô¡ ü x� leadingto a reducedextrapolatedpredictor (xp) Galerkin

equation, Ô� ö � 6�ø ù �^ô Ô÷ ü xp�ö ���ô � ö � 6�ø ù � Ô÷ ö #ô Á �/É Ô§ Ô ê ü x� Ì ö � 6Zø ù � (33)

Whensolved,thesolutionto (33) is usedin thepredictorevaluation(xpe)step,Ô÷ ü xpe�ö � 6�ø ù � Ô÷ ö < ���È��ô Ô÷ ü xp�ö �(34)

which, again, in turn is usedto updatethe regular and optimal control vectors. Then there is a set of correctoriterationsthat continueuntil the changeis sufficiently small to meetthe stoppingcriterion. The corrector (xpec)GalerkinequationisÔ� ö � 6�ø ù ��ô Ô÷ ü xpec[ � � �ö �!�ô � ö � 6Zø ù � Ô÷ ö hô Á �¿É Ô§° Ôê ü xpece[ � Ì ö � 6�ø ù � (35)

coupledwith thecorrectorevaluation(xpece)step,Ô÷ ü xpece[ � � �ö � 6�ø ù � Ô÷ ö < ���È��ô Ô÷ ü xpec[ � � �ö �(36)

wherethepredictorevaluationis thestartingcorrectionÔ÷ ü xpece[ 6 �ö � Ô÷ ü xpe�ö �SeeChung,HansonandXu [20] for theanalysisof thestabilityandconvergenceof thisprocedureusingtheheuristiccomparisonequation(39),presentedin thenext subsection,exceptthataneigenvalueanalysisis usedin [20].

4. Bellman’sCurseof Dimensionality

Themaindifficulty in treatinglargesystemsis thedimensionalcomputationalcomplexity or Curseof Dimen-sionality. Theorderof magnitudeof this complexity cansimply beapproximatedby assumingthat computationisdominatedby computationof vector functionssuchas the nonlinearityfunction

���j:9��kl��:9�������, the costgradient� @�D o p G ��:9��� and,in thecaseof uncorrelatednoise,thediagonalizedcostHessianarray D � � � o p#" � û �Q f��:9��� G é�] � . Their

computationgivesa fair representationof the orderof the computationalandmemoryrequirements.For eitherthefinite differenceor finite elementmethods[20], theorderof thenumberof componentvectorfunctionevaluationsaswell thememoryrequirementsat any time stepcanbecalculated.HeretheFinite Differencerepresentationwill beusedto motivatethissection.Sincethe V th statecomponentwill haveits own nodeindex

î Q in thefinite approximationrepresentation, :�� D û Q G é�] � � ¬ D ß2Q�[ \7í G é�] � �thecostgradienttransformedfromcontinuousrepresentation

� @ D o p G to finite differencerepresentationDV will dependon theall statecomponentsandall statefinite approximationindices,� @�D o p G ��:9� Á!ö l� ¬ D $ ` Q�[ \�%J[ \'&J[ øzøzø [ \)( G é¿] � %�] � &8] øzøzø ] � ( � (37)

for fixedtime-to-gostepó. Here,in thecasethateachV th statecomponenthasacommonnumberof nodesñ Q � ñ ,

sothetotalnumberof finite representationarraycomponentsare� �+* � � � é,Q ò � ñ Q � � � ñ é �andsimilarly for othervector functions. The orderof the computationor storagerequirementswill thenbe somemultipleof this, - � � � ñ é 9� - É � ��. é/ 0 ¾ ü � � Ì � (38)

Page 14: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

14 ComputationalStochasticDynamicProgramming, F. B. Hanson

for V � � to � statesandfixedtime-to-gostepó. Hence,thenumberof nodesgrowsexponentiallywith thedimension

of the statespace� or with the logarithmof the numberof the commonnumberof nodesper state ñ . Equation(38) is ananalyticalrepresentationof Bellman’s Curseof Dimensionality. This exponentialgrowth of the CurseofDimensionalityis illustratedin Figure1. Sincetheamountof storageis ahardwarelimitation, theselectionthenumber

Figure1: Orderof magnituderepresentationof the computationalor storagerequirementsillustrating the curseofdimensionalityfor stochasticdynamicprogrammingfor � states,ñ commonnodesperstateanduncorrelatednoise.

of nodes,giventhenumberof states,will typically bedeterminedto avoid memoryboundcomputationsasnodesarechosento satisfyaccuracy requirements.Thecase� �!1

statesand ñ �Êý �doubleprecisionnodesrequiresabout

48MW (where1MW is 1 million words).In thecaseof correlatednoise,thena full, ratherthanthediagonalof theHessianarrayneedsto becalculated,

sotheHessianarrayis transformedfrom continuous� @ ��E@ D o p G to finite DDV representationas� @ � E@ D o p G �j:9� Á!ö l� ¬ D $2$ ` Q3%f[ Q4&J[ \5%�[ \)&f[ øzøzø [ \'( G é�]_é¿] � %�] � &^] øzøzø ] � ( �

increasingtheorderof thecomputationalcurseof dimensionality� timestoú × � � � ñ é Ù � ú É � � �6. é/ 0 ¾ ü � � Ì �for fixedtime-to-gostep

ó.

Thus,exponentialgrowth in computingandstoragerequirementsarethemainbottlenecksassociatedwith theCurseof Dimensionality. High performancecomputing(discussedin thenext section)permitsthesolutionof largerdimensionproblemsthanfor mainframecomputers.

C. ALGORITHMIC CONVERGENCE

An importantcomponentof ourstochasticprogrammingcode,hasbeenthemeshselectioncriteriaby whichwecanbeassuredthatthestochasticdynamicprogrammingcorrectionswill converge[96, 59]. Thiscriteriafollowsfroma heuristicallyconstructeda linearized,constant-coefficient,comparisonPDE,< �Ë� Ô`� � � � � @ � E@ Ô`� 7�Ä��� @ Ô`$� (39)

thatmodelsthebehavior of theoriginalnonlinearstochasticdynamicalprogrammingPDE.Here,�

is aconstant����diagonalmatrixand

�is aconstant� -vector. Thecomparisonequation(39) formally correspondsto theSDE,� �ë�8�¢� � �! :9 � � � � �&% �

Page 15: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

F. B. Hanson, ComputationalStochasticDynamicProgramming 15

providedit is interpretedin termsof theIto calculus.Estimatesof the constantcoefficientsin (39) canbe appropriateboundson the control optimizedinfinitesimal

momentsof thediffusionapproximation:� Q � s2² ¼ü<; [ =q[ E � DMean D � ß Q ��� | ��� Á ~�}:9� � �ak G¦G " � Á(40)� s$²8¼ü<; [ =q[ bj� D « Q ��:9��kl���! O� � ò � . Q � �j:9����� P � G �

and � QvQ � �� �$s$² ¼ü<; [ =_[ E � DVar D � ß$Q ��� | ��� Á ~�}:9� � �ak G¦G " � Á(41)� �� ��s$²8¼ü<; [ bj� � I�öfò � " �Q�[ ö �j:9���! O�� ò � . �Q�[ � ��:9���-� P � ���

for V � � to � . However, othercoefficientestimatescouldbeusedin placeof (40,41).A vonNeumannFourieranalysisof theCrank-Nicolson,predictor-corrector, finite differencemethodappliedto

thelinearcomparisonequationyieldsageneralizedtime-spacemeshratioconditionthatis uniformin theparameters,valid for bothparabolic-like(

�?>� < ) or hyperbolic-like(� Sa< ) PDEforms@ �°ô Á �?ABBC D é� Q ò � � � Q¦Qï �Q7E � D é� Q ò � � Qï Q E � = � � (42)

Also, sincethedrift appearsin thecorrectorconvergencecriterion (42), upwindingschemesto enhancestability bycompensatingfor drift in convectiondominatedflow shouldnotbenecessary. Sincethepredictor-correctorpartof themethodis not neededfor the linearcomparisonequation(39) itself, theapplicationof thepredictor-correctorpartofmethodhasno utility for linear equationsby themselves. However, the samemethodsmustbe usedon both linearcomparisonandnonlinearequationssothederivationof thecorrectorconvergencecriterionfor thelinearcomparisonequationis valid for thenonlinearPDEof interest(6).

Note that if (42) is written asaveragesover thestatespace(i.e.,1 � �é�F éQ ò � 1 Q for somestatequantity

1 Q ),then(42)becomes @ é S �� @ �°ô Á ��G � � � " ï � � �� � " ï¿ � = �� � (43)

where� " ï S �é�F éQ ò � � Q " ï Q , for instance.Hence,the root-sum-squared-meancondition(43) basedon per state

averagesis morestringentin that @ é =a� " � ¬ < � , as � becomeslarge,i.e.,thehigherthestatedimensionthesmallerthetimemesh

ô Á hasto berelative to themeanmeasureof thestatemesh,i.e.,ô Á = �� � �G � � � " ï � � °� � " ï¿ � �Whenthemeasureof thestatemeshsize

ï(i.e., suchthat

ï Q � ú �Yï¿ ) is sufficiently small,or moreprecisely� � " ï �IH | � | " ï, the multidimensionalparabolicPDE meshratio condition for diffusion dominatedflows is ap-

proached: @ � ¬ � ��ô Á � � � " ï � =°� �However, in the oppositecase,when the measureof the statemeshsize

ïis sufficiently large, or moreprecisely| � | " ï H � � " ï �

, themultidimensionalCourant-Friedrichs-Lewy (CFL) hyperbolicmeshratio conditionfor convec-tion dominatedflows is approached: @ � ¬ � ��ô Á � | � | " ï =°� �

However, sincethefull PDEof stochasticdynamicprogramming(6) is nonlinearfor quadraticcosts,thecorrector

Page 16: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

16 ComputationalStochasticDynamicProgramming, F. B. Hanson

convergencecriterion for the full problemis to choosethe time stepô Á relative to thestatemeshsize

ïin (42) so

thecorrectorconvergenceparameter@ is actuallyselectedto beagooddeallessthanoneto accountfor nonlinearandconstrainedcontroleffects.

In [20], similar resultswereobtainedfor thefinite elementmethod,but usingeigenvaluemethodson thecom-parisonequation.

The convergenceaspectof stochasticdynamicprogrammingcalculationsis extremelycritical, becauseof theability to predetermineconvergence,maximumcorrectionsandaccuracy from the boundson the SDE coefficients.Many otherattemptsto encodethedynamicprogrammingsolutionalgorithmhave metwith failuredueto thelack ofadequateconvergencecriteria.

D. OTHERCOMPUTATIONAL METHODS

However, further improvementsin the numericalaspectsof dynamicprogrammingcan be madein the casewherestorageis morecritical thancomputation.Decreasingthenumberof nodeswhile maintainingglobalaccuracyusingmoreaccuratenodes,suchashigherorderfinite elementbases,candecreaseboththestoragerequirementsandtheexponentialdependenceon the logarithmof thenumberof nodes.Finite elementmethodsor Galerkinmethods[110, 93], dependingon the type of basisfunctions,areusuallymoreaccuratethanfinite differencemethods,butrequiremorecostlyfunctionevaluations.

Multigrid or multilevel methodsof Brandt[10] (seealso[11, 89]) canalsoused,in conjunctionwith thefiniteelementmethodor with othermethods,in orderto reducethenecessarynumberof nodesby successive useof fineandcoarsemethodsto enhanceaccuracy beyondtheaccuracy of suchgridswhenusedonly assinglegrids. Akian,ChancelierandQuadrat[1] havesuccessfullyusedavariantof themultigrid for thestationarydynamicprogrammingproblemandhave incorporatedit into the expert systemPandore. KushnerandDupuis[76] discussthe useof themultigrid methodsfor stochasticoptimal controlproblems.In [77], KushnerandJarvisapplymultigrid methodstosolve telecommunicationcontrolproblemsunderthe heavy traffic approximation.Hackbusch[48], andHortonandVandewalle[65] describeaparallelmultigrid methodfor parabolicequationsthatsimultaneouslytreatsbothspaceandtimegrids.

Thecollocationmethodwill beusedasa comparativebenchmarkfor thenumericalperformanceof othermeth-ods. Ascher, ChristiansenandRussell[4, 5] describeanefficient codefor ODE boundaryvalueproblems.Dyksen,Houstis,LynchandRice[34], andFlahertyandO’Malley [37] find thatcollocationtendsto outperformtheGalerkinmethodin numericalexperiments.RabbaniandWarner[103] pointsout difficultieswith thefinite elementformula-tion whenits approximationpropertiesarenot consistentwith flow propertiesin groundwatermodels.TheGalerkinprocedurehastheadvantagethatmoretheoreticalresultsareavailablefor it.

Othertechniqueshavebeenappliedto optimalcontrolproblems.Polak[99, 100] surveysgradient-type,Newton-type andothermethods,mostly suitablefor a deterministicproblem. Mufti [94] alsohassurveyed computationalmethodsin control. Larson[80, 81, 82] surveys dynamicalprogrammingmethods,discussestheir computationalrequirementsand presentsthe stateincrementdynamicprogrammingmethod. Jacobsonand Mayne [66] discussdifferentialdynamicprogramming,baseduponsuccessive approximationsanddynamicprogramming.They pointout that their methodis only suitablefor optimal openloop control when appliedto stochasticcontrol problems.Shoemakerandco-workers[79, 15] havecontinuedtomakeprogressontheconvergenceandparallelizationof discretetimedifferentialdynamicprogramming.GuanandLuh [47] haveaparalleldifferentialdynamicprogrammingmethodin which Lagrangemultipliersareusedto relax(parallelvariablemetricmethod)thecouplingbetweendiscretetimeinterconnectedsubsystems.

Kushner[73, 74] developeda convergent(in the sensethe weakconvergence)finite differencemethodbasedon Markov chainapproximations.It hasadvantagessuchasweakenedsmoothnessrequirementsandthepreservationof probabilisticpropertiesof thestochasticmodel. Kusher[75] andKushnerandDupuis[76] cover themorerecentdevelopmentsin theMarkov chainapproachsuchasapplicationsto jump andreflecteddiffusions.However, solvingby theMarkov ChainApproximationmaybecomputationallylengthyin someproblems,accordingto Kushner[74].Thismethodis currentlybeingdevelopedfor parallelcomputation[76, 77].

CrandallandLions [21], Souganidis[108] andCrandall,Ishii andLions [22] presentresultsfor vanishingvis-cosity methodfinite differenceapproximationsfor somewhat abstractHamilton-Jacobiequations.Their resultsarenot usefulfor theapplicationsthat we have modeled,dueto the unrealisticrestrictionthat theHamiltonianbe con-tinuous,but it is expectedthatviscositysolutionwill beshown valid for the jump caseeventuallyif not already(N.Barron,privatecommunication).We have alreadymentionedthediscontinuouspropertiesthatwould correspondtothe Hamiltonian,sincein mostapplicationsthe controlsareboundedratherthanunbounded.Their approachusing

Page 17: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

F. B. Hanson, ComputationalStochasticDynamicProgramming 17

vanishingviscosityis on theright track,andtheviscosityin our modelcanbegiveneithera stochasticor numericalinterpretation.Ourultimategoalis to beableto treatPoissonnoisewith fastandefficientalgorithms.Gaussiannoiseis relatively trivial to treatcomparedto Poissonnoise.

In Ludwig [85] andLudwig andVarah’s[86] numericalsolutionfor optimalcontrolof stochasticdiffusions,theyappliedcombinationsof collocationandNewton’smethod.

Our computationalresultshave emphasizedfinite differencemethods[105, 50, 62, 53, 55, 54, 17, 18, 19], inorderto facilitatethedevelopmentof optimalparallelandvectoralgorithms.Theseresultsbeganwith theonestate,onecontrolcase.Currently, resultsareavailablefor up to five state,five controlproblems,but six stateproblemsarepotentiallycomputablewith thecurrentgenerationof parallelcomputers.Someapplicationsmayrequiremorestatedimensions.

Theapplicationin [53] treateda two speciesmodelfor Lake Michiganandthatstrainedthemainframesat thattime. However, it is importantto treatmore interspecificinteractions,especiallywith the high degreeof turnoverin speciesdominancein this lake. The complexity of the interactionsboth biologically andeconomicallyrequireverygeneralcontrolmodels.This is justoneapplication,but it hasa greatdealof complexity with markedlydifferentlumpedspecies(onepredatorandoneprey) anddifferenteconomics(onesportandonecommercialfishery).Complexresourceapplicationsarea primarymotivationfor developingalgorithmsfor verygeneralcontrolproblems.

III. PARALLEL COMPUTATIONAL DYNAMICPROGRAMMING

Fast,parallelandvectoralgorithmsappropriatefor massivememorysupercomputersandmassively parallelpro-cessorsarebeingdevelopedfor themodifiednumericalmethodsmentionedin theprevioussection.Thesemethodsareappliedto thepartialdifferentialequationof dynamicprogrammingfor stochasticoptimalcontrolproblems.Thevectorform of thefinite differenceequationspermitsadvantagesto begainedfrom bothparallelizationandvectoriza-tion (or matrization).Themethodsdiscussedresultin executionspeed-ups, makingit morepracticalto numericallysolve dynamicprogrammingproblemsof higherdimensionsthanwould be possibleon serialprocessors.This is acontributiontowardrelieving Bellman’sCurseof Dimensionality.

A. HARDWARE ADVANCES

Ourprevioussupercomputingefforts [51, 52, 53, 55,54, 17, 18, 19] havebeendirectedtowardimplementationson the Alliant vector multiprocessorFX/8, on the Cray multiprocessorsX-MP/48, 2s/4-128and Y-MP/4-64, andon theConnectionMachinemassively parallelprocessorsCM-2 andCM-200. Work is currentlyproceedingon theConnectionMachineCM-5 massively parallelprocessorandCrayvectormultiprocessorC90.Theseimplementationsgreatlyenhancedperformanceby the removal of almostall datadependentrelations[101, 57]. From the Cray-1inthe1970sto thetoday’svectorsupercomputersandmassively parallelprocessors,machineperformancehasgonefrommegaflops(millions of floating point operationsper seconds)to gigaflops(billions of floating point operationspersecond)andheadingtowardsthe ultracomputinggoal of teraflops(trillions of floating point operationsper second)[7]. Supercomputershave majordifferencesin architecture.However, eachcompilerusessomevariantof Fortran90[71, 72], so that many codeoptimizationsareportablefrom onemachineto the next. Vectorizationcanbe viewedasa basicform of parallelismimplementedby pipeliningandsosharesmany optimizationtechniqueswith multipleprocessortypeof paralleloptimization.Thisalsomakesthehardwareor architecturemoretransparentto theuser.

TheCM-2, CM-200andCM-5 with their distributedmemoryprocessinghave additionalFortran90 extensionsthat enhancethe power of the computations,but which make themsomewhat dissimilar to the sharedmemoryar-chitecture.However, therehasbeena noticeableconvergenceof Fortran90 extensions.Our methodsrequiresomeknowledgeof thearchitectureandthecompiler, sincethebestoptimalcodemustfit thetemplatethecompileris writ-tento optimize[56]. Themainthrustin thefuturewill beimplementationona wide rangeof architectureto maintainportability andavoidanceof over-relianceon machinescurrentlyunderdevelopmentthat will not survive the highperformancecomputingenvironment.Gettingaccessto thecurrentgenerationof ultracomputers [7], suchastheCrayC90,CM-5 andIntel Paragon,is essentialfor solving largescalecomputingproblems.The largestproblemthatwehave computedis 6 stateswith 16 nodesperstate,usingabout60MW doubleprecisionmemorywith a total of 1Mnodes(i.e., onemillion discretestates).A dedicatedCray 2S has128MW (64 bit words),but this requiresspecialrequestsandcostsmany extraunits.Similarly, theCM-2 has32KB perprocessor, or 64MW (64bit word) for the16Kprocessormachine,128MW for the32K processormachineand256MW (2GB) for thefull 64K processormachine.The new generationConnectionMachineCM-5 hasup to 1056Sparcbasedprocessorswith 32MB RAM memory

Page 18: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

18 ComputationalStochasticDynamicProgramming, F. B. Hanson

and4 vectorunitseach,with thepropertythat32 of theseCM-5 processorshave thepower of 2.8 Y-MPs. ThenewgenerationCrayC90canhave up to 16 proprietaryprocessorsandup to 256MW RAM (MW meansonemega-wordof 64 bits length)perprocessor, while eachprocessoris aspowerful as2.22Y-MPs. In addition,CrayC90mayhavea CrayT3D massively parallelprocessorattachedwith up to 1024processornodes,and32 of theseT3D processornodeshave thepower of 6.7Y-MPs. WhenactualmaximalL INPACK performance[30] is usedasa benchmark,thentheCM-5 performsaswell as6.8Y-MPsper32 CM-5 processors,theCrayT3D performsaswell as11.8Y-MPsper32T3D processors,andtheC90performsaswell as3.2Y-MPsperprocessor.

A goal is the treatmentof 6 or morestatevariablesin realisticmodelswith thepresentlevel of accuracy. Fivestateswith 32 nodesperstaterequiresabout32M total nodes,but only about32K nodesif only 8 nodesperstateareneeded.

B. SOFTWARE ADVANCES:FASTERAND MOREEFFICIENTNUMERICAL ALGORITHMS

Many of the advancesin high performancecomputingare due to the useof betternumericalalgorithmsorsoftware[111]. In orderto developalgorithmsfor higherdimensionalstatespaces,a major futuredirectionwill bedevising new methodsfor computationalstochasticdynamicprogrammingin continuoustime. In placeof finite dif-ferencemethods,it is foundthatusingmorepowerful methodswhich will requirea smallernumberof nodesfor thesamelevel of accuracy. Someof thesemorepowerful methodsarethefinite element(Galerkin),themultigrid (multi-level) andcollocationmethods,aspreviouslymentioned.Originally, our serialimplementationwaswith theGalerkinmethod[61] aswell asourfirst parallelimplementationon thefirst commercialparallelprocessor, theDenelcorHEP,at ArgonneNationalLaboratory(Hanson,unpublishedresults),but we switchedto the finite differencemethodforeaseof parallel implementation.However, we have found that we neededto returnto the finite elementmethodtoreducememoryrequirements[20, 64].

Someof theearlywork onparallelalgorithmsfor dynamicprogrammingwasby LarsonandTse[83], andCasti,RichardsonandLarson[14], but wasessentiallytheoreticalandfor discretetime.

JohnssonandMathur[90] discussadvancedproceduresfor programmingthefinite elementonthemassively par-allel ConnectionMachineandalsorecommendefficientdatastructuresfor dataparallelcomputers[69]. XirouchakisandWang[115] survey the finite elementsusingparallelarchitectures,aswell asothermethodssuchasconjugategradient,multigrid anddomaindecomposition,applicableto controlproblems.Crow, Tylavsky andBosesurvey thesolutionof dynamicpowersystemsby hybridNewtonmethodsonparallelprocessors[23]. FredericksonandMcBryan[40] havefoundasuperconvergentparallelmultigrid method,but Decker [28] hasfoundthattheirmethodis notreallysignificantlymoreefficient thana goodparallelversionof the standardmultigrid algorithm,althoughthey achieveperfectprocessorutilization. We will focuson techniquesfor mappinghighdimensiongridsto lowerdimensiongridsfor theparallelstochasticdynamicprogrammingalgorithm.

1. OtherAdvancedTechniques:LoopOptimizations,Decompositions,Broadcasting

RearrangingFortran loops to eliminatedatadependencieshasbeenvery importanttechniquefor getting themost out of the so-calledautomaticoptimizing compilers. Someunderstandingof the optimizing compilers(i.e.,themachinemodel) is essentialto transformloopsinto a form recognizableby thecompiler. Someloop reorderingtechniquesarechangingthe loop nestorderandchangingvariables.A crucialobjective is putting mostof the loopwork in themostinnerloopsof anest.Somesupercomputerswill optimizeonly themostinnerloopsuchastheCraysin purevectormode,while othersmayparallelizeand/orvectorizemorethanoneloop. Recallthatvectorizationisreally a primitive kind of parallelization,wheretheparallelizationis carriedout by pipelineduseof vectorregisters,somostoptimizationtechniqueswill work for bothparallelizationandvectorization.Onetechniqueis thecollapsingof loopsinto a smallernumberby merging indicessoa smallernumberof indicesareusedto accomplishthesameiterationtasks.Theuseof moreefficient datastructuresto enhancecodeoptimizationwill bediscussedin thenextsubsection.Many of thesetechniquesarediscussedin [84, 56, 29], for instance.

As we have alreadymentioned,that mostsupercomputersusesimilar Fortranextensions,suchasFortran90[71, 72], sotheuseof advancedcomputerfeaturescanbegreatlyfacilitated,thatcodescanbevery portableandthatthehardwarecanbeessentiallytransparentto theuser. In addition,mostextensionsof theUnix languageC will havemostof theoptimizationsof Fortran90, includingtheloopoptimizationtechniquesjustdiscussed.

With many of thedistributedmemory, massively parallelprocessors,theuserhastheopportunityto spreadtheworkloadover themassive memorydistributedovermany processors.This spreadingpropertyresultsin suppressing

Page 19: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

F. B. Hanson, ComputationalStochasticDynamicProgramming 19

difficultiesdueto thegrowth in problemsize,makingmany algorithmssuchasdynamicprogrammingvery scalablein thattheworkloadcanbedividedup into many processors[112].

We have shown that our parallelstochasticdynamicalgorithmexhibits scaledperformanceasthe sizeof theproblemincreases[19]. Our CM-200 performancehasexceededthat of the Cray 2S for the 5 stateand16 nodesperstateproblem. TheConnectionMachineshows greatpromisefor applications,providedthecompany makingitremainsviable.Wearestartingtodeveloppurelyparallelalgorithms(i.e.,algorithmsbeginningasparallelalgorithms).TheConnectionMachineperformsrecursionsverywell usingshift operationsandsowehavemadegooduseof theseoperations.We have usedoperatordecompositiontechniques,broadcasttechniques,front endmemorymanagement[116, 117], FORALL loop structures[58], anddatavault methodsto enhanceperformance.However, our datavaultwork [118] hasbeenpreliminary, andwe plan to usethesuccessorsto this facility andtheCM-5 to beableto do asix stateapplication. With the introductionof the massively parallelCray T3D runningon a vectormultiprocessorCray C90 asa front endcomputer, the advantagesof massively parallelprocessingandvectormultiprocessingarecombined.

2. VectorversusHypercubeDataStructures

Oneof our biggestaccomplishmentshasbeento changethenaive hypercubetypedatastructureto thatof theaglobalvectordatastructure[55, 54]. In theusualhypercube(or hyper–rectangle)typerepresentation

DV� D $ ` Qj[ \5%J[ \'&f[ øzøzø [ \)( G é�] � %�] � &�] øzøzø ] � ( � (44)

for finite differenceor finite elementrepresentationof componentderivativestheoptimalvaluegradient� @ D o p G , par-

allel codedevelopmentandgeneralityis hindered.This is becausetheremustbeahighly nestedDO-loopfor thestatecomponentindex andindex for eachstate’snodes,for example,

do 1 i=1,ndo 1 j1=1,M1

do 1 j2=1,M2.....

do 1 jn=1,MnDV(i,j1,j2,...,jn) = ........

1 continue

sowhenit is necessaryto convert thecodeto a differentdimensionagooddealof theexistingcodemustbechanged,especiallystateDO-loopsandstatesubscriptnumbers,andstatearraydimensioning.Further, althoughthe overallscaleof thestochasticdynamicprogrammingproblemcanbeverylarge,thecall of thesubproblemfor eachcomponentmay not be very large out of respectfor the constraintscausedby Curseof Dimensionalityfor the entireproblem.Hence,theremaynot besufficient workloadon thecomponentbasisto achieve high loadbalanceon theparallelorvectorprocessorsandconsequentlynotachievehighefficiency on theadvancedarchitecture.Many of theseadvancedcomputerswill only parallelizeor vectorizethemostinner loop (e.g.,a Craywill only vectorizethemostinner loopby default),sootherloopsin thenestwill notbehighly optimized,if at all.

Onewayaroundthis optimizationhinderingdatastructureis theuseof a vectordatastructureto globally repre-sentall of thestatenodes.Thus,thehypercubedatastructureis replacedby

DV�KJ $ ` Q�[ L�M é�] � ( � (45)

in thecaseof a commonnumberñ0Q � ñ nodesperstate,so ñ é is thetotal numberof nodesin theglobal,vectordatastructure,with index N � � to ñ é . Hencefor stateloopsinvolving thevectordatastructuregradientDV, thenestdepthwill beonly betwo, thestatenodeswill requireonly oneglobalsubscriptandthearraydimensioningneedonly bechangedoncein eachroutine,whenchangingdimension.For instance,a typical stateloop would look havetheform,

do 2 i=1,ndo 2 js=1,M**n

DV(i,js) = ........2 continue

Page 20: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

20 ComputationalStochasticDynamicProgramming, F. B. Hanson

Further, a largeamountof theworkloadis thenin theglobalstatenodeloop,promotingmoreefficientuseof parallelandvectorsupercomputersthroughloadbalancingandevenlyspreadingthework load.

In thecaseof acommonnumberof nodesñ , thevectordatastructurescalarindex N canbecomputedfrom thehypercubevectorindex ð � D î Q G é�] � � D î � �7î � �Z���Z�Z�7î é G Eby aFortranlinearstoragetechniquethatcanbeusedto storetheindices,N � N � ð ~� � é� Q ò � �¦î Q � � -� ñ ü Q µ � � �givena givenvectorindex

ðandwhere N � � to ñ é includesall thestatenodeslinearly. Also, theremustbea way

to go from the vectordatastructure,backto the hypercubedatastructurefor computingboundaryconditions,statecomponentsof derivativesandsimilarquantities.Thisstateindex inversetransformationisî Q �hî Q � N ~� � Int

� � N � � -� F �öJò Q � � �vî ö � N -� � '� ñ ü ö µ � �ñ ü Q µ � � � �for V � ñ to � step

�4� � , assumingthenotation F �öfò � � �PO ö S°< . Thedirectandinversetransformationsonly haveto becomputedonce,while theactualcodingis muchsimplerthatit seems.

Thevectordatastructurehasalsobeenusedby KushnerandJarvis[77] for applicationsof controlledtelecom-municationssystemsundertheheavy traffic approximation.In addition,they have improvedtheindex representationby a techniquecalledcompressedauxiliary array index wherespatialindicesarecompressedinto a singlearrayandbit operationsareusedto performindex operations.They havefoundenhancedvectorizationandsimplifiedmultigridcalculations.

Thevectordatastructurewouldalsobeusefulfor morestandardPDEsaswell, sincethedatastructureproblemis mainly a PDEproblem. Althoughtherearemany generalroutinesfor multi–dimensionalODEs,hardlyany existfor multi–dimensionalPDEs.

C. GraphicalVisualizationof MultidimensionalResults

Scientificvisualizationis essentiallyfor examiningsuperamountsof outputfrom supercomputercalculations.A systemfor the visualizationof multidimensionalresultscalled I/O View [102, 60], utilizing an Inner coordinatesysteminsideanOutercoordinatesystem,hasbeendevelopedfor controlapplications.Althoughthedevelopmentwasintendedfor an applicationof resourcemanagementandcontrol applicationin an uncertainenvironmentto displaybothoptimalcostsandcomponentsof theoptimalcontrolvectoragainstthestatevectorcomponentsandparametrizedby otherquantities,thesystemis applicableto almostany multidimensionaloutput.

The managementof renewableresourcessuchascommercialandrecreationalfisheriescanbe difficult duetolack of data,environmentaluncertaintyanda multitudeof speciesinteractions.The dataneededto managethe re-sourcecanbebiologicalaswell aseconomicandenvironmental.Biomathematicalmodelingcanhelpfill in someofthegapsin thedata.Stochasticmodelingcanapproximatetheeffectsof environmentaluncertainty. Supercomputingenablesthe handlingof a reasonablenumberof interactingbiological species.However, electronicvisualizationisessentialfor interpretingthemultidimensionalsupercomputerresults.Further, visualizationshows resourcemanagerhow changesin managementpolicy effect theoveralleconomicperformanceof thefishery, but alsoshowshow sensi-tive theperformanceis with variationsin thepoorly known dataparameters.An implementationof a world within aworld vision conceptFeinerandBeshers[35] permitsvisualizationof a 3D solutionsurfacein aninnerworld, whichchangesalongwith correspondingchangesin theparametersof a 3D outerworld.

Refinementsweremadein the original notion of inner andouterworlds to improve the implementation.Forexample,theinnerandoutercoordinatesystemsweredetachedsotheouterworld parameterswouldbeeasilyreadable.A detaileduserinterfacewas developedto allow rotationsand translationsof the imagesurface,aswell as manyotherfeatures.This implementationallows theresourcemanagerto visualizemultidimensionalresourcesalongwithparametersensitivity of optimalvalueandoptimalcontrols.

Our implementationis called I/O View and is schematicallyrepresentedfor a particularcasein Figure2. Inthis figure,theoptimalvaluesurface Q p* is representedin the innerworld (

H) coordinatesystemasa functionof two

other inner coordinates,the independentsstatesß � and ß � . In the samerepresentationis the outerworld (

-) in

Page 21: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

F. B. Hanson, ComputationalStochasticDynamicProgramming 21

X1

X2

X3

R2

t

V*

R2’

X3’

t’

Sv*

(I)

(O)

Figure2: Schematicrepresentationof multidimensionalscientificvisualizationsystem.Seethetext for explanationoftheInnerandOuterWorlds.

which threeoutervariables,a third state ßSR , time�

anda parameterT � determiningthe sizeof the ß � stateaxis.The largedot in theouterworld (

-) representsthevaluesof the threefixedoutervariables-parameters

� T U� � ßVUR ��� U .Thesethreeparameterscanbe variedby moving the large dot by meansof the cursorwith the computedsolutionsurface Q p* changingin responseto thechangeof outervariablesto exhibit theevolution or parametersensitivity oftheoptimalsurface.Both innerandoutersystemaxeshave coloredcodedattributesasadditionalvisualcuesin theactualimplementation[102]. Uponnestingmoreouterworld coordinatesystems,many morethanthesix dimensionsdisplayedin Figure2 canberepresented.

The visualizationwas originally developedon Silicon Graphicshardware [102], but is beingportedto otherplatformslike the NeXT [60] andimproving its performance.Originally, the interfaceFormsLibrary of Overmars[98], designedfor SiliconGraphics,wasused.Remotesupercomputeroutputis sentdirectly to thelocalvisualizerbydatastreamingbetweensocketsusingtheApplicationsCommunicationsLibrary [68], thussimulatingnearreal timeaccessof output.

D. NumericandSymbolicInterface

Akian,ChancelierandQuadrat[1] describeaexpertsystemcalledPandore thatdoesanextraordinarynumberoftasksin additionto solvingstationarystochasticdynamicprogrammingproblemswith Gaussiannoiseperturbations.This systemrelies heavily on symbolicprocessingto produceproofs of existenceof the solution, analysisof thesolution,graphsof thesolutionsandseveralotherfeatureswhichareall summarizedin a short LATEX paper.

Wangandco-worker [113] have developeda symboliccomputingsystemcalledGENCRAY thatautomaticallygeneratesvectorizableCrayFortrancode.GENCRAYcanalsogenerateparallelCrayFortran.

Somefuturedirectionswill beto integratesymbolicandnumericalcomputation,by usingsymboliccomputationto simplify thedynamicprogrammingalgorithm,andalsoby generatingcodethatwill removegeneraldatadependen-cieswhile beingportableto othermachines.TheFuture Directions for Research in Symbolic Computations report[41] emphasizesthenumeric-symbolicinterfaceandtheincreasedrole parallelandsupercomputerswill play in thisarea.

IV. SOMERELATED METHODS

In this section,two relatedmethodsarepresentedascompetingmethodsto provide contrastfor stochasticdy-namicprogramming.Thesemethodshave somesimilaritiesto SDPor areusedto solve similar problems.ThesearedifferentialdynamicprogrammingandMarkov chainapproximation.Therearemany othermethodsthat could beincluded,but only thesetwo areusedto conservethescopeof thischapter. In addition,thesetwo aretheonesthataremostoftenmentionedin comparisonto stochasticdynamicprogramming.

Page 22: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

22 ComputationalStochasticDynamicProgramming, F. B. Hanson

A. DIFFERENTIAL DYNAMIC PROGRAMMING

Differentialdynamicprogramming(DDP)is avariantof dynamicprogrammingin whichaquadraticapproxima-tion of thecostabouta nominalstateandcontrolplaysanessentialrole. Themethodusessuccessiveapproximationsandexpansionsin differentialsor incrementsto obtaina solutionof optimalcontrolproblems.TheDDP methodisdueto Mayne[91, 66]. DDP is primarily usedin deterministicproblemsin discretetime, althoughtherearemanyvariations.Mayne[91] in his originalpaperdid give a straight-forwardextensionto continuoustimeproblems,whileJacobsonandMayne[66] presentseveralstochasticvariations.Themathematicalbasisfor DDPis givenby Maynein[92], alongtherelationsbetweendynamicprogrammingandtheHamiltonianformulationof themaximumprinciple.A concise,computationallyorientedsurvey of DDP developmentsis givenby Yakowitz [120] in anearliervolumeofthis seriesandthe outline for deterministiccontrol problemsin discretetime hereis roughly basedon that chapter.Earlier, Yakowitz [119] surveystheuseof dynamicprogrammingin waterresourcesapplications,nicelyplacingDDPin thelargerperspectiveof otherdynamicprogrammingvariants.Also, Jones,Willis andYeh[70], andYakowitz andRutherford[121] presentbrief helpfulsummarieswith particularemphasison thecomputationalaspectsof DDP.

1. DynamicProgrammingin DiscreteTime

Let W be the discreteforward time correspondingto��Xn� W �&ô�� with initial time

� 6 � < or W � < andfinaltime

��Y �N� > � õ �+ô��or W � õ so the stagesgo from W � < to õ in stepsof � (in oppositedirectionto the

backwardtime Á ö � ó2�8ô Á of SDP).Let:ZX2� D : Q�[ X G é�] � bethe � -dimensionalstatevectorand

k[X � D k Q�[ X G è ] � bethe � -dimensionalcontrolvector. Thediscretetimedynamicsalongthestatetrajectoryis givenrecursively by: X � � ��� X �j: X ��k X �� for < m W m õ � � � (46)

wherediscreteplantfunction�\X

is at leasttwicecontinuouslydifferentiablein bothstateandcontrolvectors.Thetotalcostof thetrajectoryfrom W � < to õ iso D :9��k Î < � õ G � Y µ ��X ò 6 "�XR��:ZX���k[X+; ¨g]Y��j:ZYAf� (47)

where"�XR��:9��k9

is thediscrete-timecostfunction,assumedto bea leasttwice continuouslydifferentiablein thestateandcontrolvectors.Implicit in (47) is thatthecostis separablewith regardto thestagesW . Thefinal or salvagecostis denotedby

g Y ��: Y , assumedat leasttwicecontinuouslydifferentiable.

Theultimategoalis to seektheminimumtotal costoqp6 ��:ZX_9� s2tvuw =_^J[�/�/�/ [ =�`ba % { D o D :9��k Î < � õ GvG � (48)

subjectto thedynamicrule (46). However, for enablingdynamicprogramminganalysis,thevariabletime-to-gocosto D :9��k Î W � õ G � Y µ �� Q ò X " Q �j: Q ��k Q ; )g]Y �j:ZYAf� (49)

andits minimization, o pX ��: X 9� s2t¦uw =�c [�/�/�/ [ = `ba %�{ D o D :9��k Î W � õ G¦G � (50)

is considered,subjectto thefinal costconditiono pY �j:'��°gdY5��:'��(51)

consistentwith thedefinitionthat F Y µ �Q ò Y O Q S�< andwith salvagecostsassumedin (47).Applying to thedynamicprogrammingprincipleof optimality,o pX �j: X ~�)s2tvu= c � " X ��: X ��k X ! s�tvuw =�cfe�%�[�/�/�/ [ = `ba %{ o D :9��k Î W � � õ G � �

Page 23: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

F. B. Hanson, ComputationalStochasticDynamicProgramming 23

decomposingtheoptimizationinto thatof thecurrentstepplusthatfor therestof thecost-to-go. Usingthedefinitionof time-to-gooptimalcost(50)andsubstitutingfor

: X � � from therecursivedynamicrule (46),o pX ��:ZX�� W 9�}s2tvu=�c J "�XR��:ZXR��kgXi; o pX � � ���\XR��:ZXR��kgXi M � (52)

Thecalculationof thisminimumsimultaneouslyproducestheoptimalcontrol,· pX �j: X 9��²�³�´9s2tvu= c J " X �j: X ��k X ! o pX � � ��� X �j: X ��k X � M � (53)

astheargumentof theminimization,for W � õ � � to < in backwardsteps(i.e.,�4� � steps).Theequations(52,53)

comprisetheDP recursivebackward sweep. Here,thetermsweepis usedto indicateaniterationover all time steps,reservingtheword stepfor eithertime or statesteps.Theuseof thetermsweepis not to beconfusedwith theuseintherelatedSuccessiveSweepMethodasin [27].

TheDP recursiveforward sweepusestheoptimalcontrol

· pXfoundin thebackwardsweepin aforwardrecursion

of thedynamicalequation(46), : pX � � ���\X��j: pX ��k pX �j: pX f� (54)

startingfrom theinitial state: p6 � :;6

andcalculationfutureoptimalstatesfor W � < to õ � � in forwardstepsof �up to thefinal state

: pY.

While thebackwardandforwardrecursionsof dynamicprogrammingseemto give a methodfor computingasolutionto the discretetime controlproblem,they do not give any actualcomputationalmethodfor calculationtheminimumor theoptimaltrajectorymotionthatwouldbeneededfor computationalimplementation.Theimplementa-tion is especiallyunclearif theproblemis nonlinear(this is truealsoof thecontinuoustime case).In orderto maketheactualcomputationwell-posed,a quadraticapproximationcostateachDDPstage

óis applied.

2. FinalTimeDDPBackwardSweep

EachDDP iteratestartsout with a current,approximateiterate h for thestate-controlset i :kjX ��kgjX�l of nearfinaltime õ pairsof stateandcontrolvectors,satisfyingthediscretedynamics: j X � � ���\X��j: j X ��k j X f� (55)

for W � < to õ � � . Sincedynamicprogrammingtakesbackward stepsfrom the final time, the startingiterateis really the final time iterate. It is assumedthat thesecurrent,nominal iteratesare somewhat closeto the targetoptimaltrajectoryto justify Taylor approximations.Theiterationsproceeduntil thetrajectoriesof successive iteratesaresufficiently close.

For thefinal time-to-gocost(thestartingstepfor eachbackwardDDPsweep),aTaylorapproximationaboutthecurrentiteratestate-controlset i :kjY ��kbjY l up to quadratictermsleadsto theapproximateformulao D :9��k Î õ � õ G � " Y �j: Y ��k Y S g Y �j: Y ÿ Æ o Y ��: Y ��k Y S " j Y #� E@ D " YÂG � X : Y h� EB D " YÓG � X k Y (56) �� X : E Y ��� @ � E@ D "�Y G � X :bYh X k E Y ��� B � E@ D "�Y G � X :ZY �� X k E Y �^� B � EB D "�Y G � X k[Y �with incrementsX : Y �¢: Y �0:kjY and X k Y �Äk Y �0kgjY

. Thefollowing arrayTaylor coefficientsareredefinedforcomputationalstorageas " j Y S " Y ��" Y �j: j Y ��k j Y �nm jY E S � E@ D " YÂG �°� E@ D " YÓG ��: j Y ��k j Y �no jY E S � EB D " YÂG �°� EB D " YÓG ��: j Y ��k j Y � j Y S �� � @ � E@ D " Y®G �N�� � @ � E@ D " Y®G �j: j Y ��k j Y (57)

Page 24: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

24 ComputationalStochasticDynamicProgramming, F. B. Hanson� jY S � B � E@ D " YÓG ��� B � E@ D " YÓG ��: j Y ��k j Y e jY S �� � B � EB D " YÂG � �� � B � EB D " YÓG �j: j Y ��k j Y ��wherethenotation

� Edenotesthetransposeof array

�,� @ is thegradientin thestateand

� B is thegradientin thecontrol. In this section,gradientsareevaluatedat thecurrentstate-controlsetfor thetime considered.It is assumedthat the set i : Y ��k Y l are variablepoints on a trajectorynearbyto the currentset i :kjY ��kgjY l . However, we willnot needthe

: Ystatevaluesthemselvesexceptherefor the purposesof analysis,but only the coefficientsdefining

the functionalform of thequadraticapproximation(notetheanalogouspropertiesto thatof theclassicalLQ controlproblem).

In absenceof constraints,the improvedapproximationto theoptimalcontrol is thenthecurrentapproximationto theoptimalcontrolis obtainedfrom thecritical pointsof Æ o at W � õ ,� B�D Æo YÓG �j: Y � Æk pY ¤� m j Y � jY � X : Y � � e jY � X Æk pY (58)� � EB D " YÂG h� B � E@ D " YÂG � X : Y h� B � EB D " YÂG � X Æk pY� < �sothat X Æk pY (59)� Æk pY �j: Y -�nk j Y� � �� e µ jY �&�nm jY � jY � X : Y � �A�Y� B � EB D "�Y G µ � �R×Y� B D "�Y G #� B � E@ D "�Y G � X :ZY ÙS p j Y 7q jY � X :ZY �Thisalinearcontrollaw in thestateperturbationX : Y with fixedcoefficient p j Y andgaincoefficient

qZjY, providedthat

theinversee µ jY �¢� e jY µ �

is well defined(e.g.,e jY � � � B ��EB D " YÂG is positivedefiniteandhencenonsingular;how-

ever, positive definitenessis only neededneartheoptimaltrajectory).Sincetheexpansionin Taylor approximationswith only a few termsis goodonly if both X : Y and X k Y aresmall,aproblemin consistency arisesif thefixedpartofX k Y , p j Y , is notsmallin (59). However, nearbytheoptimaltrajectory, p j Y � �A��� B ��EB D " YÓG µ � ��� B¿D " YÓG shouldbesmallsince

� B D "�Y G p � < ontheoptimaltrajectoryat thefinal time. In thecasethat p j Y , aswhenthestartingguessinnotgood,JacobsonandMayne[66] suggestmultiplying p j Y by asufficiently smallparameterr until theiterationsarecloserto theoptimaltrajectory. Thesecommentsapplyon thetimehorizon, < m W m õ , not just for thetime W � õ .

In thepresenceof constraints,thelinearcontrollaw (59)yieldsonly theregularcontrol,whichmustbemodifiedfor theconstraints,asin theprevioussectionfor SDP. However, for inequalityconstraints,theinconsistency problemof non-smallp j Y canarise,sopenaltyfunctionsmustbeusedto movetheconstraintsto theobjective functional.

The correspondingapproximationto the optimal costvaluefunction follows from substitutingthe currentap-proximationof theoptimalcontrol:Æ o pY ��:ZYAr� Æo Y���:ZY�� Æk pY ��:ZY� (60)� T jY a�ts jY E � X : Y X : E Y ��u jY � X : Y �reducedto a quadraticfunctionin X : Y only. Uponcalculation,thecurrentcoefficientsaregivenbyu jY � � j Y � �1 � � jY E e µ jY � jY�'s jY E � �vm jY E � �� �no jY E e µ jY � jY (61)T jY � " j Y � �1 �vo jY E e µ jY o j Y �The last, currentlyfixed coefficient T jY is not essentialfor the iteration,unlessthe valueoptimal costneededasaresult.

Page 25: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

F. B. Hanson, ComputationalStochasticDynamicProgramming 25

3. GeneralDDPBackwardSweepin Time

The coefficientsat final time õ definethe beginning of the DDP backward sweepfor the currentiteration h ,while thecoefficientsfor theremainingiterationtimesarefoundby marchingbackwardfrom W � õ � � to W � <assumingthequadraticvalueinductionhypothesis,Æ o_pX � � �j:ZX � � 9� T jX � � a�ts jX � � E � X :ZX � � X : E X � � �6u jX � � � X :ZX � � � (62)

basedontheformula(60)for W � õ , whereX : X � � S : X � � �¥:kjX � � . Assumingboth: X � � and

:kjX � � obey thedynamicsof (46) (if not, thecalculationis morecomplex andnot very suitablefor implementation),thenthe incrementin thevectordynamicsis expandedasthequadratic,X :ZX � � � �\X��j:ZX���kgXi'�Ú�\X¿��: j X ��k j X � �\X��j: j X X :gXq��k j X X k[X+-�n�\X��j: j X ��k j X ÿ �Y� @ � E X E � X :gX ��Y� B � E X E � X kgX (63) � �� X : E X �&��� @ � E@ « Qj[ X -� X : X X k E X �&�Y� B � E@ « Q�[ X '� X : X �� X k E X ���Y� B � EB « Q�[ X_-� X k[X8� é�] � �which is similar to theexpansionof thescalarcost

" Yin (56).

Next theargumentof theminimumof thePrincipleof Optimalityin (52)is expandedbysubstitutingthequadraticexpansionsfor

" Xin (56) and

� Xin (63), while retainingonly termsup to quadraticorder(beingcarefulto expand

aboutthelatercurrentstate:kjX � � whenusing Æ o pX � � �j: X � � asrequiredby theinductionhypothesis(62)), theargument

becomes Æ o D :9��k Î W � õ G S "�X¿�j:ZXR��k[Xi! Æo pX � � ���\X¿�j:ZXR��k[Xiÿ Æ o X¿�j:ZXR��k[Xi S ` jX a�vm jX E � X :ZXÈ ��no jX E � X kgX (64) X : E X � � j X � X :bX X k E X � � jX � X :bX X k E X � e jX � X kgXR�This is thequadraticapproximationto theminimizationargumentin (52). Thecorrespondingcoefficientsare` jX � " j X T jX � � ��vm jX E � � E@ D "�X G �j: j X ��k j X ; a��� @ � E X E ��: j X ��k j X 5s jX � � ��vo jX E � � EB D " X�G �j: j X ��k j X ; a��� B � E X E ��: j X ��k j X ws jX � � � (65)� j X � �� � @ � E@ D "�X G �� ��� @ � E@ � E X 5s jX � � a��� @ � E X )u jX � � ���Y� @ � E X E��jX � � B � E@ D "�X G �� ��� B � E@ � E X ws jX � � � �Y� B � E X )u jX � � ���Y� @ � E X Ee jX � �� � B � EB D " X�G �� ��� B � EB � E X 5s jX � � ��Y� B � E X )u jX � � �&�Y� B � E X Ewhereargumentsof thegradientsat thecurrenttime in thelastthreecoefficientshavebeenomitted.

Minimizing thequadraticapproximationin (64),< �Ä��� B/D Æo X G p �xo jX ��jX � X : X � � e jX � X Æk pX �yieldsthecurrentapproximationto theoptimalcontrolat W ,X Æk pX � p j X 7q jX � X : X � (66)

where p j X S � �� e µ jX �yo jXand

q jX S � �� e µ jX � ��jX �(67)

Page 26: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

26 ComputationalStochasticDynamicProgramming, F. B. Hanson

Theselattertwo coefficientsneedonly besavedfor thecurrentapproximateoptimalcontrolsincethey defineits linearfunction in thestateincrement.Thequadraticapproximationis critical for thestraightforwarddeterminationof thecontrol, in this caseunconstrained.Recallthat in thecasewhere p j X is not small, thena small coefficient canusedwhere p j X is applied[91] (i.e., p j X is replacedby rzp j X with sufficiently small r until convergenceof the successiveapproximationsto

o pXis assured).Thedifficulty ariseswhentheapproximationsaretoofarfrom theoptimaltrajectory,

but closeto theoptimal trajectoryo�jX �� B D .{jX G shouldbesmall if

.{|Xis somepseudo-Hamiltonianto which the

minimumprincipleapplies.Upon substitutingthe linear control law (66) into the bivariatequadraticapproximation(64), the formula for

inductionhypothesis(62) is obtainedfor time W giventheresultfor W � ,Æ o pX ��: X 9� T jX a�ts jX E � X : X X : E X ��u jX � X : X � (68)

proving theinductionpartof thecurrentbackwardsweep.Further, thestatedependentquadraticcoefficientsaregivenin theprocedureas u jX � � j X �� � � jX E q jX�ts jX E �¢�nm jX E � �� �no jX E q jX (69)T jX �°` jX � �1 �vo jX E p j X �for W � õ to < in backwardsteps,includingthefinal conditionin (61) aswell. This concludesthebackwardsweepfor iterateh .

A primarypropertyof DDPis thatit is exact,in infinite precision,for theLQP(LinearQuadraticProblem),sincetheTaylorapproximationswouldbeexactin theLQPcase[91].

4. DDPForwardSweep

In theforwardsweepfor iterateh , the h th backwardsweepapproximationfor theoptimalcontrol,Æk pX �}k j X X Æk pX �ak j X p j X 7q jX � X :ZXR� (70)

givenin incrementform by (66) for all thetimes W � < to õ , is usedto calculateimprovementsin theoptimalstatetrajectory. If p j X is notsmall,then(70) is replacedbyÆk pX �}k j X r � p j X 7q jX � X :ZX��with <2=}r m � selectedto fosterconvergenceuntil p j X is smallagain[91, 121]. Thediscretedynamiclaw (46),Æ: pX � � ��� X � Æ: pX � Æk pX f� (71)

canbe solvedrecursively for W � < to õ � � in forwardstepssincethe linearcontrol law approximationis given.Oncedone,thesuccessoriterateis definedas : j � �X � Æ: pX � (72)

for W � < to õ . However, for computationalstorageconsiderations,thenew iteratejustreplacestheold,:kj � �X � ¬ :kjX

,saving theold iterateonly for thenext checkof theiterationstoppingcriterion.

The combinedbackwardandforwardsweepiterationsendwhenan appropriatestoppingcriterion is reached,suchas s$²8¼X |¦| Æ o pX �j: j � �X -� Æo pX �j: j X |¦| m tol * �in somesuitablenorm.Thisvaluestoppingcriterionmayalsobesupplementedusingsuccessivestates,s$² ¼X |v| : j � �X ��: j X |¦| m

tol @ �

Page 27: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

F. B. Hanson, ComputationalStochasticDynamicProgramming 27

or usingsuccessivecontrols, s2² ¼X |v| k j � �X �nk j X |v| mtol B �

wherethetolerancestol areprescribed.

5. SimilaritiesandDifferencesbetweenDDPandSDP

TheobviousdifferencebetweenDDP andSDP, aspresentedhere,is thatDDP is primarily appliedto determin-istic problems,ratherthanstochasticas is SDP. However, JacobsonandMayne[66] andothersdo give stochasticvariants,includingformulationfor theLQGP(Linear, QuadraticandGaussianProblem).Theseauthorsalsopresentthecontinuoustimecase,aswell asgiveestimatesfor theerrorscommittedin DDPandstep-sizeadjustments.

Anotherdifferenceis that the DDP iterationsareover the entire time horizonfor eachiterate,whereasSDPiterationsin the backward sweeprangeover only a singletime stepfollowedby a backwardmarchin time for thenext setof iterations.Thatis, anDDP backwardsweepiterationrangesoverbothstateandtemporalspaces,whereastheSDPbackwardsweeprangesover only thestatespaceandstopwhensuccessive valuesfor that time stepsatisfycorrectorstoppingcriterion. In contrast,an approximationfor all timesis found in DDP, beforeproceedingto thenext iteration.In termof thetimedomain,roughlyspeaking,DDPis somewhatanalogousto thepointJacobimethod,while SDPis morelikeGauss-Seidel.

Also, a meshselectionratio recipeis availablein SDPfor selectingtheratio of thetime stepto anapproximatemeasureof thespacestep. However, DDP doesnot discretizeeitherthestate

:or thecontrol

kvariables[121, 70]

so the meshselectionratio is not relevant,while the SDPheredoesdiscretizethestatebut thecontrol is computedasoutputwith thevalue.Thecomputationalandstoragecostpertime stepof SDPis exponential

- � � � ñ é in (38)consistentwith thecurseof dimensionalityandwith ñ finite differencenodesperstate.However, for DDP thecostperstageis thecostof computingthecoefficientarrayssuchas

� j X,� jX

ande jX

from theassociatedHessianarrays,aswell asthecontrollaw coefficients p j X and

qbjX, sothestoragecostis

- � � � � � � or

- � � R � � � � � � � � if theHessianof thedynamicsvector

� Xis storedfor thestageandthecomputationalcostis

- � � R � � � � for thecontrollaw and

- � �P~ � � � R � � � � � for� j X

,� jX

ande jX

. dependinghow themultiplicationsin (65)arecomputed(seealso[121, 70] for a somewhatdifferentaccounting).TheDDPmanagesto keepthecurseof dimensionalityundercontrol.A majorreductionin storageandcomputationoccursin DDPsincecalculationsareconcentratedin theneighborhoodof anoptimaltrajectory, whereasSDPcoversthewholestatespace.

Theassumptionthatcostarequadraticin thecontrolfor SDPis somewhatsimilar to thequadraticassumptioninDDP, exceptthat in DDP thequadraticis in bothcostsanddynamicswith respectto bothcontrolandstate.In SDP,the critical calculationof the optimal control requiresonly that the costbe quadraticandthe dynamicsbe linear inthe control,which is morerealisticwherethe nonlinearityin the dynamicsis an essentialpart of the model. In theversionof SDPpresentedhere,thequadraticcostswerepresentedanpartof themodelratherthanpartof themethodof solution,althoughmorecomplex costanddynamicscouldbe treatedby a quadraticTaylor approximationof thecostanda linearTaylor approximationof thedynamicsto formulateaniterative procedureto solve themoregeneralSDPproblem.In DDPcomputationis mainly thecomputationof thetemporalfunctionalcoefficients,whereasin theSDPherethecomputationpermitsgeneralstateandtimedependence.

Quadraticconvergencefor DDP hasbeenshown by Murray andYakowitz [95]. For SDPin continuoustime,NaimipourandHanson[96, 59] haveaheuristiccomparisonargumentfor convergence,andtheconvergenceis linear,in that, ` ü � � �ì [ ö �n` ü4� �ì [ ö �8�Ø� É ` ü �ì [ ö � ` ü3� �ì [ ö Ì �where þ is thecorrectioncounterand

�is thecorrectorconvergenceparameterfor which the @ in (42) is anapproxi-

mation.Theforwardsweepsarequitedifferentin thetwo cases,dueto thedifferencein stochasticanddeterministicfor-

mulations.Theexplicit recursionin DDP is relatively trivial comparedto solvingtheforwardKolmogorov equationsfor thestatedensitygiventhestochasticoptimalcontrolto gettheoptimalexpectedstatetrajectory.

6. DDPVariationsandApplications

Yakowitz [119, 120] hasgiven a thoroughsurvey of the computationand techniquesof differentialdynamicprogrammingin 1989.Liao andShoemaker[79] studiedconvergencein unconstrainedDDPmethodsandhavefound

Page 28: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

28 ComputationalStochasticDynamicProgramming, F. B. Hanson

that adaptive shifts in the Hessianare very robust and yield the fastestconvergencein the casethat the problemHessianmatrix is not positive definite.Chang,ShoemakerandLiu [16] solve for optimalpumpingratesto remediategroundwaterpollution contaminationusingfinite elementsand hyperbolicpenaltyfunctionsto includeconstraintsin theDDP method.Culver andShoemaker [24, 25] includeflexible managementperiodsinto themodelanduseafasterQuasi-Newtonversionof DDP. Earlier, MurrayandYakowitz [95] hadcomparedDDPandNewton’smethodstoshow thatDDP inheritedthequadraticconvergenceof Newton’smethod.Caffey, Liao andShoemaker [15] developaparallelimplementationof DDPthatis speededupby reducingthenumberof synchronizationpointsover timesteps.

B. MARKOV CHAIN APPROXIMATION

Anotherapproachto finite differencesis the well developedMarkov Chain Approximation(MCA) of Kushner[73, 74]. Recentdevelopmentsaresurveyedandfurtheradvancedby Kushner[75], andby KushnerandDupuis[76],with specialattentiontomethodsfor jumpandreflecteddiffusions.ThismethodappliesaMarkov chainapproximationto continuoustime, continuousstatestochasticcontrolproblemsby renormalizingfinite differencesformsasproperMarkov chaintransitionprobabilities.Thesetransitionprobabilitiesarisewhenderiving finite differenceversionsofthedynamicprogrammingequation.An importantadvantageof this methodis thattheMarkov chainapproximationfacilitatesconvergenceproofsfor thenumericalmethodsin termsof probabilisticarguments.Probabilisticinterpreta-tion of theapproximationis a majormotivationfor theformulationof this method.Here,theMCA methodis givena formal presentation,in thespirit of theSDPnotationandformulationto facilitatecomparison.The readershouldrefer to theabove referencesfor thegreaterdetail,especiallyKushnerandDupuis[76] for a multitudeof variationsandconvergenceproofs.

1. MCA DynamicProgrammingModelFormulation

ConsiderthestochasticdiffusionwithoutPoissonjumpsgovernedby thestochasticdifferentialequation(SDE)� ��������������� � ��� � �� #"$������� ��% �����(73)

wherethenotationis thesameasin (1). It is assumedthatdrift�

andGaussiancoefficient"

arebounded,continuousand Lipshitz continuousin the state

�, while

�is uniformly so in the control � . Further, let the expectedcost

objective functionalbe ` ��:9��kl���¤�Mean

� ( b cb ��d�e ����� d �� � ����� d f� d f� d (74)| �������}:9� � �}k!� g5�j:9��� > ��

with thesamenotationasin (2). It is assumedthat theinstantaneouscoste

andthesalvagecostg

areboundedandcontinuous.Muchalsomaydependon theboundaryconditions.Thefinal sideconditionis thesameasthatin (8) forSDP.

Theoptimalcostsaredefinedas o p �j:9���~�)tvu��= J ` �j:9��kl��� M � (75)

usingtheinfimum(tvu��

), insteadof thelessgeneralminimum(s2tvu

) in thespirit of [76], overall admissiblecontrols.Uponapplicationof theprincipleof optimality, thedynamicprogrammingequationfor theoptimalexpectedcost

o pis < � o pb  � @RD o p G �j:9��� (76)� o pb  tvu��= �j� E �j:9��kl���� @�D o p G �� ��"�" E J�j:9��� � � @ � E@ D o p G e �j:9��k���� G �

Since(76) is abackwardequationandsincewewantto keepit simple,(76) is approximatedwith theBackwardEuler

Page 29: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

F. B. Hanson, ComputationalStochasticDynamicProgramming 29

approximation oqpX µ � ��:'¤� oqpX �j:'� #ô���X µ � �Zt¦u��= J � E X �j:9��k�� @ D o_pX G (77) �� ��"�" E X �j:' � � @ � E@ D o pX G e�X �j:9��k9 � �for W � � to õ , with

o pX ��:'$ÿ o p ��:9����X_, andsimilarly for

�\X,"�X

ande X

, while��X0�Ñ��X µ � aô���X µ � is time

in termsof the forward index W . The final conditionis

o pY �j:'$� o p �j:9���4>i2� g p �j:9���4>+. (The correlation,in the

caseof constanttime increments,betweenthe forward index W andthebackward indexó

usedin SDPis that Á;ö ��4>5�#ó �^ô Á � � õ �#óR9� ô Á �¢� Y µ ö �¢� X , sotheforwardandbackwardtime indicesarerelatedby W � õ �#ó.

Hence,Á 6 �Ê�4> � � Y and Á Y � < � �76 , arethefinal andinitial times,respectively, in eithertime direction.) Theinterpolationtime incrementis denotedby

ô�� X µ � �a� X �Ú� X µ � hereandhasimportantrolesto play for modelingandfor numericalconvergence.

However, sincemuchof the literatureon MCA, correspondingto muchof the literatureon stochasticcontrol,is formulatedasa stationary(time-independent)problem,the indices

óand W aretreatedasiterationindicesfor the

time-independentproblem.Thetime-independentproblemmayarisefrom ergodicproblemsor exit timeproblemsorinfinite horizonproblems,for example.

2. MCA LocalConsistency Conditions

Thesymbolï

will indicatethe the orderof the spacingin the discretizationof the statespacefor theMarkovchain � X for discretestepsW}�¢< (notefor thechainwe usethe forwardtime index W insteadof thebackwardtimeindex

óto givepriority to theMarkov chainpropertiesusuallydefinedin forwardtime, insteadof completelyforcing

thebackwardtimenotionof SDP).TheMarkov chaintransitionprobabilitiesaredefinedbyÀ � � X �5� | k[X+9� ProbD � X � � �8� | � \ ��k \ �Yî m W Gfor transitionsof the Markov chain from stage� X to stage� X � � ���

undercontrol policyk X

. Thesetransitionprobabilitiesmustsatisfy the non-negativity (À �Õ< ) andconservation ( F À � � ) of probability propertiesof anyproperprobability law. Thus,thechainis definedon a finite statespace? � with discretetime parameterW . Further,theMarkov chainapproximationincrements

ô � X S8� X � � � � X , with| ô � X | � ú ��ï/ , mustsatisfythelocalconsistency

conditions � D ô � X | � X ��kgX G S �z� �v��� � X '� À � � X �w� | kgX_� ô���XA� D �\XR� � X ��kgXi; 7�_� � Gand (78)

CovarD ô � X | � X ��kgX G S �z� �v��� � X � � D ô � X G J����� � X � � D ô � X G E� À � � X �5� | k[Xi� ô���XA� J ��"�X+" E X J� � X ; ��_� � M �asï)� ¬ < � , for W � < to õ � � , consistentwith theusualconditionsfor the first two infinitesimalmomentsof

thestochasticdiffusionapproximationcorrespondingto theSDE(73). In (78), theconditioningon earlierstatesandcontrols i6� \ ��k \ �7î =�W l hasbeensuppressedto simplify thenotation,but is not meantto imply any assumptiononearliervariables.

Here,�X

is the local interpolationtime increment,suchthatô���X � ¬ < as

ï � ¬ < � , and suchthat thepiecewiseconstantinterpolatedchainandcontrolwith continuoustimeparameter

�are� ��� S8� X and

k¶��� S k X for�

on D � X ��� X hô � X f�which togetherwith thelocal consistency conditions(78)aresatisfiedby thecorrespondingdiscretetimechainallowapproximationof (73). For generality,

ô���X5�°ô���XR� � X ��kgXi is permitted.

3. MCA DynamicProgrammingEquationandTransitionProbabilitiesConstructionfrom FiniteDifferences

Page 30: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

30 ComputationalStochasticDynamicProgramming, F. B. Hanson

The MCA transitionprobabilities À � � �w� | k� often are constructedfrom the finite differencein spaceof theparabolic,dynamicprogrammingequation(77) (finite elementscouldbeusedaswell). Theusualmixedfinite differ-encesin spaceusedin [76] areupwinded(forwardandbackward)for first orderstatederivatives,� `� û Q �j:9����ÿ���� ü�; �Z� í��fí [ bj� µ � ü<; [ bj�� í � « Q �j:9��kl��:9���f��� �¨<� ü�; [ bj� µ � ü<; µ � í � í©[ bj�� í � « Q �j:9��kl��:9���f��� =¨<�� � (79)

andcentraldifferencesfor thesecondorderderivatives,� � `� û �Q �j:9���~ÿ `$�j: hï Q �y� Q ����� � `$�j:9���� h` �j:ª� ï Q �6� Q ���ï �Q �(80)

with� Q asthe unit vectorfor the V th statecomponentand

ï Q � ú �Yï¿ asthe sizeof the V th step. Here,we assumethattheGaussiannoiseis not correlated(i.e.,wemeanthecoefficient is assumedto bediagonal),sothecrosssecondderivativesarenot needed.Also, theupwindingon first derivativessacrificesaccuracy (i.e., it has ú �Yï Q errorratherthanthesmallerú �Yï �Q errorof centraldifferencesas

ï Q � ¬ < � , sothelargererror“numericallypollutes”thesmaller)for potentiallygreaternumericalstabilityin thecaseof convectiondominatedflows(i.e.,thedrift part

| « Q | " ï Q is greaterthanthediffusionpart

��"�"�E� Qj[ Q " ï �Q ). However, many refinementsto fix up theseproblemsarefoundin KushnerandDupuis[76], suchashigherorderfinite differencesincludingthoseneededfor crosssecondderivatives.

Uponsubstitutingthesefinite differences,thedynamicprogrammingequation(77)becomeso p �j:9��� X µ � �� tvu��= c a % D À X �j:9��:9� | k X µ � -� o p �j:9��� X (81) é� Q ò � À X �j:9��: hï Q ��� Q | k X µ � -� o p �j:� hï Q �6� Q ��� X é� Q ò � À X �j:9��:Ö� ï Q ��� Q | k X µ � -� o p �j:ª� ï Q �6� Q ��� X ô���X µ � � e ��:9��k[X µ � ����Xi G �wherethetransitionprobabilities,now denotedbyÀ � � X µ � � � X | kgX µ � ~ÿ À X�� � �5� | kgX µ � andactivatedby thecontrolvector

k X µ � from� X µ � to

� X, aregivenbyÀ X ��:9�:9� | k X µ � r� � � ô � X µ � � é�\ ò � � ��"�"�E� \�[ \�[ Xï �\ | « \�[ X¿|ï \ �$�

(82)

À XR�j:9��: #ï Q �6� Q | k[X µ � r� ô���X µ � � � ��"�" E Qj[ Q�[ X� ï �Q D « Q�[ X G �ï Q �Ó�(83)À XR�j:9��: �nï Q �6� Q | k[X µ � r� ô���X µ � � � ��"�"�E� Qj[ Q�[ X� ï �Q D « Q�[ X G µï Q �È�(84)

for sufficiently smallô�� X µ � so À X ��:9�:9� | k X µ � in (82) is non-negative. Here D « G�� S s$²8¼ D�� «±� < G �a< (i.e., D « G � D « G µ � | « |

and D « G � � D « G µ � «). The drift componentis

« Qj[ X �ë« Q �j:9��k X µ � ��� X andthe diffusion coefficientis" Q�[ Qj[ X �Õ" Q�[ Q ��:9��� X , assumingthat only the diagonalpart is needed.While the explicit time-dependenceof the

dynamicalsystemcoefficientsareevaluatedat thelatertime� X

(this is notacritical requirement)aswouldbefrom theBackwardEulerapproximation,thecontrol is evaluatedat theearliertime

� X µ � . Theproceduregeneratesfeedbackcontrol at the discretelevel. This is becausethe control policy, whenknown, would be setat the beginning for theperiodin forwardtime,althoughthiscontrolpolicy is determinedby thedynamicprogramin backwardtimesequence.KushnerandDupuis[76] call this typeof MCA anexplicit method,but since,asa backwardprocedure,it is explicitin thecosts

o pXwhile implicit in thecontrol

kgX µ � , it is really quasi-explicit in thecaseof thesimpleapproximationsusedhere.However, a numberof implicit methodsarealsopresentedby KushnerandDupuis[76].

Page 31: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

F. B. Hanson, ComputationalStochasticDynamicProgramming 31

In (82,83,84),thereallyimportantbenefitof upwindingis seen,in thatupwindingensuresthatthedrift coefficient« Q�[ X only entersthetransitionprobabilitiesasanabsolutevalue,guaranteeingthenon-negativity propertyof thenon-self transitionprobabilitiesin (83,84),and(82)aswell, providedtheinterpolationincrementis sufficientlysmall.Notethattherequirementthattheself transitionterm(82)mustbenon-negativeasaprobabilityputsrestrictionsonthetimestep

ô � X µ � , i.e., ô�� X µ � m �F éQ ò � Û ü �Z�b� � í�� í�� c� &í �� � í�� c �� í Ü � (85)

which is somethinglike the generalizedCourantFriedrichsandLewy (CFL) condition(42) for SDP. This is morereadilyseenin theform: @

MCA��ô�� X µ � � é� Q ò � � ��"�"�E� Qj[ Q�[ Xï �Q | « Q�[ X |ï Q � m � � (86)

which is morelike the“� � ” normversionof the“

� � ” normversionof thegeneralizedCFL conditionin (42). Thustheconditionsfor bothmethodsrequirethesameorderof magnitudeof thetimestep.Thedifferencesin quasi-normsis notsignificantgiventheapproximationsin bothmethods.

Onecancheckthatthetransitionprobabilitiesdo indeedsatisfytheMCA localconsistency conditions(78). Forinstance,thefirst infinitesimalmomentof theMCA is calculatedasfollows,� D ô � X | � X �}: G � À XR��:9�: | kgX µ � -� D < G é� Q ò � À X �j:9��:� #ï Q �6� Q | k X µ � �� D ï Q �y� Q G é� Q ò � À X �j:9��:ª�nï Q �6� Q | k X µ � �� D �Âï Q �6� Q G� ô���X µ � � é� Q ò � � D « Q�[ X G � � D « Qj[ X G µ -��� Q� ô�� X µ � � é� Q ò � « Q�[ X �6� Q �°ô�� X µ � ������:9��k X µ � ��� X f�demonstratingthattheconditionis satisfiedfor thefirst moment.Theconsistency conditioncanbesimilarly demon-stratedfor thesecondmoment.

In thecaseof astationarydynamicprogrammingequation,i.e.,theelliptic case,suchasfor anexit timeproblem,optimalstoppingproblemor infinite horizon,thenthecoefficientsfrom thefinite differenceformulaarerenormalizedto satisfypropertiesof transitionprobabilities.Renormalizationis not essentialfor thenon-stationarycasedescribedabove. In thestationarycase,renormalizationis computationallyusefulin that it canhelpspeedup thecomputationif wisely done. Typically, the stationaryself-transitionterm ÔÀ X �j:9��: | k9 is set to zeroandthe centralcost

o p �j:'is

solvedfor asa functionof thenontrivial transitioncosts,sotheinterpolationor iterationtime �ô�� is selectedto betheresultantcoefficientof theinstantaneouscost

e ��:9��k9. For example,supposethatthefinal horizontime

�4> � d6�, the

first exit timefor thechainto reachto theboundary, thenthenon-stationarycaseequations(81,83,84)arereformulated(presumablybeforetakingtheinfimum) for thestationarycaseastheiterationÔ o p �X � � ��:'¤� t¦u��Ô = �cfe�% � é� Q ò � ÔÀ ��:9�:� hï Q �y� Q | Ôk �X � � -� Ôo p �X �j:� hï Q ��� Q (87) é� Q ò � ÔÀ �j:9��:Ö� ï Q �6� Q | Ôk �X � � -� Ôo p �X �j: � ï Q �6� Q �ô �-� e ��:9� Ôk �X � � Ü �

Page 32: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

32 ComputationalStochasticDynamicProgramming, F. B. Hanson

A new iterationindex ¡W replacesthe former W which losesits meaninga time index in thestationaryproblem. Thetransitionprobabilitiesarenow denotedbyÀ � � �X � � � � �X | Ôk �X � � �ÿ ÔÀ �X�� � �X � � � � �X | Ôk �X � � f�conditionedby thecontrolvector Ôk �X � � from iterations ¡W to ¡W � . Theconditionprobabilitiesarenow givenbyÔÀ �j:9��: | Ôk �X � � S < �ÔÀ ��:9�:� hï Q �y� Q | Ôk �X � � r� �ô��-� � ��"�"�E9 Q�[ Q� ï �Q D « Q G �ï Q �Ø� (88)ÔÀ ��:9�:ª� ï Q �y� Q | Ôk �X � � r� �ô��-� � ��"�"�E9 Q�[ Q� ï �Q D « Q G µï Q �Ø� (89)

The startinginterpolationtime increment �ô�� is given by the reciprocalof D¦� � À ��:9�: | k� G " ô �wX µ � from the non-stationarycaseas �ô �9� �F é\ ò � Û ü �Z� � ��¢ � ¢� &¢ � � ¢ ���¢ Ü �For this stationarycaseit is assumedthat the SDE coefficientsare autonomous,i.e.,

« Q � « Q ��:9��k9 and" Qj[ Q �" Qj[ Q �j:' , for consistency. The formernon-stationaryinterpolationtime increment

ô�� X µ � cancelsout of the discretetimedynamicprogrammingequation(77).

4. MCA DynamicProgrammingEquationSolution

The dynamicprogrammingequation(81) canbe solved for

o pX µ � andsimultaneouslyfor the optimal controlvectorapproximationfor MCA astheinfimumor minimumargument:k p �j:9��� X µ � r� ² ³�´#tvu��=�c a % D À X ��:9�:9� | k X µ � '� o p �j:9��� X (90) é� Q ò � À X �j:9��: hï Q �6� Q | k X µ � -� o p �j:� hï Q �6� Q ��� X é� Q ò � À X �j:9��:Ö� ï Q �6� Q | k X µ � -� o p �j:ª� ï Q �6� Q ��� X ô���X µ � � e ��:9��k[X µ � ����Xi G �

KushnerandDupuis[76] discussmany computationalmethodsfor solving problemslike (81), suchasPoint-Jacobi,Gauss-Seidel,Successive-Over-Relaxationaccelerations,Multigrid orMultilevel,andDomainDecomposition.They alsodiscussboththebenefitsanddisadvantagesof thesemethods.Pleasereferto their bookfor moreinforma-tion. The paperof KushnerandJarvis[77] gives the recentstateof MCA computationsusingparallelandvectorsupercomputers.Quadratandcoworkers[1] have usedMCA in their expert systemfor solving stationaryoptimalcontrolproblems.

Jumpdiffusionsor Poissonprocessesincludedwith stochasticdiffusionsonly requiresa smallamountof extraeffort to calculatethemodificationsof thetransitionprobabilitiesfor MCA. Thechangesduediffusionandthejumpsareseparatelyaccountedfor sinceoneis continuousandtheotherdiscontinuous.

Reflectingboundariesarehandledfor reflectingdiffusionsandreflectingjump diffusionsby addinga borderofextradiscreteelementsaroundthedomain.Thewidth of theborderelementsis ú ��ï/ . Any chainthatwindsup in theborderelementif reflectedbackinto the interior elementsin a way consistentwith theapplicationandthestochasticnoise,andlocally consistentwith the local directionof reflection.Non-normalreflectionsmaybedifficult to handlein multi-dimensions.Thereflectionsaremodeledby addedauxiliary stochasticprocessesto theSDE(73) andtheseauxiliarystochasticprocessesareonly activatedwhena chainhits theborderelement.Thisseemsto bea properwayto handlethecomplexitiesof boundaryconditionsfor stochasticprocesses,whetherreflectingor otherwise.Therecentpaperof KushnerandYang[78] givesa clearexampleof theMCA methodwith reflectingboundaryconditionsfortelecommunicationnetworksin aheavy traffic environment.

Page 33: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

F. B. Hanson, ComputationalStochasticDynamicProgramming 33

5. SimilaritiesandDifferencesbetweenMCA andSDP

TheobviousdifferencebetweenMCA andSDP, aspresentedhere,is thatMCA is is primarily appliedto station-ary problems,in spitefor thedynamicmodelpresentationhere.TheMCA methodis extremelyadaptable.KushnerandDupuis[76] give variationsfor exit time problems,ergodicproblems,jump diffusions,discountedcosts,averagecosts,optimalfiltering andmany otherapplications.A majoradvantageor themethodis thatconvergenceproofsarefacilitatedby theprobabilisticframework, while they aregenerallydifficult or not availablefor traditionalnumericalPDEmethodspresentedfor SDPhere.However, theMCA methoddoesintroducea indirect featurein theuseof theMarkov chainto solveacontinuoustime,continuousstateoptimalcontrolproblem.A principalsimilarity is thatbothMCA andSDPbasicallyuseBellman’sdynamicprogrammingequations,althoughunderquitedifferentprobabilisticandnumericalinterpretations.They areboth variantsof the stochasticdynamicprogrammingalgorithm. Althoughthecontrolis usuallythemostimportantoutputfor thecontroluser, sometimestheexpectedoptimaltrajectories,i.e.,usingtheoptimal control,aredesiredto studythebehavior of thedynamicsystemin theoptimalstate.Theredoesnot appearto bea formulationof MCA thatcoverstheuseof theforwardequationsfor this purposein KushnerandDupuis[76], althoughsimulationsusingrandomnumbergeneratorsaresuggested.

V. RESEARCHDIRECTIONS

Futuredirectionscontinueto be improvementsin softwareandalgorithms.Therearealwaysbiggeror granderchallengingproblemsto solve and any advantageis welcomed. Increasingthe level of parallelismis desiredfordealingwith large scalecomputationalandmemoryrequirements.Algorithms that areable to parallelizedynamicprogrammingproblemsovertimesuchastime-clusteringandthatreducethestatespacecomputationby targetingtheneighborhoodof theoptimaltrajectory. Also desiredis theuseof accurateupwindingschemesfor morestabilityandtheuseof methodslike multigrid which make convergencelesssensitive to meshsizeby usingsmallandlargemeshgridsto filter outerrors.

A future researchdirectionis thedirectandfair comparisonof methodslike SDP, DDP andMCA solving thesameproblem. It shouldbepossibleto determinewhat featuresaremoreeffective andto determinewhich methodshaverobustconvergenceproperties,but alsowhichhave fasterratesof convergenceundertypical circumstances.Re-gionrestrictingtechniques,suchasDDPeffectively useswith its focuson theneighborhoodof theoptimaltrajectory,will beimportant.Also, methods,likemultigrid, whichdonotdemandveryfinegrid structureswill alsobeimportantfor reducingthecomputationalandmemoryrequirements.

ACKNOWLEDGEMENT

Theauthorthankshis collaboratorsDr. S.-L. Chung,Dr. H. H. Xu, Prof. K. Naimipour, Dr. D. J. Jarvis,J. J.Westman,C. J.PraticoandM. S.Vetterfor theircontributions.Healsothankshishosts,Prof. H. J.Kushnerof BrownUniversityandC. A. Shoemaker of CornellUniversity, for their hospitalityandinsightsaboutcomputationalcontrolwhenvisiting for a sabbaticalyear. In addition,hethanksProf. P. G. Dupuisfor hishelpfulcomments.

VI. REFERENCES

1. M. Akian, J.P. ChancelierandJ.P. Quadrat,“Dynamic ProgrammingComplexity andApplications,” Proceed-ingsof 27thIEEEConferenceonDecisionandControl 2, pp.1551-1558(1988).

2. B. D. O. AndersonandJ. B. Moore, Optimal Control: Linear Quadratic Methods, PrenticeHall, EnglewoodCliffs, NJ,1990.

3. L. Arnold, StochasticEquations:TheoryandApplications, Wiley, New York, NY, 1974.

4. U. Ascher, J.ChristiansenandR.D. Russell,“A CollocationSolverfor MixedOrderSystemsof BoundaryValueProblems,” Mathematicsof Computation33, pp.659-679(1979).

5. U. Ascher, J.ChristiansenandR. D. Russell,“CollocationSoftwarefor Boundary-ValueODEs,” ACM Transac-tionsonMathematicalSoftware7, pp.209-222(1981).

6. M. Athans,D. Castanon,K. P. Dunn, C. S. Greene,W. H. Lee, N. R. Sandell,Jr., andA. S. Willsky, “TheStochasticControl of the F-8c Aircraft Using a Multiple Model Adaptive Control (MMAC) Method- Part I:EquilibriumFlight,” IEEETransactionsonAutomaticControl AC-22, pp.768-780(1977).

Page 34: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

34 ComputationalStochasticDynamicProgramming, F. B. Hanson

7. G. Bell, “Ultracomputers:A TeraflopBeforeIts Time,” Communicationsof ACM 35 (8), pp.26-47(1992).

8. R. E. Bellman,DynamicProgramming, PrincetonUniversityPress,Princeton,NJ,1957.

9. R. E. Bellman,AdaptiveControl Processes:A GuidedTour, PrincetonUniversityPress,Princeton,NJ,1961.

10. A. Brandt,“Multile vel Computations:Review andRecentDevelopments,” Multigrid Methods:Theory, Appli-cations,andSupercomputing(S.F. McCormick,Editor),MarcelDekker, New York, pp.35-62,1988.

11. W. L. Briggs,Multigrid Tutorial, Societyfor IndustrialandAppliedMathematics,Philadelphia,PA, 1987.

12. A. E. BrysonandY. Ho, AppliedOptimalControl, Ginn,Waltham,MA, 1969.

13. J.L. Calvet,J.DantasdeMelo andJ.M. Garcia,“A FastParallelDynamicProgrammingAlgorithm for OptimalControl,” Proceedingsof 1990AmericanControl Conference, 3 pages(1990).

14. J.Casti,M. RichardsonandR.Larson,“DynamicProgrammingandParallelComputers”JournalofOptimizationTheoryandApplications12 (4), pp.423-786(1973).

15. H. Caffey, L.-Z. Liao andC. A. Shoemaker, “ParallelProcessingof LargeScaleDiscrete–Time UnconstrainedDifferentialDynamicProgramming,” Parallel Computing19, pp.1003-1017(1993).

16. L.-C. Chang,C. A. Shoemaker, andP. L.-F. Liu, “Optimal Time–VaryingPumpingRatesfor GroundwaterRe-mediation:Applicationof a ConstrainedOptimalControlAlgorithm,” Water ResourcesResearch 28 (12), pp.3157-3173(1992).

17. S.-L. ChungandF. B. Hanson,“ParallelOptimizationsfor ComputationalStochasticDynamicProgramming,”Proceedingsof 1990 InternationalConferenceon Parallel Processing3: Algorithmsand Applications(P.-C.Yew, Editor),pp.254-260(1990).

18. S.-L. ChungandF. B. Hanson,“OptimizationTechniquesfor StochasticDynamicProgramming,” Proceedingsof 29thIEEEConferenceonDecisionandControl 4, pp.2450-2455(1990).

19. S.-L. Chung,F. B. HansonandH. Xu, “SupercomputerOptimizationsfor StochasticOptimal Control Appli-cations,” Proceedingsof 4th NASAWorkshopon ComputationalControl of Flexible AerospaceSystems,NASAConf. Publ.10065,Part 1 (L. W. Taylor, Editor),NASA Langley ResearchCenter, pp.57-70,March1991.

20. S.-L.Chung,F. B. HansonandH. H. Xu, “ParallelStochasticDynamicProgramming:FiniteElementMethods,”LinearAlgebra andApplications172, pp.197-218(1992).

21. M. G.CrandallandP. L. Lions,“Two Approximationsof Solutionsof HamiltonJacobiEquations,” Mathematicsof Computation43, pp.1-19(1984).

22. M. G.Crandall,H. Ishii andP. L. Lions,“User’sGuidetoViscositySolutionsof SecondOrderPartialDifferentialEquations,” Bulletinof theAMS27 (1), pp.1-67(1992).

23. M. L. Crow, D. J. Tylavsky andA. Bose,“ConcurrentProcessingin Power SystemAnalysis,” Control andDy-namicalSystems:Advancesin TheoryandApplications42 (C. T. Leondes,Editor),AcademicPress,New York,NY, pp.1-55,1991.

24. T. Culver andC. A. Shoemaker “Dynamic OptimalControl for GroundwaterRemediationwith Flexible Man-agementPeriods,” Water ResourcesResearch 28 (3), pp.629-641(1992).

25. T. Culver andC. A. Shoemaker “Optimal Control for GroundwaterRemediationby DifferentialDynamicPro-grammingwith Quasi-NewtonApproximations,” WaterResourcesResearch 29 (4), pp.823-831(1993).

26. J.DantasdeMelo, J.L. Calvet andJ.M. Garcia,“VectorizationandMultitaskingof DynamicProgramminginControl:ExperimentsonaCray-2,” Parallel Computing13, pp.261-269(1990).

27. P. Dyer andS. R. McReynolds,TheComputationandTheoryof OptimalControl, AcademicPress,New York,NY, 1970.

Page 35: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

F. B. Hanson, ComputationalStochasticDynamicProgramming 35

28. N. H. Decker, “On the Parallel Efficiency of the Frederickson-McBryanMultigrid Algorithm,” ICASEReportNo.90-17, NASA Langley ResearchCenter, Hampton,VA, 8 pages,February1990.

29. J.J.Dongarra,I. S.Duff, D. C. SorensenandH. A. vanderVorst,SolvingLinearSystemsonVectorandSharedMemoryComputers, Societyfor IndustrialandAppliedMathematics,Philadelphia,PA, 1991.

30. J. J. Dongarra,H. W. Meyer andE. Strohmaier, “TOP500SupercomputerSites,” ComputerCenter, Universityof Mannheim,ftp://ftp.uni-mannheim.de/info/rumdoc/top500.ps.Z,31pages,November1994.

31. J. DouglasJr. andT. Dupont,“Galerkin Methodsof ParabolicEquations,” SIAM J. NumericalAnalysis7, pp.575-626(1970).

32. J. DouglasJr., “Time StepProceduresof NonlinearParabolicPDEs,” Mathematicsof Finite ElementsandAp-plications,MAFELAP(J.Whiteman,Editor),AcademicPress,London,pp.289-304,1979.

33. S.E. Dreyfus,DynamicProgrammingandtheCalculusof Variations, AcademicPress,New York, NY, 1965.

34. W. R. Dyksen,E. N. Houstis,R. E. Lynch andJ. R. Rice, “The Performanceof the CollocationandGalerkinMethodswith HermiteBicubics,” SIAMJ. NumericalAnalysis21, pp.695-715(1984).

35. S.FeinerandC. Beshers,“Visualizing � -DimensionalVirtual Worldswith � -Vision,” ComputerGraphics:Spe-cial Issueon

ý $ InteractiveGraphics24 (2), pp.37-38(1990).

36. A. FeldsteinandK. W. Neves,“High OrderMethodsfor State-DependentDelay DifferentialEquationswithNonsmoothSolution,” SIAMJ. NumericalAnalysis21, pp.844-863,(1984).

37. J. E. Flahertyand R. E. O’Malley Jr., “Numerical Methodsof Stiff Systemsof Two-Point BoundaryValueProblems,” SIAMJ. ScientificandStatisticalComputing5, pp.865-886,(1984).

38. W. H. FlemingandR. W. Rishel,Deterministicand StochasticOptimal Control, Springer-Verlag,New York,NY, 1975.

39. J.J.Florentin,“Optimal Controlof Systemswith GeneralizedPoissonInputs,” ASMETrans.85D (J. BasicEngr.2), pp.217-221(1963).

40. P. O. FredericksonandO. A. McBryan,“ParallelSuperconvergentMultigrid,” Multigrid Methods:Theory, Ap-plications,andSupercomputing(S.F. McCormick,Editor),MarcelDekker, New York, pp.195-210,1988.

41. Future Directionsfor Research in SymbolicComputations:Reportof Workshopon Symbolicand AlgebraicComputing(A. Boyle andB. F. Caviness,Editors;A. C. Hearn,WorkshopChairperson),Societyfor IndustrialandAppliedMathematics,Philadelphia,PA, 1990.

42. Future Directionsin Control Theory: A MathematicalPerspective(W. H. Fleming,Chairman),SocietyforIndustrialandAppliedMathematics,Philadelphia,PA, 1988.

43. I. I. GihmanandA. V. Skorohod,StochasticDifferentialEquations, Springer-Verlag,New York, NY, 1972.

44. I. I. GihmanandA. V. Skorohod,ControlledStochasticProcesses, Springer-Verlag,New York, NY, 1979.

45. GrandChallenges: High PerformanceComputingandCommunications:TheFY 1993U.S.Research andDe-velopmentProgram, FederalCoordinatingCouncil for Science,Engineering,andTechnology, CommitteeonPhysical,Mathematical,andEngineeringSciences,c/o NationalScienceFoundation(CISE),Washington,DC,1993.

46. High PerformanceComputingandCommunications:Technology andCommunicationsfor theNational Infor-mationInfrastructure: The1995Blue Book, NationalCoordinationOffice for HPCC(http://www.hpcc.gov/);FederalCoordinatingCouncil for Science,Engineering,andTechnology;Committeeon Physical,Mathemati-cal,andEngineeringSciences;c/oNationalScienceFoundation(CISE),Arlington, VA, 1995.

47. X.-H. GuanandP. B. Luh, “A New ParallelAlgorithmfor OptimalControlProblemsof InterconnectedSystems,”InternationalJournalof Control 56 (6), pp.1275-1297(1992).

Page 36: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

36 ComputationalStochasticDynamicProgramming, F. B. Hanson

48. W. Hackbusch,“ParabolicMultigrid Methods,” ComputingMethodsin AppliedSciencesandEngineering6 (R.Glowinski andJ.-L.Lions,Editors),NorthHolland,Amsterdam,pp.189-197,1984.

49. F. B. Hanson,“Analysisof a SingularFunctionalDifferentialEquationthatArisesfrom StochasticModelingofPopulationGrowth,” Journalof MathematicalAnalysisandApplications96, pp.73-100(1983).

50. F. B. Hanson,“BioeconomicModelof theLakeMichiganAlewife Fishery,” CanadianJournalof FisheriesandAquaticSciences44 (Suppl.2), pp.298-305(1987).

51. F. B. Hanson,“VectorMultiprocessorImplementationfor StochasticDynamicProgramming,” IEEEDistributedProcessingTechnicalCommitteeNewsletter10, pp.44-48(1988).

52. F. B. Hanson,“ComputationalDynamicProgrammingfor StochasticOptimalControlona VectorMultiproces-sor,” ArgonneNationalLaboratoryTechnicalMemorandumANL/MCS-TM-113, 26pages,June1988.

53. F. B. Hanson,“ParallelComputationfor StochasticDynamicProgramming:Row versuscolumncodeorienta-tion,” Proc.1988IEEEConf. onParallel Processing3: AlgorithmsandApplications, pp.117-119(1988).

54. F. B. Hanson,“Parallel andvectorcomputationfor optimal control applications,” Proceedingsof 3rd AnnualConferenceon AerospaceComputationalControl, JPL Publication85-45 1 (D. E. Bernardand G. K. Man,Editors),JetPropulsionLaboratory, Pasadena,pp.184-197,December15,1989.

55. F. B. Hanson,“StochasticDynamicProgramming:AdvancedComputingConstructs,” Proceedingsof 28thIEEEConferenceonDecisionandControl 1, pp.901-903(1989).

56. F. B. Hanson,“A RealIntroductionto Supercomputing:A UserTrainingCourse,” Proceedingsof Supercomput-ing ’90, pp.376-385(1990).

57. F. B. Hanson,“ComputationalDynamicProgrammingon a VectorMultiprocessor,” IEEE Transactionson Au-tomaticControl 36 (4), pp.507-511(1991).

58. F. B. Hanson,D. J. Jarvisand H. H. Xu, “Applicationsof FORALL-FormedComputationsin Large ScaleStochasticDynamic Programming,” Proceedingsof Scalable High PerformanceComputing Conference:SHPCC-92, IEEEComputerSociety, pp.182-185(1992).

59. F. B. HansonandK. Naimipour, “Convergenceof NumericalMethodfor MultistateStochasticDynamicPro-gramming,” Proceedingsof InternationalFederationof AutomaticControl 12thWorld Congress9, pp.501-504(1993).

60. F. B. Hanson,C. J.Pratico,M. S.Vetter, andH. H. Xu, “MultidimensionalVisualizationApplied to RenewableResourceManagement,” Proceedingsof SixthSIAMConferenceonParallel Processingfor ScientificComputing2, pp.1033-1036(1993).

61. F. B. HansonandD. Ryan,“Optimal harvestingin a StochasticEnvironmentwith DensityDependentJumps,”MathematicalModelsof RenewableResources3 (R. Lamberson,Editor), Associationof ResourceModelers,Arcata,Cal.,pp.117-123,1984.

62. F. B. HansonandD. Ryan,“Optimal Harvestingwith DensityDependentRandomEffects,” Natural ResourceModeling2, pp.439-455(1987).

63. F. B. HansonandH. C. Tuckwell, “Logistic Growth with RandomDensityIndependentDisasters,” TheoreticalPopulationBiology19, pp.1-18(1981).

64. F. B. HansonandJ. J. Westman,“ ComputationalStochasticDynamicProgrammingProblems:GroundwaterQualityRemediation,” Proceedingsof 33rd IEEEConferenceonDecisionandControl 1, 6 pages(1994).

65. G. HortonandS.Vandewalle, “A Space–Time Multigrid Methodfor ParabolicPDEs,” SIAMJ. ScientificCom-puting, 17pages,1995.

66. D. H. JacobsonandD. Q.Mayne,DifferentialDynamicProgramming, AmericanElsevier, New York, NY, 1970.

Page 37: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

F. B. Hanson, ComputationalStochasticDynamicProgramming 37

67. A. H. Jazwinski,StochasticProcessesandFiltering Theory, AcademicPress,New York, NY, 1970.

68. MichaelB. Johnson,TheApplicationCommunicationLibrary, MIT MediaLaboratory, Cambridge,MA, 1992.

69. S. L. JohnssonandK. K. Mathur, “The Finite ElementMethodon a DataParallel Computing,” InternationalJournalof High SpeedComputing1, pp.29-44(1989).

70. L.D. Jones,R. Willis andW. W.-G. Yeh,“Optimal Controlof NonlinearGroundwaterHydraulicsUsingDiffer-entialDynamicProgramming,” WaterResourcesResearch 23, pp.2097-2106(1987).

71. J.F. Kerrigan,Migrating to Fortran90, O’Reilly & Associates,Sebastopol,CA, 1993.

72. C. H. Koelbel,D. B. Loveman,R. S.Schreiber, G. L. SteeleJr. andM. E. Zosel,TheHigh PerformanceFortranHandbook, MIT Press,Cambridge,MA, 1994.

73. H. J.Kushner, “A Survey of SomeApplicationsof ProbabilityandStochasticControlTheorytoFiniteDifferenceMethodsfor DegenerateElliptic andParabolicEquations,” SIAMReview 18, pp.545-577,(1976).

74. H. J. Kushner, Probability Methodsfor Approximationsin StochasticControl andfor Elliptic Equations, Aca-demicPress,New York, NY, 1977.

75. H. J.Kushner, “NumericalMethodsfor StochasticControlin ContinuousTime,” SIAMJ. Control andOptimiza-tions28, pp.999-1048(1990).

76. H. J.KushnerandP. G.Dupuis,NumericalMethodsfor StochasticControl in ContinuousTime, Springer-Verlag,New York, NY, 1992.

77. H. J.KushnerandD. J.Jarvis,“Large-ScaleComputationsfor High DimensionControlSystems,” Proceedingsof 33rd IEEEConferenceonDecisionandControl 1, 6 pages(1994).

78. H. J.KushnerandJ.-C.Yang,“A NumericalMethodfor ControlledRoutingin LargeTrunk Line NetworksviaStochasticControlTheory,” ORSAJournalonComputing6 (3), pp.300-316(1994).

79. L.-Z. Liao andC. A. Shoemaker, “Convergencein UnconstrainedDiscrete-TimeDifferentialDynamicProgram-ming,” IEEETransactionsonAutomaticControl 36 (6), pp.692-706(1991).

80. R. E. Larson,“DynamicProgrammingwith ReducedComputationalRequirements,” IEEETransactionson Au-tomaticControl AC-14, pp.135-143(1965).

81. R.E.Larson,“A Survey of DynamicProgrammingComputationalProcedures,” IEEETransactionsonAutomaticControl AC-16, pp.767-774(1967).

82. R. E. Larson,StateIncrementDynamicProgramming, AmericanElsevier, New York, NY, 1968.

83. R. E. LarsonandE. Tse,“ParallelProcessingAlgorithmsfor theOptimalControlof NonlinearSystems,” IEEETransactionsonAutomaticControl AC-22, pp.777-786(1973).

84. JohnM. LevesqueandJ.W. Williamson,A Guidebookto FORTRANon Supercomputers, AcademicPress,NY,1988.

85. D. Ludwig, “Optimal Harvestingof aRandomlyFluctuatingResource.I: Applicationof PerturbationMethods,”SIAMJ. AppliedMathematics37, pp.185-205(1979).

86. D. Ludwig andJ.M. Varah,“Optimal Harvestingof a RandomlyFluctuatingResource.II: NumericalMethodsandResults,” SIAMJ. AppliedMathematics37, pp.185-205(1979).

87. R. Luus,“Optimal Controlby DynamicProgrammingUsingSystematicReductionin Grid Size,” InternationalJournalof Control 51 (5), pp.995-1013(1990).

88. R. Luus, “Application of Dynamic Programmingto Higher-DimensionalNon-LinearOptimal Control Prob-lems,” InternationalJournalof Control 52 (1), pp.239-250(1990).

Page 38: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

38 ComputationalStochasticDynamicProgramming, F. B. Hanson

89. S.F. McCormack,Multigrid Methods, Societyfor IndustrialandAppliedMathematics,Philadelphia,PA, 1987.

90. K. K. Mathur andS. L. Johnsson,“Data StructuresandAlgorithms for the Finite ElementMethodon a DataParallelSupercomputer,” InternationalJournalonNumericalMethodsin Engineering, (1989).

91. D. Q. Mayne, “A Second–OrderGradientMethod for DeterminingOptimal Control of Non–LinearDiscreteTimeSystems,” InternationalJournalof Control 3, pp.85-95(1966).

92. D. Q. Mayne,“Dif ferentialDynamicProgramming— A Unified Approachto theOptimizationof DynamicalSystems,” Control andDynamicalSystems:Advancesin TheoryandApplications10 (C. T. Leondes,Editor),AcademicPress,New York, NY, pp.179-254,1973.

93. A. R. Mitchell andR. Wait, TheFinite ElementMethodin Partial DifferentialEquations, Wiley, New York, NY,1977.

94. I. H. Mufti, ComputationalMethodsin OptimalControl Problems,Lecture Notesin OperationsResearch andMathematicalSystems27, Springer-Verlag,Berlin, Germany, 1970.

95. D. M. Murray and S. J. Yakowitz, “Dif ferential Dynamic Programmingand Newton’s Method for DiscreteOptimalControlProblems,” Journalof OptimizationandApplications43, pp.395-414(1984).

96. K. NaimipourandF. B. Hanson,“Convergenceof a NumericalMethodfor the BellmanEquationof Stochas-tic OptimalControlwith QuadraticCostsandConstrainedControl,” DynamicsandControl 3 (3), pp.237-259(1993).

97. A NationalComputingInitiative: TheAgendafor Leadership(H. J.Revechee,Chair),Societyfor IndustrialandAppliedMathematics,Philadelphia,PA, 1987.

98. Mark H. Overmars,“A GraphicalUserInterfaceToolkit for Silicon GraphicsWorkstations,” Dept.ComputerScience,UtrechtUniversity, P.O.Box 80.089TB, Utrecht,TheNetherlands.

99. E. Polak,ComputationalMethodsin Optimization, AcademicPress,New York, NY, 1971.

100. E. Polak,“An HistoricalSurvey of ComputationalMethodsin OptimalControl,” SIAMReview 15, pp.553-584(1973).

101. C. D. Polychronopoulos,Parallel ProgrammingandCompilers, Kluwer Academic,Boston,MA, 1988.

102. C. J. Pratico,F. B. Hanson,H. H. Xu, D. J. Jarvisand M. S. Vetter, “Visualizationfor the ManagementofRenewableResourcesin an UncertainEnvironment,” Proceedingsof Supercomputing’92, pp. 258-266,colorplatesp. 843(1992).

103. R. G. RabbaniandJ.W. Warner, “Shortcomingsof ExistingFinite ElementFormulationsfor SubsurfaceWaterPollution Modeling andIts Rectification: One-DimensionalCase,” SIAM J. AppliedMathematics54 (3), pp.660-673(1994).

104. D. RyanandF. B. Hanson,“Optimal Harvestingwith ExponentialGrowth in an Environmentwith RandomDisastersandBonanzas,” MathematicalBiosciences74, pp.37-57(1985).

105. D. RyanandF. B. Hanson,“Optimal Harvestingof a Logistic Populationin an Environmentwith StochasticJumps,” Journalof MathematicalBiology24, pp.259-277(1986).

106. Z. Schuss,TheoryandApplicationsof StochasticDifferentialEquations, Wiley, New York, NY, 1980.

107. D. L. Snyder andM. I. Miller, RandomPoint Processesin Time and Space, Springer-Verlag,New York, NY,1991

108. P. E. Souganidis,“ApproximationSchemesfor ViscositySolutionsof Hamilton-JacobiEquations,” Journal ofDifferentialEquations59, pp.1-43(1985).

109. R. F. Stengel,StochasticOptimalControl, Wiley, New York, NY, 1986.

Page 39: Techniques in Computational Stochastic Dynamic Programminghomepages.math.uic.edu/~hanson/pub/CADS96/cads96wide.pdf · eral optimal feedback control of nonlinear dynamical systems

F. B. Hanson, ComputationalStochasticDynamicProgramming 39

110. G. StrangandG. J.Fix, AnAnalysisof theFinite ElementMethod, Prentice-Hall,EnglewoodCliffs, NJ,1973.

111. SystemSoftwareandToolsfor High PerformanceComputingEnvironments(P. MessinaandT. Sterling,Editors),Societyfor IndustrialandAppliedMathematics,Philadelphia,PA, 1988.

112. Thinking MachinesCorporation,ConnectionMachine Model CM-5 Technical Summary, Cambridge,MA,November1992.

113. S. WeerawaranaandP. S. Wang,“A PortableCodeGeneratorfor ParallelCrayFortran,” KentStateUniversityPreprint, 23pages,June1989.

114. W. M. Wonham,“RandomDifferentialEquationsin ControlTheory,” ProbabilisticMethodsin AppliedMathe-matics2, AcademicPress,New York, pp.131-212,1970.

115. P. C.XirouchakisandP. Y.Wang,“Finite ElementComputingonParallelArchitectures,” Control andDynamicalSystems:Advancesin TheoryandApplications58 (C. T. Leondes,Editor),AcademicPress,New York, NY, pp.233-260,1993.

116. H. H. Xu, F. HansonandS.L. Chung,“Optimal DataParallelMethodsfor theStochasticDynamicalProgram-ming,” Proceedingsof 1991InternationalConferenceon Parallel Processing3: Algorithms& Applications(K.So,Editor),pp.142-146,1991.

117. H. H. Xu, F. HansonandS.L. Chung,“ParallelSolutionsof DimensionalityProblemsfor theStochasticDynam-ical Programming,” Proceedingsof 30thIEEEConferenceonDecisionandControl 2, pp.1717-1723(1991).

118. H. H. Xu, D. J. JarvisandF. B. Hanson,“Parallel DataVault Methodsfor Larger ScaleStochasticDynamicProgramming,” Proceedingsof 1992AmericanControl Conference1, pp.142-146(1992).

119. S. Yakowitz, “Dynamic ProgrammingApplicationsin Water Resources,” Water ResourcesResearch 18, pp.673-696(1982).

120. S. Yakowitz, “AlgorithmsandComputationalTechniquesin DifferentialDynamicProgramming,” Control andDynamicalSystems:Advancesin Theoryand Applications31 (C. T. Leondes,Editor), AcademicPress,NewYork, NY, pp.75-91,1989.

121. S. Yakowitz andB. Rutherford,“ComputationalAspectsof Discrete-Time Control,” AppliedMathematicsandComputation15, pp.29-45(1984).