Reply to Comments on “Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems”

Reply to Comments on “Enhancement of the Global Convergence of UsingIterative Dynamic Programming To Solve Optimal Control Problems”

Chyi Hwang*

Department of Chemical Engineering, National Chung Cheng University, Chia-Yi 621, Taiwan

Jeng-Shaw Lin†

Department of Chemical Engineering, National Cheng Kung University, Taina 700, Taiwan

Sir: The iterative dynamic programming (IDP) methodinitiated by Luus and co-workers is indeed a powerfultechnique for finding optimal controls for continuous-time dynamic systems described by nonlinear dif-ferential equations. In the literature, it has been appliedto solve various optimal control problems with a greatpossibility of obtaining the global optimum solutions.However, as is well-known, IDP is a heuristic optimiza-tion method whose performance depends on severalparameters, such as Nx, Nt, Nc, Np, Ni, R, and η.Moreover, the possibility of using IDP to obtain globaloptimal solutions also depends on the way of generatingcandidate controls and the problem type. How to chooseproper parameters in using an IDP scheme to solve aspecific optimal control problem in order to ensure theglobal optimum is always difficult. Hence, any sugges-tions that can lead to improving the computationalefficiency and the solution convergence of IDP arehelpful to interested readers.

One of the advantages of IDP lies in the fact that thescheme of dynamic programming can be implementedin a parallel computing environment so that the opti-mization computation can be accomplished in a pre-specified time interval. This is very crucial in such areal-time application as the nonlinear model predictivecontrol, in which the optimization computation must beaccomplished in a specified time interval. The IDPscheme with a single-state grid point at each time stagecannot generally get a satisfactory result without usingmultipass computations. However, multipass IDP com-putations are sequential operations which cannot takefull advantage of a parallel computer, in which acomputing job can be finished in a much shorter timeinterval if the underlying computation scheme can beimplemented in a parallel manner. Hence, single-passIDP computation with multiple-state grid points has itsgreat potential use in real-time applications, even if thesolution of the problem to be solved can be obtained off-line via the multipass single-grid IDP scheme. From theabove viewpoint, we think it is not necessary to make acriticism of the examples illustrated in our paper.

There is almost always a tradeoff between computa-tional time and global convergence in solving an opti-mization problem which has more than one local opti-mum. As mentioned in our paper, the convergence ofthe IDP solution depends on the contraction factor R.It is generally true that the larger the contraction factorused, the more computation time that is required andthe greater the possibility to obtain a globally converged

solution. It is therefore not surprising that Luus dem-onstrated a converged solution by IDP with allowablecontrols chosen randomly and with R ) 0.95 and 100iterations, which requires more than triple the computertime required by Sobol’s systematic approach of assign-ing allowable controls and with R ) 0.85 and 30iterations. It is noted that the use of R ) 0.85 with 30iterations reduces the control resolution by the factor0.8530 ) 0.007 63, which is comparable to the factor of0.95100 ) 0.005 92 in the use of R ) 0.95 with 100iterations. Hence, the contraction factor of R ) 0.85 andthe 30 iterations taken in example 2 are quite reason-able for a compromise between the computation timerequired and the global optimal solution expected. Thiscomparison demonstrates the advantage of using Sobol’ssystematic approach to generate control candidates,allowing the use of a smaller region contraction factor,which in turn reduces the number of iterations andcomputing time, for obtaining a global optimal solution.By the way, we did not conclude that the single-passIDP with randomly generated controls fails to obtain aconverged solution for any region contraction factor.

In our paper, we gave the formula NcNt(1 + Nx(Nt -1)/2) for estimating the number of state shootingsrequired in one cycle of usual backward dynamicprogramming (DP) computation. In the following, weexplain how this formula was derived.

Suppose that the number of state grid points for eachtime stage is Nx and the number of allowable controlsfor each state grid point is Nc. The time interval ofinterest [0, tf] is partitioned into Nt equal-length timestages. The kth time stage spans the time interval [(k- 1)∆, k∆], where ∆ ) tf/Nt. Let the Nx state grid pointsof the kth time stage be represented by x̂(k-1, i), i ) 1,2, ..., Nx and the Nc allowable controls for each stategrid points by u(k, i), i ) 1, 2, ..., Nc. The backward DPstarts the control evaluation from the state grid pointsof the last time stage, or time stage Nt, which spansthe time interval [(Nt - 1)∆, Nt∆]. At time stage Nt, thenumber of state shootings required to evaluate theoptimal controls associated with the Nx state grid pointsx̂(Nt-1, k), k ) 1, 2, ..., Nx, is NcNx. At time stage Nt -1, the performance index INt-2(k, i) associated with thekth state grid point x̂(Nt-2, k) and the ith candidatecontrol u(Nt-2, i) is evaluated by first performing timeintegration of augmented dynamic equations with theinitial condition [x̂(Nt-2, k), 0] and the control u(Nt-2,i) from t ) (Nt - 2)∆ to (Nt - 1)∆ to obtain anaugmented state, say [x((Nt-1)∆), xn+1((Nt-1)∆)]. Ingeneral, the state x((Nt-1)∆) may not coincide with oneof the state grid points x̂(Nt-1, j), j ) 1, 2, ..., Nx,generated for time stage Nt. Suppose the nearest stategrid point to the state x((Nt-1)∆) is given by x(Nt-1, l)

† This author passed away in July 1998, 1 month after hereceived his Ph.D. degree from National Cheng Kung Univer-sity.

2513Ind. Eng. Chem. Res. 1999, 38, 2513-2514

10.1021/ie991076a CCC: $18.00 © 1999 American Chemical SocietyPublished on Web 05/07/1999

whose optimal control has been evaluated to be u(Nt-1, j). With the control u(Nt-1, j) and the initial condition[x((Nt-1)∆), xn+1((Nt-1)∆)], we then integrate the aug-mented state equations from t ) (Nt-1)∆ to Nt∆ toobtain the performance index INt-2(k, i) ) xn+1(Nt∆).Hence, at time stage Nt - 1, it is required to performtwo state shootings for each control evaluation associ-ated with a state grid point and 2NcNx state shootingsfor control evaluations of Nx state grid points. Similarly,a control evaluation for a state grid point at time stageNt - 2 requires performing three state shootings.Therefore, it is required to perform 3NxNc state shoot-ings for Nx state grid points with Nc allowable controlsfor each state grid point. By induction we know thatthe control evaluation for Nx state grid points at timestage Nt - j, 0 e j e Nt - 2 requires performing (j +1)NxNc state shootings. Because there is only one stategrid point for the first time stage, the control evaluationrequires performing NcNt state shootings. On the basisof the above description, we know that the total numberof state shootings performed in one cycle of backwarddynamic programming is given by

This does not include the number of state shootings forgenerating state grid points.

It is noted that there is an alternative approach tothe control evaluation. At time stage j, 0 e j < Nt, theperformance index Ij-1(k, i) associated with the kth stategrid point x̂(j-1, k) and the ith allowable control u(j-1, i) is evaluated by integrating the augmented stateequations with the initial condition [x̂(j-1, k), 0] andthe control u(j-1, i) from t ) (j - 1)∆ to j∆ to reach anaugmented state [x((j∆)), xn+1(j∆)]. Let the nearest stategrid point at time stage j + 1 to the state x(j∆) be x̂(j,l), of which the optimal control and the associatedperformance index are given by u(j, ν) and I*(j, l). Thenthe performance index Ij-1(k, i) is evaluated as

This way of evaluating the performance index canreduce a great number of state shootings. However, thecomputed performance index does not represent the trueperformance index because of the discrepancies betweenthe shooted states and state grid points at time instantst ) j∆, (j + 1)∆, ..., (Nt - 1)∆. For example, there is adiscrepancy between the shooted state x(j∆) and thestate grid point x(j, l). In authors’ experience, thismethod of evaluating the performance index often leadsto poor convergence in IDP optimization.

Acknowledgment

We thank Dr. Rein Luus for his interest in our paper.

IE991076A

NcNt + NcNx + 2NcNx + ... + (Nt - 1)NcNx )NcNt + NcNxNt(Nt - 1)/2

Ij-1(k, i) ) xn+1(j∆) + I*(j, l)

2514 Ind. Eng. Chem. Res., Vol. 38, No. 6, 1999

Documents

Reply to Comments on “Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems”