2
Reply to Comments on “Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems” Chyi Hwang* Department of Chemical Engineering, National Chung Cheng University, Chia-Yi 621, Taiwan Jeng-Shaw Lin ² Department of Chemical Engineering, National Cheng Kung University, Taina 700, Taiwan Sir: The iterative dynamic programming (IDP) method initiated by Luus and co-workers is indeed a powerful technique for finding optimal controls for continuous- time dynamic systems described by nonlinear dif- ferential equations. In the literature, it has been applied to solve various optimal control problems with a great possibility of obtaining the global optimum solutions. However, as is well-known, IDP is a heuristic optimiza- tion method whose performance depends on several parameters, such as N x , N t , N c , N p , N i , R, and η. Moreover, the possibility of using IDP to obtain global optimal solutions also depends on the way of generating candidate controls and the problem type. How to choose proper parameters in using an IDP scheme to solve a specific optimal control problem in order to ensure the global optimum is always difficult. Hence, any sugges- tions that can lead to improving the computational efficiency and the solution convergence of IDP are helpful to interested readers. One of the advantages of IDP lies in the fact that the scheme of dynamic programming can be implemented in a parallel computing environment so that the opti- mization computation can be accomplished in a pre- specified time interval. This is very crucial in such a real-time application as the nonlinear model predictive control, in which the optimization computation must be accomplished in a specified time interval. The IDP scheme with a single-state grid point at each time stage cannot generally get a satisfactory result without using multipass computations. However, multipass IDP com- putations are sequential operations which cannot take full advantage of a parallel computer, in which a computing job can be finished in a much shorter time interval if the underlying computation scheme can be implemented in a parallel manner. Hence, single-pass IDP computation with multiple-state grid points has its great potential use in real-time applications, even if the solution of the problem to be solved can be obtained off- line via the multipass single-grid IDP scheme. From the above viewpoint, we think it is not necessary to make a criticism of the examples illustrated in our paper. There is almost always a tradeoff between computa- tional time and global convergence in solving an opti- mization problem which has more than one local opti- mum. As mentioned in our paper, the convergence of the IDP solution depends on the contraction factor R. It is generally true that the larger the contraction factor used, the more computation time that is required and the greater the possibility to obtain a globally converged solution. It is therefore not surprising that Luus dem- onstrated a converged solution by IDP with allowable controls chosen randomly and with R) 0.95 and 100 iterations, which requires more than triple the computer time required by Sobol’s systematic approach of assign- ing allowable controls and with R) 0.85 and 30 iterations. It is noted that the use of R) 0.85 with 30 iterations reduces the control resolution by the factor 0.85 30 ) 0.007 63, which is comparable to the factor of 0.95 100 ) 0.005 92 in the use of R) 0.95 with 100 iterations. Hence, the contraction factor of R) 0.85 and the 30 iterations taken in example 2 are quite reason- able for a compromise between the computation time required and the global optimal solution expected. This comparison demonstrates the advantage of using Sobol’s systematic approach to generate control candidates, allowing the use of a smaller region contraction factor, which in turn reduces the number of iterations and computing time, for obtaining a global optimal solution. By the way, we did not conclude that the single-pass IDP with randomly generated controls fails to obtain a converged solution for any region contraction factor. In our paper, we gave the formula N c N t (1 + N x (N t - 1)/2) for estimating the number of state shootings required in one cycle of usual backward dynamic programming (DP) computation. In the following, we explain how this formula was derived. Suppose that the number of state grid points for each time stage is N x and the number of allowable controls for each state grid point is N c . The time interval of interest [0, t f ] is partitioned into N t equal-length time stages. The kth time stage spans the time interval [(k - 1)Δ, kΔ], where Δ ) t f /N t . Let the N x state grid points of the kth time stage be represented by x ˆ(k-1, i), i ) 1, 2, ..., N x and the N c allowable controls for each state grid points by u(k, i), i ) 1, 2, ..., N c . The backward DP starts the control evaluation from the state grid points of the last time stage, or time stage N t , which spans the time interval [(N t - 1)Δ, N t Δ]. At time stage N t , the number of state shootings required to evaluate the optimal controls associated with the N x state grid points x ˆ(N t -1, k), k ) 1, 2, ..., N x , is N c N x . At time stage N t - 1, the performance index I Nt-2 (k, i) associated with the kth state grid point x ˆ(N t -2, k) and the ith candidate control u(N t -2, i) is evaluated by first performing time integration of augmented dynamic equations with the initial condition [x ˆ(N t -2, k), 0] and the control u(N t -2, i) from t ) (N t - 2)Δ to (N t - 1)Δ to obtain an augmented state, say [x((N t -1)Δ), x n+1 ((N t -1)Δ)]. In general, the state x((N t -1)Δ) may not coincide with one of the state grid points x ˆ(N t -1, j), j ) 1, 2, ..., N x , generated for time stage N t . Suppose the nearest state grid point to the state x((N t -1)Δ) is given by x(N t -1, l) ² This author passed away in July 1998, 1 month after he received his Ph.D. degree from National Cheng Kung Univer- sity. 2513 Ind. Eng. Chem. Res. 1999, 38, 2513-2514 10.1021/ie991076a CCC: $18.00 © 1999 American Chemical Society Published on Web 05/07/1999

Reply to Comments on “Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems”

Embed Size (px)

Citation preview

Reply to Comments on “Enhancement of the Global Convergence of UsingIterative Dynamic Programming To Solve Optimal Control Problems”

Chyi Hwang*

Department of Chemical Engineering, National Chung Cheng University, Chia-Yi 621, Taiwan

Jeng-Shaw Lin†

Department of Chemical Engineering, National Cheng Kung University, Taina 700, Taiwan

Sir: The iterative dynamic programming (IDP) methodinitiated by Luus and co-workers is indeed a powerfultechnique for finding optimal controls for continuous-time dynamic systems described by nonlinear dif-ferential equations. In the literature, it has been appliedto solve various optimal control problems with a greatpossibility of obtaining the global optimum solutions.However, as is well-known, IDP is a heuristic optimiza-tion method whose performance depends on severalparameters, such as Nx, Nt, Nc, Np, Ni, R, and η.Moreover, the possibility of using IDP to obtain globaloptimal solutions also depends on the way of generatingcandidate controls and the problem type. How to chooseproper parameters in using an IDP scheme to solve aspecific optimal control problem in order to ensure theglobal optimum is always difficult. Hence, any sugges-tions that can lead to improving the computationalefficiency and the solution convergence of IDP arehelpful to interested readers.

One of the advantages of IDP lies in the fact that thescheme of dynamic programming can be implementedin a parallel computing environment so that the opti-mization computation can be accomplished in a pre-specified time interval. This is very crucial in such areal-time application as the nonlinear model predictivecontrol, in which the optimization computation must beaccomplished in a specified time interval. The IDPscheme with a single-state grid point at each time stagecannot generally get a satisfactory result without usingmultipass computations. However, multipass IDP com-putations are sequential operations which cannot takefull advantage of a parallel computer, in which acomputing job can be finished in a much shorter timeinterval if the underlying computation scheme can beimplemented in a parallel manner. Hence, single-passIDP computation with multiple-state grid points has itsgreat potential use in real-time applications, even if thesolution of the problem to be solved can be obtained off-line via the multipass single-grid IDP scheme. From theabove viewpoint, we think it is not necessary to make acriticism of the examples illustrated in our paper.

There is almost always a tradeoff between computa-tional time and global convergence in solving an opti-mization problem which has more than one local opti-mum. As mentioned in our paper, the convergence ofthe IDP solution depends on the contraction factor R.It is generally true that the larger the contraction factorused, the more computation time that is required andthe greater the possibility to obtain a globally converged

solution. It is therefore not surprising that Luus dem-onstrated a converged solution by IDP with allowablecontrols chosen randomly and with R ) 0.95 and 100iterations, which requires more than triple the computertime required by Sobol’s systematic approach of assign-ing allowable controls and with R ) 0.85 and 30iterations. It is noted that the use of R ) 0.85 with 30iterations reduces the control resolution by the factor0.8530 ) 0.007 63, which is comparable to the factor of0.95100 ) 0.005 92 in the use of R ) 0.95 with 100iterations. Hence, the contraction factor of R ) 0.85 andthe 30 iterations taken in example 2 are quite reason-able for a compromise between the computation timerequired and the global optimal solution expected. Thiscomparison demonstrates the advantage of using Sobol’ssystematic approach to generate control candidates,allowing the use of a smaller region contraction factor,which in turn reduces the number of iterations andcomputing time, for obtaining a global optimal solution.By the way, we did not conclude that the single-passIDP with randomly generated controls fails to obtain aconverged solution for any region contraction factor.

In our paper, we gave the formula NcNt(1 + Nx(Nt -1)/2) for estimating the number of state shootingsrequired in one cycle of usual backward dynamicprogramming (DP) computation. In the following, weexplain how this formula was derived.

Suppose that the number of state grid points for eachtime stage is Nx and the number of allowable controlsfor each state grid point is Nc. The time interval ofinterest [0, tf] is partitioned into Nt equal-length timestages. The kth time stage spans the time interval [(k- 1)∆, k∆], where ∆ ) tf/Nt. Let the Nx state grid pointsof the kth time stage be represented by x̂(k-1, i), i ) 1,2, ..., Nx and the Nc allowable controls for each stategrid points by u(k, i), i ) 1, 2, ..., Nc. The backward DPstarts the control evaluation from the state grid pointsof the last time stage, or time stage Nt, which spansthe time interval [(Nt - 1)∆, Nt∆]. At time stage Nt, thenumber of state shootings required to evaluate theoptimal controls associated with the Nx state grid pointsx̂(Nt-1, k), k ) 1, 2, ..., Nx, is NcNx. At time stage Nt -1, the performance index INt-2(k, i) associated with thekth state grid point x̂(Nt-2, k) and the ith candidatecontrol u(Nt-2, i) is evaluated by first performing timeintegration of augmented dynamic equations with theinitial condition [x̂(Nt-2, k), 0] and the control u(Nt-2,i) from t ) (Nt - 2)∆ to (Nt - 1)∆ to obtain anaugmented state, say [x((Nt-1)∆), xn+1((Nt-1)∆)]. Ingeneral, the state x((Nt-1)∆) may not coincide with oneof the state grid points x̂(Nt-1, j), j ) 1, 2, ..., Nx,generated for time stage Nt. Suppose the nearest stategrid point to the state x((Nt-1)∆) is given by x(Nt-1, l)

† This author passed away in July 1998, 1 month after hereceived his Ph.D. degree from National Cheng Kung Univer-sity.

2513Ind. Eng. Chem. Res. 1999, 38, 2513-2514

10.1021/ie991076a CCC: $18.00 © 1999 American Chemical SocietyPublished on Web 05/07/1999

whose optimal control has been evaluated to be u(Nt-1, j). With the control u(Nt-1, j) and the initial condition[x((Nt-1)∆), xn+1((Nt-1)∆)], we then integrate the aug-mented state equations from t ) (Nt-1)∆ to Nt∆ toobtain the performance index INt-2(k, i) ) xn+1(Nt∆).Hence, at time stage Nt - 1, it is required to performtwo state shootings for each control evaluation associ-ated with a state grid point and 2NcNx state shootingsfor control evaluations of Nx state grid points. Similarly,a control evaluation for a state grid point at time stageNt - 2 requires performing three state shootings.Therefore, it is required to perform 3NxNc state shoot-ings for Nx state grid points with Nc allowable controlsfor each state grid point. By induction we know thatthe control evaluation for Nx state grid points at timestage Nt - j, 0 e j e Nt - 2 requires performing (j +1)NxNc state shootings. Because there is only one stategrid point for the first time stage, the control evaluationrequires performing NcNt state shootings. On the basisof the above description, we know that the total numberof state shootings performed in one cycle of backwarddynamic programming is given by

This does not include the number of state shootings forgenerating state grid points.

It is noted that there is an alternative approach tothe control evaluation. At time stage j, 0 e j < Nt, theperformance index Ij-1(k, i) associated with the kth stategrid point x̂(j-1, k) and the ith allowable control u(j-1, i) is evaluated by integrating the augmented stateequations with the initial condition [x̂(j-1, k), 0] andthe control u(j-1, i) from t ) (j - 1)∆ to j∆ to reach anaugmented state [x((j∆)), xn+1(j∆)]. Let the nearest stategrid point at time stage j + 1 to the state x(j∆) be x̂(j,l), of which the optimal control and the associatedperformance index are given by u(j, ν) and I*(j, l). Thenthe performance index Ij-1(k, i) is evaluated as

This way of evaluating the performance index canreduce a great number of state shootings. However, thecomputed performance index does not represent the trueperformance index because of the discrepancies betweenthe shooted states and state grid points at time instantst ) j∆, (j + 1)∆, ..., (Nt - 1)∆. For example, there is adiscrepancy between the shooted state x(j∆) and thestate grid point x(j, l). In authors’ experience, thismethod of evaluating the performance index often leadsto poor convergence in IDP optimization.

Acknowledgment

We thank Dr. Rein Luus for his interest in our paper.

IE991076A

NcNt + NcNx + 2NcNx + ... + (Nt - 1)NcNx )NcNt + NcNxNt(Nt - 1)/2

Ij-1(k, i) ) xn+1(j∆) + I*(j, l)

2514 Ind. Eng. Chem. Res., Vol. 38, No. 6, 1999