Comments on “Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems”

CORRESPONDENCE

Comments on “Enhancement of the Global Convergence of Using IterativeDynamic Programming To Solve Optimal Control Problems”

Rein Luus

Department of Chemical Engineering, University of Toronto, Toronto, Ontario M5S 3E5, Canada

Sir: In a recent paper Lin and Hwang (1998) sug-gested ways of improving the convergence rate ofiterative dynamic programming (IDP) and used twooptimal control problems for illustration. There are,however, some gaps in their evaluation of existingapproaches, so the aim of these comments is to providesome further information to enable the suggestions ofthe authors to be put into proper perspective.

In using IDP, Bojkov and Luus (1993) showed that alarge number of grid points at each time step is notnecessary for many typical chemical engineering prob-lems and showed (Luus and Bojkov, 1994) that the useof a single grid point at each time step yielded rapidconvergence to the global optimum of a highly nonlinearbifunctional catalyst blending problem, where 26 localoptima had been obtained by the use of sequentialquadratic programming (SQP). However, there are someproblems, such as the fed-batch reactor problem asconsidered by Hartig et al. (1995), which require the useof a relatively large number of grid points at each timestep to establish the global optimal control policy. Thetwo problems considered by Lin and Hwang (1998) donot fall into the latter category, so there is no need toconsider more than a single grid point at each time step.This reduces the number of function evaluations sub-stantially without sacrificing the accuracy of the result-ing solution.

The optimal control problem under consideration isa linear system with a quadratic performance index asgiven by Nagurka et al. (1991), where the dimensional-ity of the system is chosen to be n ) 8; i.e., there areeight linear differential equations and eight controlvariables to be determined to minimize a quadraticperformance index. A piecewise linear continuous op-timal control policy for this system with n ) 100, 200,and 250 has been established successfully with IDP(Luus, 1996), so the case with n ) 8 should be straight-forward, and actually it is, contrary to the findings ofLin and Hwang (1998). The authors carried out numer-ous runs, varying the number of grid points, the numberof allowable values for control, and the random numbergenerators, and reported that, when used in a singlepass, IDP leads to convergence to within 0.01% of theoptimum only if Sobol’s quasi-random sequence is used.The best that the other random number generators wereable to achieve was to reach within 0.03% of theminimum performance index. For the investigation,they used 10 time stages (P ) 10) of equal length andsought piecewise constant control to minimize theperformance index I.

To check on this surprising result, this problem was

run with IDP with the built-in random number genera-tor in the WATCOM FORTRAN compiler version 9.5.A single grid point was used at each time step in all ofthe runs. The minimum value of the performance indexis I ) 373.592 574 90 when the integration time step h) 0.01 is used and I ) 373.592 574 11 when h ) 0.005is used. Therefore, the choice of h ) 0.01, as used bythe authors, is very good when the fourth-order Runge-Kutta method is used. The starting control policy is ui) -5.0, i ) 1, 2, ..., 8, and the initial region size for thecontrol variables is set at 5.0. This gives the initial valuefor the performance index of I ) 1988.476 190 23. Oneimportant parameter in IDP is the region contractionfactor γ used after every iteration to reduce the size ofthe region over which admissible values for control aretaken. As is shown in Figure 1, where R ) 25 randomlychosen values are chosen for control, the region contrac-tion factor has a very significant effect. It is clear thatγ ) 0.85 leads to premature collapse of the searchregion, so that adequate convergence cannot be obtainedwith IDP in a single pass. The use of a larger numberof randomly chosen points, let us say R ) 100, makesvery little difference. However, with γ ) 0.90 there isno difficulty in getting convergence to seven figures ina single pass within 100 iterations. The use of γ ) 0.95leads to a slower convergence rate, so a larger numberof iterations would be needed for convergence to seven-figure accuracy. Because Lin and Hwang (1998) used γ) 0.85 and allowed only 30 iterations, it is obvious whythey concluded erroneously that single-pass IDP doesnot yield sufficiently accurate convergence for thissystem.

To overcome their perceived problem, instead ofchanging the region contraction factor, Lin and Hwang(1998) suggested using IDP in a multipass manner andusing Sobol’s quasi-random method for generating thecandidates for control. The use of a multipass methodin IDP was first suggested by Luus (1993) in dealingwith very high dimensional systems. The computationaladvantage of using IDP in a multipass manner hasalready been reported in the literature in dealing withtime-delay problems (Luus et al., 1995), with high-dimensional optimal control problems (Luus, 1996), andwith the optimal control of nonseparable optimal controlproblems (Luus, 1997). The great advantage of using amultipass method is to enable IDP to be applied toconstrained systems where some parameters in thepenalty functions are updated between passes (Luusand Storey, 1997; Mekarapiruk and Luus, 1997; Luus,1998b). The systematic developments in the computa-tional improvements of IDP are given in a recent review

2510 Ind. Eng. Chem. Res. 1999, 38, 2510-2512

10.1021/ie9808357 CCC: $18.00 © 1999 American Chemical SocietyPublished on Web 05/15/1999

paper (Luus, 1998a). The need for a different means ofgenerating candidates for control should be carefullyexamined after taking into full consideration the ef-fectiveness of the existing procedure.

In using IDP in a multipass fashion, there is anadditional parameter which plays an important role,namely, the region restoration factor η, which repre-sents the factor by which the initial control regions arerestored at the beginning of the particular pass in termsof the initial region at the beginning of the previouspass; i.e., rin,i(q) ) ηrin,i(q - 1), i ) 1, 2, ..., 8, where q isthe pass number. To reduce computational require-ments, we would like to have η as small as possiblewithout prematurely collapsing the search space. Linand Hwang, using γ ) 0.85, put η ) 0.85, which isshown in Figure 2 to be too large to obtain adequateconvergence, when R ) 10, and 20 iterations per passare used. The choice of η ) 0.4 is a much better choiceand provides convergence to 10-figure accuracy in only11 passes. It is also noted that the choice of γ in amultipass method is not that crucial, because the region

size is partly restored after each pass. As is shown inFigure 3, with γ ) 0.95 almost the same results areobtained as with γ ) 0.85.

When the number of random points per iteration isreduced to R ) 5, with γ ) 0.95, as is shown in Figure4, rapid convergence is obtained with η ) 0.4. However,with γ ) 0.85 premature collapse of the region resultswhen η ) 0.4 is used, as is shown in Figure 5. It is clearthat with R ) 5 there is no difficulty in gettingconvergence to 10 figures well within 15 passes of 20iterations each. The computation time on Pentium/120with R ) 5 for a run consisting of 20 passes, of 20iterations each, took 48.4 s. This is approximately thesame as the 48.5 s required for 100 iterations with R )25 for the single pass run. The computation times wereobtained by reading the clock and, therefore, include thetime also used to write information into files and to thescreen.

Lin and Hwang (1998) gave a formula showing thatthe computational effort should increase quadratically

Figure 1. Convergence characteristics of IDP for a single pass,showing the effect of the region contraction factor γ: b, γ ) 0.85;2, γ ) 0.90; 9, γ ) 0.95.

Figure 2. Convergence characteristics of IDP used in a multipassmanner with R ) 10 and γ ) 0.85, showing the effect of the regionrestoration factor η: b, η ) 0.40; 2, η ) 0.60; 9, η ) 0.75; [, η )0.85.



Ind. Eng. Chem. Res., Vol. 38, No. 6, 1999 2511

with an increase in the number of time stages P. Thisis not correct, because to solve the problem with P ) 20to yield I ) 373.164 718 with R ) 5 required 4.6 s/passand to solve that with P ) 50 to yield I ) 373.044 727took 11.0 s/pass. In each case η ) 0.4 with γ ) 0.95 wasused and the convergence was obtained within 20 passesof 20 iterations each. The relationship between thecomputation time and the number of time stages isalmost linear. It is interesting to note that the perfor-mance index obtained with 50 time stages is marginallyhigher than I ) 373.022 319 obtained with 10 stageswith piecewise linear continuous control.

In conclusion, therefore, the random number genera-tor available in the FORTRAN compiler used in thiswork, and in previous papers, gives excellent results forIDP and the random number generators and seednumbers are expected to have minimal effect. The pathto the global optimum may differ slightly, but the endresult upon convergence will be the same. What is muchmore important, however, is the size of the region overwhich the numbers are generated. The size of the region,apart from the initial choice, is controlled by the regioncontraction factor γ and the region restoration factor η.

There is a relatively wide range over which γ and η maybe chosen to achieve successful results. The effects ofthese two parameters have been shown here, and thisinformation should be useful as a guide for solving otheroptimal control problems with IDP.

Literature CitedBojkov, B.; Luus, R. Evaluation of the Parameters Used in

Iterative Dynamic Programming. Can. J. Chem. Eng. 1993,451-459.

Hartig, F.; Keil, F. J.; Luus, R. Comparison of OptimizationMethods for a Fed-Batch Reactor. Hung. J. Ind. Chem. 1995,23, 141-148.

Lin, J. S.; Hwang, C. Enhancement of the Global Convergence ofUsing Iterative Dynamic Programming To Solve OptimalControl Problems. Ind. Eng. Chem. Res. 1998, 37, 2469-2478.

Luus, R. Application of Iterative Dynamic Programming to VeryHigh Dimensional Systems. Hung. J. Ind. Chem. 1993, 21, 243-250.

Luus, R. Numerical Convergence Properties of Iterative DynamicProgramming when Applied to High Dimensional Systems.Chem. Eng. Res. Des. 1996, 74, 55-62.

Luus, R. Application of Iterative Dynamic Programming toOptimal Control of Nonseparable Problems. Hung. J. Ind.Chem. 1997, 25, 293-297.

Luus, R. Iterative Dynamic Programming: From Curiosity to aPractical Optimization Procedure. Control Intell. Syst. 1998a,26, 1-8.

Luus, R. Direct Approach to Time Optimal Control by IterativeDynamic Programming. Proceedings of the IASTED Interna-tional Conference on Intelligent Systems and Control, Halifax,Nova Scotia, Canada, June 1-4, IASTED/Acta Press: Anaheim,CA, 1998b; pp 121-125.

Luus, R.; Bojkov, B. Global Optimization of the BifunctionalCatalyst Problem. Can. J. Chem. Eng. 1994, 72, 160-163.

Luus, R.; Storey, C. Optimal Control of Final State ConstrainedSystems. Proceedings of the IASTED International Conferenceon Modelling, Simulation and Optimization, Singapore, Aug11-13, IASTED/Acta Press: Anaheim, CA, 1997; pp 245-249.

Luus, R.; Zhang, X.; Hartig, F.; Keil, F. J. Use of Piecewise LinearContinuous Optimal Control for Time-Delay Systems. Ind. Eng.Chem. Res. 1995, 34, 4136-4139.

Mekarapiruk, W.; Luus, R. Optimal Control of Inequality StateConstrained Systems. Ind. Eng. Chem. Res. 1997, 36, 1686-1694.

Nagurka, M.; Wang, S.; Yen, V. Solving linear Quadratic OptimalControl Problems by Chebychev-Based State Parameterization.Proceedings of the 1991 American Control Conference, Boston,MA, IEEE Service Center: Piscataway, NJ, 1991; pp 104-109.

Received for review July 29, 1998Accepted March 8, 1999

IE9808357


2512 Ind. Eng. Chem. Res., Vol. 38, No. 6, 1999

Documents

Comments on “Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems”