3
CORRESPONDENCE Comments on “Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems” Rein Luus Department of Chemical Engineering, University of Toronto, Toronto, Ontario M5S 3E5, Canada Sir: In a recent paper Lin and Hwang (1998) sug- gested ways of improving the convergence rate of iterative dynamic programming (IDP) and used two optimal control problems for illustration. There are, however, some gaps in their evaluation of existing approaches, so the aim of these comments is to provide some further information to enable the suggestions of the authors to be put into proper perspective. In using IDP, Bojkov and Luus (1993) showed that a large number of grid points at each time step is not necessary for many typical chemical engineering prob- lems and showed (Luus and Bojkov, 1994) that the use of a single grid point at each time step yielded rapid convergence to the global optimum of a highly nonlinear bifunctional catalyst blending problem, where 26 local optima had been obtained by the use of sequential quadratic programming (SQP). However, there are some problems, such as the fed-batch reactor problem as considered by Hartig et al. (1995), which require the use of a relatively large number of grid points at each time step to establish the global optimal control policy. The two problems considered by Lin and Hwang (1998) do not fall into the latter category, so there is no need to consider more than a single grid point at each time step. This reduces the number of function evaluations sub- stantially without sacrificing the accuracy of the result- ing solution. The optimal control problem under consideration is a linear system with a quadratic performance index as given by Nagurka et al. (1991), where the dimensional- ity of the system is chosen to be n ) 8; i.e., there are eight linear differential equations and eight control variables to be determined to minimize a quadratic performance index. A piecewise linear continuous op- timal control policy for this system with n ) 100, 200, and 250 has been established successfully with IDP (Luus, 1996), so the case with n ) 8 should be straight- forward, and actually it is, contrary to the findings of Lin and Hwang (1998). The authors carried out numer- ous runs, varying the number of grid points, the number of allowable values for control, and the random number generators, and reported that, when used in a single pass, IDP leads to convergence to within 0.01% of the optimum only if Sobol’s quasi-random sequence is used. The best that the other random number generators were able to achieve was to reach within 0.03% of the minimum performance index. For the investigation, they used 10 time stages (P ) 10) of equal length and sought piecewise constant control to minimize the performance index I. To check on this surprising result, this problem was run with IDP with the built-in random number genera- tor in the WATCOM FORTRAN compiler version 9.5. A single grid point was used at each time step in all of the runs. The minimum value of the performance index is I ) 373.592 574 90 when the integration time step h ) 0.01 is used and I ) 373.592 574 11 when h ) 0.005 is used. Therefore, the choice of h ) 0.01, as used by the authors, is very good when the fourth-order Runge- Kutta method is used. The starting control policy is u i )-5.0, i ) 1, 2, ..., 8, and the initial region size for the control variables is set at 5.0. This gives the initial value for the performance index of I ) 1988.476 190 23. One important parameter in IDP is the region contraction factor γ used after every iteration to reduce the size of the region over which admissible values for control are taken. As is shown in Figure 1, where R ) 25 randomly chosen values are chosen for control, the region contrac- tion factor has a very significant effect. It is clear that γ ) 0.85 leads to premature collapse of the search region, so that adequate convergence cannot be obtained with IDP in a single pass. The use of a larger number of randomly chosen points, let us say R ) 100, makes very little difference. However, with γ ) 0.90 there is no difficulty in getting convergence to seven figures in a single pass within 100 iterations. The use of γ ) 0.95 leads to a slower convergence rate, so a larger number of iterations would be needed for convergence to seven- figure accuracy. Because Lin and Hwang (1998) used γ ) 0.85 and allowed only 30 iterations, it is obvious why they concluded erroneously that single-pass IDP does not yield sufficiently accurate convergence for this system. To overcome their perceived problem, instead of changing the region contraction factor, Lin and Hwang (1998) suggested using IDP in a multipass manner and using Sobol’s quasi-random method for generating the candidates for control. The use of a multipass method in IDP was first suggested by Luus (1993) in dealing with very high dimensional systems. The computational advantage of using IDP in a multipass manner has already been reported in the literature in dealing with time-delay problems (Luus et al., 1995), with high- dimensional optimal control problems (Luus, 1996), and with the optimal control of nonseparable optimal control problems (Luus, 1997). The great advantage of using a multipass method is to enable IDP to be applied to constrained systems where some parameters in the penalty functions are updated between passes (Luus and Storey, 1997; Mekarapiruk and Luus, 1997; Luus, 1998b). The systematic developments in the computa- tional improvements of IDP are given in a recent review 2510 Ind. Eng. Chem. Res. 1999, 38, 2510-2512 10.1021/ie9808357 CCC: $18.00 © 1999 American Chemical Society Published on Web 05/15/1999

Comments on “Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems”

  • Upload
    rein

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Comments on “Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems”

CORRESPONDENCE

Comments on “Enhancement of the Global Convergence of Using IterativeDynamic Programming To Solve Optimal Control Problems”

Rein Luus

Department of Chemical Engineering, University of Toronto, Toronto, Ontario M5S 3E5, Canada

Sir: In a recent paper Lin and Hwang (1998) sug-gested ways of improving the convergence rate ofiterative dynamic programming (IDP) and used twooptimal control problems for illustration. There are,however, some gaps in their evaluation of existingapproaches, so the aim of these comments is to providesome further information to enable the suggestions ofthe authors to be put into proper perspective.

In using IDP, Bojkov and Luus (1993) showed that alarge number of grid points at each time step is notnecessary for many typical chemical engineering prob-lems and showed (Luus and Bojkov, 1994) that the useof a single grid point at each time step yielded rapidconvergence to the global optimum of a highly nonlinearbifunctional catalyst blending problem, where 26 localoptima had been obtained by the use of sequentialquadratic programming (SQP). However, there are someproblems, such as the fed-batch reactor problem asconsidered by Hartig et al. (1995), which require the useof a relatively large number of grid points at each timestep to establish the global optimal control policy. Thetwo problems considered by Lin and Hwang (1998) donot fall into the latter category, so there is no need toconsider more than a single grid point at each time step.This reduces the number of function evaluations sub-stantially without sacrificing the accuracy of the result-ing solution.

The optimal control problem under consideration isa linear system with a quadratic performance index asgiven by Nagurka et al. (1991), where the dimensional-ity of the system is chosen to be n ) 8; i.e., there areeight linear differential equations and eight controlvariables to be determined to minimize a quadraticperformance index. A piecewise linear continuous op-timal control policy for this system with n ) 100, 200,and 250 has been established successfully with IDP(Luus, 1996), so the case with n ) 8 should be straight-forward, and actually it is, contrary to the findings ofLin and Hwang (1998). The authors carried out numer-ous runs, varying the number of grid points, the numberof allowable values for control, and the random numbergenerators, and reported that, when used in a singlepass, IDP leads to convergence to within 0.01% of theoptimum only if Sobol’s quasi-random sequence is used.The best that the other random number generators wereable to achieve was to reach within 0.03% of theminimum performance index. For the investigation,they used 10 time stages (P ) 10) of equal length andsought piecewise constant control to minimize theperformance index I.

To check on this surprising result, this problem was

run with IDP with the built-in random number genera-tor in the WATCOM FORTRAN compiler version 9.5.A single grid point was used at each time step in all ofthe runs. The minimum value of the performance indexis I ) 373.592 574 90 when the integration time step h) 0.01 is used and I ) 373.592 574 11 when h ) 0.005is used. Therefore, the choice of h ) 0.01, as used bythe authors, is very good when the fourth-order Runge-Kutta method is used. The starting control policy is ui) -5.0, i ) 1, 2, ..., 8, and the initial region size for thecontrol variables is set at 5.0. This gives the initial valuefor the performance index of I ) 1988.476 190 23. Oneimportant parameter in IDP is the region contractionfactor γ used after every iteration to reduce the size ofthe region over which admissible values for control aretaken. As is shown in Figure 1, where R ) 25 randomlychosen values are chosen for control, the region contrac-tion factor has a very significant effect. It is clear thatγ ) 0.85 leads to premature collapse of the searchregion, so that adequate convergence cannot be obtainedwith IDP in a single pass. The use of a larger numberof randomly chosen points, let us say R ) 100, makesvery little difference. However, with γ ) 0.90 there isno difficulty in getting convergence to seven figures ina single pass within 100 iterations. The use of γ ) 0.95leads to a slower convergence rate, so a larger numberof iterations would be needed for convergence to seven-figure accuracy. Because Lin and Hwang (1998) used γ) 0.85 and allowed only 30 iterations, it is obvious whythey concluded erroneously that single-pass IDP doesnot yield sufficiently accurate convergence for thissystem.

To overcome their perceived problem, instead ofchanging the region contraction factor, Lin and Hwang(1998) suggested using IDP in a multipass manner andusing Sobol’s quasi-random method for generating thecandidates for control. The use of a multipass methodin IDP was first suggested by Luus (1993) in dealingwith very high dimensional systems. The computationaladvantage of using IDP in a multipass manner hasalready been reported in the literature in dealing withtime-delay problems (Luus et al., 1995), with high-dimensional optimal control problems (Luus, 1996), andwith the optimal control of nonseparable optimal controlproblems (Luus, 1997). The great advantage of using amultipass method is to enable IDP to be applied toconstrained systems where some parameters in thepenalty functions are updated between passes (Luusand Storey, 1997; Mekarapiruk and Luus, 1997; Luus,1998b). The systematic developments in the computa-tional improvements of IDP are given in a recent review

2510 Ind. Eng. Chem. Res. 1999, 38, 2510-2512

10.1021/ie9808357 CCC: $18.00 © 1999 American Chemical SocietyPublished on Web 05/15/1999

Page 2: Comments on “Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems”

paper (Luus, 1998a). The need for a different means ofgenerating candidates for control should be carefullyexamined after taking into full consideration the ef-fectiveness of the existing procedure.

In using IDP in a multipass fashion, there is anadditional parameter which plays an important role,namely, the region restoration factor η, which repre-sents the factor by which the initial control regions arerestored at the beginning of the particular pass in termsof the initial region at the beginning of the previouspass; i.e., rin,i(q) ) ηrin,i(q - 1), i ) 1, 2, ..., 8, where q isthe pass number. To reduce computational require-ments, we would like to have η as small as possiblewithout prematurely collapsing the search space. Linand Hwang, using γ ) 0.85, put η ) 0.85, which isshown in Figure 2 to be too large to obtain adequateconvergence, when R ) 10, and 20 iterations per passare used. The choice of η ) 0.4 is a much better choiceand provides convergence to 10-figure accuracy in only11 passes. It is also noted that the choice of γ in amultipass method is not that crucial, because the region

size is partly restored after each pass. As is shown inFigure 3, with γ ) 0.95 almost the same results areobtained as with γ ) 0.85.

When the number of random points per iteration isreduced to R ) 5, with γ ) 0.95, as is shown in Figure4, rapid convergence is obtained with η ) 0.4. However,with γ ) 0.85 premature collapse of the region resultswhen η ) 0.4 is used, as is shown in Figure 5. It is clearthat with R ) 5 there is no difficulty in gettingconvergence to 10 figures well within 15 passes of 20iterations each. The computation time on Pentium/120with R ) 5 for a run consisting of 20 passes, of 20iterations each, took 48.4 s. This is approximately thesame as the 48.5 s required for 100 iterations with R )25 for the single pass run. The computation times wereobtained by reading the clock and, therefore, include thetime also used to write information into files and to thescreen.

Lin and Hwang (1998) gave a formula showing thatthe computational effort should increase quadratically

Figure 1. Convergence characteristics of IDP for a single pass,showing the effect of the region contraction factor γ: b, γ ) 0.85;2, γ ) 0.90; 9, γ ) 0.95.

Figure 2. Convergence characteristics of IDP used in a multipassmanner with R ) 10 and γ ) 0.85, showing the effect of the regionrestoration factor η: b, η ) 0.40; 2, η ) 0.60; 9, η ) 0.75; [, η )0.85.

Figure 3. Convergence characteristics of IDP used in a multipassmanner with R ) 10 and γ ) 0.95, showing the effect of the regionrestoration factor η: b, η ) 0.40; 2, η ) 0.60; 9, η ) 0.75; [, η )0.85.

Figure 4. Convergence characteristics of IDP used in a multipassmanner with R ) 5 and γ ) 0.95, showing the effect of the regionrestoration factor η: b, η ) 0.40; 2, η ) 0.60; 9, η ) 0.75; [, η )0.85.

Ind. Eng. Chem. Res., Vol. 38, No. 6, 1999 2511

Page 3: Comments on “Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems”

with an increase in the number of time stages P. Thisis not correct, because to solve the problem with P ) 20to yield I ) 373.164 718 with R ) 5 required 4.6 s/passand to solve that with P ) 50 to yield I ) 373.044 727took 11.0 s/pass. In each case η ) 0.4 with γ ) 0.95 wasused and the convergence was obtained within 20 passesof 20 iterations each. The relationship between thecomputation time and the number of time stages isalmost linear. It is interesting to note that the perfor-mance index obtained with 50 time stages is marginallyhigher than I ) 373.022 319 obtained with 10 stageswith piecewise linear continuous control.

In conclusion, therefore, the random number genera-tor available in the FORTRAN compiler used in thiswork, and in previous papers, gives excellent results forIDP and the random number generators and seednumbers are expected to have minimal effect. The pathto the global optimum may differ slightly, but the endresult upon convergence will be the same. What is muchmore important, however, is the size of the region overwhich the numbers are generated. The size of the region,apart from the initial choice, is controlled by the regioncontraction factor γ and the region restoration factor η.

There is a relatively wide range over which γ and η maybe chosen to achieve successful results. The effects ofthese two parameters have been shown here, and thisinformation should be useful as a guide for solving otheroptimal control problems with IDP.

Literature CitedBojkov, B.; Luus, R. Evaluation of the Parameters Used in

Iterative Dynamic Programming. Can. J. Chem. Eng. 1993,451-459.

Hartig, F.; Keil, F. J.; Luus, R. Comparison of OptimizationMethods for a Fed-Batch Reactor. Hung. J. Ind. Chem. 1995,23, 141-148.

Lin, J. S.; Hwang, C. Enhancement of the Global Convergence ofUsing Iterative Dynamic Programming To Solve OptimalControl Problems. Ind. Eng. Chem. Res. 1998, 37, 2469-2478.

Luus, R. Application of Iterative Dynamic Programming to VeryHigh Dimensional Systems. Hung. J. Ind. Chem. 1993, 21, 243-250.

Luus, R. Numerical Convergence Properties of Iterative DynamicProgramming when Applied to High Dimensional Systems.Chem. Eng. Res. Des. 1996, 74, 55-62.

Luus, R. Application of Iterative Dynamic Programming toOptimal Control of Nonseparable Problems. Hung. J. Ind.Chem. 1997, 25, 293-297.

Luus, R. Iterative Dynamic Programming: From Curiosity to aPractical Optimization Procedure. Control Intell. Syst. 1998a,26, 1-8.

Luus, R. Direct Approach to Time Optimal Control by IterativeDynamic Programming. Proceedings of the IASTED Interna-tional Conference on Intelligent Systems and Control, Halifax,Nova Scotia, Canada, June 1-4, IASTED/Acta Press: Anaheim,CA, 1998b; pp 121-125.

Luus, R.; Bojkov, B. Global Optimization of the BifunctionalCatalyst Problem. Can. J. Chem. Eng. 1994, 72, 160-163.

Luus, R.; Storey, C. Optimal Control of Final State ConstrainedSystems. Proceedings of the IASTED International Conferenceon Modelling, Simulation and Optimization, Singapore, Aug11-13, IASTED/Acta Press: Anaheim, CA, 1997; pp 245-249.

Luus, R.; Zhang, X.; Hartig, F.; Keil, F. J. Use of Piecewise LinearContinuous Optimal Control for Time-Delay Systems. Ind. Eng.Chem. Res. 1995, 34, 4136-4139.

Mekarapiruk, W.; Luus, R. Optimal Control of Inequality StateConstrained Systems. Ind. Eng. Chem. Res. 1997, 36, 1686-1694.

Nagurka, M.; Wang, S.; Yen, V. Solving linear Quadratic OptimalControl Problems by Chebychev-Based State Parameterization.Proceedings of the 1991 American Control Conference, Boston,MA, IEEE Service Center: Piscataway, NJ, 1991; pp 104-109.

Received for review July 29, 1998Accepted March 8, 1999

IE9808357

Figure 5. Convergence characteristics of IDP used in a multipassmanner with R ) 5 and γ ) 0.85, showing the effect of the regionrestoration factor η: b, η ) 0.40; 2, η ) 0.60; 9, η ) 0.75; [, η )0.85.

2512 Ind. Eng. Chem. Res., Vol. 38, No. 6, 1999