10
Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems Jeng-Shiaw Lin Department of Chemical Engineering, National Cheng Kung University, Tainan 700, Taiwan Chyi Hwang* Department of Chemical Engineering, National Chung Cheng University, Chia-Yi 621, Taiwan Although iterative dynamic programming (IDP) has been well recognized as a powerful method for finding the true optimal solution to a nonlinear dynamic optimization problem, the global optimality of the IDP solution is still not guaranteed completely. This paper explores enhancing the global convergence of using iterative dynamic programming to solve optimal control problems. Two approaches are employed to enhance the possibility of obtaining the true optimum while reducing the required computation efforts. One approach employs Sobol’s quasi-random sequence generator to generate allowable controls and the other utilizes multipass computation. Numerical examples show that the use of multipass IDP computation with small numbers of state grid points and allowable control values can indeed enhance the possibility of obtaining a true optimum. This is particularly true when the allowable control values are generated by using Sobol’s quasi-random sequence generator. Introduction Recently, the iterative dynamic programming (IDP) method has been introduced by Luss and coworkers (Bojkov and Luus, 1992a,b; Luus, 1989, 1990a,b; Luus and Rosen, 1991) to solve continuous-time optimal control problems where the systems are described by nonlinear differential equations. The IDP method is essentially based on treating control as time-stage operation, gridding the state and control variables, and utilizing region contraction to overcome the state di- mensionality problem. In performing IDP computa- tions, the state grid points are generated by assigning different values for control and at each state grid point admissible control values, which are chosen from a uniform distribution or at random, are evaluated to determine the optimal control associated with the state grid point. By contracting the control region for each time stage after each cycle of dynamic programming computation, the IDP finally obtains an optimal control profile. Due to the advantages of being easily imple- mented on a personal computer and having greater possibility to obtain a globally optimal solution than a nonlinear programming method, the original IDP method and some of the improved versions (Lin and Hwang, 1996b; Hwang and Lin, 1998) have been successfully applied to solve various optimal control problems of nonlinear processes (Bojkov and Luus, 1994a,b, 1996; Dadebo and McAuley, 1995b; Hartig and Keil, 1993a, 1994; Hartig et al., 1996; Keil et al., 1996; Luus, 1990c, 1991, 1993a-c; Luus and Bojkov, 1994; Luus et al., 1992; Luus and Galli, 1991; Mekarapiruk and Luus, 1997), including time-delay systems (Dadebo and Luus, 1992; Dadebo and McAuley, 1995a; Lin and Hwang, 1996a) and systems with a large number of state and control variables (Bojkov and Luus, 1992a,b, 1993, 1995; Hartig and Keil, 1993b; Luus, 1990b, 1993d; Luus and Smith, 1991). It is noted that IDP is a heuristic method that does not guarantee finding the global optimum. In the literature (Bojkov and Luus, 1993, 1994b, 1995, 1996; Hartig and Keil, 1993a; Luus and Bojkov, 1994; Luus et al., 1992; Mekarapiruk and Luus, 1997), several cases have been reported that the IDP method gives local optimal solutions. The possibility for the IDP method to obtain a local optimum depends on many parameters, such as the region contraction factor, the number of state grid points, the number of allowable control values, the choice of admissible values for control, etc. In general, using a large region contraction factor or large numbers of state grid points and control values can reduce the possibility of obtaining a local optimal solution. However, the computational cost may be exceptionally high for the case where the system has many state and control variables. Hence, the selection of appropriate control values and the multipass com- putation seem to be viable approaches to ensure the global optimality of an IDP solution while not increasing computation cost. The purpose of this paper is to explore the possibility of enhancing global convergence of the IDP solutions to optimal control problems. Specially, it is to use Sobol’s systematic approach (Bratley and Fox, 1988; Fox, 1986; Sobol, 1979) as an alternative way of generating allowable control values and to demonstrate the efficacy of obtaining the true optimal solution by using multipass IDP computation. Two examples are used to compare the schemes of generating control values and to compare the convergence and the com- putational efficiency of using single-pass and multipass IDP computations. Optimal Control Problem Consider a dynamic process described by the nonlin- ear differential equation * To whom all correspondence should be addressed. 2469 Ind. Eng. Chem. Res. 1998, 37, 2469-2478 S0888-5885(97)00629-5 CCC: $15.00 © 1998 American Chemical Society Published on Web 05/15/1998

Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems

  • Upload
    chyi

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems

Enhancement of the Global Convergence of Using Iterative DynamicProgramming To Solve Optimal Control Problems

Jeng-Shiaw Lin

Department of Chemical Engineering, National Cheng Kung University, Tainan 700, Taiwan

Chyi Hwang*

Department of Chemical Engineering, National Chung Cheng University, Chia-Yi 621, Taiwan

Although iterative dynamic programming (IDP) has been well recognized as a powerful methodfor finding the true optimal solution to a nonlinear dynamic optimization problem, the globaloptimality of the IDP solution is still not guaranteed completely. This paper explores enhancingthe global convergence of using iterative dynamic programming to solve optimal control problems.Two approaches are employed to enhance the possibility of obtaining the true optimum whilereducing the required computation efforts. One approach employs Sobol’s quasi-random sequencegenerator to generate allowable controls and the other utilizes multipass computation. Numericalexamples show that the use of multipass IDP computation with small numbers of state gridpoints and allowable control values can indeed enhance the possibility of obtaining a trueoptimum. This is particularly true when the allowable control values are generated by usingSobol’s quasi-random sequence generator.

Introduction

Recently, the iterative dynamic programming (IDP)method has been introduced by Luss and coworkers(Bojkov and Luus, 1992a,b; Luus, 1989, 1990a,b; Luusand Rosen, 1991) to solve continuous-time optimalcontrol problems where the systems are described bynonlinear differential equations. The IDP method isessentially based on treating control as time-stageoperation, gridding the state and control variables, andutilizing region contraction to overcome the state di-mensionality problem. In performing IDP computa-tions, the state grid points are generated by assigningdifferent values for control and at each state grid pointadmissible control values, which are chosen from auniform distribution or at random, are evaluated todetermine the optimal control associated with the stategrid point. By contracting the control region for eachtime stage after each cycle of dynamic programmingcomputation, the IDP finally obtains an optimal controlprofile. Due to the advantages of being easily imple-mented on a personal computer and having greaterpossibility to obtain a globally optimal solution than anonlinear programming method, the original IDP methodand some of the improved versions (Lin and Hwang,1996b; Hwang and Lin, 1998) have been successfullyapplied to solve various optimal control problems ofnonlinear processes (Bojkov and Luus, 1994a,b, 1996;Dadebo and McAuley, 1995b; Hartig and Keil, 1993a,1994; Hartig et al., 1996; Keil et al., 1996; Luus, 1990c,1991, 1993a-c; Luus and Bojkov, 1994; Luus et al.,1992; Luus and Galli, 1991; Mekarapiruk and Luus,1997), including time-delay systems (Dadebo and Luus,1992; Dadebo and McAuley, 1995a; Lin and Hwang,1996a) and systems with a large number of state andcontrol variables (Bojkov and Luus, 1992a,b, 1993, 1995;Hartig and Keil, 1993b; Luus, 1990b, 1993d; Luus andSmith, 1991).

It is noted that IDP is a heuristic method that doesnot guarantee finding the global optimum. In theliterature (Bojkov and Luus, 1993, 1994b, 1995, 1996;Hartig and Keil, 1993a; Luus and Bojkov, 1994; Luuset al., 1992; Mekarapiruk and Luus, 1997), several caseshave been reported that the IDP method gives localoptimal solutions. The possibility for the IDP methodto obtain a local optimum depends on many parameters,such as the region contraction factor, the number ofstate grid points, the number of allowable control values,the choice of admissible values for control, etc. Ingeneral, using a large region contraction factor or largenumbers of state grid points and control values canreduce the possibility of obtaining a local optimalsolution. However, the computational cost may beexceptionally high for the case where the system hasmany state and control variables. Hence, the selectionof appropriate control values and the multipass com-putation seem to be viable approaches to ensure theglobal optimality of an IDP solution while not increasingcomputation cost.

The purpose of this paper is to explore the possibilityof enhancing global convergence of the IDP solutionsto optimal control problems. Specially, it is to useSobol’s systematic approach (Bratley and Fox, 1988;Fox, 1986; Sobol, 1979) as an alternative way ofgenerating allowable control values and to demonstratethe efficacy of obtaining the true optimal solution byusing multipass IDP computation. Two examples areused to compare the schemes of generating controlvalues and to compare the convergence and the com-putational efficiency of using single-pass and multipassIDP computations.

Optimal Control Problem

Consider a dynamic process described by the nonlin-ear differential equation* To whom all correspondence should be addressed.

2469Ind. Eng. Chem. Res. 1998, 37, 2469-2478

S0888-5885(97)00629-5 CCC: $15.00 © 1998 American Chemical SocietyPublished on Web 05/15/1998

Page 2: Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems

where x ) [x1, x2, ..., xn]T is an n-state vector, u ) [u1,u2, ..., um]T is the m-control vector, and f is then-component vector-valued function. The optimal con-trol problem is to find the control policy u(t) boundedby

over the specified time interval 0 e t e tf, where tf isthe final time, such that the performance index

is minimized.To solve this problem by the IDP method, we ap-

proximate the control u(t) by the piecewise-constantcontrol profile

over Nt equal-length time stages, where tk ) ktf/Nt. Withthis control discretization, the problem is then to findthe control sequence u(k), k ) 1, 2, ..., Nt, bounded by

for the augmented dynamic system

such that the performance index

is minimized.

A Review of Iterative Dynamic Programming

In applying the IDP method to solve the optimalcontrol problem described by (5)-(7), it is required firstto divide the time interval [0, tf] of interest into Nt equal-length time stages. Then, for each optimization itera-tion of dynamic programming, it is required to (i) specifythe control region for each time stage, (ii) construct Nxstate grid points x(k, j), j ) 1, 2, ..., Nx, for all stagesbut the first time stage, whose state node has the initialcondition x(0), and (iii) assign Nc control vectors u(k,j), j ) 1, 2, ..., Nc, for each stage. After the continuousoptimization problem is discretized, the dynamic pro-gramming is then performed iteratively to determinethe optimal control sequence u*(k), k ) 1, 2, ..., Nt.

Before proceeding with IDP computations, the regionof the lth (1 e l e m) control component ul(t) at stage k(1 e k e Nt) is set to

where

In the above equations, ul0(k) and rl

1 denote the initialcentral control at stage k and the half-width of theregion of the lth control variable ul(t), respectively. Forthe subsequent iterations, the region for each controlcomponent at each stage is determined by the regioncontraction factor R, the optimal control sequence of theprevious iteration, and the clipping policy. Suppose thatthe (i - 1)th iteration of dynamic programming givesthe optimal control sequence ui-1(k) ) [u1

i-1(k), u2i-1(k),

..., umi-1(k)]T, k ) 1, 2, ..., Nt. Then, the region of the lth

control variable ul(t) at the ith dynamic programmingiteration is determined as follows:

where rli ) Rrl

i-1. According to (9a), the width of theregion for the lth control variable at stage k is given by

Once the control region for the ith iteration ofdynamic programming has been determined, we canconstruct the state grids by first choosing Nx controlsequences {uj(1), uj(2), ..., uj(Nt)}, j ) 1, 2, ..., Nx with

and then integrating state equation (6) with each ofthese control sequences to obtain state grid points x(k,j), j ) 1, 2, ..., Nx; k ) 2, 3, ..., Nt. Such an approach ofconstructing state grid points was proposed by Luus(1989). The Nc allowable controls ui(k, j), j ) 1, 2, ...,Nc for stage k at the ith iteration of dynamic program-ming optimization are taken values from the hyperrect-angle

The scheme of assigning allowable controls will bediscussed in detail in the next section.

Having set up the parameters for the ith iteration ofdynamic programming, the optimal control sequenceui(k), k ) 1, 2, ..., Nt can be obtained by the algorithmof reverse dynamic programming (Luus, 1990b; Hartigand Keil, 1993a; Hwang and Lin, 1998).

Generating Allowable Values for the ControlOptimization by IDP

The choice of allowable values for the control of eachstage plays a crucial role in dynamic programmingoptimization. In the IDP computation with Nc allowablecontrols and Nx state grid points for each stage, itrequires NcNt(Nx(Nt - 1)/2 + 1) times of state shootingsto complete a cycle of optimization if the usual reversedynamic programming (Luus, 1990b) is used. Here astate shooting means that of integrating the augmentedsystem equation (6) over the time interval of a stage.Usually, the number of state grid points, Nx, chosen foreach time stage is proportional to the number of

Ul,ki ) [ul

i(k), uj li(k)] (9a)

uli(k) ) max(ul, ul

i-1(k) - rli) (9b)

uj li(k) ) min(uj l, ul

i-1(k) + rli) (9c)

|Ul,ki | ) uj l

i(k) - uli(k) (10)

uj(k) ) ui(k) + j - 1Nx - 1

[|U1,ki |, |U2,k

i |, ..., |Um,ki |]T

k ) 1, 2, ..., Nt (11)

Uki ) U1,k

i × U2,ki × ... × Um,k

i (12)

dx(t)dt

) f(x(t), u(t), t) x(0) ) x0 (1)

ui e ui(t) e uj i i ) 1, 2, ..., m (2)

I ) Φ(x(tf), tf) + ∫0

tfφ(x(t), u(t), t) dt (3)

u(t) ) u(k) tk-1 e t < tk, k ) 1, 2, ..., Nt (4)

ui e ui(k) e uj i i ) 1, 2, ..., m (5)

dxdt

}ddt

)[xxn+1 ][f(x, u, t)φ(x, u, t) ], x̂(0) ) [x0

0 ] (6a)

u(t) ) u(k) tk-1 e t < tk (6b)

I ) Φ(x(tf), tf) + xn+1(tf) (7)

Ul,k1 ) [ul

1(k), uj l1(k)] (8a)

ul1(k) ) max(ul, ul

0(k) - rl1) (8b)

uj l1(k) ) min(uj l, ul

0(k) + rl1) (8c)

2470 Ind. Eng. Chem. Res., Vol. 37, No. 6, 1998

Page 3: Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems

allowable control values, Nc. Also, the number of stategrid points, Nx, is large enough to ensure the globalconvergence. As a result, the number of state shootingsincreases almost quadratically with Nc. However, sev-eral examples appeared in the literature as well as theexamples given in this work reveal the fact that verylittle may be gained by using too many allowablecontrols for each stage.

In the past development of the IDP computationalgorithms, two schemes of assigning allowable controlvalues have been used. If the number of controlvariables m is small, a uniform gridding of control spaceis preferable to the random distribution. Suppose thatat stage k, Ml + 1 values for the lth control variableul(t) are distributed uniformly over the control region[ul

i-1(k) - rli, ul

i-1(k) + rli] } [ul

i(k), uj li(k)]. Then there

are Nc ) ∏l)1m (Ml + 1) allowable control vectors [v1, v2,

..., vm]T, where

and jl ) 0, 1, ..., Ml; l ) 1, 2, ..., m. To include the centralcontrol sequence ui-1(k) of the previous iteration ofdynamic programming computation in the above set ofcontrol vectors so as to obtain a nonincreasing conver-gence of the performance index, the numbers Ml, l ) 1,2, ..., m, should be chosen as even integers.

On the other hand, if the number of control variablesm is quite large, it becomes difficult to use uniformly-distributed control values. In this case the allowablecontrol vectors for each time stage and each dynamicprogramming iteration are usually taken at randomfrom the allowable control domain (12), which is anm-dimensional hyperrectangle. Suppose that Nc allow-able control vectors are chosen for each stage and thecentral control ui-1(k) is to be included as one of theallowable control vectors, it is required first to generatea sequence of random numbers rν, ν ) 1, 2, ..., mNt-(Nc - 1), which are uniformly distributed in the interval[0, 1]. Then, allowable control vectors ui(k, j), whichlie in the control region (9), are obtained as

where

and ν ) (k - 1)(Nc - 1)m + m(j - 1) + l. It is notedthat, in the IDP computation using randomly-generatedallowable control values, the sequences of randomvariables generated are not repeated from stage to stageand from iteration to iteration.

In this paper, we suggest the use of a systematicapproach to generate allowable values of control for thedynamic programming optimization. The approach isbased on using Sobol’s quasi-random sequence generatorto generate a sequence of points that are distributedvery uniformly in a multidimensional hypercube. Thesepoints have been successfully used for a systematiccrude search, as starting points for a global searchalgorithm. For a detailed description of Sobol’s quasi-random sequence generator and its implementationalgorithm, the readers are referred to the papers by

Bratley and Fox (1988), Fox (1986), Sobol (1979),Homma and Saltelli (1995).

By using Sobol’s quasi-random sequence generator,we can obtain in sequence M sets of (Nc - 1) points inthe m-dimensional unit hypercube [0, 1]m. Let thesepoints be denoted by

where i ) 1, 2, ..., M. On the basis of these sets ofpoints, the allowable controls for each stage are assignedas follows:

where

µ is equal to 1 plus the remainder of dividing i by M.Here it should be noted that the points generated bySobol’s quasi-random sequence generator are determin-istic rather than stochastic. Also noted is that theallowable control patterns are repeated for every Miterations of dynamic programming computation.

Before leaving the section, it is noted that the valuesof allowable control generated by the uniform distribu-tion approach may violate the bounds (2). If a compo-nent of the generated allowable control vector u(k, l)does not reside in the feasible control region [u, uj ], asimple clipping technique (Luus 1989, 1990a-c) or amodified clipping technique (Hartig and Keil, 1993a) canbe used to shift the infeasible control component intothe allowable region.

Multipass IDP ComputationA pass of IDP computation consists of Ni iterations

of dynamic programming optimization with contractingthe control region by a factor R in each iteration. Theoptimal control sequence obtained at the current itera-tion will be used as the central control profile of the nextiteration. Similarly, in solving optimal control problemsby multipass IDP with a pass region contraction factorη, the optimal control sequence obtained in the currentpass of IDP optimization will be used as the centralcontrol profile of the next pass. Multiple passes of IDPcomputation have been used (Bojkov and Luus, 1992a;Dadebo and McAuley, 1995a; Luus and Galli, 1991;Luus and Rosen, 1991) to refine the piecewise-constantcontrol policy by doubling the number of time stages sothat it is comparable to the optimal continuous controlprofile. Recently, it has been found (Luus, 1993d; Luuset al., 1995) that several optimal control problems canbe effectively solved by multipass IDP computationusing only a single state grid for each stage. Therefore,it is worth examining the global convergence propertyof multipass IDP computation. For this purpose, weoutline in the following the procedure of multipass IDPcomputation.

Procedure of Multipass IDP Scheme(1) Let the control region size be initially given by r(1).

Select the parameters Np, Nt, and η, and the centralcontrol profile u(0)(k), k ) 1, 2, ..., Nt.

(2) Set the pass index p ) 1.

vl ) uli(k) + jl

2rli

Ml(13)

ui(k, j) ) {[v1, v2, ..., vm]T for j ) 1, ..., Nc - 1

ui-1(k) for j ) Nc(14)

ul ) uli(k) + rν(uj l

i(k) - uli(k))

pi,j ) [pi,j,1, pi,j,2, ..., pi,j,m]T j ) 1, 2, ..., Nc - 1(15)

ui(k, j) ) {[v1, v2, ..., vm]T for j ) 1, ..., Nc - 1

ui-1(k) for j ) Nc(16)

vl ) uli(k) + pµ,j,l(uj l

i(k) - uj li(k))

Ind. Eng. Chem. Res., Vol. 37, No. 6, 1998 2471

Page 4: Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems

(3) Select the parameters Ni, Nc, and R.(4) Set the initial control region size to p(p). Use

u(p-1)(k), k ) 1, 2, ..., Nt, as the central control profileand then use an existing IDP algorithm (Hwang andLin, 1998) to perform one pass of Ni iterations ofdynamic programming optimization and obtain theoptimal control sequence u(p)(k), k ) 1, 2, ..., Nt.

(5) Increment the pass index p by 1, and restore thecontrol region size to r(p) ) ηr(p-1).

(6) If p e Np, go to step (3).(7) Stop.

Illustrative ExamplesTo illustrate the enhancement of global convergence

of the IDP method by using a systematic approach toassign allowable controls and using multipass computa-tion, two examples are worked out. The improved IDPalgorithm of Hwang and Lin (1998) was invoked toperform dynamic optimization. The required stateshootings are performed with a fourth-order Runge-Kutta method with a step length h. All the numericalcomputations are carried out in double precision on anIBM RS/6000 workstation. Since the major computa-tion burden lies in the state shootings for evaluatingthe performance index, the computational efficiency willbe measured by the number of state shootings, Nss,actually performed in the computation.

Example 1. Consider an isothermal photochemicalcontinuous stirred tank reactor (CSTR) in which thefollowing five chemical reactions are occurring

where A, B, and C are the reactants, (CB) is anintermediate, and D, E, F, and G are the products. Thistypical chemical engineering system has been used foroptimal control studies by several authors (Jensen,1964; Lapidus and Luus, 1967; Rao and Luus, 1972;Luus, 1990b, 1991; Bojkov and Luus, 1993). The systemequations are given by Lapidus and Luus (1967):

where x1(t), x2(t), x3(t), x4(t), x5(t), x6(t), and x7(t) denotethe concentrations of species A, B, C, D, E, F, and G,respectively. The quantity

is the total feed rate, u4(t), u1(t), and u2(t) are the flow

rates of A, B, and C, respectively, and u3(t) is the squareroot of the light intensity. For the given initial state

and the control bounds

the optimal control objective is to maximize the perfor-mance index

by choosing the controls u1(t), u2(t), u3(t), and u4(t) inthe time interval 0 e t e tf.

As reported in the literature (Luus, 1990b), the useof single-pass IDP with Nt ) 11 time stages, Nx ) 27state grid points for each stage, and Nc ) 81 ()34)rectangular grids for allowable controls gave the optimalperformance index I ) 21.76. We solved the sameproblem by using single-pass IDP with the same numberof time stages and the same allowable controls but with15 state grid points, the optimal performance index isconverged to I ) 21.7575 in 29 iterations. The totalnumber of state shootings performed is Nss

t )1 963 764, which will be used as a comparison basis ofthe computational savings of the multipass IDP com-putation. In order to verify this result, we solved theproblem again by using single x-grid multipass IDPcomputation with the same number of time stages andthe same allowable controls. The values of the optimumperformance I is still converged to 21.7575 by 20 passesof 20-iteration IDP computations. Hence, in the follow-ing, the value of 21.7575 will be regarded as the trueoptimal solution for the case where Nt ) 11 is used.

To solve this problem by the IDP method with theallowable controls chosen randomly or systematically,we partitioned the time interval [0, 0.2] into Nt ) 11equal-length time stages and set the region contractionfactor R ) 0.85. The step length of numerical integra-tion used for each state shooting is h ) 1/110. Theinitial central control for each time stage is set as u0(k)) [10, 3, 2, 6]T, k ) 1, 2, ..., Nt, and the initial half-width of the control region is given as r1 ) } [r1

1, r21, r3

1,r4

1]T ) [10, 3, 2, 6]T. The converged performanceindices by 30-iteration IDP computations are listed inTable 1 for using different numbers of state grid pointsand allowable controls, and using different approachesto generating allowable controls. As can be seen fromthe table, single-pass IDP computations with allowablecontrols chosen either randomly or systematically failto converge to the true optimal solution. For each setof IDP parameters, none of ten runs with randomlychosen allowable controls gave the true optimal solution.Furthermore, the single-pass IDP computation usinglarge numbers of allowable controls and state grid

A + B f 2DC + B f (CB)(CB) + B f 2E

E + D f 2FF + A f 2G

x̆1(t) ) u4(t) - q(t) x1(t) - 17.6x1(t) x2(t) -23x1(t) x6(t) u3(t)

x̆2(t) ) u1(t) - q(t) x2(t) - 17.6x1(t) x2(t) -146x2(t) x3(t)

x̆3(t) ) u2(t) - q(t) x3(t) - 73x2(t) x3(t)

x̆4(t) ) -q(t) x4(t) + 35.2x1(t) x2(t) - 51.3x4(t) x5(t)

x̆5(t) ) -q(t) x5(t) + 219x2(t) x3(t) - 51.3x4(t) x5(t)

x̆6(t) ) -q(t) x6(t) + 102.6x4(t) x5(t) -23x1(t) x6(t) u3(t)

x̆7(t) ) -q(t) x7(t) + 46x1(t) x6(t) u3(t)

q(t) ) u1(t) + u2(t) + u4(t)

x(0) ) [0.1883, 0.2507, 0.0467, 0.0899, 0.1804,0.1394, 0.1046]T

0 e u1(t) e 20

0 e u2(t) e 6

0 e u3(t) e 4

0 e u4(t) e 20

I ) ∫0

tf [5.8(q(t) x1(t) - u4(t)) - 3.7u1(t) - 4.1u2(t) +

q(t)(23x4(t) + 11x5(t) + 28x6(t) + 35x7(t)) - 5u32(t) -

0.099] dt

2472 Ind. Eng. Chem. Res., Vol. 37, No. 6, 1998

Page 5: Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems

points, which requires an excessively large number ofstate shootings, still give the local solution rather thanthe global solution. In the most runs, IDP with allow-able controls generated by Sobol’s quasi-random distri-bution gave better results than those generated by arandom distribution.

On the basis of the results in Table 1, we infer thatthe multipass IDP may be an effective alternative toobtain the true optimal solution. To verify this, wesolved the problem using multi-pass IDP computationwith a few state and control grid points. The conver-gence of the performance index versus the pass numberfor multipass 20-iteration IDP computations usingSobol’s systematically-generated allowable controls isshown in Figure 1. Table 2 shows the convergence ofthe performance index by multipass 20-iteration IDPcomputations with the allowable controls generated bySobol’s quasi-random sequence. Table 3 shows theconvergence of the performance index by multipass 20-iteration IDP computation with allowable control gener-ated by random distribution. It is seen from Table 2that, with the allowable controls generated by Sobol’squasi-random sequence, all sets of multipass IDP com-putation gave global optimal solutions. In contrast, ascan be seen from Table 3, not all multipass IDPcomputations using randomly-generated allowable con-trols gave the true optimal solution. In other words,Sobol’s systematic distribution of points is a robustapproach to generate allowable controls to ensure theglobal convergence of the multipass IDP computation.In order to compare the computation burdens of single-pass and multipass IDP computation, the actual num-bers of state shootings performed in each run of IDPcomputation with the allowable control generated bySobol’s quasi-random sequence is an efficient and ef-fective choice for obtaining the global optimal solution.

We are now in a position to examine in greater detailthe effect of the number of allowable controls Nc andthe contraction factor R on the convergence of themultipass IDP computation. Let us first investigate theinfluence of the number of allowable controls Nc on the

convergence of the performance index to within 0.03%of the optimal performance index I ) 339.10. In orderto compare with the results obtained by utilizingrandomly-chosen values for allowable controls (Bojkovand Luus, 1993), we solved this problem with tf ) 4.0,Nt ) 20, Ni ) 20, R ) 0.85, h ) 0.01, and the same initialcontrol policy and initial region size as before. In actualcomputations, multipass IDP computations were per-formed if the number of allowable controls Nc is so smallas not to obtain a convergent performance index within0.03% of the optimum by a one-pass IDP computation.In performing the multipass IDP computation, theregion contraction factor η between passes is set as η )R, and 15 IDP passes were carried out. The cumulative

Table 1. Comparison of Convergence of Performance Indices Obtained by Single-Pass IDP Computations withAllowable Controls Generated by Sobol’s Quasi-Random Sequence and Random Distribution for Example 1a

Nc ) 40 Nc ) 60

allowable controls Nx ) 9 Nx ) 15 Nx ) 21 Nx ) 9 Nx ) 15 Nx ) 21

random 1 I 21.676 82 21.432 27 21.490 95 21.500 35 21.501 01 21.651 83Nss 600 520 977 240 1 337 160 909 660 1 482 600 2 054 400

random 2 I 21.628 28 21.574 94 21.451 03 21.641 68 21.668 84 21.577 37Nss 600 880 975 120 1 346 360 903 660 1 483 200 2 050 920

random 3 I 21.559 91 21.603 51 21.469 32 21.609 35 21.605 39 21.731 67Nss 602 040 975 760 1 341 840 906 600 1 487 580 2 047 320

random 4 I 21.610 08 21.604 90 21.683 75 21.617 06 21.706 40 21.709 77Nss 597 600 980 400 1 338 840 909 600 1 485 000 2 055 240

random 5 I 21.525 43 21.499 09 21.490 87 21.707 06 21.593 36 21.643 88Nss 601 200 976 480 1 339 200 906 720 1 484 100 2 057 820

random 6 I 21.711 68 21.642 03 21.602 91 21.535 84 21.722 60 21.511 81Nss 600 440 979 760 1 343 760 905 400 1 491 000 2 043 840

random 7 I 21.532 44 21.607 40 21.544 03 21.711 80 21.543 68 21.719 82Nss 599 680 977 160 1 343 600 904 440 1 486 200 2 045 820

random 8 I 21.413 99 21.558 19 21.664 57 21.678 41 21.722 64 21.538 91Nss 601 240 975 040 1 341 760 904 800 1 482 000 2 053 980

random 9 I 21.666 26 21.482 24 21.700 24 21.540 22 21.609 03 21.648 57Nss 600 480 981 040 1 338 000 906 600 1 483 320 2 054 520

random 10 I 21.542 95 21.405 47 21.696 12 21.511 78 21.477 46 21.742 17Nss 600 520 978 000 1 342 640 907 800 1 488 000 2 053 560

Sobol’s quasi-random I 21.667 71 21.756 56b 21.752 24c 21.619 90 21.611 45 21.725 32Nss 607 200 996 000 1 381 640 910 800 1 500 000 2 088 000

a I: performance index. Nss: number of state shootings performed. b Performance index within 0.01% of the optimum. c Performanceindex within 0.03% of the optimum.

Figure 1. Convergence of the performance index by the multipassIDP algorithm with Sobol’s quasi-random distribution of allowablecontrols for example 1.

Ind. Eng. Chem. Res., Vol. 37, No. 6, 1998 2473

Page 6: Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems

number of iterations Iter and the number of cumulativestate shootings N*ss are recorded in Table 4. It can beobserved from the table that (i) the global convergenceof the IDP solutions appears to be independent of thenumber of state grid points Nx used; (ii) the totalnumber of state shootings performed increases as thenumber of state grid points is increased; (iii) the totalnumber of state shootings required to give a convergedsolution is not necessarily increased with the increaseof the number of allowable controls; (iv) the convergencedoes not have a significant improvement when thenumber of allowable controls Nc is greater than 10; (v)when Nc is less than 8, the multipass IDP is requiredto ensure the solution convergence. It is interesting to

note that multipass IDP with a single x-grid requiresfewer state shootings than that with multiple x-grids.In the present case, single x-grid multipass IDP Nc ) 8gives a converged solution while requiring the least totalnumber of state shootings N*ss ) 31 920.

Note that Bojkov and Luus (1993) have applied one-pass IDP with Nc ) 16 randomly-chosen values forcontrol and Nx ) 3 state grid points to obtain aconvergence to within 0.03% of the optimum in 16iterations. The cumulative number of state shootingswas N*ss ) 148 480. Hence, the percentage of reductionin the total number of state shootings by using theproposed systematic approach to assign allowable con-trols can be as high as

Table 2. Convergence of the Performance Indices Obtained by Multipass IDP Computations with Allowable ControlGenerated by Sobol’s Quasi-Random Sequence for Example 1a

Nx ) 1 Nx ) 3

pass Nc ) 10 Nc ) 15 Nc ) 20 Nc ) 10 Nc ) 15 Nc ) 20

1 I 21.2881 21.5658 21.6406 21.2586 21.5445 21.4077Nss 13 200 19 800 26 400 35 100 52 800 70 400

2 I 21.5348 21.7219 21.7469 21.4745 21.6352 21.6180Nss 13 200 19 800 26 400 35 200 52 800 70 400

3 I 21.6258 21.7566c 21.7503 21.5613 21.6846 21.6773Nss 13 200 19 800 26 400 35 200 52 800 70 400

4 I 21.6572 21.7572c 21.7522d 21.5888 21.7494 21.7550d

Nss 13 200 19 800 26 400 35 200 52 800 70 4005 I 21.7043 21.7574c 21.7537d 21.6243 21.7550d 21.7562c

Nss 13 200 19 800 26 400 35 200 52 800 70 4006 I 21.7475 21.7574c 21.7547d 21.6542 21.7565c 21.7567c

Nss 13 200 19 800 26 400 35 200 52 800 70 4007 I 21.7538d 21.7574c 21.7563c 21.6904 21.7573 21.7571c

Nss 13 200 19 800 26 400 35 200 52 800 70 4008 I 21.7547d 21.7575b 21.7569c 21.7320 21.7575b 21.7573c

Nss 13 200 19 800 26 400 35 200 52 800 70 4009 I 21.7558c 21.7570c 21.7491 21.7574c

Nss 13 200 26 400 35 200 70 40010 I 21.7564c 21.7572c 21.7532d 21.7575b

Nss 13 200 26 400 35 200 70 40011 I 21.7568c 21.7574c 21.7554c

Nss 13 200 26 400 35 20012 I 21.7570c 21.7575b 21.7563c

Nss 13 200 26 400 35 20013 I 21.7572c 21.7569c

Nss 13 200 35 20014 I 21.7573c 21.7573c

Nss 13 200 35 20015 I 21.7574c 21.7575b

Nss 13 200 35 20016 I 21.7575b

Nss 13 200E 89.25% 91.93% 83.87% 73.12% 78.49% 64.15%

a I: performance index. Nss: number of state shootings performed. E percentage of reduction in the number of state shootings, E )100% × (Nss

t - N*ss)/Nsst , Nss

t ) 1 963 764, N*ss ) cumulative number of state shootings. b Optimal performance. c Performance index within0.01% of the optimum. d Performance index within 0.03% of the optimum.

Table 3. Number of Passes Performed by the Multipass IDP Computations with Randomly-Chosen Allowable ControlsTo Obtain Optimal Performance Index 21.7575 for Example 1a

Nx ) 1 Nx ) 3

run Nc ) 10 Nc ) 15 Nc ) 20 Nc ) 10 Nc ) 15 Nc ) 20

1 6 (21.5118) 10 12 6 (21.4595)2 15 9 7 13 8 63 (21.7574) (21.4595) (21.5389) 10 10 44 13 11 9 10 9 65 9 (21.5383) 12 12 8 96 8 (21.4903) 9 7 (21.5385) 87 (21.7569) 13 14 12 10 78 15 10 6 14 8 89 10 12 10 6 8 6

10 (21.7572) 13 10 13 7 8a The parenthesized figures represent the converged performance indices, which are not the true optimum, obtained by 20 passes of

IDP computations.

2474 Ind. Eng. Chem. Res., Vol. 37, No. 6, 1998

Page 7: Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems

The convergence of the IDP solution also depends onthe contraction factor R used. In one-pass IDP compu-tation, the smaller contraction factor R is used, thesmaller number of state shootings is required, and thehigher possibility of a local optimum will be obtained.To investigate the effect of the contraction factor R onthe convergence in multipass IDP computation, wesolved the problem by using single x-grid multipass IDPwith Nc ) 6, 8, 10, and 15. In these computations, thecontraction factor η between passes has the same valueas R. We compare in Figure 2 the number of cumulativestate shootings required by multipass IDP with thesystematic control assignment and by one-pass IDP withthe random control assignment for various values ofcontraction factor R. In this figure, the number of stateshootings required for each one-pass IDP computationis calculated by multiplying NcNt(1 + Nx(Nt - 1)/2) withthe number of iterations performed, which is quotedfrom Figure 5 in the paper by Bojkov and Luus (1993).It is noted that for R < 0.8, one-pass IDP with randomcontrol assignment did not yield convergence for Nc )11, 16, and 21 while multipass IDP with systematiccontrol assignment yields convergence, even if a smallcontraction factor R ) 0.65 and a small of number ofallowable controls Nc ) 6 were used. Hence, as can beseen from Figure 2, multipass IDP with systematiccontrol assignment is more robust to obtain globaloptimum while requiring less computational effort thanone-pass IDP with random control assignment.

Example 2. Consider the nth-order linear time-invarient dynamic system with n inputs previouslypresented by Nagurka et al. (1991):

where x [x1, ..., xn]T is the state vector, u ) [u1, ..., un]T,and the system matrix A is given by

The problem here is to find the optimal policy u(t) inthe time interval 0 e t e 1 that minimizes theperformance index

Recently, Bojkov and Luus (1992a,b) have solved thisproblem for the cases of n ) 2, 6, 8, and 20 with therandomly-chosen values for control. In particularly,Bojkov and Luus (1992a) found the optimal controlpolicy for the eighth-order by using a three-pass IDPmethod (Luus and Galli, 1991). In their multipass IDPcomputations, the number of time stages was doubledafter each pass and the control policy at the end of thepass was used as the initial policy for the following pass.The IDP parameters used to solve the problem are asfollows: the region contraction factor R ) 0.85, thenumber of state grid points Nx ) 11, the number ofrandom allowable values for control Nc ) 100 and theintegration step size of h ) 0.05 for the fifth-orderRunge-Kutta method. The initial values for controlpolicy are chosen as u0(k) ) [-2, -3, -4, -6, -7, -8,-8, -1]T, k ) 1, 2, ..., Nt, and initial region sizes aregiven as r ) [1, 1, 1, 1, 1, 1, 1, 1]T. Moreover, theyobtained the minimum performance index I ) 373.17for the case where the time interval [0, 1] was dividedinto Nt ) 20 equal-length time stages. It is noted thatthe number of state shootings for single-pass IDP withNt ) 10, Nx ) 11, Nc ) 100, and Ni ) 30 is Nss

t )1 515 000, which will be used as the comparison basisfor the computational savings of the multipass IDPcomputation.

Table 4. Effect of the Parameters Nc and Nx Used inMultipass IDP Computations (with tf ) 4.0, Nt ) 20 andSobol’s Quasi-Random Allowable Controls) on theConvergence of the Performance Index to within the0.03% of the True Optimum for Example 1a

Nx ) 1 Nx ) 3 Nx ) 5 Nx ) 7

Nc Iter N*ss Iter N*ss Iter N*ss Iter N*ss

3 136 85 680 155 205 782 119 167 553 136 213 5374 40 33 600 55 113 440 55 123 712 55 44 8525 59 61 950 54 143 555 72 228 955 72 265 6106 78 98 280 78 270 420 77 336 468 77 398 1247 58 85 260 58 228 900 58 315 595 39 258 3638 19 31 920 19 89 680 19 129 352 19 156 0649 20 37 800 37 196 470 37 310 698 37 393 129

10 19 39 900 20 118 000 20 177 080 20 226 84015 19 59 850 19 168 150 19 276 450 20 393 31520 19 79 800 19 224 200 18 349 200 19 497 80025 18 94 500 18 265 500 17 401 225 17 562 95030 19 119 700 18 318 600 18 523 800 18 728 43035 20 147 000 17 351 050 19 645 050 18 850 50040 19 159 600 17 401 200 18 698 400 17 918 000

a Iter: cumulative number of iterations. b N*ss: cumulativenumber of state shootings.

E )N*ss - Nss

N*ss× 100% ) 78.50%

x3 (t) ) Ax(t) + u(t) x(0) ) [1, 2, ..., n]T

Figure 2. Effect of the contraction factor R on the number of stateshootings required to yield convergence with 0.03% of the optimumfor example 1 with Nt ) 20 and tf ) 4.

A ) [0 1 0 · · · 00 0 1 · · · 0···

···

···

· · ····0 0 0 · · · 1

1 -2 ···

· · · (-1)n+1n]

I ) 10xT(1) x(1) + ∫0

1(xT(t) + uT(t) u(t)) dt

Ind. Eng. Chem. Res., Vol. 37, No. 6, 1998 2475

Page 8: Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems

Table 5. Comparison of the Convergence of Performance Indices by Single-Pass IDP Computations with AllowableControls Generated by Sobol’s Quasi-Random Sequence and Random Distribution for Example 2a

Nx ) 7 Nx ) 9

allowable controls Nc ) 50 Nc ) 75 Nc ) 100 Nc ) 50 Nc ) 75 Nc ) 100

random 1 I 373.993 373.778 373.716 373.869 373.803 373.727Nss 319 700 485 475 650 000 410 950 617 775 827 300

random 2 I 373.890 373.798 373.692c 374.032 373.824 373.718Nss 318 700 484 800 650 000 407 950 620 475 829 100

random 3 I 373.853 373.726 373.732 374.196 373.774 373.715Nss 322 750 486 150 649 100 408 750 621 825 829 100

random 4 I 373.843 373.830 373.722 373.788 373.805 373.712Nss 321 850 486 150 650 000 407 800 620 475 829 100

random 5 I 373.875 373.810 373.738 373.966 373.789 373.758Nss 323 200 486 150 649 100 407 800 617 175 828 200

random 6 I 373.865 373.714 373.767 373.996 373.751 373.728Nss 324 100 485 475 649 100 409 600 619 125 829 100

random 7 I 373.848 373.747 373.719 373.828 373.743 373.747Nss 319 050 487 500 650 000 407 400 619 800 825 500

random 8 I 373.907 373.764 373.785 373.843 373.775 373.711Nss 321 400 485 475 646 500 409 150 621 150 828 200

random 9 I 373.786 373.762 373.765 373.838 373.761 373.682c

Nss 322 300 485 475 648 200 405 650 621 825 827 300random 10 I 373.857 373.733 373.710 373.876 373.765 373.710

Nss 322 350 481 200 647 300 409 200 619 800 829 100Sobol’s quasi-random I 373.701c 373.648c 373.683c 373.665c 373.625b 373.684c

Nss 317 500 426 250 635 000 385 000 608 900 805 000a I: performance index. Nss: number of state shootings performed. b Performance index within 0.01% of the optimum. c Performance

index within 0.03% of the optimum.

Table 6. Converged Performance Indices and Number of State Shootings Performed by the Multipass IDPComputation with Sobol’s Quasi-Random Allowable Controls for Example 2a

Nx ) 1 Nx ) 3

pass Nc ) 10 Nc ) 15 Nc ) 10 Nc ) 10 Nc ) 15 Nc ) 10

1 I 403.079 381.857 378.540 407.026 380.077 379.508Nss 11 000 16 500 22 000 26 300 43 500 58 000

2 I 379.798 375.523 374.365 379.869 374.870 374.382Nss 11 000 16 500 22 000 26 660 43 500 58 000

3 I 375.168 374.336 373.835 375.092 373.940 373.860Nss 11 000 16 500 22 000 26 300 43 500 58 000

4 I 374.337 373.815 373.733 374.161 373.762 373.696d

Nss 11 000 16 500 22 000 26 300 43 500 58 0005 I 373.952 373.723 373.678d 373.849 373.690d 373.670d

Nss 11 000 16 500 22 000 26 300 43 500 58 0006 I 373.763 373.664d 373.653d 373.742 373.651d 373.648d

Nss 11 000 16 500 22 000 26 300 43 500 58 0007 I 373.695d 373.641d 373.636d 373.682d 373.628c 373.635d

Nss 11 000 16 500 22 000 26 300 43 500 58 0008 I 373.648d 373.623c 373.615c 373.652d 373.620c 373.619c

Nss 11 000 16 500 22 000 26 300 43 500 58 0009 I 373.639d 373.614c 373.608c 373.639d 373.610c 373.609c

Nss 11 000 16 500 22 000 26 300 43 500 58 00010 I 373.633d 373.606c 373.605c 373.626c 373.604c 373.603c

Nss 11 000 16 500 22 000 26 300 43 500 58 00011 I 373.613c 373.602c 373.604c 373.621c 373.601c 373.600c

Nss 11 000 16 500 22 000 26 300 43 500 58 00012 I 373.608c 373.599c 373.600c 373.613c 373.598c 373.599c

Nss 11 000 16 500 22 000 26 660 43 500 58 00013 I 373.602c 373.596c 373.598c 373.608c 373.597c 373.597c

Nss 11 000 16 500 22 000 26 300 43 500 58 00014 I 373.600c 373.595c 373.596c 373.604c 373.596c 373.596c

Nss 11 000 16 500 22 000 26 300 43 500 58 00015 I 373.598c 373.595b 373.595b 373.601c 373.595b 373.595b

Nss 11 000 16 500 22 000 26 300 43 500 58 00016 I 373.597c 373.599c

Nss 11 000 26 30017 I 373.596c 373.597c

Nss 11 000 26 30018 I 373.595b 373.596c

Nss 11 000 26 66019 I 373.595b

Nss 26 300E 86.93% 83.66% 78.22% 66.95% 56.93% 42.57%

a I: performance index. Nss: number of state shootings performed. E: percentage of reduction in the number of state shootings, E )100 × (Nss

t - N*ss)/Nsst , Nss

t ) 1 515 000, N*ss ) cumulative number of state shootings. b Optimal peformance index. c Performance indexwithin 0.01% of the optimum. d Performance index within 0.03% of the optimum.

2476 Ind. Eng. Chem. Res., Vol. 37, No. 6, 1998

Page 9: Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems

Here, we solved the problem with n ) 8 to demon-strate the efficacy and robustness of the multipass IDPcomputation with allowable controls generated by Sobol’squasi-random sequence in obtaining the global optimalsolution. First, we examine the convergence propertyof the single-pass IDP computation with randomly-chosen allowable control values. For these purposes,the following IDP parameters were used: Nt ) 10, Nc) 50, 75, and 100, Nx ) 7, 9, h ) 0.01. The initialcontrol policy is chosen as uc(k) ) [-5, -5, -5, -5, -5,-5, -5, -5]T, k ) 1, 2, ..., Nt, and the initial half regionsize is given by r0 ) [5, 5, 5, 5, 5, 5, 5, 5]T. Theconverged performance indices I for Ni ) 30 and thenumber of state shootings Nss are recorded in Table 5for 60 runs. With the same parameters but usingSobol’s quasi-random sequence to generate allowablecontrol values, we obtained the converged performanceindices shown in the bottom row of Table 5. It isobserved from Table 5 that none of these runs gave thetrue optimal solution. However, all single-pass IDPcomputations with systematically-chosen allowable con-trols were able to obtain a performance index within0.03% of the optimum values (373.595) while only 2 outof 60 runs using randomly-chosen controls convergedto the same performance level. Moreover, the single-pass IDP computation using randomly-chosen allowablecontrols fails to obtain the true optimal solution evenby the number of allowable controls is chosen to besufficiently large.

Now let us solve this problem using multipass IDPcomputation with the allowable controls generated bySobol’s quasi-random sequence. We solved the problemby the multipass IDP scheme with R ) 0.85, η ) 0.85,Nt ) 10, Ni ) 20, and h ) 0.01 for various combinationsof the number of allowable controls Nc ) 10, 15, 20 andthe number of state grids Nx ) 1, 3. The convergenceproperties obtained and the number of state shootingsperformed in these multipass IDP computations arelisted in Table 6. It is noted that all multipass IDPcomputations gave the true optimal solutions, even forthe cases where the single state grid point and a smallnumber of allowable controls were used. As comparedwith the results shown in Table 5, we know that themultipass IDP computation along with the allowablecontrol being generated from Sobol’s quasi-randomsequence is a robust and efficient approach to obtainthe global solution to the optimal control problem bythe IDP computation.

Conclusions

Iterative dynamic programming using uniformly-distributed allowable controls is a reliable method offinding optimal piecewise constant control policy fornonlinear optimal control problems. However, therequired computation burden is excessively high as thenumber of control variables is large. To avoid thecombinatorial explosion of the number of allowablecontrols by using rectangular gridding of the feasiblecontrol space, it has been suggested in the literature touse a prescribed number of the allowable controlsgenerated by using the random numbers with a uniformdistribution in the interval (0, 1). In this paper, we haveshown by numerical examples that the single-pass IDPcomputation with the randomly-distributed allowablecontrols is not so robust to obtain the true optimalsolution. In order to enhance the global convergence ofthe IDP method without incrasing the computation load,

we propose the use of Sobol’s quasi-random sequenceto generate allowable controls and the use of multipassIDP computation with very few state grid points and asmall number of allowable controls. Simulation resultsshow that the multipass IDP method with allowablecontrols generated by Sobol’s quasi-random sequence isindeed a robust and efficient means of obtaining the trueoptimal solution to the optimal control problems havinga large number of control variables.

Acknowledgment

This work was supported by the National ScienceCouncil of the Republic of China under Grant NSC-87-2214-E-194-001.

Nomenclature

E ) reduction ratio in the number of state shootingsf(‚) ) vector-valued functionI ) performance index defined in (3)IDP ) iterative dynamic programmingIter ) cumulative number of iterationsNLP ) nonlinear programmingNc ) number of control grid pointsNi ) number of iterationsNp ) number of passesNss ) number of state shootingsN*ss ) cumulative number of state shootingsNt ) number of equilength time stagesNx ) number of state grid pointsri ) half-width of the control region vector for the ith

iterationtf ) final timeu(k) ) control used for stage ku(k, j) ) jth allowable control for stage ku(t) ) m-component control vectorui(k) ) central control vector used for stage k at the ith

iterationu ) lower bound of the control vector u(t)uj ) upper bound of the control vector u(t)Ul,k

i ) region for the lth component of control for stage k atthe ith iteration

Uki ) m-dimensional hyperrectangle defined in (12)

x(k, j) ) jth state grid point for stage kx(t) ) n-component state vectorx̂(t) ) augmented state vector in (7a)x0 ) initial state vector

Greek Letters

R ) region contraction factor between iterationsη ) region contraction factor between passesΦ(‚) ) scalar function defined in (3)φ(‚) ) scalar function defined in (3)

Suffixes (Subscripts/Superscripts)

a ) optimal performance indexb ) performance index within 0.01% of the optimumc ) performance index within 0.03% of the optimumi ) component i, ith elementT ) matrix transpose* ) optimal value

Literature Cited

Bojkov, B.; Luus, R. Extension of Iterative Dynamic Programmingto High-Dimensional Systems by using Randomly ChosenValues for Control. Proc. Am. Control Conf. 1992a, 194.

Bojkov, B.; Luus, R. Use of Random Admissible Values for Controlin Iterative Dynamic Programming. Ind. Eng. Chem. Res.1992b, 31, 1308.

Ind. Eng. Chem. Res., Vol. 37, No. 6, 1998 2477

Page 10: Enhancement of the Global Convergence of Using Iterative Dynamic Programming To Solve Optimal Control Problems

Bojkov, B.; Luus, R. Evaluation of the Parameters Used inIterative Dynamic Programming. Can. J. Chem. Eng. 1993, 71,451.

Bojkov, B.; Luus, R. Application of Iterative Dynamic Program-ming to Time Optimal Control. Trans. Inst. Chem. Eng. 1994a,72A, 72.

Bojkov, B.; Luus, R. Time Optimal Control by Iterative DynamicProgramming. Ind. Eng. Chem. Res. 1994b, 33, 1486.

Bojkov, B.; Luus, R. Time Optimal Control of High DimensionalSystems by Iterative Dynamic Programming. Can. J. Chem.Eng. 1995, 73, 380.

Bojkov, B.; Luus, R. Optimal Control of Nonlinear Systems withUnspecified Final Times. Chem. Eng. Sci. 1996, 51 (6), 905.

Bratley, P.; Fox, B. L. Algorithm 659: Implementing Sobol’sQuasirandom Sequence Generator. ACM Trans. Math. Soft.1988, 14 (1), 88.

Dadebo, S.; Luus, R. Optimal Control of Time-Delay Systems byDynamic Programming. Opt. Control App. Methods 1992, 13,29.

Dadebo, S. A.; McAuley, K. B. Iterative Dynamic Programmingfor Minimum Energy Control Problems with Time Delay. Opt.Control App. Methods 1995a, 16, 217.

Dadebo, S. A.; McAuley, K. B. A Simultaneous Iterative SolutionTechnique for Time-Optimal Control using Dynamic Program-ming. Ind. Eng. Chem. Res. 1995b, 34, 2077.

de Tremblay, M.; Luus, R. Optimization of Non-Steady-StateOperation of Reactors. Can. J. Chem. Eng. 1989, 67, 494.

Fox, B. L. Algorithm 647: Implementation and Relative Efficiencyof Quasirandom Sequence Generators. ACM Trans. Math. Soft.1986, 12 (4), 362.

Hartig, F.; Keil, F. J. A Modified Algorithm of Iterative DynamicProgramming. Hung. J. Ind. Chem. 1993a, 21, 101.

Hartig, F.; Keil, F. J. Large-scale Spherical Fixed Bed Reactors:Modeling and Optimization. Ind. Eng. Chem. Res. 1993b, 32,424.

Hartig, F.; Keil, F. J. Global Optimization of Quench Reactors byIterative Dynamic Programming. Hung. J. Ind. Chem. 1994,22, 233.

Hartig, F.; Keil, F. J.; Kafarov, V. V. Optimization of ComplexReactions by Mixed-Integer Iterative Dynamic Programming.Theoretical Foundations Chem. Eng. 1996, 30 (1), 50.

Homma, T.; Saltelli, A. Use of Sobol’s Quasirandom SequenceGenerator for Integration of Modified Uncertainty ImportanceMeasure. J. Nucl. Sci. Technol. 1995, 32 (11), 1164.

Hwang, C.; Lin, J. S. An Improved Computational Scheme forSolving Dynamic Optimization Problem with Iterative DynamicProgramming. Submitted to Chin. Inst. Chem. Eng. 1998.

Jensen, T. Dynamic Control of Large Dimension NonlinearChemical Processes. Ph.D. Dissertation, University of Princeton,1964.

Keil, F. J.; Stoyanov, S.; Chunova, E. Optimization of FerriteProduction by Fuzzy Iterative Dynamic Programming. Hung.J. Ind. Chem. 1996, 24, 309.

Lapidus, L.; Luus, R. Optimal Control of Engineering Process;Blaisdell: Waltham, MA, 1967.

Lin, J. S.; Hwang, C. Optimal Control of Time-Delay Systems byForward Iterative Dynamic Programming. Ind. Eng. Chem. Res.1996a, 35 (8), 2795.

Lin, J. S.; Hwang, C. A Forward Iterative Dynamic ProgrammingTechnique for Optimal Control of Nonlinear Dynamic Systems.J. Chinese Inst. Chem. Engrs. 1996b, 27, 477.

Luus, R. Optimal Control by Dynamic Programming using Acces-sible Grid Points and Region Reduction. Hung. J. Ind. Chem.1989, 17, 523.

Luus, R. Optimal Control by Dynamic Programming using Sys-tematic Reduction in Grid Size. Int. J. Control 1990a, 51, 995.

Luus, R. Application of Dynamic Programming to High-Dimen-sional Nonlinear Optimal Control Problems. Int. J. Control1990b, 52, 239.

Luus, R. Application of Dynamic Programming to SingularOptimal Control Problems. Proc. Am. Control Conf. 1990c, 2932.

Luus, R. Effect of the Choice of Final time in Optimal Control ofNonlinear Systems. Can. J. Chem. Eng. 1991, 69, 144.

Luus, R. Piecewise Linear Continuous Optimal Control by Itera-tive Dynamic Programming. Ind. Eng. Chem. Res. 1993a, 32,859.

Luus, R. Application of Dynamic Programming to Differential-Algebraic Process Systems. Comput. Chem. Eng. 1993b, 17 (4),373.

Luus, R. Optimization of Fed-Batch Fermentors by IterativeDynamic Programming. Biotechnol. Bioeng. 1993c, 41, 599.

Luus, R. Application of Iterative Dynamic Programming to veryHigh-Dimensional Systems. Hung. J. Ind. Chem. 1993d, 21,243.

Luus, R.; Bojkov, B. Global Optimization of the BifunctionalCatalyst Problem. Can. J. Chem. Eng. 1994, 72, 160.

Luus, R.; Galli, M. Multiplicity of Solutions in using DynamicProgramming for Optimal Control. Hung. J. Ind. Chem. 1991,19, 55.

Luus, R.; Rosen, O. Application of Dynamic Programming to FinalState Constrained Optimal Control Problems. Ind. Eng. Chem.Res. 1991, 30, 1525.

Luus, R.; Smith, S. G. Application of Dynamic Programming toHigh-Dimensional Systems Described by Difference Equations.Chem. Eng. Technol. 1991, 14, 122.

Luus, R.; Dittrich, J.; Keil, F. J. Multiplicity of Solutions in theOptimization of a Bifunctional Catalyst Blend in a TubularReactor. Can. J. Chem. Eng. 1992, 70, 780.

Luus, R.; Zhang, X.; Hartig, F.; Keil, F. J. Use of Piecewise LinearContinuous Optimal Control for Time-Delay Systems. Ind. Eng.Chem. Res. 1995, 34, 4136.

Mekarapiruk, W.; Luus, R. Optimal Control of Inequality StateConstrained Systems. Ind. Eng. Chem. Res. 1997, 36, 1686.

Nagurka, M.; Wang, S.; Yen, V. Solving Linear Quadratic OptimalControl Problems by Chebyshev-Based State parametrization.Proc. Am. Control Conf. 1991, 104.

Rao, S. N.; Luus, R. Evaluation and Improvement of Control VectorIteration Procedures for Optimal Control. Can. J. Chem. Eng.1972, 50, 777.

Sobol, I. M. On the Systematic Search in a Hypercube. SIAM J.Numer. Anal. 1979, 16 (5), 790.

Received for review September 8, 1997Revised manuscript received March 23, 1998

Accepted March 24, 1998

IE970629J

2478 Ind. Eng. Chem. Res., Vol. 37, No. 6, 1998