The Pontryagin maximum principle and a unified theory of dynamic optimization

ISSN 0081-5438, Proceedings of the Steklov Institute of Mathematics, 2010, Vol. 268, pp. 58–69. c© Pleiades Publishing, Ltd., 2010.

The Pontryagin Maximum Principleand a Unified Theory of Dynamic Optimization∗

Francis Clarke a

Received April 2009

Abstract—The Pontryagin maximum principle is the central result of optimal control theory.In the half-century since its appearance, the underlying theorem has been generalized, strength-ened, extended, proved and reinterpreted in a variety of ways. We review in this article one ofthe principal approaches to obtaining the maximum principle in a powerful and unified context,focusing upon recent results that represent the culmination of over thirty years of progress usingthe methodology of nonsmooth analysis. We illustrate the novel features of this theory, as wellas its versatility, by introducing a far-reaching new theorem that bears upon the currently activesubject of mixed constraints in optimal control.

DOI: 10.1134/S0081543810010062

1. INTRODUCTION

In this introductory section we state the classical Pontryagin maximum principle as applied toa standard fixed-time optimal control problem in the Mayer form, the problem which will servethroughout the article for comparison purposes. We also recall a well-known nonsmooth version ofthe result.

Consider the problem (PC) that consists of minimizing the cost functional �(x(a), x(b)) subjectto the boundary or endpoint conditions

(x(a), x(b)) ∈ E

and the dynamicsx′(t) = f(t, x(t), u(t)) for a.e. t ∈ [a, b],

where the (measurable) control function u(·) is constrained by

u(t) ∈ U(t) for a.e. t ∈ [a, b].

Here x(t) is an absolutely continuous function with values in Rn. The pair (x, u) above is referred

to as a process. The data of the problem consist of the functions f and �, the set E, and themultifunction U(·). Let us assume for simplicity that the functions in question are continuouslydifferentiable and that E is a classical manifold or manifold with boundary. (Measurable behaviorof U is also needed; we omit these details for now.)

The issue is to give a set of necessary conditions that an optimal (or locally optimal) process(x∗, u∗) must satisfy. We say that an admissible process (x∗, u∗) is a strong local minimum for theproblem (PC) if, for some ε > 0, for any process (x, u) admissible for (PC) that satisfies

‖x − x∗‖∞ := maxt∈[a,b]

|x(t) − x∗(t)| < ε,

∗Plenary talk delivered at the Pontryagin Centennial Conference, June 2008.a CNRS, UMR 5208, Institut Camille Jordan, Universite Claude Bernard Lyon 1, 43 blvd du 11 novembre 1918,

F-69622 Villeurbanne, France.E-mail address: [email protected]

58

THE PONTRYAGIN MAXIMUM PRINCIPLE 59

we have �(x∗(a), x∗(b)) ≤ �(x(a), x(b)). Following Pontryagin et alii, we introduce the pseudo-Hamil-tonian function

H(t, x, u, p) := 〈p, f(t, x, u)〉.

The classical Pontryagin maximum principle asserts:

Theorem 1 (Pontryagin et al., 1956). If (x∗, u∗) is a strong local minimum for the prob-lem (PC) and if the optimal control u∗ is essentially bounded, then there exists an absolutely con-tinuous function p(·) on [a, b] together with a scalar λ0 equal to 0 or 1 satisfying the nontrivialitycondition:

λ0 + |p(t)| �= 0, t ∈ [a, b],

the transversality condition:

(p(a),−p(b)) − λ0∇�(x∗(a), x∗(b)) is normal to E at the point (x∗(a), x∗(b)),

the adjoint equation: for almost every t ∈ [a, b],

−p′(t) = D{H(t, ·, u∗(t), p(t))}(x∗(t))

and the maximum condition: for almost every t ∈ [a, b],

maxu∈U(t)

H(t, x∗(t), u, p(t)) = H(t, x∗(t), u∗(t), p(t)).

This theorem was the successful conclusion to a long quest. In the words of L.C. Young,1

The proof of the maximum principle, given in the book of Pontryagin, Boltyanskii, Gamkrelidzeand Mishchenko . . . , represents, in a sense, the culmination of the efforts of mathematicians,for considerably more than a century, to rectify the Lagrange multiplier rule.

A further, very important contribution of this theory was the very formulation of the basicproblem, with its focus upon the explicit control aspect. This greatly enhanced its appeal forpurposes of modeling and helped make it a vital tool in a wide range of applications.

Given the importance of the maximum principle, it is natural to seek extensions of it. Therehave been very many such advances in the last fifty years; we refer the reader to [1, 4, 10, 14, 16–18,20, 21, 23–29] and to the references therein, but of course the list could be much longer.

One of the persistent themes in developing generalizations of the maximum principle has involvedreducing the differentiability requirements in the hypotheses, as well as other regularity of the data.The first versions of the theorem for merely Lipschitz data were proven by Clarke in the early1970s; there, the adjoint equation is replaced by an inclusion in terms of the generalized gradientintroduced by the author.

The nonsmooth maximum principle can now be considered a well-known result; we proceed tostate it, in essentially its original form (see Clarke [5, 6, 8]). The hypotheses are the following:f(t, x, u) is (L × B)-measurable with respect to t and (x, u), the multifunction U(·) has (L × B)-measurable graph, the set E is closed, and � is locally Lipschitz. We posit also that f is Lipschitzin x in the following sense: for each t and u ∈ U(t) there exists k(t, u) such that

|f(t, x2, u) − f(t, x1, u)| ≤ k(t, u)|x2 − x1| ∀x1, x2 ∈ B(x∗(t), ε).

1Lectures on the Calculus of Variations and Optimal Control (1969).

PROCEEDINGS OF THE STEKLOV INSTITUTE OF MATHEMATICS Vol. 268 2010

60 F. CLARKE

Theorem 2 (Clarke, 1975). If (x∗, u∗) is a strong local minimum for the problem (PC) andif k(t, u∗(t)) is summable, then there exists an absolutely continuous function p(·) on [a, b] togetherwith a scalar λ0 equal to 0 or 1 satisfying the nontriviality condition:

λ0 + |p(t)| �= 0, t ∈ [a, b],


(p(a),−p(b)) ∈ λ0∂L�(x∗(a), x∗(b)) + NLE(x∗(a), x∗(b)),

the adjoint equation: for almost every t ∈ [a, b],

−p′(t) ∈ ∂C{H(t, ·, u∗(t), p(t))}(x∗(t)),

and the maximum condition: for almost every t ∈ [a, b],

maxu∈U(t)

H(t, x∗(t), u, p(t)) = H(t, x∗(t), u∗(t), p(t)).

In the theorem statement, ∂C denotes the generalized gradient (with respect to the x variable),∂L is the limiting subdifferential, and NL

S the limiting normal cone to S. We refer to Clarke [10] fora brief summary of these constructs of nonsmooth analysis, or to Clarke et al. [13] for a detailedpresentation. Let us merely remark that Theorem 2 strictly subsumes the classical result embodiedin Theorem 1: under the hypotheses of the latter, the generalized normals and differentials reduceto the classical notions, and the conclusions are identical.

Certain additional conclusions can be added to the maximum principle, notably when the prob-lem is autonomous, or when the underlying interval [a, b] can vary (see Clarke [10] or Vinter [28]for example). Also, the presence of unilateral state constraints can be considered, necessitating theuse of measures in the necessary conditions. We do not discuss such issues here.

We remark that the early versions of Theorem 2 write the transversality condition using ∂C�and NC

E rather than the potentially smaller constructs ∂L� and NLE , but (as several authors have

noted) the original proof actually yields this minor improvement without any modifications.We will now describe a longstanding project that seeks to view the maximum principle as one

aspect of a unified approach to dynamic optimization.

2. THE UNIFIED APPROACH

There have been several attempts to develop a unified approach to dynamic optimization. Thegeneral theories of Dubovitskiı and Milyutin [20], and of Neustadt [25], are well-known examples.Another approach was initiated by Clarke in the early 1970s; it may be called the nonsmoothanalysis approach. To explain the motivation behind this work, we need to recall the classicalnecessary conditions developed for the basic problem in the calculus of variations in the course ofits long history; we do so now, somewhat informally.

Consider the problem (PB) of minimizing the so-called Bolza functional

�(x(a), x(b)) +

b∫

a

L(t, x(t), x′(t)) dt

over a class of smooth functions satisfying the endpoint constraints

(x(a), x(b)) ∈ E.



The first necessary condition is the Euler equation, which asserts that if x∗ is a solution of theproblem, then there is an arc p which satisfies

(p′(t), p(t)) = ∇x,vL(t, x∗(t), x′∗(t)) for a.e. t ∈ [a, b]. [E]

This is actually an important variant of the equation first proposed by Euler in 1744; it is knownas the integral form of the Euler equation, and it was discovered by du Bois-Reymond in the latterhalf of the nineteenth century. It corresponds to the adjoint equation of the maximum principle, aspointed out by Pontryagin et alii when they first published their result.

The second basic necessary condition was proved at about the same time; it is known as theWeierstrass condition:

maxv∈Rn

〈p(t), v〉 − L(t, x∗(t), v) = 〈p(t), x′∗(t)〉 − L(t, x∗(t), x′

∗(t)) for a.e. t ∈ [a, b]. [W]

This corresponds to the maximum condition in the maximum principle.The third element we require adds the endpoint information to the necessary conditions. Histor-

ically this was rarely expressed in very general form, but we can group the various conditions foundin the literature exactly as we have expressed them in Theorem 1, in an all-inclusive transversalitycondition:

(p(a),−p(b)) −∇�(x∗(a), x∗(b)) is normal to E at the point (x∗(a), x∗(b)). [T]

The attempt to develop a unified theory of dynamic optimization can be described (in part) asthe project of obtaining the necessary conditions [E], [W], [T] for the problem of Bolza (PB) whenthe data (L, �,E) are not necessarily smooth, and more particularly when L is extended-valued(sometimes equal to +∞).

The point of having L extended-valued is that one can implicitly represent additional constraintsby defining L to be +∞ when they are violated. For example, if the basic problem above is consid-ered under the additional equality constraint h(t, x(t), x′(t)) = 0 a.e. (this is classically referred to asa problem of Lagrange), we are led to redefine L(t, x, v) to equal +∞ when h(t, x, v) �= 0 and to equalits old value L(t, x, v) when h(t, x, v) = 0. Then (and we emphasize that this is completely rigorous)the minimization of the new functional

∫ ba L(t, x, x′) dt (with no added constraint) is equivalent to

the minimization of the old one under the equality constraint.It turns out that the standard optimal control problem (PC) discussed previously, as well as a

number of less standard problems involving mixed equality and inequality constraints, differentialinclusions, or generalized control systems, can be brought under the umbrella of the extended-valued approach. Of course, the expression of the basic necessary conditions [E], [W], [T] will haveto change to accommodate nonsmooth data. This requires some constructs of nonsmooth analysis.

The ultimate goal, then, would be to prove extended necessary conditions for (PB) in a suffi-ciently general setting, and under sufficiently nonrestrictive hypotheses, so that not only the classicalcases, but also the maximum principle, its principal extensions, and other cases could be subsumedin a satisfactory manner by the unified theory. Ideally, the new unified theory so obtained wouldin fact give the state of the art for each of its special cases. This project, initiated in 1973 inthe author’s doctoral thesis [3] (see also [7]), has required several decades of development and thecontributions of many people, principal among whom have been Clarke, Ioffe, Ledyaev, Loewen,Rockafellar and Vinter. The outcome is described in the succinct, self-contained monograph [10].

One of the principal features of the theory is its versatility: it applies to the calculus of variations,to differential inclusions, and to various types of optimal control problems. Another importantfeature is the absence of many hypotheses that have encumbered such results in the past (convexity,boundedness, data regularity, constraint qualifications). A weaker type of Lipschitz condition (called


62 F. CLARKE

pseudo-Lipschitz ) is postulated, and an exceptionally weak type of local minimum is shown to sufficein deriving the strongest forms of the necessary conditions.

In addition, and these are the two features we will stress in the examples below, the results arestratified and expressible in a natural geometric form. The first of these features refers to the factthat hypotheses are made only relative to a certain radius function, and the conclusions are thenasserted to hold to precisely the same extent. As we will see, this is especially useful in derivingmultiplier rules in the presence of functional constraints, as it eliminates the need to call uponimplicit function theorems.

The second feature, the geometrical formulation, allows one to state a simple theorem whichspecializes easily to a variety of contexts. We stress that the interest of the results obtained is notlimited to problems with nonsmoothness: we derive a new state of the art even for problems withsmooth data.

In order to give an idea of the nature of the unified approach, it is convenient to consider firsta control problem phrased in terms of a differential inclusion.

A differential inclusion problem. We are given a multifunction F from [a, b] × Rn to the

subsets of Rn. It is assumed that F is measurable and that F (t, ·) has closed graph. A trajectory

of F refers to an absolutely continuous function x on [a, b] satisfying x′(t) ∈ F (t, x(t)) a.e. Weconsider the problem (PD) of minimizing �(x(a), x(b)) over the trajectories x of F satisfying theendpoint constraints (x(a), x(b)) ∈ E.

A measurable function R : [a, b] → (0,+∞] is called a radius function.Definition 1. The trajectory x∗ is a local W 1,1 minimum of radius R for the problem (PD)

if, for some ε > 0, we have�(x∗(a), x∗(b)) ≤ �(x(a), x(b))

for all trajectories x satisfying the endpoint constraints as well as

‖x − x∗‖∞ < ε,

b∫

a

|x′(t) − x′∗(t)| dt < ε,

and|x′(t) − x′

∗(t)| ≤ R(t) for a.e. t ∈ [a, b].

Note that when R is identically +∞ (which is allowed), this reduces to what is usually referredto as a local W 1,1 minimum, which is in turn a weaker assumption than that of a strong localminimum (as in Section 1). When R is a finite constant, we obtain a type of minimum that isknown in the calculus of variations as a weak local minimum.

In the following, G(t) refers to the graph of the multifunction F (t, ·), and NPG refers to the cone

of proximal normals. When S is a subset of Rn and x ∈ S, we say that ζ ∈ R

n is a proximal normalto S at x (written ζ ∈ NP

S (x)) provided that, for some σ > 0, we have

〈ζ, x′ − x〉 ≤ σ|x′ − x|2 ∀x′ ∈ S.

This fundamental type of normal vectors is a building block which generates all the others andwhich coincides with the familiar normal vectors if G happens to be smooth or convex.

The following is a geometrical version of a property of F that is known as a pseudo-Lipschitzcondition:

Definition 2. We say that F satisfies the bounded slope condition of radius R near x∗ if thereexist k ∈ L1(a, b) and ε > 0 such that for almost every t, for every (x, v) ∈ G(t) with x ∈ B(x∗(t), ε)and v ∈ B(x∗(t), R(t)), and for all (α, β) ∈ NP

G(t)(x, v), one has |α| ≤ k(t)|β|.



The following result, taken from [10], turns out to be a powerful and unifying tool for necessaryconditions.

Theorem 3 (Clarke, 2005). Suppose that, for some radius function R, the trajectory x∗ isa local W 1,1 minimum of radius R for the problem (PD) , where F satisfies near x∗ the boundedslope condition of radius R, with R(t) ≥ ηk(t) a.e. for some η > 0. Then there exists an absolutelycontinuous function p(·) on [a, b] together with a scalar λ0 equal to 0 or 1 satisfying the nontrivialitycondition:

(λ0, p(t)) �= 0 ∀t ∈ [a, b],


(p(a),−p(b)) ∈ ∂Lλ0�(x∗(a), x∗(b)) + NLE(x∗(a), x∗(b)),

and the Euler adjoint inclusion:

p′(t) ∈ co{ω : (ω, p(t)) ∈ NL

G(t)(x∗(t), x∗(t))}

for a.e. t ∈ [a, b],

as well as the Weierstrass condition of radius R: for almost every t we have

〈p(t), v〉 ≤ 〈p(t), x∗(t)〉 ∀v ∈ F (t, x∗(t)) ∩ B(x∗(t), R(t)).

If the above holds for a sequence of radius functions Ri (with all parameters ε, k, η possibly dependingon i) for which

lim infi→∞

Ri(t) = +∞ a.e.,

then the conclusions hold for an arc p which satisfies the global Weierstrass condition :

〈p(t), v〉 ≤ 〈p(t), x∗(t)〉 ∀v ∈ F (t, x∗(t)), for a.e. t ∈ [a, b].

The fact that this theorem gives rise to definitive results in such varied contexts as the calculusof variations, standard control problems, and generalized systems, is amply described in [10]. Inparticular, the goal described above of finding satisfactory extended necessary conditions for theproblem of Bolza (PB) is achieved through this means.

In this article, however, we proceed to develop its applications in a different direction: optimalcontrol problems in which the control set depends upon the state x. This is often referred to as thecase of mixed constraints.

3. A GENERAL THEOREM ON MIXED CONSTRAINTS

We consider now the problem (PM) which is the same as the problem (PC) of Section 1, but withone important difference: the control constraint u(t) ∈ U(t) is replaced by the mixed state/controlconstraint

(x(t), u(t)) ∈ S(t),

where S(t) is a closed subset of Rn×R

m for each t. As always, S and f are assumed to be measurablein a suitable sense, while � is taken to be locally Lipschitz and E closed.

The main theorem below features hypotheses directly related to a given pair (x∗, u∗) that isadmissible for (PM). Let R : [a, b] → (0,+∞] be a given radius function and ε > 0. We set

S(t, x) := {u ∈ Rm : (x, u) ∈ S(t)},

Sε,R∗ (t) :=

{(x, u) ∈ S(t) : |x − x∗(t)| ≤ ε, |u − u∗(t)| ≤ R(t)

}.


64 F. CLARKE

We say that (x∗, u∗) is a local minimum of radius R for (PM) provided that for every pair (x, u)admissible for (PM) which also satisfies

(x(t), u(t)) ∈ Sε,R∗ (t) a.e.,

b∫

a

|x′(t) − x′∗(t)| dt ≤ ε,

we have �(x(a), x(b)) ≥ �(x∗(a), x∗(b)).The main hypotheses of the theorem are conditioned by the radius R; they concern Lipschitz

behavior of f(t, x, u) with respect to (x, u) and a certain bounded slope condition bearing upon thesets S(t).

[H1] For almost every t ∈ [a, b], the function f(t, ·, ·) is locally Lipschitz on a neighborhood ofSε,R∗ (t), and there exist measurable functions kx and ku, with kx summable, such that, for

almost every t,

(xi, ui) ∈ Sε,R∗ (t) (i = 1, 2) ⇒ |f(t, x1, u1)−f(t, x2, u2)| ≤ kx(t)|x1−x2|+ku(t)|u1−u2|.

[H2] There exists a measurable function kS such that kSku is summable and, for almost everyt ∈ [a, b], the following bounded slope condition holds:

(x, u) ∈ Sε,R∗ (t), (α, β) ∈ NP

S(t)(x, u) ⇒ |α| ≤ kS(t)|β|.

The following theorem asserts necessary conditions under optimality and regularity hypotheseswhich are imposed only on a radius R, and its conclusions hold to the same extent; this situationis referred to in [10] as stratified. We stress that the theorem allows the case R ≡ +∞.

Theorem 4. Let the process (x∗, u∗) be a local minimum of radius R for (PM) , where hy-potheses [H1] and [H2] hold and where, for some positive constant η, we have R(t) ≥ ηkS(t) a.e.Then there exist an arc p and a number λ0 in {0, 1} satisfying the nontriviality condition:

(λ0, p(t)) �= 0 ∀t ∈ [a, b]

and the transversality condition:

(p(a),−p(b)) ∈ ∂Lλ0�(x∗(a), x∗(b)) + NLE(x∗(a), x∗(b))

and such that p satisfies the adjoint inclusion: for almost every t

(−p′(t), 0) ∈ ∂C{H(t, ·, ·, p(t))}(x∗(t), u∗(t)) − NCS(t)(x∗(t), u∗(t)),

as well as the Weierstrass condition of radius R: for almost every t,

u ∈ S(t, x∗(t)), |u − u∗(t)| ≤ R(t) ⇒ H(t, x∗(t), u, p(t)) ≤ H(t, x∗(t), u∗(t), p(t)).

If the hypotheses hold for a sequence of radius functions Ri (with all parameters ε, kx, ku, kS , ηpossibly depending on i) for which

lim infi→∞

Ri(t) = +∞ a.e.,

then the conclusions above hold for an arc p which satisfies the global Weierstrass condition : foralmost every t

u ∈ S(t, x∗(t)) ⇒ H(t, x∗(t), u, p(t)) ≤ H(t, x∗(t), u∗(t), p(t)).



We omit the proof of Theorem 4 (see [12]), which consists of a direct appeal to Theorem 3. Notethat [H1] requires some Lipschitz behavior with respect to the control u; this was not the case inTheorems 1 or 2. This is a price to be paid in order to consider control sets that depend on thestate.

4. SPECIAL CASES

It turns out that Theorem 4 unifies, subsumes and significantly extends the existing resultson the issue of necessary conditions for problems with mixed constraints, notably in the threespecial cases that comprise the bulk of the literature: calculus of variations, differential-algebraicsystems, and mixed constraints specified by equalities and inequalities. We proceed to give somebrief illustrations of this; the forthcoming article [12] develops in detail the new approach, which inour view represents an important simplification and unification of the theory.

4.1. Unilateral control constraints. Let S(t)= {(x, u) : u∈U(t)}. Then the problem (PM)of the previous section coincides with the classical optimal control problem (PC) of Section 1, inwhich the control constraints are unilateral (unmixed).

Let (α, β) belong to NPS(t)(x, u). Then, by the definition of proximal normal, for some constant σ,

the function(x′, u′) �→ −〈α, x′〉 − 〈β, u′〉 + σ

{|x′ − x|2 + |u′ − u|2

}

has a minimum relative to (x′, u′) ∈ Rn × U(t) at (x′, u′) = (x, u). It follows that α = 0, so that

the bounded slope condition of [H2] is automatically satisfied, with kS = 0.If f satisfies [H1], then Theorem 4 applies. When f is locally Lipschitz, [H1] is a consequence

of the classically familiar hypothesis that u∗ is bounded. With the radius function taken to beR ≡ +∞, the necessary conditions reduce to familiar ones which include the full Weierstrass (ormaximum) condition

H(t, x∗(t), u, p(t)) ≤ H(t, x∗(t), u∗(t), p(t)) ∀u ∈ U(t), for a.e. t ∈ [a, b],

as well as the Euler form of the adjoint inclusion: for almost every t

(−p′(t), 0) ∈ ∂C{H(t, ·, ·, p(t))}(x∗(t), u∗(t)) − {0} × NCU(t)(u∗(t)).

For smooth data, this is precisely Theorem 1. For f merely locally Lipschitz, however, this adjointinclusion, taken jointly in (x, u), is different from the one obtained in the usual nonsmooth maximumprinciple, Theorem 2, and neither implies the other.

The relative merits of these two different forms are discussed and illustrated in [11, 15, 22], towhich we refer for further details. We add only that since kS = 0 here, we can allow any (arbitrarilysmall) positive time-dependent radius function and still get the Euler adjoint equation, togetherwith a corresponding Weierstrass condition. Note also that the function ku need not be summable.This goes beyond previous results.

4.2. Calculus of variations: the multiplier rule. We consider now the following problemof Lagrange in the calculus of variations:

minimizeb∫

a

L(t, x(t), x′(t) dt

over the arcs x satisfying the following boundary conditions and pointwise constraint:

x(a) = A, x(b) = B, h(t, x(t), x′(t)) = 0 for a.e. t ∈ [a, b].


66 F. CLARKE

There is a vast literature on such problems (see Bliss [2], Hestenes [21] and the references therein).In the classical setting, which we adopt here, it is assumed that L : R × R

n × Rn → R and h : R ×

Rn × R

n → RN (with N ≤ n) are continuously differentiable and that there exists a solution x∗

with piecewise continuous derivative. The goal is to derive necessary conditions in the form of amultiplier rule. (This project was in fact the century-long quest that L.C. Young refers to in thequotation that appears in Section 1.)

The problem is framed as a special case of (PM) as follows:⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩

Minimize y(b) subject to

x′(t) = u(t) a.e.,

y′(t) = L(t, x(t), u(t)) a.e.,

(x(t), y(t), u(t)) ∈ S(t) := {(x, y, u) : h(t, x, u) = 0} a.e.,

(x(a), y(a), x(b), y(b)) ∈ E := {(A, 0, B)} × R.

If R is any constant finite radius function, then [H1] holds for certain constants kx and ku (dependingon R). In order to apply Theorem 4, we study the bounded slope condition, suppressing the(irrelevant) y variable for ease of notation.

We remark that if (α, β) ∈ NPS(t)(x, u) and if the Jacobian matrix Dx,uh(t, x, u) has maxi-

mal rank N , then (α, β) belongs to the classical normal space to S(t) at (x, u), whence (α, β) =Dx,u〈λ, h〉(t, x, u) for some vector λ ∈ R

N . This simple but crucial geometric fact is a consequence ofthe definition of proximal normal, together with the classical Lagrange multiplier rule. It will allowus to confirm the bounded slope condition and also to interpret the resulting necessary conditionin terms of multipliers (rather than normal vectors).

A rank hypothesis is commonly made in the literature: that the matrix Duh(t, x, u) is of maximalrank N , either globally in some prescribed region, or else just along the optimal arc. We now showthat both scenarios can be handled by Theorem 4.

Suppose first that we assume only that Duh(t, x∗(t), x′∗(t)) is of maximal rank for every t. (This

is to be interpreted as holding for both x′∗(t+) and x′

∗(t−) if x∗ has a corner at t.) We claim (whileomitting the simple proof by contradiction) that under this rank condition, there exist constantsε > 0 and kS such that

t ∈ [a, b], |x − x∗(t)| ≤ ε, |u − x′∗(t)| ≤ ε

⇒ |λTDxh(t, x, u)| ≤ kS |λTDuh(t, x, u)| ∀λ ∈ RN , |λ| = 1.

This allows us to verify [H2] for a suitably small radius R and to invoke Theorem 4. A straight-forward inspection of the resulting necessary conditions reveals the existence of λ0 ∈ {0, 1},λ ∈ L∞(a, b)N and an arc p such that λ0 + ‖p‖∞ > 0 and

p′(t) = Dx{λ0L + 〈λ(t), h〉}(t, x∗(t), x′∗(t)),

p(t) = Du{λ0L + 〈λ(t), h〉}(t, x∗(t), x′∗(t)).

We recognize this as the classical Euler equation for the Lagrangian λ0L + 〈λ, h〉, and the desiredmultiplier rule is obtained. Theorem 4 also yields, for almost every t,

h(t, x∗(t), u) = 0, |u − x′∗(t)| ≤ ε

⇒ 〈p(t), u〉 − λ0L(t, x∗(t), u) ≤ 〈p(t), x′∗(t)〉 − λ0L(t, x∗(t), x′

∗(t)),

a local Weierstrass condition which is new in this context. We summarize:



Corollary 1. Under the hypotheses above, with a local rank condition, we obtain the multiplierrule together with a local Weierstrass condition.

The other approach to the multiplier rule is to require that the matrix Duh(t, x, u) be of maximalrank globally (at points where h = 0), and not just along the optimal arc. In that case, we canapply Theorem 4 as above for a sequence of constant radius functions Ri increasing to +∞, and weobtain the multiplier rule accompanied by the global Weierstrass condition, as in [21]:

Corollary 2. Under the hypotheses above, with a global rank condition, we obtain the multiplierrule together with a global Weierstrass condition.

Thus we are able to easily obtain via Theorem 4 the multiplier rule in either its local or globalform. In fact, the theorem can be used to obtain such multiplier rules under considerably weakerregularity hypotheses than the ones we have posited in this example (both on the data and thesolution), and for constraints that are not necessarily of equality type [12].

4.3. Mixed constraints in optimal control. We study next the case in which the constraintset S(t) of problem (PM) is described as follows:

S(t) :={(x, u) : g(t, x, u) ≤ 0, h(t, x, u) = 0, u ∈ U

},

which combines unilateral control constraints with mixed ones defined via functional equalities andinequalities. We take U closed here and, in order to facilitate comparison with the literature, limitourselves to the smooth setting. Thus, g : R × R

n × Rm → R

M and h : R × Rn × R

m → RN are

taken to be continuously differentiable, as is f . This type of context has dominated the literatureon mixed constraints.

In the following, we impose a constraint qualification hypothesis of a type familiar in optimizationand often referred to as the Mangasarian–Fromowitz condition. It is this hypothesis that can beused to confirm the bounded slope condition.

Corollary 3. Let (x∗, u∗) be a strong (or W 1,1) local minimum, where u∗ is bounded. Supposethat for almost every t, at each point (x, u) ∈ S(t) for which |x − x∗(t)| < ε, we have

λ ∈ RN , γ ∈ R

M+ , 〈γ, g(t, x, u)〉 = 0, Du{〈γ, g〉 + 〈λ, h〉}(t, x, u) ∈ −NL

U(u)

⇒ γ = 0, λ = 0.

Then the conclusions of Theorem 4 hold, with the global Weierstrass condition. In addition, thereexist bounded measurable functions

λ : [a, b] → RN , γ : [a, b] → R

M+ , with 〈γ(t), g(t, x∗(t), u∗(t))〉 = 0 a.e.,

such that the adjoint inclusion is expressible in the explicit multiplier form

(−p′(t), μ(t)) = Dx,u{〈p(t), f〉 − 〈γ(t), g〉 − 〈λ(t), h〉}(t, x∗(t), u∗(t)) a.e.,

where μ is a measurable function satisfying μ(t) ∈ NCU(t)(u∗(t)) a.e.

We remark that it has not been necessary to make a certain restrictive structural assumptionthat has been a common feature of the literature on mixed constraints (see, for example, [9, 19]).We mean by this the assumption that the control variable u can be partitioned (in the same way atall points) into two parts (v,w) in such a way that v is completely free of unilateral constraints anda maximal rank condition is satisfied with respect to this free part of the control. To be precise, ithas been assumed previously that the set S(t) is described by

S(t) :={(x, v,w) : g(t, x, v, w) ≤ 0, h(t, x, v, w) = 0, w ∈ W

}


68 F. CLARKE

and that the following type of constraint qualification holds near the optimal process:

(x, v,w) ∈ S(t), λ ∈ RN , γ ∈ R

M+ , 〈γ, g(t, x, v, w)〉 = 0,

0 = Dv{〈γ, g〉 + 〈λ, h〉}(t, x, v, w) ⇒ γ = 0, λ = 0.

It is clear that this implies the constraint qualification adduced in Corollary 3.2

It is possible to extend the result to the case of nonsmooth data. Furthermore, versions of thenecessary conditions may be obtained in which the constraint qualification is imposed only pointwisewith reference to the optimal process. This somewhat delicate issue is addressed in [12], where it isshown that mistakes have made their way into the existing literature.

The aspects we have stressed in this section are the relative ease with which multiplier rules canbe derived from Theorem 4, together with the versatility of the theorem and the transparency ofits use. In particular, we obtain a unified treatment of local, intermediate, and global situations,depending on what hypotheses are made.

REFERENCES1. A. V. Arutyunov, Optimality Conditions. Abnormal and Degenerate Problems (Kluwer, Dordrecht, 2000).2. G. A. Bliss, Lectures on the Calculus of Variations (Univ. Chicago Press, Chicago, 1946).3. F. H. Clarke, “Necessary Conditions for Nonsmooth Problems in Optimal Control and the Calculus of Variations,”

Doctoral thesis (Univ. Washington, 1973). (Thesis director: R. T. Rockafellar).4. F. H. Clarke, “Necessary Conditions for Nonsmooth Variational Problems,” in Optimal Control Theory and Its

Applications (Springer, New York, 1974), Lect. Notes Econ. Math. Syst. 106, pp. 70–91.5. F. H. Clarke, “Le principe du maximum avec un minimum d’hypotheses,” C. R. Acad. Sci. Paris 281, 281–283

(1975).6. F. H. Clarke, “Maximum Principles without Differentiability,” Bull. Am. Math. Soc. 81, 219–222 (1975).7. F. H. Clarke, “The Generalized Problem of Bolza,” SIAM J. Control Optim. 14, 682–699 (1976).8. F. H. Clarke, “The Maximum Principle under Minimal Hypotheses,” SIAM J. Control Optim. 14, 1078–1091

(1976).9. F. Clarke, “The Maximum Principle in Optimal Control, Then and Now,” Control Cybern. 34, 709–722 (2005).

10. F. Clarke, Necessary Conditions in Dynamic Optimization (Am. Math. Soc., Providence, RI, 2005), Mem. AMS173 (816).

11. F. H. Clarke and M. R. de Pinho, “The Nonsmooth Maximum Principle,” Control Cybern. (in press).12. F. H. Clarke and M. R. de Pinho, “Optimal Control Problems with Mixed Constraints,” SIAM J. Control Optim.

(in press).13. F. H. Clarke, Yu. S. Ledyaev, R. J. Stern, and P. R. Wolenski, Nonsmooth Analysis and Control Theory (Springer,

New York, 1998), Grad. Texts Math. 178.14. M. R. de Pinho, M. M. Ferreira, and F. Fontes, “Unmaximized Inclusion Necessary Conditions for Nonconvex

Constrained Optimal Control Problems,” ESAIM Control Optim. Calc. Var. 11, 614–632 (2005).15. M. R. de Pinho and R. B. Vinter, “An Euler–Lagrange Inclusion for Optimal Control Problems,” IEEE Trans.

Autom. Control 40, 1191–1198 (1995).16. M. R. de Pinho and R. B. Vinter, “Necessary Conditions for Optimal Control Problems Involving Nonlinear

Differential Algebraic Equations,” J. Math. Anal. Appl. 212, 493–516 (1997).17. M. R. de Pinho, R. B. Vinter, and H. Zheng, “A Maximum Principle for Optimal Control Problems with Mixed

Constraints,” IMA J. Math. Control Inf. 18, 189–205 (2001).18. E. N. Devdariani and Yu. S. Ledyaev, “Maximum Principle for Implicit Control Systems,” Appl. Math. Optim.

40, 79–103 (1999).19. A. V. Dmitruk, “Maximum Principle for the General Optimal Control Problem with Phase and Regular Mixed

Constraints,” Comput. Math. Model. 4, 364–377 (1993).20. A. Ya. Dubovitskiı and A. A. Milyutin, “Theory of the Maximum Principle,” in Methods of the Theory of Extremal

Problems in Economics (Nauka, Moscow, 1981), pp. 6–47 [in Russian].21. M. R. Hestenes, Calculus of Variations and Optimal Control Theory (J. Wiley and Sons, New York, 1966).

2If the control is partitioned in this way, one can use the finitization procedure of [10, p. 88] to obtain the necessaryconditions when W (t) has measurable t-dependence and is not necessarily closed-valued.



22. A. D. Ioffe and R. T. Rockafellar, “The Euler and Weierstrass Conditions for Nonsmooth Variational Problems,”Calc. Var. Partial Diff. Eqns. 4, 59–87 (1996).

23. A. D. Ioffe and V. M. Tikhomirov, Theory of Extremal Problems (Nauka, Moscow, 1974; North-Holland, Ams-terdam, 1979).

24. A. A. Milyutin and N. P. Osmolovskii, Calculus of Variations and Optimal Control (Am. Math. Soc., Providence,RI, 1998).

25. L. W. Neustadt, Optimization: A Theory of Necessary Conditions (Princeton Univ. Press, Princeton, 1976).26. Z. Pales and V. Zeidan, “Optimal Control Problems with Set-Valued Control and State Constraints,” SIAM J.

Optim. 14, 334–358 (2003).27. G. Stefani and P. Zezza, “Optimality Conditions for a Constrained Control Problem,” SIAM J. Control Optim.

34, 635–659 (1996).28. R. Vinter, Optimal Control (Birkhauser, Boston, 2000).29. J. Warga, Optimal Control of Differential and Functional Equations (Academic Press, New York, 1972).

This article was submitted by the author in English


Documents

The Pontryagin maximum principle and a unified theory of dynamic optimization