19
Nonlinear Analysis 69 (2008) 2095–2113 www.elsevier.com/locate/na An augmented Lagrangian approach with a variable transformation in nonlinear programming Liwei Zhang a,b,* , Xiaoqi Yang c a Department of Science, Shenyang Institute of Aeronautic Engineering, Shenyang 110136, China b Applied Mathematics, Dalian University of Technology, Dalian 116024, China c Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong, China Received 23 May 2007; accepted 25 July 2007 Abstract Tangent cone and (regular) normal cone of a closed set under an invertible variable transformation around a given point are investigated, which lead to the concepts of θ -1 -tangent cone of a set and θ -1 -subderivative of a function. When the notion of θ -1 -subderivative is applied to perturbation functions, a class of augmented Lagrangians involving an invertible mapping of perturbation variables are obtained, in which dualizing parameterization and augmenting functions are not necessarily convex in perturbation variables. A necessary and sufficient condition for the exact penalty representation under the proposed augmented Lagrangian scheme is obtained. For an augmenting function with an Euclidean norm, a sufficient condition (resp., a sufficient and necessary condition) for an arbitrary vector (resp., 0) to support an exact penalty representation is given in terms of θ -1 - subderivatives. An example of the variable transformation applied to constrained optimization problems is given, which yields several exact penalization results in the literature. c 2007 Elsevier Ltd. All rights reserved. Keywords: Augmented Lagrangian; Duality; Exact penalty representation; Tangent cone; Normal cone; Subderivative; Subdifferential 1. Introduction The first augmented Lagrangian, namely the proximal Lagrangian, was introduced by Rockafellar [10] and the theory of augmented Lagrangians were developed in, e.g., Ioffe [6], Bertsekas [1–3] and Rockafellar [11] for constrained optimization problems. Recently Rockafellar and Wets [12] proposed a general framework for augmented Lagrangians for a primal problem of minimizing an extended real-valued function, in which a convex augmenting function σ and a dualizing parameterization function f (x , u ) are employed, where f is convex in parameter u . Huang and Yang [5] extended the augmented Lagrangian theory to generalized augmented Lagrangians with generalized augmenting functions being only proper, lower semi-continuous and level-bounded. By using a The research of Liwei Zhang is supported by the National Natural Science Foundation of China under project grant No. 10471015 and by the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry. The research of Xiaoqi Yang is supported by the Research Grant Council of Hong Kong (B-Q549). * Corresponding author at: Department of Science, Shenyang Institute of Aeronautic Engineering, Shenyang 110136, China. E-mail addresses: [email protected] (L. Zhang), [email protected] (X. Yang). 0362-546X/$ - see front matter c 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.na.2007.07.048

An augmented Lagrangian approach with a variable transformation in nonlinear programming

Embed Size (px)

Citation preview

Page 1: An augmented Lagrangian approach with a variable transformation in nonlinear programming

Nonlinear Analysis 69 (2008) 2095–2113www.elsevier.com/locate/na

An augmented Lagrangian approach with a variable transformationin nonlinear programmingI

Liwei Zhanga,b,∗, Xiaoqi Yangc

a Department of Science, Shenyang Institute of Aeronautic Engineering, Shenyang 110136, Chinab Applied Mathematics, Dalian University of Technology, Dalian 116024, China

c Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong, China

Received 23 May 2007; accepted 25 July 2007

Abstract

Tangent cone and (regular) normal cone of a closed set under an invertible variable transformation around a given point areinvestigated, which lead to the concepts of θ−1-tangent cone of a set and θ−1-subderivative of a function. When the notionof θ−1-subderivative is applied to perturbation functions, a class of augmented Lagrangians involving an invertible mapping ofperturbation variables are obtained, in which dualizing parameterization and augmenting functions are not necessarily convex inperturbation variables. A necessary and sufficient condition for the exact penalty representation under the proposed augmentedLagrangian scheme is obtained. For an augmenting function with an Euclidean norm, a sufficient condition (resp., a sufficientand necessary condition) for an arbitrary vector (resp., 0) to support an exact penalty representation is given in terms of θ−1-subderivatives. An example of the variable transformation applied to constrained optimization problems is given, which yieldsseveral exact penalization results in the literature.c© 2007 Elsevier Ltd. All rights reserved.

Keywords: Augmented Lagrangian; Duality; Exact penalty representation; Tangent cone; Normal cone; Subderivative; Subdifferential

1. Introduction

The first augmented Lagrangian, namely the proximal Lagrangian, was introduced by Rockafellar [10] andthe theory of augmented Lagrangians were developed in, e.g., Ioffe [6], Bertsekas [1–3] and Rockafellar [11]for constrained optimization problems. Recently Rockafellar and Wets [12] proposed a general framework foraugmented Lagrangians for a primal problem of minimizing an extended real-valued function, in which a convexaugmenting function σ and a dualizing parameterization function f (x, u) are employed, where f is convex inparameter u. Huang and Yang [5] extended the augmented Lagrangian theory to generalized augmented Lagrangianswith generalized augmenting functions being only proper, lower semi-continuous and level-bounded. By using a

I The research of Liwei Zhang is supported by the National Natural Science Foundation of China under project grant No. 10471015 and by theScientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry. The research of Xiaoqi Yang is supportedby the Research Grant Council of Hong Kong (B-Q549).

∗ Corresponding author at: Department of Science, Shenyang Institute of Aeronautic Engineering, Shenyang 110136, China.E-mail addresses: [email protected] (L. Zhang), [email protected] (X. Yang).

0362-546X/$ - see front matter c© 2007 Elsevier Ltd. All rights reserved.doi:10.1016/j.na.2007.07.048

Page 2: An augmented Lagrangian approach with a variable transformation in nonlinear programming

2096 L. Zhang, X. Yang / Nonlinear Analysis 69 (2008) 2095–2113

generalized augmenting function, “lower order” nonsmooth and nonconvex penalty functions used in Luo et al. [7]and Pang [8] can be derived under the scheme of generalized augmented Lagrangian. The generalized augmentedLagrangians also include a class of nonlinear penalty functions studied in Rubinov et al. [9] and Yang and Huang [15]as special cases.

In this paper, we consider another scheme for constructing a class of augmented Lagrangians for constrainedoptimization problems. The augmented Lagrangian is related to an invertible mapping A as follows

lA(φ, x, v, r) := infu

φ(x, u)+ r∆(A(u))− 〈v, A(u)〉. (1.1)

We will show that this augmented Lagrangian corresponds to a nonlinear perturbation to constraints, which producesa dualizing parameterization function being nonconvex in parameter vector u if φ is the conventional dualizingparameterization function. For characterizing the exactness property of this class of augmented Lagrangians, weintroduce the concepts of subderivative and (regular/horizon) subdifferential derived from a variable transformationand explore their basic properties. It should be pointed out that the subderivative of [12] is based on the quotient

f (x + tw)− f (x)

t,

whereas the subderivative based on a variable transformation leads to the quotient in the form

f (x + Θ(t)w)− f (x)

t,

where Θ(t) is a diagonal matrix in t and this kind of subderivative can characterize some non-Lipschitz functionseffectively, see examples in Section 2.

If v = 0 and ∆ is convex, then lA(φ, x, 0, r) can be generated by setting σ(u) = ∆(A(u)) in the generalizedaugmented Lagrangian scheme of [5], so the set of exact penalty functions supported by 0 derived from the proposedscheme is a subclass of that by [5]. On the other hand, from several examples, it seems more difficult to derive explicitaugmented Lagrangian formulas from the generalized augmented Lagrangian scheme than the one proposed in thispaper. This is an advantage of the proposed scheme because the multiplier vector plays an important role in designingnumerical methods.

The paper is organized as follows. Section 2.1 derives formulas for the tangent cone of A(C) at A( p) when C is aclosed set and A is an invertible mapping over a neighborhood of p, especially A satisfies Assumption 2.1. Section 2.2introduces the concept of θ−1-subderivative based on the formulas in Section 2.1 applied to the epigraph of a function.Several examples are presented to show the notion of θ−1-subderivatives in Section 2.3 and optimality conditionsfor unconstrained and constrained optimization problems are demonstrated in Section 2.4. In Section 3.1 a class ofaugmented Lagrangians based on nonlinear perturbations to constraints are introduced and the corresponding dualitytheorem is established under mild conditions. A sufficient and necessary condition for exact penalty representation isobtained. Furthermore, for the special case ∆(·) = ‖·‖2, a sufficient condition for an arbitrary vector v and a sufficientand necessary condition for 0, supporting an exact penalty representation are given in terms of θ−1-subderivatives. InSection 3.2 we give an example of variable transformation, which can produce several popular penalty functions andconditions for their exact penalty property.

2. Subderivatives based on variable transformations

Let C ⊂ Rr be a subset of Rr and p ∈ C be a given point. Let V ∈ N ( p) be a neighborhood of p and A : V −→ Rr

be a one-to-one mapping. Without loss of generality, we assume A is a one-to-one and onto mapping from V to A(V )and A−1 exists on A(V ), which is a one-to-one and onto mapping from A(V ) to V . In Section 2.1 we derive formulasfor the tangent cone of A(C) at a point q ∈ A(C) and introduce T θ

−1

C ( p) under an assumption on mapping A. InSection 2.2 we derive formulas for subderivative and (regular/horizon) subdifferentials of a functionψ under a variabletransformation and introduce θ−1-subderivative of a function ψ at a point p. Section 2.3 presents some examplesfor illustrating the notion of θ−1-subderivative and Section 2.4 derives optimality conditions for unconstrained andconstrained optimization problems.

Page 3: An augmented Lagrangian approach with a variable transformation in nonlinear programming

L. Zhang, X. Yang / Nonlinear Analysis 69 (2008) 2095–2113 2097

2.1. Tangent cones based on variable transformations

The tangent cone of A(C) at A( p) is defined by [12]

TA(C)(A( p)) = lim supτ0

A(C)− A( p)

τ,

the regular normal cone and the normal cone of A(C) at A( p) are defined by

NA(C)(A( p)) = v | 〈v, q − A( p)〉 ≤ o(‖q − A( p)‖), q ∈ A(C)

and

NA(C)(A( p)) = lim supq →A(C) A( p)

NA(C)(A(p)),

respectively.Obviously, if A−1 is continuous at A( p), then

TA(C)(A( p)) =

w | ∃pν ∈ C, pν −→ p, ∃τ ν 0, w = lim

ν

A(pν)− A( p)

τ ν

.

Now we consider a wide class of mappings satisfying the following assumption:

Assumption 2.1. Let A satisfy

A(p) = (µ1(p1), . . . , µr (pr ))T , A−1(q) = (µ−1

1 (q1), . . . , µ−1i (qr ))

T

with the property

µ−1i (qi + τdi ) = pi + θi (τ )ξi (di ), i = 1, . . . , r,

where θi (·) and ξi (·) are invertible functions from [0, ε) to [0, δ) and from R to R satisfying that θi is increasingwith θi (0) = 0. In this case, we denote A−1(q + τd) = p + Θ(τ )ξ(d), where Θ(τ ) = diag(θ1(τ ), . . . , θr (τ )),θ(τ ) = (θ1(τ ), . . . , θr (τ ))

T and ξ(d) = (ξ1(d), . . . , ξr (d))T .

For expressing TA(C)(A( p)) under the above assumption, we introduce the concept θ−1-tangent of C at p ∈ C ,

denoted by T θ−1

C ( p), which is defined by

T θ−1

C ( p) = w | ∃tν 0, wν → w with p + Θ(tν)wν ∈ C. (2.1)

Proposition 2.1. If A is an invertible mapping from V ∈ N ( p) to A(V ), A and A−1 are continuous andAssumption 2.1 holds, then

TA(C)(A( p)) = ξ−1(T θ−1

C ( p)) (2.2)

and

NA(C)(A( p)) = v | 〈v, ξ−1(w)〉 ≤ 0, w ∈ T θ−1

C ( p). (2.3)

Proof. Let d ∈ TA(C)(A( p)). Then there exist sequences tν, pν satisfying tν 0 and pν −→ p such that

d = limν→∞

A(pν)− A( p)

tν.

According to Assumption 2.1, one has that

pν = p + Θ(tν)ξ(dν) ∈ C,

where

dν =A(pν)− A( p)

tνand A(pν) = A( p)+ tνdν .

Page 4: An augmented Lagrangian approach with a variable transformation in nonlinear programming

2098 L. Zhang, X. Yang / Nonlinear Analysis 69 (2008) 2095–2113

Let wν = ξ(dν), then wν −→ w = ξ(d). Therefore we have w ∈ T θ−1

C ( p), d ∈ ξ−1(T θ−1

C ( p)) and in turn

TA(C)(A( p)) ⊂ T θ−1

C ( p).

On the other hand, if w ∈ T θ−1

C ( p), then there exist tν 0 and wν −→ w such that p + Θ(tν)wν ∈ C . Letdν = ξ−1(wν), then dν −→ d = ξ−1(w), satisfying

p + Θ(tν)ξ(dν) ∈ C.

From Assumption 2.1, the above inclusion implies that

A( p + Θ(tν)ξ(dν)) = q + tνdν ∈ A(C),

which implies that ξ−1(w) = d ∈ TA(C)(A( p)). Therefore we obtain ξ−1(T θ−1

C ( p)) ⊂ TA(C)(A( p)), and in turn (2.2)is valid.

Noting that NA(C)(A( p)) is the polar of TA(C)(A( p)), we obtain (2.3) from the expression for TA(C)(A( p))in (2.2).

2.2. θ−1-subderivatives

Let φ : Rr−→ R andψ : Rr

−→ R, where R = R∪±∞ and C = epiψ . Assume that φ andψ are connected by

φ(q) = ψ(A−1(q))

or

ψ(p) = φ(A(p)).

Define A : Rr+1−→ Rr+1 by

A(p, α) = (A(p), α),

then A is invertible on V × R and A−1(q, α) = (A−1(q), α) for q ∈ A(V ) and α ∈ Rr and

epiφ ∩ A(V )× R = (q, α) | ψ(p) ≤ α, p ∈ V

= A(p, α) | (p, α) ∈ epiψ, p ∈ V

= A(epiψ) ∩ A(V )× R.

Let

∆τφ(q)(d) =φ(q + τd)− φ(q)

τ.

Then we have, for sufficiently small τ > 0, that

epi ∆τφ(q) =epiφ − (q, φ(q))

τ

=A(epiψ)− A( p, ψ( p))

τ.

Then the subderivative (see [12]) of φ at q , dφ(q), can be generated by

epi dφ(q) = TA(epiψ)( A( p, ψ( p))). (2.4)

The regular subdifferential of φ at q , in the sense of Rockafellar and Wets [12], ∂φ(q), can be calculated by (see 8.9Theorem of [12])

∂φ(q) = v | (v,−1) ∈ N A(epiψ)( A( p, ψ( p))), (2.5)

and if φ is lsc at φ(q), ∂φ(q) and ∂∞φ(q) are calculated by

∂φ(q) = v | (v,−1) ∈ N A(epiψ)( A( p, ψ( p)))

Page 5: An augmented Lagrangian approach with a variable transformation in nonlinear programming

L. Zhang, X. Yang / Nonlinear Analysis 69 (2008) 2095–2113 2099

and

∂∞φ(q) = v | (v, 0) ∈ N A(epiψ)( A( p, ψ( p))).

Let A satisfy Assumption 2.1. Define θ (τ ) = (θ(τ )T , τ )T and ξ (d, α) = (ξ(d)T , α)T . Then

A−1(q + τd, ψ( p)+ τα) =

(pψ( p)

)+ Θ(τ )

(dα

).

where Θ(τ ) = diag(θ1(τ ), . . . , θr+1(τ )). In this case we obtain from Proposition 2.1 that

TA(epiψ)( A( p, ψ( p))) = T θ−1

epiψ ( p, ψ( p)). (2.6)

Let us analyze the expressions of ∆τφ(q)(d) and epi ∆τφ(q) for a special mapping A satisfying Assumption 2.1.Under Assumption 2.1, one has

∆τφ(q)(d) =φ(q + τd)− φ(q)

τ

=ψ(A−1(q + τd))− ψ(A−1(q))

τ

=ψ( p + Θ(τ )ξ(d))− ψ( p)

τ

= ∆θτψ( p)(ξ(d)),

where ∆θτψ( p) is defined by

∆θτψ( p)(w) =

ψ( p + Θ(τ )w)− ψ( p)

τ

for a function θ : Rr+ −→ Rr

+.Now we are in a position to introduce the concept of θ−1-subderivative. The θ−1-subderivative of ψ at p along w

is defined by

dθ−1ψ( p)(w) = lim inf

τ0,w′→w∆θτψ( p)(w

′).

For establishing the relations between ∆θτψ( p) and ∆τφ(q), we introduce the notations ∆θ

τψ( p) and ∆τφ(q), whichare defined by

∆θτψ( p)(w) = ∆θ

τψ( p)(w)+ δ(τ | [0, ε))

and

∆τφ(q)(d) = ∆τφ(q)(d)+ δ(τ | [0, ε)),

where δ(· | B) is the indicator function of set B [12].

Proposition 2.2. Let ψ , A satisfy Assumption 2.1, θ and ξ , ξ−1 be continuous, then

(i) The graph and epigraph of mapping (τ, w) −→ ∆θτψ( p)(w) coincide with those of (τ, w) −→ ∆τφ(q)

(ξ−1(w));

(ii) dφ(q)(d) = dθ−1ψ( p)(ξ(d)) and dθ

−1ψ( p)(w) = dφ(q)(ξ−1(w)).

(iii) epi dθ−1ψ( p) = T θ

−1

epiψ ( p, ψ( p)).

Proof. Assertion (i) comes from the following equality

∆θτψ( p)(w) = ∆τφ(q)(ξ

−1(w)), τ ∈ (0, ε).

Page 6: An augmented Lagrangian approach with a variable transformation in nonlinear programming

2100 L. Zhang, X. Yang / Nonlinear Analysis 69 (2008) 2095–2113

As ξ is continuous, ξ(B(d, ε)) is a neighborhood of ξ(d) for any d ∈ Rr and ε > 0, we obtain from Assumption 2.1that

dφ(q)(d) = lim infd ′→d,τ0

∆τφ(q)(d′)

= lim infd ′→d,τ0

ψ( p + θ(τ )ξ(d ′))− ψ( p)

τ

= lim infd ′→d,t0

ψ( p + Θ(t)ξ(d ′))− ψ( p)

t

= supε>0,δ>0

infd ′∈B(d,ε),t∈(0,δ)

ψ( p + Θ(t)ξ(d ′))− ψ( p)

t

= supε>0,δ>0

infw′∈ξ(B(d,ε)),t∈(0,δ)

ψ( p + Θ(t)w′)− ψ( p)

t

≥ supε′>0,δ>0

infw′∈B(ξ(d),ε′),t∈(0,δ)

ψ( p + Θ(t)w′)− ψ( p)

t

= dθ−1ψ( p)(ξ(d)).

On the other hand, there are sequences wν, tν satisfying wν −→ ξ(d) and tν 0 and

dθ−1ψ( p)(ξ(d)) = lim

ν→∞

ψ( p + Θ(tν)wν)− ψ( p)

tν.

Let dν = ξ−1(wν), then, from the continuity of ξ−1, dν −→ d and

limν→∞

∆tνφ(q)(dν) = dθ

−1ψ( p)(ξ(d)),

which implies that dθ−1ψ( p)(ξ(d)) ≥ dφ(q)(d). Therefore equalities in (ii) are valid.

Noting, under the conditions given, that

epi dφ(q) = epi dθ−1ψ( p) = Tepiφ(q, φ(q)) = TA(epiψ)(q, φ(q)),

we obtain (iii) from the equality (2.6). The proof is completed.

Let ∂Aψ( p) be the so-called regular subdifferential of ψ at p with respect to mapping A, which is defined by

∂Aψ( p) = v | ψ(p) ≥ ψ( p)+ 〈v, A(p)− A( p)〉 + o(‖A(p)− A( p)‖).

Proposition 2.3. The following conclusions are valid:

(i) ∂φ(q) = ∂Aψ( p);(ii) If A and A−1 are continuous, then v ∈ ∂Aψ( p) iff

lim infp→ p,p 6= p

ψ(p)− ψ( p)− 〈v, A(p)− A( p)〉

‖A(p)− A( p)‖≥ 0; (2.7)

(iii) If A and A−1 are continuous, and Assumption 2.1 holds, then v ∈ ∂φ(q) or v ∈ ∂Aψ( p) iff

dθ−1ψ( p)(w) ≥ 〈v, ξ−1(w)〉, ∀w,

or

∂Aψ( p) = v | dθ−1ψ( p)(w) ≥ 〈v, ξ−1(w)〉,∀w.

Proof. For any v ∈ ∂φ(q), one has

φ(q) ≥ φ(q)+ 〈v, q − q〉 + o(‖q − q‖),

Page 7: An augmented Lagrangian approach with a variable transformation in nonlinear programming

L. Zhang, X. Yang / Nonlinear Analysis 69 (2008) 2095–2113 2101

which is equivalent to

φ(A(p)) ≥ φ(A( p))+ 〈v, A(p)− A( p)〉 + o(‖A(p)− A( p)‖) (2.8)

as A is a one-to-one and onto mapping from V to A(V ). Therefore v is an element of ∂Aψ( p) and ∂φ(q) ⊂ ∂Aψ( p).The equality in (i) holds as the inverse inclusion can be proved in a similar way.

It follows from (2.8) that v ∈ ∂φ(q) iff

lim infA(p)→A( p),A(p)6=A( p)

ψ(p)− ψ( p)− 〈v, A(p)− A( p)〉

‖A(p)− A( p)‖≥ 0,

which is equivalent to (2.7) as [A(p) → A( p), A(p) 6= A( p)] is equivalent to [p → p, p 6= p] under the conditionthat both A and A−1 are continuous. This proves the validity of (ii).

Setting p = p + Θ(t)w′, we have from Assumption 2.1 that A(p) = q + tξ−1(w′). From (2.8), one has thatv ∈ ∂Aψ( p) iff, for any t > 0 and w′

∈ Rr ,

ψ( p + Θ(t)w′) ≥ ψ( p)+ 〈v, tξ−1(w′)〉 + o(‖tξ−1(w′)‖)

which is equivalent to

lim inft0,w′→w

ψ( p + Θ(t)w′)− ψ( p)

t− 〈v, ξ−1(w)〉 ≥ 0,

namely

dθ−1ψ( p)(w) ≥ 〈v, ξ−1(w)〉,

which proves (iii).

2.3. Examples

Now we give several examples to illustrate the notion of θ−1-subderivative.

Example 2.1. Let ψ(p) = maxep1 − 1, log(p2 + 1), 3√

p3 and p = 0. It is obvious that epiψ is nonconvex, and ψis even non-Lipschitz continuous, dψ(0)(w) is not a proper function in w. Let A(p) = (ep1 − 1, log(p2 + 1), 3

√p3),

then A−1(q) = (log(q1 + 1), eq2 − 1, q33 ).

φ(q) = ψ(A−1(q)) = maxq1, q2, q3, q = (0, 0, 0).

It is evident that epiφ is convex, Tepi φ(0, 0) = epi max·, ·, · and dφ(0)(d) = maxd1, d2, d3.

Example 2.2. Let

ψ(p) =

p sin1

p3/7 , p 6= 0

0, p = 0

and p = 0. Then dψ(0)(w) = −|w|. If we choose A(p) = p3/7, then A−1(q) = q7/3. Let φ(q) = ψ(A−1(q)), then

φ(q) =

q7/3 sin1q, q 6= 0

0, q = 0

and φ is differentiable at q = 0, φ′(0) = 0. This example shows that a suitable variable transformation can improvethe differentiability property.

Example 2.3. Let ψ(p) =√

[p]+ and p = 0, where [p] = max0, p. It is obvious that epiψ is nonconvex,Tepiψ (0, 0) = R− × R+ and

dψ(0)(w) =

+∞ if w > 00 if w ≤ 0.

Page 8: An augmented Lagrangian approach with a variable transformation in nonlinear programming

2102 L. Zhang, X. Yang / Nonlinear Analysis 69 (2008) 2095–2113

Let A(p) = µ(p) = sgn(p)√

|p|, then A−1(q) = sgn(q)q2 and q = 0. We have

φ(q) = ψ(A−1(q)) =

√[sgn(q)q2]+ = [q]+.

It is evident that epiφ is convex, Tepi φ(0, 0) = epi [·]+ and dφ(0)(d) = [d]+.As µ−1(0 + td) = 0 + t2sgn(d)d2, we can choose θ(t) = t2 and ξ(d) = sgn(d)d2. It is easy to check that

Assumption 2.1 holds for mapping A. Then it follows from Proposition 2.1 that

dθ−1ψ(0)(w) = dφ(0)(ξ−1(w))

=

√w if w > 0

0 if w ≤ 0

=√

[w]+.

Example 2.4. Let ψ(p) = 2k+1√

p where k ≥ 1 is an integer and p = 0. It is obvious that epiψ is nonconvex,Tepiψ (0, 0) = R− × R and

dψ(0)(w) =

+∞ if w > 0−∞ if w ≤ 0.

Let A(p) = µ(p) = 2k+1√

p, then A−1(q) = q2k+1 and q = 0. We have

φ(q) = ψ(A−1(q)) = q.

It is evident that dφ(0)(d) = d .As µ−1(0 + td) = 0 + t2k+1d2k+1, we can choose θ(t) = t2k+1 and ξ(d) = d2k+1. It is easy to check that

Assumption 2.1 holds for mapping A. Then it follows from Proposition 2.1 that

dθ−1ψ(0)(w) = dφ(0)(ξ−1(w))

=2k+1

√w.

Example 2.5. Let ψ be the same function as in Example 2.1. Let A(p) = (p1, p2, 3√

p3)T , p = (0, 0, 0)T and

q = A( p) = (0, 0, 0)T . Then A−1(q) = (q1, q2, q33 )

T and φ(q) = maxeq1 − 1, log(q2 + 1), q3. It is obvious thatA−1(q + τd) = p + Θ(τ )ξ(d), where θ(τ ) = (τ, τ, τ 3)T and ξ(d) = (d1, d2, d3

3 )T . We have from Proposition 2.1

that

dθ−1ψ(0)(w) = dφ(0)(ξ−1(w))

= maxw1, w2,3√w3.

2.4. Optimality conditions

Firstly we consider the following unconstrained minimization problem

minx∈Rn

f0(x), (2.9)

where f0 : Rn−→ R is a proper lsc function.

Proposition 2.4. Let x∗ be a point of Rn at which there exists an invertible mapping from a neighborhood V ∈ N (x∗)

to A(V ), satisfying both A and A−1 are continuous over V and A(V ) respectively. Let A satisfy Assumption 2.1 atx∗. Then, for x∗ to be a local minimizer of Problem (2.9), it is necessary that

dθ−1

f0(x∗)(d) ≥ 0, ∀d ∈ Rn,

or equivalently

0 ∈ ∂A f0(x∗).

Page 9: An augmented Lagrangian approach with a variable transformation in nonlinear programming

L. Zhang, X. Yang / Nonlinear Analysis 69 (2008) 2095–2113 2103

Proof. Let d ∈ Rn . There exist dk, tk

satisfying dk−→ d and tk

0 such that

dθ−1

f0(x∗)(d) = lim

k→∞

f0(x∗+ Θ(tk)dk)− f0(x∗)

tk .

Since θ is continuous, θ(0) = 0, we have for k sufficiently large that x∗+ Θ(tk)dk

∈ V0, where V0 ∈ N (x∗) is aneighborhood of x∗ such that f0(x) ≥ f0(x∗) for x ∈ V0. Therefore

f0(x∗+ Θ(tk)dk)− f0(x∗)

tk ≥ 0

for sufficiently large k, which implies that dθ−1

f0(x∗)(d) ≥ 0.It follows from (iii) of Proposition 2.3 that 0 ∈ ∂A f0(x∗) is equivalent to dθ

−1f0(x∗)(d) ≥ 0,∀d ∈ Rn .

Now we consider the constrained optimization problem in the form

min f0(x) | x ∈ C. (2.10)

Similar to 8.15 Theorem of [12], we can demonstrate the following necessary conditions for Problem (2.10).

Proposition 2.5. Let f0 : Rn−→ R be a proper lsc function and C ⊂ Rn be a closed set. If x∗

∈ C is a point atwhich the following constraint qualification is fulfilled

v ∈ ∂∞

A f0(x∗)

−v ∈ NA(C)(A(x∗))

H⇒ v = 0.

Then, for x∗ to be a local minimizer for Problem (2.10), it is necessary that

0 ∈ ∂A f0(x∗)+ NA(C)(A(x

∗)), (2.11)

which in the case that ∂A f0(x∗) = ∂A f0(x∗) and NA(C)(A(x∗)) = NA(C)(A(x∗)), is equivalent to

dθ−1

f0(x∗)(d) ≥ 0, ∀d ∈ T θ

−1

C (x∗). (2.12)

Proof. Let f0(z) = f0(A−1(z)). Then under the conditions given, x∗ being a local minimizer of Problem (2.10)implies that A(x∗) is a local minimizer to

min f0(z) | z ∈ A(C). (2.13)

Since

∂ f0(A(x∗)) = ∂A f0(x

∗), ∂ f0(A(x∗)) = ∂A f0(x

∗), ∂∞ f0(A(x∗)) = ∂∞

A f0(x∗),

the constraint qualification means that

v ∈ ∂∞ f0(A(x∗))

−v ∈ NA(C)(A(x∗))

H⇒ v = 0.

Then it follows from 8.15 Theorem of [12] that

0 ∈ ∂ f0(A(x∗))+ NA(C)(A(x

∗)),

which is just (2.11). Conditions ∂A f0(x∗) = ∂A f0(x∗) and NA(C)(A(x∗)) = NA(C)(A(x∗)) imply f0 and A(C) areregular at A(x∗) (in the sense of [12]), and it also follows from 8.15 Theorem of [12] that (2.11) is equivalent to

d f0(A(x∗))(d) ≥ 0, ∀d ∈ TA(C)(A(x

∗)). (2.14)

We have from Proposition 2.2 that d f0(A(x∗))(d) = dθ−1

f (x∗)(ξ(d)) and from (2.2) that TA(C)(A(x∗)) = ξ−1

(T θ−1

C (x∗)), and therefore we can get (2.12) from (2.14).

Page 10: An augmented Lagrangian approach with a variable transformation in nonlinear programming

2104 L. Zhang, X. Yang / Nonlinear Analysis 69 (2008) 2095–2113

3. Zero duality gap and augmented Lagrangians

Consider the primal problem

minimize f0(x)subject to G(x) ∈ Rm1

− × 0m−m1,

x ∈ X,(3.1)

where G(x) = (g1(x), . . . , gm(x))T , X ⊂ Rn is a closed set. Let

f (x) = f0(x)+ δ(x | Ω), Ω = x ∈ X | G(x) ∈ Rm1− × 0m−m1,

then Problem (3.1) is equivalent to the following problem

infx∈Rn

f (x). (3.2)

The classic perturbation function is

ς(u) = infx∈Rn

f (x, u),

where

f (x, u) = f0(x)+ δ(x | Ω(u)), Ω(u) = x ∈ X | G(x)+ u ∈ Rm1− × 0m−m1. (3.3)

Here f (x, u) is constructed by the constraint mapping being perturbed linearly, which yields the convexity of f (x, u)in u.

In this section, we consider nonlinear perturbations where the convexity of the dualizing parameterization function,required by [12], is not guaranteed. Therefore here we call a function φ : Rn

× Rm−→ R is a dualizing parameter-

ization function for f only if f (x) = φ(x, 0) for all x ∈ Rn , and we also do not require φ has the form of (3.3).

3.1. A class of augmented Lagrangians

We first give some definitions for the use of deriving augmented Lagrangians based on nonlinear perturbations.

Definition 3.1 (Uniform Boundedness [12]). A function φ : Rn× Rm

−→ R with value φ(x, u) is said to be level-bounded in x locally uniform in u if, for every u ∈ Rm and α ∈ R, there exists a neighborhood U (u) of u along witha bounded set D ⊂ Rn such that x | φ(x, u) ≤ α ⊂ D for any u ∈ U (u).

Definition 3.2 (Augmented Lagrangian Functions [12]). For a primal problem of minimizing f (x) over x ∈ Rn andany dualizing parameterization f (·) = f (·, 0) for a choice of f : Rn

× Rm−→ R, consider any augmenting function

∆; by this meant a proper, lsc, convex function

∆ : Rm−→ R with min ∆ = 0, argmin ∆ = 0.

The corresponding augmented Lagrangian with penalty parameter r > 0 is then the function l : Rn× Rm

−→ Rdefined by

l( f , x, v, r) := infu

f (x, u)+ r∆(u)− 〈v, u〉.

The corresponding augmented dual problem consists of maximizing over all (v, r) ∈ Rm× R the function

κ( f , v, r) := infx,u

f (x, u)+ r∆(u)− 〈v, u〉.

Definition 3.3 (Generalized Augmented Lagrangians [5]). A function σ : Rm−→ R is said to be a generalized

augmented function if it is proper, lsc, level-bounded on Rm and satisfying min σ = 0, argmin σ = 0. And thegeneralized augmented Lagrangian l : Rn

× Rm−→ R is defined by

l( f , x, v, r) := infu

f (x, u)+ rσ(u)− 〈v, u〉.

Page 11: An augmented Lagrangian approach with a variable transformation in nonlinear programming

L. Zhang, X. Yang / Nonlinear Analysis 69 (2008) 2095–2113 2105

Now us introduce the notion of a class of generalized augmented Lagrangians based on variable transformations,which is different from Definition 3.3

Definition 3.4 (A-Augmented Function). A-augmented Lagrangian with penalty parameter r > 0 is the functionlA : Rn

× Rm−→ R defined by (1.1), namely,

lA(φ, x, v, r) := infu

φ(x, u)+ r∆(A(u))− 〈v, A(u)〉.

The corresponding augmented dual problem consists of maximizing over all (v, r) ∈ Rm× R the function

κA(φ, v, r) := infx,u

φ(x, u)+ r∆(A(u))− 〈v, A(u)〉,

where A is an invertible mapping from V ∈ N (0) to A(V ) and ∆ is an augmented function satisfying the conditionsin Definition 3.2 or a generalized augmented function satisfying the conditions in Definition 3.3.

Now we give two examples of lA(φ, x, v, r) below.

Example 3.1. Assume that A : Rm→ Rm is a mapping satisfying Assumption 2.1. Let A be order-preserving,

namely A(u) ≤ A(u′) if and only if u ≤ u′, ∆(·) = 1/2‖ · ‖22, and φ(x, u) = f (x, u), then

lA(φ, x, v, r) = f (x)+ δX (x)+12

[m1∑j=1

(r−1v j + µ j (g j (x)))2+ −

m1∑j=1

(r−1v j )2

]

+

m∑i=m1+1

[viµi (gi (x))+

r

2µi (gi (x))

2].

Example 3.2. Assume that mapping A satisfies the conditions of Example 3.1, ∆(·) = ‖ · ‖1, and φ(x, u) = f (x, u),then

lA(φ, x, v, r) = f (x)+ δX (x)+

m1∑j=1

[v j (µ j (g j (x)))+ + r |(µ j (g j (x)))+|]

+

m∑i=m1+1

[viµi (gi (x))+ r |µi (gi (x))|], for |vi | ≤ r, i = 1, . . . ,m1.

We have the following obvious conclusion, which is similar to Proposition 2.1 of [5].

Proposition 3.1. For any dualizing parameterization function φ and any augmenting function ∆,(i) the A-augmented Lagrangian lA(φ, x, v, r) is concave, upper semicontinuous in (v, r);(ii) weak duality holds

κA(φ, v, r) ≤ β(0),

where β(u) is the perturbation function corresponding to φ.

The following theorem has the similar format of 11.59 Theorem of [12].

Theorem 3.1 (Duality). For a problem of minimizing f (x) over x ∈ Rn , Assumption 2.1 is satisfied at 0 withA(0) = 0, consider the A-augmented Lagrangian lA(φ, x, v, r) associated with a dualizing parameterizationf = φ(·, 0), φ : Rn

× Rm−→ R, an augmenting function ∆ : Rm

−→ R. Suppose that φ(x, u) is level-bounded inx uniformly in u and infx lA(φ, x, v, r) > −∞ for at least one (v, r) ∈ Rm

× (0,∞). Then

f (x) ≥ supv,r

lA(φ, x, v, r), κA(φ, v, r) = infx

lA(φ, x, v, r),

and

infx

f (x) = infx

[supv,r

lA(φ, x, v, r)] = supv,r

[infx

lA(φ, x, v, r)] = supv,rκA(φ, v, r). (3.4)

Page 12: An augmented Lagrangian approach with a variable transformation in nonlinear programming

2106 L. Zhang, X. Yang / Nonlinear Analysis 69 (2008) 2095–2113

Furthermore,

argmaxv,r

κA(φ, v, r) = (v, r) | β(u) ≥ β(0)+ 〈v, A(u)〉 − r∆(A(u)),∀u. (3.5)

Proof. Noting A(0) = 0 and A−1(0) = 0, ψ(x, y) := φ(x, A−1(y)) satisfies f (x) = ψ(x, 0) and ψ is also adualizing parameterization function of f . The perturbation function corresponding to ψ is

υ(y) = β(A−1 y),

and

l(ψ, x, v, r) = lA(φ, x, v, r), κ(ψ, v, r) = κA(φ, v, r).

As A and A−1 are continuous and A(0) = 0 = A−1(0), we know that φ is level-bounded in x locally uniformly inu near 0 if and only if ψ is level-bounded in x locally in y near 0. As the convexity of ψ(x, y) with respect to y isabsent, 11.59 Theorem of [12] cannot be employed directly, we need to verify the validity of all conclusions in thistheorem.

From the definition of l(ψ, x, v, r), we have

ψ(x, y)+ r∆(y)− 〈v, y〉 ≥ l(ψ, x, v, r), ∀u ∈ Rm,

which yields the inequality

f (x) = ψ(x, 0) ≥ l(ψ, x, v, r)

if setting y = 0. The equality κ(ψ, v, r) = infx l(ψ, x, v, r) is obvious. Since

infx

f (x) ≥ infx

[supv,r

l(ψ, x, v, r)] ≥ supv,r

[infx

l(ψ, x, v, r)] = supv,rκ(ψ, v, r),

for establishing (3.4), we only need to show that

infx

f (x) = υ(0) = supv,rκ(ψ, v, r).

By hypothesis there is at least one pair (v, r) such that κ(ψ, v, r) is finite. To get the equality in question, it sufficesto prove that κ(ψ, v, r) −→ υ(0) as r → ∞. We can rewrite κ(ψ, v, r) as follows

κ(ψ, v, r + s) = infy

υ(y)+ s∆(y),

where s = r − r and υ(y) = υ(y)+ r∆(y)− 〈v, y〉. Now we proceed to prove that κ(ψ, v, r + s) tends to υ(0) as stends to ∞.

Because ∆ is convex with argmin∆ = 0, it is level-coercive (by 3.27 Corollary of [12]). Since

υ(y) ≥ κ(ψ, v, r),

namely υ being bounded below, we have from (a) of 3.26 Theorem of [12] that υ + s∆ is also level-coercive. Since∆(y) > 0 for y 6= 0 and ∆(0) = 0, υ + s∆ increases pointwise as s → ∞ to δ0 + υ(0). It follows from (d) of7.4 Proposition of [12] that υ + s∆ epi-converges to δ0 + υ(0). Therefore we have from 7.33 Theorem of [12] thatinf(υ + s∆) −→ inf(δ0 + υ(0)) = υ(0). The equalities (3.4) are demonstrated.

For (v′, r ′) ∈ argmaxv,r

κ(ψ, v, r), it is sufficient and necessary that

υ(0) = κ(ψ, v′, r ′) = infy

υ(y)+ r ′∆(y)− 〈v′, y〉,

which is equivalent to

υ(y) ≥ υ(0)+ 〈v′, y〉 − r ′∆(y), ∀y.

The last inequality implies the equality (3.5).

Remark 3.1. (i) From the proof of 11.59 Theorem of [12], under the conditions of Theorem 3.1, actually there is a vsuch that β(0) = limr→∞ κA(φ, v, r), this result was extended to the generalized augmented Lagrangian by [5].

Page 13: An augmented Lagrangian approach with a variable transformation in nonlinear programming

L. Zhang, X. Yang / Nonlinear Analysis 69 (2008) 2095–2113 2107

(ii) The results in Theorem 3.1 are also true if ∆ is relaxed to a generalized augmenting function of Definition 3.3,see Theorem 2.1 of [5].

Now we discuss the criterion for exact penalty property of lA(φ, x, v, r).

Definition 3.5 (Exact Penalty Representation [12]). Let β(u) be the perturbation function corresponding to φ(x, u).Let lA(φ, x, v, r) be an A-augmented Lagrangian defined by Definition 3.4. A vector v ∈ Rm is said to support anexact penalty representation for the problem minimizing f (x) over x ∈ Rn , if there exists r > 0 such that

β(0) = infx

lA(φ, x, v, r), ∀r ≥ r ,

and

argminx

f (x) = argminx

A(φ, x, v, r) ∀r ≥ r .

Theorem 3.2. In the framework of Theorem 3.1, a vector v supports an exact penalty representation for the primalproblem if and only if there exist U ∈ N (0) and r > 0 such that

β(u) ≥ β(0)+ 〈v, A(u)〉 − r∆(A(u)), for all u ∈ U. (3.6)

This criterion is equivalent to the existence of an r > 0 with (v, r) ∈ argmaxv,r

κA(φ, v, r), and moreover such values r

are the ones serving as adequate penalty thresholds for the exact penalty property with respect to v.

Proof. Obviously, (3.6) is equivalent to

υ(y) ≥ υ(0)+ 〈v, y〉 − r∆(y), for all y ∈ A(U ). (3.7)

Since A(U ) is a neighborhood of 0, 11(33) in 11.61 Theorem of [12] is valid for the perturbation function υ(y), whichis the necessary and sufficient condition for v supporting an exact penalty representation.

Considering the special case ∆(·) = ‖ · ‖2, in the following theorem, we give a sufficient condition for v, a sufficientand necessary condition for 0, supporting an exact penalty representation in terms of θ−1-subderivatives.

Theorem 3.3. Assume that A is invertible from V ∈ N (0) to A(V ), ξ−1 is continuous and min‖ξ−1(w)‖2 | w ∈

brdy B > 0. Let Assumption 2.1 be satisfied and ∆(·) = ‖ · ‖2.

(i) If φ satisfies

dθ−1β(0)(w) ≥ 〈v, ξ−1(w)〉, ∀w ∈ brdy B, (3.8)

then v supports an exact penalty representation.(ii) Vector 0 supports an exact penalty representation if and only if

c = infw∈brdy B

dθ−1β(0)(w) > −∞. (3.9)

Proof. Let (3.8) hold, then one has for any w ∈ brdy B that

lim inft0,w′→w

β(Θ(t)w′)− β(0)− 〈v, ξ−1(w′)〉

t≥ 0.

Therefore for any ρ > 0, there exists εw > 0 such that

β(Θ(t)w′) ≥ β(0)+ 〈v, tξ−1(w′)〉 − ρt, ∀t ∈ [0, εw], w′∈ B(w, εw).

As bdry B is compact, there are a finite number of (wi , εi ), i = 1, . . . , l, such that

bdry B ⊂

l⋃i=1

B(wi , εi ).

Page 14: An augmented Lagrangian approach with a variable transformation in nonlinear programming

2108 L. Zhang, X. Yang / Nonlinear Analysis 69 (2008) 2095–2113

Taking ε = minε1, . . . , εl, we have

β(Θ(t)w) ≥ β(0)+ 〈v, tξ−1(w)〉 − ρt, ∀t ∈ [0, ε], w ∈ bdry B. (3.10)

Let U = [0, ε]bdry B, c0 = max‖ξ−1(w)‖−12 | w ∈ bdry B, we have U ∈ N (0) and c0 is a finite positive scalar

from the assumptions on ξ−1. For any u ∈ U , we have u = tξ−1(w) for some t ∈ [0, ε] and w ∈ bdry B andt = ‖A(u)‖2/‖ξ

−1(w)‖2. Then it follows from (3.10) that

β(u) ≥ β(0)+ 〈v, A(u)〉 − ρc0‖A(u)‖2,

which, from Theorem 3.2, implies the exact penalty representation supporting by v, namely (i) is valid.Suppose that 0 supports an exact penalty representation, we have from Theorem 3.2 that

β(u) ≥ β(0)− r‖A(u)‖2, ∀u ∈ U,

for some neighborhood U ∈ N (0) and r > 0. For sufficiently small ε > 0, ∃tε > 0 such that

Θ(t)w ∈ U, w ∈ B(bdry B, ε),∀t ∈ [0, tε].

Therefore, from the equality A(Θ(t)w) = tξ−1(w), we have

β(Θ(t)w) ≥ β(0)− r t‖ξ−1(w)‖2,

which implies for w ∈ B(bdry B, ε),∀t ∈ [0, tε] that

∆θt β(0)(w) ≥ −r‖ξ−1(w)‖2

≥ −r sup‖ξ−1(w)‖2 | w ∈ B(bdry B, ε)

≡ −r0.

From the definition of θ−1-subderivative, we obtain c ≥ −r0 and the necessity is established.Now we turn to the proof of the sufficiency of (ii). If c > −∞, for sufficiently small ε > 0 such that

infw∈B(bdry B,ε)‖ξ−1(w)‖2 > 0, there exists tε > 0 such that

β(Θ(t)w)− β(0) ≥ (c − ε)t, ∀w ∈ B(bdry B, ε),∀t ∈ [0, tε]. (3.11)

Let

r =

(c − ε)( sup

w∈B(bdry B,ε)‖ξ−1(w)‖2)

−1, c − ε ≥ 0,

(c − ε)( infw∈B(bdry B,ε)

‖ξ−1(w)‖2)−1, c − ε < 0

t = minθi (tε), i = 1, . . . ,m and U = Θ([0, tε])B(bdry B, ε), then it follows from (3.11), by setting Θ(t)w = u(A(u) = tξ−1(w), t = ‖A(u)‖2/‖ξ

−1(w)‖2), that

β(u)− β(0) ≥ −|r |‖A(u)‖2, ∀u ∈ U.

Noting that 0 ∈ [0, t]bdry B ⊂ U , namely U ∈ N (0), we have that the above inequality is just (3.6) and the sufficiencyof (ii) is demonstrated.

For a general augmenting function ∆, we can demonstrate a sufficient and necessary condition for condition (3.6)under the following assumption.

Assumption 3.1. Assume that ∆ satisfies

γ1 ≡ infw∈bdry B

[d(∆)(0)(w)] > 0, infw∈bdry B

[d(−∆)(0)(w)] > −∞.

Corollary 3.1. Under Assumptions 2.1 and 3.1, condition (3.6) is equivalent to

γ2 ≡ infw∈bdry B

dθ−1β(0)(w)− 〈v, ξ−1(w)〉 > −∞. (3.12)

Page 15: An augmented Lagrangian approach with a variable transformation in nonlinear programming

L. Zhang, X. Yang / Nonlinear Analysis 69 (2008) 2095–2113 2109

Proof. If (3.6) holds, then for any w ∈ bdry B, t > 0 can be chosen to be small enough such that Θ(t)w ∈ U . SinceAssumption 2.1 holds, A(Θ(t)w) = tξ−1(w) and (3.6) becomes

β(Θ(t)w) ≥ β(0)+ 〈v, tξ−1(w)〉 − r [∆(tξ−1(w))− ∆(0)], ∀t > 0 sufficiently small,

which implies

∆θt β(0)(w) ≥ 〈v, ξ−1(w)〉 + r

(−∆)(tξ−1(w))− (−∆)(0)t

,

and leads to

dθ−1β(0)(w)− 〈v, ξ−1(w)〉 ≥ r inf

w′∈bdry B[d(−∆)(0)(w′)] > −∞,

namely (3.12) holds.On the contrary, if (3.12) holds and γ2 > 0, then there is an ε > 0 such that

υ(td) ≥ υ(0)− 〈v, td〉 +12γ2t, ∀t ∈ [0, ε],∀d ∈ ξ−1(brdy B),

which implies

υ(y)− υ(0) ≥ 〈v, y〉, ∀y ∈ εξ−1(B).

Therefore if we choose U = A−1(εξ−1(B)), r = 0, then (3.6) is valid.Now suppose γ2 ≤ 0. There is an ε > 0 such that

υ(td) ≥ υ(0)− 〈v, td〉 + (γ2 − 1)t, ∆(td) ≥γ1

2t, ∀t ∈ [0, ε], ∀d ∈ ξ−1(brdy B).

Let r = 2(−γ2 + 1)/γ1. Then

υ(td) ≥ υ(0)− 〈v, td〉 − r∆(td), ∀d ∈ ξ−1(brdy B),

which implies

υ(y)− υ(0) ≥ 〈v, y〉 − r∆(y), ∀y ∈ εξ−1(B).

Therefore (3.6) is valid for U = A−1(εξ−1(B)).

3.2. An example

We consider a specific variable transformation defined by

A(u) = (µ1(u), . . . , µm(u))T , µi (u) = sgn(ui )|ui |

λi , λi ∈ (0,∞). (3.13)

Obviously, it is easy to check that A is invertible with

A−1(y) = (µ−11 (y), . . . , µ−1

m (y))T , µ−1i (y) = sgn(yi )|yi |

1/λi . (3.14)

A and A−1 are continuous and Assumption 2.1 holds at u = 0 with

θi (t) = t1/λi , ξi (d) = sgn(di )|di |1/λi .

We use 1/λ to denote the vector (1/λ1, . . . , 1/λm)T and use d1/λ to denote dθ

−1. Then, for a function g : Rm

−→ R,

d1/λg(u)(w) = lim infw′→w,t0

g(u + Θ(t)w′)− g(u)

twith Θ(t) = diag (t1/λ1 , . . . , t1/λm ).

In this case we have

∂Ag(0) =

v | d1/λg(0)(w) ≥

m∑i=1

sgn(wi )|wi |λi vi ,∀w

.

Page 16: An augmented Lagrangian approach with a variable transformation in nonlinear programming

2110 L. Zhang, X. Yang / Nonlinear Analysis 69 (2008) 2095–2113

For the case λ1 = · · · = λm = λ, one has

d1/λg(0)(w) = lim infw′→w,τ0

g(τw′)− g(0)

τ 1/λ ,

which coincides with d pg(0;w) in [13] and [14], where it was used to study strict local minima of order p.

Example 3.3. Consider a minimization problem of a non-Lipschitz function as below:

minu

g(u) = max√

|u1|, 3 3√

u2,−4 5√

u3.

Obviously u = (0, 0, 0)T is a local minimizer. Let

A(u) = (sgn(u1)|u1|1/2, u1/3

2 , u1/53 )T .

Then

A−1(y) = (sgn(y1)y21 , y3

2 , y53)

T

and

g(y) = max[y1]+, 3y2,−4y3.

It is easy to obtain

dg(0)(d) = max[d1]+, 3d2,−4d3.

We can verify Assumption 2.1 for g at 0 with

θ(t) = (t2, t3, t5)T , ξ(d) = (sgn(d1)|d1|2, d3

2 , d53 )

T , λ = (1/2, 1/3, 1/5)T .

It follows from Proposition 2.4 that the necessary condition for (0, 0, 0)T being a minimizer is

d1/λg(0)(w) ≥ 0, ∀w ∈ R3,

namely (from Proposition 2.2)

max√

|w1|, 3 3√w2,−4 5

√w3 ≥ 0, ∀w ∈ R3.

Now we discuss the application of the transformation (3.13) to Problem (3.1). Firstly, we express Problem (3.1)equivalently as

minimize f0(x)subject to f1(x) ≤ 0,

x ∈ X,(3.15)

where f1(x) = max0, g1(x), . . . , gm1(x), |gm1+1(x)|, . . . , |gm(x)|.The conventional dualizing parameterization is

φ(x, u) = f0(x)+ δX (x)+ δ(x | x ′| f1(x

′)+ u ≤ 0)

and the corresponding perturbation function for (3.15) is

β(u) = inf f0(x)+ δX (x) | f1(x)+ u ≤ 0.

Define A : R −→ R by

A(u) = sgn(u)|u|λ,

where λ ∈ (0,∞). It is obvious that A is invertible and

A−1(y) = sgn(y)|y|1/λ,

and both A and A−1 are continuous at every u ∈ R and y ∈ R.

Page 17: An augmented Lagrangian approach with a variable transformation in nonlinear programming

L. Zhang, X. Yang / Nonlinear Analysis 69 (2008) 2095–2113 2111

Let ∆ be l1 norm function, namely

∆(y) = |y|1 = |y|, y ∈ R.

According to the scheme in Section 3.2, we have

ψ(x, y) = φ(x, A−1(y)) = f0(x)+ δX (x)+ δ(x | x ′| f1(x

′)+ sgn(y)|y|1/λ

≤ 0),

and the perturbation in turn is

υ(y) = β(sgn(y)|y|1/λ).

The A-augmented Lagrangian associated with φ is

lA(φ, x, v, r) = f0(x)+ δX (x)+ v[sgn( f1(x))| f1(x)|λ]+ + r | f1(x)|

λ, |v| ≤ r,

which actually is

lA(φ, x, v, r) = f0(x)+ δX (x)+ v f1(x)λ

+ r f1(x)λ, |v| ≤ r. (3.16)

For v = 0, we obtain

lA(φ, x, 0, r) = f0(x)+ δX (x)+ r [max0, g1(x), . . . , gm1(x), |gm1+1(x)|, . . . , |gm(x)|]λ, (3.17)

which is just the penalty function in [8], also the λ-th root of the exact penalty function introduced by [9].Now we discuss the exact penalty property of lA(φ, x, 0, r). It follows from Theorem 3.3 that function lA(φ, x, 0, r)

is an exact penalty function, or 0 supports an exact penalty representation, if and only if there is r > 0 such that

β(u) ≥ β(0)− r∆(A(u)), for sufficiently small |u|,

namely

β(u) ≥ β(0)− r |u|λ, for sufficiently small |u|,

which is equivalent to

lim infu→0

β(u)− β(0)|u|λ

> −∞. (3.18)

Remark 3.2. (i) Condition (3.18) is just the one given in Theorem 7.1 of [9].(ii) Suppose that

lim‖x‖→+∞,x∈X

max f0(x), g1(x), . . . , gm1(x), |gm1+1(x)|, . . . , |gm(x)| = +∞,

then it follows from the proof of Theorem 4.1 of [5] that φ(x, u) is level-bounded in x locally uniform in u. And(3.18) is the sufficient and necessary condition for lA(φ, x, 0, r) being an exact penalty function of Problem (3.1).

(iii) If we set in (3.15)

f1(x) =

m1∑j=1

[g j (x)]+ +

m∑j=m1+1

|g j (x)|,

then we obtain

lA(φ, x, 0, r) = f0(x)+ δX (x)+ r

[m1∑j=1

[g j (x)]+ +

m∑j=m1+1

|g j (x)|

]λ,

which was considered in [7]. And it is easy to check, by using Theorem 3.2, (3.18) is the sufficient and necessarycondition for 0 supporting an exact penalty function. This result was presented in Theorem 4.5 of [5].

(iv) Condition (3.18) was connected to the concept of γ -rank uniformly weakly stability by [4] in which nonlinearLagrangian for multiobjective optimization was studied.

Page 18: An augmented Lagrangian approach with a variable transformation in nonlinear programming

2112 L. Zhang, X. Yang / Nonlinear Analysis 69 (2008) 2095–2113

Now we deal with Problem (3.1) by using A defined as in (3.13) directly. Let φ be the conventional dualizingparameterization

φ(x, u) = f0(x)+ δX (x)+ δ(x | x ′| G(x ′)+ u ∈ Rm1

− × 0m−m1),

and the corresponding perturbation function for (3.1) is β(u) = infx φ(x, u). ψ(x, y) is determined by

ψ(x, y) = f0(x)+ δX (x)+ δ(x | x ′| G(x ′)+ A−1(y) ∈ Rm1

− × 0m−m1),

which is produced by a nonlinear perturbation, and obviously the corresponding perturbation function υ(y) = β

(A−1 y). If choosing ∆(·) = ‖ · ‖1, we can obtain the A-augmented Lagrangian associated with φ:

lA(φ, x, v, r) = f0(x)+ δX (x)+

m1∑i=1

[vi [sgn(gi (x))|gi (x)|λ]]+ + r [sgn(gi (x))|gi (x)|

λ]+

+

m∑j=m1+1

[v j sgn(g j (x))|g j (x)|λ

+ r |g j (x)|λ], |vi | ≤ r, i = 1, . . . ,m1,

which just is

lA(φ, x, v, r) = f0(x)+ δX (x)+

m1∑i=1

[vi [gi (x)]λ+ + r [gi (x)]

λ+]

+

m∑j=m1+1

[v j sgn(g j (x))|g j (x)|λ

+ r |g j (x)|λ], |vi | ≤ r, i = 1, . . . ,m1.

For v = 0, we obtain

lA(φ, x, 0, r) = f0(x)+ δX (x)+ rm1∑i=1

[gi (x)]λ+ + r

m∑j=m1+1

|g j (x)|λ, (3.19)

where v ∈ Rm satisfies |vi | ≤ r, i = 1, . . . ,m1.

Proposition 3.2 (Theorem 4.6 of [5]). Assume that the feasible set of Problem (3.1) is nonempty. Then functionlA(φ, x, 0, r) defined by (3.18) is an exact penalty function if and only if

lim inf‖u‖→0

β(u)− β(0)m∑

j=1|u j |

λ

> −∞. (3.20)

Proof. From Theorem 3.2, vector 0 supports an exact penalty if and only if there exist U ∈ N (0) and r > 0 such that

β(u) ≥ β(0)+ 〈v, A(u)〉 − r∆(A(u)), for all u ∈ U.

Noting ∆(A(u)) =∑m

j=1 |u j |λ, we have the above inequality is equivalent to (3.20).

References

[1] D.P. Bertsekas, Constrained Optimization and Multiplier Methods, Academic Press, 1982.[2] D.P. Bertsekas, Nonlinear Programming, Athena Scientific, 1999.[3] J.F. Bonnans, A. Shapiro, Perturbation Analysis of Optimization Problems, Springer-Verlag, New York, 2000.[4] X.X. Huang, X.Q. Yang, Nonlinear Lagrangian for multiobjective optimization and applications to duality and exact penalization, SIAM J.

Optim. 13 (2002) 675–692.[5] X.X. Huang, X.Q. Yang, A unified augmented Lagrangian approach to duality and exact penalization, Math. Oper. Res. 28 (3) (2003) 533–552.[6] A. Ioffe, Necessary and sufficient conditions for a local minimum 3: Second-order conditions and augmented duality, SIAM J. Control Optim.

17 (1979) 266–288.[7] Z.Q. Luo, J.S. Pang, D. Ralph, Mathematical Programs with Equilibrium Constraints, Cambridge University Press, New York, 1996.

Page 19: An augmented Lagrangian approach with a variable transformation in nonlinear programming

L. Zhang, X. Yang / Nonlinear Analysis 69 (2008) 2095–2113 2113

[8] J.S. Pang, Error bounds in mathematical programming, Math. Program. 79 (1997) 299–332.[9] A.M. Rubinov, B.M. Glover, X.Q. Yang, Decreasing functions with applications to penalization, SIAM J. Optim. 10 (1) (1999) 289–313.

[10] R.T. Rockafellar, Augmented Lagrange multiplier functions and duality in nonconvex programming, SIAM J. Control 12 (2) (1974) 68–285.[11] R.T. Rockafellar, Lagrange multipliers and optimality, SIAM Rev. 35 (1993) 183–238.[12] R.T. Rockafellar, R.J.-B. Wets, Variational Analysis, Springer-Verlag, New York, 1998.[13] M. Studniarski, Necessary and sufficient conditions for isolated local minima of nonsmooth functions, SIAM J. Control Optim. 24 (1986)

1044–1049.[14] D.E. Ward, Characterizations of strict local minima and necessary conditions for weak sharp minima, J. Optim. Theory Appl. 80 (1994)

551–571.[15] X.Q. Yang, X.X. Huang, A nonlinear Lagrangian approach to constrained optimization problems, SIAM J. Optim. 11 (2001) 1119–1144.