22
BILEVEL POLYNOMIAL PROGRAMS AND SEMIDEFINITE RELAXATION METHODS JIAWANG NIE, LI WANG, AND JANE J. YE Abstract. A bilevel program is an optimization problem whose constraints involve another optimization problem. This paper studies bilevel polynomial programs (BPPs), i.e., all the functions are polynomials. We reformulate BPPs equivalently as semi-infinite polynomial programs (SIPPs), using Fritz John conditions and Jacobian representations. Combining the exchange technique and Lasserre type semidefinite relaxations, we propose numerical methods for solving both simple and general BPPs. For simple BPPs, we prove the con- vergence to global optimal solutions. Numerical experiments are presented to show the efficiency of proposed algorithms. 1. Introduction We consider the bilevel polynomial program (BPP): (1.1) (P ): F * := min xR n ,yR p F (x, y) s.t. G i (x, y) 0,i =1, ··· ,m 1 , y S(x), where F and all G i are real polynomials in (x, y), and S(x) is the set of global minimizers of the optimization parameterized by x: (1.2) P (x): min zR p f (x, z) s.t. g j (x, z) 0,j =1, ··· ,m 2 . In (1.2), f and each g j are polynomials in (x, z). For convenience, denote by Z (x) the feasible set of (1.2). The inequalities G i (x, y) 0 are called upper level constraints, while g j (x, z) 0 are called lower level constraints. When m 1 =0 (resp., m 2 = 0), there are no upper (resp., lower) level constraints. Accordingly, (P ) is called the upper level optimization, while (1.2) is called the lower level optimization. Sometimes, (1.1) is called the outer optimization and (1.2) is called the inner optimization. Similarly, F (x, y) is the upper level (or outer) objective, and f (x, z) is the lower level (or inner) objective. Denote the set of all (x, y) that are feasible for both upper and lower level optimization (1.3) U := (x, y): G i (x, y) 0(i =1, ··· m 1 ),y Z (x) . When the feasible set Z (x) Z is independent of x, the problem (P ) is called a simple bilevel polynomial program (SBPP). The SBPP is not “simple” mathemati- cally. It is actually quite difficult to solve. SBPPs have important applications in economics, e.g., the moral hazard model of the principal-agent problem [18]. When 2010 Mathematics Subject Classification. 65K05, 90C22, 90C26, 90C34. Key words and phrases. bilevel polynomial program, Lasserre relaxation, Fritz John condition, Jacobian representation, exchange method, semi-infinite programming. 1 arXiv:1508.06985v1 [math.OC] 27 Aug 2015

arXiv:1508.06985v1 [math.OC] 27 Aug 2015

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

BILEVEL POLYNOMIAL PROGRAMS AND SEMIDEFINITE

RELAXATION METHODS

JIAWANG NIE, LI WANG, AND JANE J. YE

Abstract. A bilevel program is an optimization problem whose constraints

involve another optimization problem. This paper studies bilevel polynomialprograms (BPPs), i.e., all the functions are polynomials. We reformulate BPPs

equivalently as semi-infinite polynomial programs (SIPPs), using Fritz John

conditions and Jacobian representations. Combining the exchange techniqueand Lasserre type semidefinite relaxations, we propose numerical methods for

solving both simple and general BPPs. For simple BPPs, we prove the con-

vergence to global optimal solutions. Numerical experiments are presented toshow the efficiency of proposed algorithms.

1. Introduction

We consider the bilevel polynomial program (BPP):

(1.1) (P ) :

F ∗ := min

x∈Rn,y∈RpF (x, y)

s.t. Gi(x, y) ≥ 0, i = 1, · · · ,m1,

y ∈ S(x),

where F and all Gi are real polynomials in (x, y), and S(x) is the set of globalminimizers of the optimization parameterized by x:

(1.2) P (x) : minz∈Rp

f(x, z) s.t. gj(x, z) ≥ 0, j = 1, · · · ,m2.

In (1.2), f and each gj are polynomials in (x, z). For convenience, denote byZ(x) the feasible set of (1.2). The inequalities Gi(x, y) ≥ 0 are called upper levelconstraints, while gj(x, z) ≥ 0 are called lower level constraints. When m1 = 0(resp., m2 = 0), there are no upper (resp., lower) level constraints. Accordingly,(P ) is called the upper level optimization, while (1.2) is called the lower leveloptimization. Sometimes, (1.1) is called the outer optimization and (1.2) is calledthe inner optimization. Similarly, F (x, y) is the upper level (or outer) objective,and f(x, z) is the lower level (or inner) objective. Denote the set of all (x, y) thatare feasible for both upper and lower level optimization

(1.3) U :=

(x, y) : Gi(x, y) ≥ 0 (i = 1, · · ·m1), y ∈ Z(x).

When the feasible set Z(x) ≡ Z is independent of x, the problem (P ) is called asimple bilevel polynomial program (SBPP). The SBPP is not “simple” mathemati-cally. It is actually quite difficult to solve. SBPPs have important applications ineconomics, e.g., the moral hazard model of the principal-agent problem [18]. When

2010 Mathematics Subject Classification. 65K05, 90C22, 90C26, 90C34.Key words and phrases. bilevel polynomial program, Lasserre relaxation, Fritz John condition,

Jacobian representation, exchange method, semi-infinite programming.

1

arX

iv:1

508.

0698

5v1

[m

ath.

OC

] 2

7 A

ug 2

015

Page 2: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

2 JIAWANG NIE, LI WANG, AND JANE J. YE

Z(x) depends on x, the problem (P ) is called a general bilevel polynomial program(GBPP). GBPP is also an effective modelling tool for many applications in variousfields; see e.g. [6, 8] and the references within.

1.1. Background. The bilevel program is a class of difficult optimization prob-lems. Even for the case where all the functions are linear, the problem is NP-hard[3]. A general approach for solving bilevel programs is to transform them intosingle level optimization problems. A commonly used technique is to replace thelower level optimization by its Kurash-Kuhn-Tucker (KKT) conditions. When thelower level program has only inequality constraints, the reduced problem becomesa so-called mathematical program with equilibrium constraints (MPEC) [17, 26]. Ifthe lower level optimization is nonconvex, the optimal solution of a bilevel programmay not even be a stationary point of the reduced single level optimization by usingthe KKT conditions. This was shown by a counter example due to Mirrlees [18].Moreover, even if the lower level optimization is convex, the optimal solution of abilevel program may not be found by solving the corresponding MPEC [7].

An alternative approach for solving BPPs is to use the value function [25, 34],which gives an equivalent reformulation. However, the optimal solution of thebilevel optimization may not be a stationary point of the value function reformu-lation. To overcome this difficulty, there exists work of combining KKT conditionsand value function reformulations (cf. [35]). Over the past two decades, manynumerical algorithms were proposed for solving bilevel programs. However, mostof them assume that the lower level optimization is convex, with few exceptions[20, 25]. Recently, the smoothing techniques were used to find stationary pointof the valued function or the combined reformulation of simple bilevel programs[15, 30, 31, 32].

In general, it is quite difficult to find global optimizers of nonconvex optimizationproblems. However, when the functions are polynomials, there exists much work oncomputing global optimizers, by using Lasserre type semidefinite relaxations [12].We refer to [13, 14] for recent work in this area. Recently, Jeyakumar, Lasserre, Liand Pham [11] worked on simple bilevel polynomial programs. When the lower leveloptimization (1.2) is convex, they transformed (1.1) into a single level polynomialprogram, by using KKT or Fritz John conditions and Lagrange multipliers. When(1.2) is nonconvex, they propose to solve (1.1) by a sequence of approximate poly-nomial optimization problems. They proved convergence to global solutions undersome general conditions. The work [11] is very interesting and inspiring, becausepolynomial optimization techniques are proposed to solve BPPs. In this paper,we also use Lasserre type semidefinite relaxations to solve BPPs, but we makedifferent reformulations of polynomial optimization problems, by using Jacobianrepresentations and the exchange technique in semi-infinite programming.

1.2. From BPP to SIPP. A bilevel optimization problem can be reformulatedas a semi-infinite program (SIP). Thus, the classical methods (e.g., the exchangemethod) for SIPs can be applied to solve bilevel programs. For convenience ofdescription, at the moment, we consider SBPPs, i.e., the feasible set Z(x) ≡ Z in(1.2) is independent of x. Clearly, y ∈ S(x) if and only if y ∈ Z and

(1.4) H(x, y, z) := f(x, z)− f(x, y) ≥ 0, ∀ z ∈ Z.

Page 3: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

BILEVEL POLYNOMIAL PROGRAMS 3

Hence, (P ) is equivalent to the problem:

(1.5) (P ) :

F ∗ := min

x∈Rn, y∈RpF (x, y)

s.t. Gi(x, y) ≥ 0, i = 1, . . . ,m1,H(x, y, z) ≥ 0, ∀ z ∈ Z.

The problem (P ) is a semi-infinite polynomial program (SIPP), if the set Z isinfinite. We refer to [33] for SIPPs. Let v(x) denote the value function:

(1.6) v(x) := infz∈Z(x)

f(x, z).

If (x, y) is feasible for (P ), then v(x) = minz∈Z(x)

f(x, z) and

minz∈Z

H(x, y, z) = v(x)− f(x, y) ≥ 0.

For all y ∈ Z, we always have v(x)− f(x, y) ≤ 0. Thus,

(1.7) y ∈ S(x) ⇐⇒ y ∈ Z, minz∈Z

H(x, y, z) = 0.

A classical method for solving SIPPs is the so-called exchange method ([21, 33]).

Suppose Zk is a finite grid of Z. Replacing Z by Zk in (P ), we get:

(1.8) (Pk) :

F ∗k := min

x∈Rn, y∈RpF (x, y)

s.t. Gi(x, y) ≥ 0, i = 1, . . . ,m1,

H(x, y, z) ≥ 0, ∀ z ∈ Zk.

The feasible set of (Pk) is larger than that of (P ). Hence,

F ∗k ≤ F ∗.

Since Zk is a finite set, (Pk) is a polynomial optimization problem. If, for some Zk,

we can get an optimizer (xk, yk) of (Pk) such that

(1.9) v(xk)− f(xk, yk) ≥ 0,

then yk ∈ S(xk) and (xk, yk) is feasible for (P ). In such case, (xk, yk) must be a

global optimizer of (P ). Otherwise if (1.9) fails to hold, then there exists zk ∈ Zsuch that

f(xk, zk)− f(xk, yk) < 0.

For such a case, we can construct the new grid set

Zk+1 := Zk ∪ zk,

and then solve the new problem (Pk+1) with the grid set Zk+1. Repeating this

process, we can get an algorithm for solving (P ) approximately.How does the above approach work in computational practice? Does it converge

to global optimizers? First, we can see that F ∗k is a sequence of monotonicallyincreasing lower bounds for the global optimal value F ∗, i.e.,

F ∗1 ≤ · · · ≤ F ∗k ≤ F ∗k+1 ≤ · · · ≤ F ∗.By a standard analysis for SIP (cf. [21]), one can expect the convergence F ∗k → F ∗,under some conditions. However, we would like to point out that the above exchangeprocess typically converges very slowly for solving BPPs. A major reason is that

the feasible set of (Pk) is much bigger than that of (P ). Indeed, the dimension of

Page 4: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

4 JIAWANG NIE, LI WANG, AND JANE J. YE

the feasible set of (Pk) is typically bigger than that of (P ). This is because, for

every feasible (x, y) in (P ), y must also satisfy optimality conditions for the lower

level optimization (1.2). In the meanwhile, the y in (Pk) does not satisfy such

optimality conditions. Typically, for (Pk) to approximate (P ) reasonably well, thegrid set Zk should be very big. In practice, the above standard exchange methodis not efficient for solving BPPs.

1.3. Contributions. In this paper, we propose efficient computational methodsfor solving BPPs. First, we transform a BPP into an equivalent semi-infinite poly-nomial program (SIPP), by using Fritz John conditions and Jacobian representa-tions. Then, we propose new algorithms for solving BPPs, by using the exchangetechnique and Lasserre type semidefinite relaxations.

For each (x, y) that is feasible for (1.1), y is a minimizer for the lower level op-timization (1.2). If some constraint qualification conditions are satisfied, the KKTconditions hold. If such qualification conditions fail to hold, the KKT conditionsare likely not satisfied. However, the Fritz John conditions always hold at y for

(1.2) (cf. [4, §3.3.5]). So, we can add the Fritz John conditions to (P ), while theproblem is not changed. A disadvantage of using Fritz John conditions is the us-age of multipliers, which need to be considered as new variables. Typically, usingmultipliers will make the polynomial optimization much harder to solve, because ofmuch more new variables. To overcome this difficulty, the technique in [22, Section2] can be applied to avoid the usage of multipliers. This technique is known asJacobian representations for optimality conditions.

The above observations motivate us to design efficient algorithms for solvingBPPs, by combining Fritz John conditions, Jacobian representations, Lasserre re-laxations, and the exchange technique. Our major results are as follows:

• Unlike some prior methods for solving BPPs, we do not assume the KKTconditions hold for the lower level optimization (1.2). Instead, we use theFritz John conditions. This is because the KKT conditions may fail tohold for BPPs, while the Fritz John conditions always hold. By usingJacobian representations, the usage of multipliers can be avoided. Thisgreatly improves the computational efficiency.• For simple bilevel polynomial programs, we propose an algorithm using Ja-

cobian representations, Lasserre relaxations and the exchange technique.Its convergence to global minimizers is also proved. The numerical experi-ments show that it is very efficient for solving SBPPs.• For general bilevel polynomial programs, we also propose a similar algo-

rithm, using Jacobian representations, Lasserre relaxations and the ex-change technique. The numerical experiments show that it also works wellfor some GBPPs, while it is not theoretically guaranteed to compute globaloptimizers.

The paper is organized as follows: In Section 2, we review some preliminaries inpolynomial optimization and Jacobian representations. In Section 3, we propose amethod for solving simple bilevel polynomial programs and prove its convergence.In Section 4, we consider general bilevel polynomial programs and show how thealgorithm works. In Section 5, we present numerical experiments to demonstratethe efficiency of the proposed methods.

Page 5: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

BILEVEL POLYNOMIAL PROGRAMS 5

2. Preliminaries

Notation. The symbol N (resp., R ,C) denotes the set of nonnegative integers(resp., real numbers, complex numbers). For an integer n > 0, [n] denotes the set1, · · · , n. For x := (x1, . . . , xn) and α := (α1, . . . , αn), denote the monomial.

xα := xα11 · · ·xαn

n .

For a finite set T , |T | denotes its cardinality. The symbol R[x] := R[x1, · · · , xn]denotes the ring of polynomials in x := (x1, · · · , xn) with real coefficients whereasR[x]k denotes its subspace of polynomials of degree at most k. For a symmetricmatrix W , W 0( 0) means that W is positive semidefinite (definite). Fora vector u ∈ Rn, ‖u‖ denotes the standard Euclidean norm. The gradient of afunction f(x) is denoted as ∇f(x). If f(x, z) is a function in both x and z, then∇zf(x, z) denotes the gradient with respect to z. For an optimization problem,argmin denotes the set of its optimizers.

2.1. Polynomial optimization. An ideal I in R[x] is a subset of R[x] such thatI · R[x] ⊆ I and I + I ⊆ I. For a tuple p = (p1, . . . , pr) in R[x], I(p) denotes thesmallest ideal containing all pi, i.e.,

I(p) = p1 · R[x] + · · ·+ pr · R[x].

The kth truncation of the ideal I(p), denoted as Ik(p), is the set

p1 · R[x]k−deg(p1) + · · ·+ pm · R[x]k−deg(pr).

For the polynomial tuple p, denote its real zero set

V(p) := v ∈ Rn | p(v) = 0.

A polynomial σ ∈ R[x] is said to be a sum of squares (SOS) if σ = a21 + · · ·+ a2kfor some a1, . . . , ak ∈ R[x]. The set of all SOS polynomials in x is denoted as Σ[x].For a degree m, denote the truncation

Σ[x]m := Σ[x] ∩ R[x]m.

For a tuple q = (q1, . . . , qt), its quadratic module is the set

Q(q) := Σ[x] + q1 · Σ[x] + · · ·+ qt · Σ[x].

The k-th truncation of Q(q) is the set

Σ[x]2k + q1 · Σ[x]d1 + · · ·+ qt · Σ[x]dt

where each di = 2k − deg(qi). For the tuple q, denote the basic semialgebraic set

S(q) := v ∈ Rn | q(v) ≥ 0.

For the polynomial tuples p and q as above, if f ∈ I(p) + Q(g), then clearlyf ≥ 0 on the set V(p) ∩ S(q). However, the reverse is not necessarily true. Thesum I(p) +Q(q) is said to be archimedean if there exists b ∈ I(p) +Q(q) such thatS(b) = v ∈ Rn : b(v) ≥ 0 is compact set in Rn. When I(p)+Q(q) is archimedean,Putinar [27] proved that if a polynomial f > 0 on V(p)∩S(q), then f ∈ I(p)+Q(q).When f is only nonnegative (but not strictly positive) on V(p)∩S(q), we still havef ∈ I(p) +Q(q), under some general conditions (cf. [24]).

Page 6: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

6 JIAWANG NIE, LI WANG, AND JANE J. YE

Now, we review Lasserre type semidefinite relaxations in polynomial optimiza-tion. More details can be found in [12, 13, 14]. Consider the general polynomialoptimization roblem:

(2.1)

fmin := min

x∈Rnf(x)

s.t. p(x) = 0, q(x) ≥ 0,

where f ∈ R[x] and p, q are tuples of polynomials. The feasible set of (2.1) is pre-cisely the intersection V(p)∩S(q). The standard Lasserre’s hierarchy of semidefiniterelaxations for solving (2.1) is (k = 1, 2, . . .):

(2.2)

fk := max γ

s.t. f − γ ∈ I2k(p) +Qk(q).

When the set I(p) +Q(q) is archimedean, Lasserre proved the convergence

fk → fmin.

Indeed, the sequence fk often has finite convergence to fmin, under some generalconditions (cf. [24]). Moreover, we can also get global minimizers of (2.1) by usingthe flat extension or flat truncation condition (cf. [23]). The optimization (2.2) canbe solved as a semidefinite program, so it can be solved by standard SDP packages(e.g., SeDuMi [28], SDPT3 [29]). A convenient and efficient software for usingLasserre relaxations is GloptiPoly 3 [10].

2.2. Jacobian representations. We consider the polynomial optimization prob-lem that is similar to the lower level optimization (1.2):

(2.3) minz∈Rp

f(z) s.t. g1(z) ≥ 0, . . . , gm(z) ≥ 0,

where f, g1, . . . , gm ∈ R[z] := R[z1, . . . , zp]. Let Z be the feasible set of (2.3). Forz ∈ Z, let J(z) denote the index set of active constraining functions at z.

Suppose z∗ is an optimizer of (2.3). By the Fritz John condition (cf. [4, §3.3.5]),there exists 0 6= (µ0, µ1, . . . , µm) such that

(2.4) µ0∇f(z∗)−m∑i=1

µi∇gi(z∗) = 0, µigi(z∗) = 0 (i ∈ [m]).

A point like z∗ satisfying (2.4) is called a Fritz John point. If we only consideractive constraints, the above is then reduced to

(2.5) µ0∇f(z∗)−∑

i∈J(z∗)

µi∇gi(z∗) = 0.

The condition (2.4) uses multipliers µ0, . . . , µm, which are often not known in ad-vance. If we consider them as new variables, then it would increase the number ofvariables significantly. For the index set J = i1, . . . , ik, denote the matrix

B[J, z] :=[∇f(z) ∇gi1(z) · · · ∇gik(z)

].

Then condition (2.5) means that the matrix B[J(z∗), z∗] is rank deficient, i.e.,

rankB[J(z∗), z∗] ≤ |J(z∗)|.The matrix B[J(z∗), z∗] depends on the active set J(z∗), which is typically unknownin advance.

The technique in [22, §2] can be applied to get explicit equations for Fritz Johnpoints, without using multipliers µi. For a subset J = i1, . . . , ik ⊆ [m] with

Page 7: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

BILEVEL POLYNOMIAL PROGRAMS 7

cardinality |J | ≤ minm, p − 1, write its complement as Jc := [m]\J. List all the(k + 1)× (k + 1) minors of the matrix B[J, z] as

ηJ1 , . . . , ηJ` .

Consider the products of such minors with constraining polynomials:

(2.6) ηJ1 ·(

Πj∈Jc

gj

), . . . , ηJ` ·

(Πj∈Jc

gj).

They are all polynomials in z. For convenience of notation, for all possibilities ofJ ⊆ [m] with |J | ≤ minm, p− 1, list all the polynomials obtained in (2.6) as

(2.7) ψ1, . . . , ψL.

We point out that the Fritz John points can be characterized by using the poly-nomials ψ1, . . . , ψL. Define the set of all Fritz John points as

(2.8) KFJ :=

z ∈ Rp∣∣∣∣∣∣∃(µ0, µ1, . . . , µm) 6= 0, µigi(z) = 0 (i ∈ [m]),

µ0∇f(z)−m∑i=1

µi∇gi(z) = 0.

.

Let W be the set of real zeros of polynomials ψj(z), i.e.,

(2.9) W = z ∈ Rp | ψ1(z) = · · · = ψL(z) = 0.

The set W can be defined by a smaller number of equations. We refer to [22, §2].It is interesting to note that the sets KFJ and W are equal.

Lemma 2.1. For KFJ ,W as in (2.8)-(2.9), it holds that KFJ = W .

Proof. First, we prove that W ⊆ KFJ . Choose an arbitrary u ∈W , and let J(u) bethe active set at u. If |J(u)| ≥ p, then the gradients ∇f(u) and ∇gj(u) (j ∈ J(u))must be linearly dependent, so u ∈ KFJ . Next, we suppose |J(u)| < p. Notethat gj(u) > 0 for all j ∈ J(u)c. By the construction, some of ψ1, . . . , ψL are thepolynomials as in (2.6)

ηJ(u)k ·

j∈J(u)cgj

).

Thus, ψ(u) = 0 implies that all the polynomials ηJ(u)k vanish at u. By their defini-

tion, we know the matrix B[J(u), u] does not have full column rank. This meansthat u ∈ KFJ .

Second, we show that KFJ ⊆W . Choose an arbitrary u ∈ KFJ .

• Case I: J(u) = ∅. Then ∇f(u) = 0. The first column of matrices B[J, z] iszero, so all ηJi and ψj vanishes at u and hence u ∈W .• Case II: J(u) 6= ∅. Let I ⊆ [m] be an arbitrary index set with |I| ≤

minm, p−1. If J(u) 6⊆ I, then at least one j ∈ Ic belongs to J(u). Thus,at least one j ∈ Ic satisfies gj(u) = 0, so all the polynomials

ηIk ·(

Πj∈Ic

gj

).

vanish at u. If J(u) ⊆ I, then µigi(u) = 0 implies that µi = 0 for all i ∈ Ic.By definition of KFJ , the matrix B[I, z] does not have full column rank.So, the minors ηIi of B[I, z] vanish at u. By the construction of ψi, we knowall ψi vanish at u, so u ∈W .

The proof is completed by combining the above two cases.

Page 8: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

8 JIAWANG NIE, LI WANG, AND JANE J. YE

3. Simple bilevel polynomial programs

In this section, we study simple bilevel polynomial programs (SBPPs) and presentan algorithm for computing global optimizers. For SBPPs as in (1.1), the feasibleset Z(x) for the lower level optimization (1.2) is independent of x. Assume Z(x) isconstantly the semialgebraic set

(3.1) Z := z ∈ Rp | g1(z) ≥ 0, . . . , gm2(z) ≥ 0,

for given polynomials g1, . . . , gm2in z := (z1, . . . , zp). For each pair (x, y) that is

feasible in (1.1), y is an optimizer for (1.2) which now becomes

(3.2) minz∈Rp

f(x, z) s.t. g1(z) ≥ 0, . . . , gm2(z) ≥ 0.

Note that the inner objective f still depends on x. So, y must be a Fritz John pointof (3.2), i.e., there exists 0 6= (µ0, µ1, . . . , µm2

) satisfying

µ0∇zf(x, y)−∑j∈[m2]

µj∇zgj(y) = 0, µjgj(y) = 0 (j ∈ [m2]).

Let KFJ(x) denote the set of all Fritz John points of (3.2). The set KFJ(x) canbe characterized by Jacobian representations. Let ψ1, . . . , ψL be the polynomialsconstructed as in (2.7). Note that each ψj is now a polynomial in (x, z), becausethe objective of (3.2) depends on x. Thus, each (x, y) feasible for (1.1) satisfies

ψ1(x, y) = · · · = ψL(x, y) = 0.

For convenience of notation, denote the polynomial tuples

(3.3) ξ :=(G1, . . . , Gm1 , g1, . . . , gm2

), ψ =

(ψ1, . . . , ψL

).

Then, the simple BPP as in (1.1) is equivalent to the semi-infinite polynomialprogram (H(x, y, z) is defined as in (1.4)):

(3.4)

F ∗ := min

x∈Rn,y∈RpF (x, y)

s.t. ψ(x, y) = 0, ξ(x, y) ≥ 0,H(x, y, z) ≥ 0, ∀ z ∈ Z.

3.1. A semidefinite algorithm for SBPP. We have seen that the SBPP (1.1)is equivalent to (3.4), which is an SIPP. So, we can apply the exchanged method tosolve it. The basic idea of “exchange” is that we replace Z by a finite grid set Zkin (3.4), and then solve it for a global minimizer (xk, yk) by Lasserre relaxations.If (xk, yk) is feasible for (1.1), we stop; otherwise, we compute global minimizersof H(xk, yk, z) and add them to Zk. Repeat this process until the convergence ismet. This leads to the following algorithm.

Algorithm 3.1. (A Semidefinite Relaxation Algorithm for SBPP.)

Input: Polynomial F , f , G1, . . . , Gm1, g1, . . . , gm2

for SBPP as in (1.1), a toleranceparameter ε ≥ 0, and a maximum iteration number kmax.

Output: The global optimal value F ∗ and a set X ∗ of global minimizers of (1.1),up to the tolerance ε.

Step 1 Let Z0 = ∅, X ∗ = ∅ and k = 0.

Page 9: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

BILEVEL POLYNOMIAL PROGRAMS 9

Step 2 Apply Lasserre relaxations to solve

(3.5) (Pk) :

F ∗k := min

x,yF (x, y)

s.t. ψ(x, y) = 0, ξ(x, y) ≥ 0,H(x, y, z) ≥ 0, ∀ z ∈ Zk,

and get the set Sk = (xk1 , yk1 ), · · · , (xkrk , ykrk

) of its global minimizers.Step 3 For each i = 1, · · · , rk, do the following:

(a) Apply Lasserre relaxations to solve

(3.6) (Qki ) :

vki := min

zH(xki , y

ki , z).

s.t. ψ(xki , z) = 0g1(z) ≥ 0, . . . , gm2(z) ≥ 0,

and get the set T ki =zki,j : j = 1, · · · , tki

of its global minimizers.

(b) If vki ≥ −ε, then update X ∗ := X ∗ ∪ (xki , yki ).Step 4 If X ∗ 6= ∅ or k > kmax, stop; otherwise, update

(3.7) Zk+1 := Zk ∪ T k1 ∪ · · · ∪ T krk .Let k := k + 1 and go to Step 2.

In the above, the subproblems (3.5) and (3.6) can be solved by Lasserrese relax-ations as in (2.2). If ε = 0 and the algorithm terminates with k < kmax, we getexact global minimizers of (1.1); otherwise, we only get global minimizers, up tosome numerical errors.

3.2. Two features of the algorithm. As in the introduction, we do not applythe exchange method directly to (1.5), but instead to (3.4). Both (1.5) and (3.4)are SIPPs that are equivalent to the SBPP (1.1). As the numerical experimentsshow, the SIPP (3.4) is much easier to be solved by the exchange method. This isbecause, the Fritz John condition ψ(x, y) = 0 in (3.4) makes it much easier for (3.5)to approximate (3.4) accurately. Typically, for a finite grid set Zk of Z, the feasiblesets of (3.4) and (3.5) have the same dimension. However, the feasible set of (1.5)has smaller dimension than that of (1.8). Thus, it is usually very difficult for (1.8)to approximate (1.5) accurately, by choosing a finite set Zk. In the contrast, it isoften much easier for (3.5) to approximate (3.4) accurately. We illustrate this factby the following example.

Example 3.2. ([19, Example 3.19]) Consider the SBPP:

(3.8)

min

x∈R,y∈RF (x, y) := xy − y + 1

2y2

s.t. 1− x2 ≥ 0, 1− y2 ≥ 0,y ∈ S(x) := argmin

1−z2≥0f(x, z) := −xz2 + 1

2z4.

Since f(x, z) = 12 (z2 − x)2 − 1

2x2, one can see that

S(x) =

0, x ∈ [−1, 0),

±√x, x ∈ [0, 1].

Therefore, the outer objective F (x, y) can be expressed as

F (x, y) =

0, x ∈ [−1, 0),

1

2x± (x− 1)

√x, x ∈ [0, 1].

Page 10: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

10 JIAWANG NIE, LI WANG, AND JANE J. YE

So, the optimal solution and the optimal value of (3.8) are (a =√13−16 ):

(x∗, y∗) = (a2, a) ≈ (0.1886, 0.4343), F ∗ =1

2a2 + a3 − a ≈ −0.2581.

If Algorithm 3.1 is applied without using the Jacobian condition ψ(x, y) = 0, thecomputational results are shown in Table 1. The problem (3.8) cannot be solvedreasonably well. In the contrast, if we apply Algorithm 3.1 with the Jacobiancondition ψ(x, y) = 0, then (3.8) is solved very well. The computational results areshown in Table 2. It takes only two steps for the algorithm to converge.

Table 1. Computational results without ψ(x, y) = 0

Step k (xki , y

ki ) zki,j F∗k vki

0 (-1,1) 4.098e-13 -1.5000 -1.50001 (0.1505, 0.5486) ±0.3879 -0.3156 -0.01132 (0.0752, 0.3879) ±0.2743 -0.2835 -0.00283 (0.2088, 0.5179) ±0.4569 -0.2754 -0.00184 cannot be solved ... ... ...

Table 2. Computational results with ψ(x, y) = 0

Step k (xki , y

ki ) zkij F∗k vki

0 (-1,1) 3.283e-21 -1.5000 -1.50001 (0.1886,0.4342) ±0.4342 -0.2581 -3.625e-12

For the lower level optimization (1.2), the KKT conditions may fail to hold atminimizers y. In such a case, the earlier methods which replace (1.2) by the KKTconditions, do not work at all. However, such problems can also be solved efficientlyby our method. The following are two examples.

Example 3.3. ([7, Example 2.4]) Consider the following SBPP:

(3.9) F ∗ := minx∈R,y∈R

(x− 1)2 + y2 s.t. y ∈ S(x) := argminz∈Z:=z∈R|z2≤0

x2z.

It is easy to see that the global minimizer of this problem is (x∗, y∗) = (1, 0). SetZ = 0 which is always convex. By bringing in new multiplier variable λ, we geta single level optimization problem: r∗ := min

x∈R,y∈R,λ∈R(x− 1)2 + y2

s.t. x2 + 2λy = 0, λ ≥ 0, y2 ≤ 0, λy2 = 0.

The feasible points of this problem are (0, 0, λ) with λ ≥ 0. We have r∗ = 1 > F ∗.The KKT reformulation approach fails in this example, since y∗ ∈ S(x∗) is not aKKT point. Now let us solve the SBPP problem (3.9) by using the method proposedin this paper. The redundant polynomial generated by Jacobian representation is:ψ(x, y) = x2y2 = 0, and we reformulate the problem as:

s∗ := minx∈R,y∈R

(x− 1)2 + y2

s.t. x2(z − y) ≥ 0, ∀z ∈ Z,ψ(x, y) = x2y2 = 0.

Page 11: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

BILEVEL POLYNOMIAL PROGRAMS 11

This problem is not an SIPP actually, since the set Z only has one feasible point.At initial step, we find its optimal solution (x∗0, y

∗0) = (1, 0), and it is easy to check

that minz∈Z

H(x∗0, y∗0 , z) = 0, which certifies that it is the global minimizer of SBPP

problem (3.9).

Example 3.4. Consider the SBPP:

(3.10)

min

x∈R, y∈R2F (x, y) := x+ y1 + y2

s.t. x− 2 ≥ 0, 3− x ≥ 0,y ∈ S(x) := argmin

z∈Zf(x, z) := x(z1 + z2),

where set Z is defined by the inequalities:

g1(z) := z21 − z22 − (z21 + z22)2 ≥ 0, g2(z) := z1 ≥ 0.

For all x ∈ [2, 3], one can check that S(x) = (0, 0). Clearly, the global minimizerof (3.10) is (x∗, y∗) = (2, 0, 0), and the optimal value F ∗ = 2. At z∗ = (0, 0),

∇zf(x, z∗) =

[xx

],∇zg1(z∗) =

[00

],∇zg2(z∗) =

[10

].

The KKT condition does not hold for the lower level optimization, since ∇zf(x, z∗)is not a linear combination of ∇zg1(z∗) and ∇zg2(z∗). By [24, Proposition 3.4],Lasserre relaxations in (2.2) do not have finite convergence for solving the lowerlevel optimization. One can easily check that

KFJ(x) = (0, 0), (0.8990, 0.2409)1,

for all feasible x. By Jacobian representation of KFJ(x), we get

ψ(x, z) =(xg1(z)g2(z), −xz1(z1 + z2 + 2(z2 − z1)(z21 + z22)), −xg1(z)

).

Apply Algorithm 3.1 to solve (3.10). Indeed, for k = 0, Z0 = ∅, we already get

(x0, y0) ≈ (2.0000, 0.0000, 0.0000),

which is the true global minimizer. We also get

z01 ≈ (4.632,−4.633)× 10−5, v01 ≈ −5.2510× 10−8.

For a small value of ε (e.g., 10−6), Algorithm 3.1 terminates successfully with theglobal minimizer of (3.10).

3.3. Convergence analysis. We study the convergence of Algorithm 3.1 for solv-ing SBPPs as in (1.1). For theoretical analysis, we are mostly interested in theperformance of Algorithm 3.1 when the parameters ε = 0 and kmax =∞.

Theorem 3.5. For the simple bilevel polynomial program as in (1.1), assume thelower level optimization is as in (3.2). At each step k of Algorithm 3.1, suppose thepolynomial optimization subproblems (Pk) and each (Qki ) are solved successfully byLasserre relaxations. Suppose ε = 0 and kmax =∞.

(i) If Algorithm 3.1 stops at Step k, then each (x∗, y∗) ∈ X ∗ is a global opti-mizer of (1.1).

1They are the solutions of the equations g1(z) = 0, z1 + z2 + 2(z2 − z1)(z21 + z22) = 0.

Page 12: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

12 JIAWANG NIE, LI WANG, AND JANE J. YE

(ii) Suppose Algorithm 3.1 does not stop and each Sk 6= ∅ is finite. Suppose theunion ∪k≥0Zk is bounded. Let (x∗, y∗) be an arbitrary accumulation pointof the set ∪k≥0Sk. If the value function v(x), as in (1.6), is continuous atx∗, then (x∗, y∗) is a global optimizer of SBPP problem (1.1).

Proof. (i) The simple bilevel program (1.1) is equivalent to (3.4). Note that eachoptimal value F ∗k ≤ F ∗ and the sequence F ∗k is monotonically increasing. IfAlgorithm 3.1 stops at Step k, then each (x∗, y∗) ∈ Sk is feasible for (3.4), and alsofeasible for (1.1), so it holds that

F ∗ ≥ F ∗k = F (x∗, y∗) ≥ F ∗.

This implies that (x∗, y∗) is a global optimizer of problem (1.1).(ii) Suppose Algorithm 3.1 does not stop and each Sk 6= ∅ is finite. For each

accumulation point (x∗, y∗) of the union ∪k≥0Sk, there exists a sequence k` ofintegers such that k` →∞ as `→∞ and

(xk` , yk`)→ (x∗, y∗), and each (xk` , yk`) ∈ Sk` .

We prove that (x∗, y∗) is also feasible for (3.4), and it is a global minimizer of (1.1).Define the function

φ(x, y) := minz∈Z

H(x, y, z).

Clearly, φ(x, y) = v(x) − f(x, y). Since v(x) is continuous at x∗, φ(x, y) is alsocontinuous at (x∗, y∗). Each (xk` , yk`) ∈ U (see (1.3)), so (x∗, y∗) ∈ U . By thedefinition of the value function v(x) as in (1.6), we have

φ(x∗, y∗) ≤ 0.

Next, we prove φ(x∗, y∗) ≥ 0. For all k′ and for all k` ≥ k′, the point (xk` , yk`) isfeasible for the subproblem (Pk′), so

H(xk` , yk` , z) ≥ 0 ∀z ∈ Zk′ .

Letting `→∞, we then get

(3.11) H(x∗, y∗, z) ≥ 0 ∀z ∈ Zk′ .

The above is true for all k′. In Algorithm 3.1, for each k`, there exists zk` ∈ T k`i ,for some i, such that

φ(xk, yk) = H(xk` , yk` , zk`).

Since zk` ∈ Zk`+1, by (3.11), we know

H(x∗, y∗, zk`) ≥ 0.

Therefore, it holds that

φ(x∗, y∗) = φ(xk` , yk`) + φ(x∗, y∗)− φ(xk` , yk`)≥ [H(xk` , yk` , zk`)−H(x∗, y∗, zk`)] + [φ(x∗, y∗)− φ(xk` , yk`)].

Since zk` belongs to the bounded set ∪k≥0Zk, we can generally assume zk` → z∗ ∈Z. The polynomial H(x, y, z) is continuous at (x∗, y∗, z∗), and the function φ(x, y)is continuous at (x∗, y∗). Letting ` → ∞, we get φ(x∗, y∗) ≥ 0. Thus, (x∗, y∗) isfeasible for (3.4), so it is a global minimizer of (1.1).

Page 13: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

BILEVEL POLYNOMIAL PROGRAMS 13

In Theorem 3.5(ii), the value function v(x) is assumed to be continuous at x∗.This can be satisfied under some conditions. The restricted inf-compactness (RIC)is a such one. The value function v(x) is said to have RIC at x∗ if v(x∗) is finite andthere exist a compact set Ω and a positive number ε0, such that for all ‖x−x∗‖ < ε0with v(x) < v(x∗) + ε0, there exists z ∈ S(x) ∩ Ω. For instance, if the set Z iscompact, or the lower level objective f(x∗, z) is weakly coercive in z with respectto set Z (i.e., lim

z∈Z,‖z‖→∞f(x∗, z) =∞) then v(x) has restricted inf-compactness at

x∗; see, e.g., [5, §6.5.1]. Note that the union ∪k≥0Zk is contained in Z. So, if Z iscompact then ∪k≥0Zk is bounded.

Proposition 3.6. For SBPP problem (1.1), assume the lower level optimizationis as in (3.2). If the value function v(x) has restricted inf-compactness at x∗, thenv(x) is continuous at x∗.

Proof. On one hand, since the lower level constraint is independent of x, the valuefunction v(x) is always upper semicontinuous [2, Theorem 4.22 (1)]. On the otherhand, since the restricted inf-compactness holds it follows from [5, page 246] (or seethe proof of [9, Theorem 3.9]) that v(x) is lower semicontinuous. Therefore v(x) iscontinuous at x∗.

4. General Bilevel Polynomial Programs

In this section, we study general bilevel polynomial programs as in (1.1). ForGBPPs, the feasible set Z(x) of the lower level optimization (1.2) varies as xchanges, i.e., the constraining polynomials gj(x, z) depends on x.

For each pair (x, y) that is feasible for (1.1), y is an optimizer for the lowerlevel optimization (1.2), so y must be a Fritz John point of (1.2), i.e., there exists0 6= (µ0, µ1, . . . , µm2

) satisfying

µ0∇zf(x, y)−∑j∈[m2]

µj∇zgj(x, y) = 0, µjgj(x, y) = 0 (j ∈ [m2]).

For convenience, we still use KFJ(x) to denote the set of Fritz John points of (1.2)at x. The set KFJ(x) consists of common zeros of some polynomials. As in (2.3),choose the polynomials (f(z), g1(z), . . . , gm(z)) to be (f(x, z), g1(x, z), . . . , gm2(x, z)),whose coefficients depend on x. Then, construct ψ1, . . . , ψL in the same way as in(2.7). Each ψj is also a polynomial in (x, z). Thus, every (x, y) feasible in (1.1)satisfies ψj(x, y) = 0, for all j. For convenience of notation, we still denote thepolynomial tuples ξ, ψ as in (3.3).

We have seen that (1.1) is equivalent to the semi-infinite polynomial program(H(x, y, z) is as in (1.4)):

(4.1)

F ∗ := min

x∈Rn, y∈RpF (x, y)

s.t. ψ(x, y) = 0, ξ(x, y) ≥ 0,H(x, y, z) ≥ 0, ∀ z ∈ Z(x).

Note that the constraint H(x, y, z) ≥ 0 in (4.1) is required for z ∈ Z(x), whichdepends on x. Algorithm 3.1 can also be applied to solve (4.1). We first give anexample for showing how it works.

Page 14: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

14 JIAWANG NIE, LI WANG, AND JANE J. YE

Example 4.1. ([19, Example 3.23]) Consider the GBPP:

(4.2)

minx,y∈[−1,1]

x2

s.t. 1 + x− 9x2 − y ≤ 0

y ∈ argminz∈[−1,1]

z

s.t. z2(x− 0.5) ≤ 0.

By simple calculations, one can show that

Z(x) =

0, x ∈ (0.5, 1],

[−1, 1], x ∈ [−1, 0.5],S(x) =

0, x ∈ (0.5, 1],

− 1, x ∈ [−1, 0.5].

Let U = (x, y) ∈ [−1, 1]2 : 1 + x− 9x2 − y ≤ 0. The feasible set of (4.2) is:

F :=((x, 0) : x ∈ (0.5, 1] ∪ (x,−1) : x ∈ [−1, 0.5]

)∩ U .

One can show that the global minimizer and the optimal values are

(x∗, y∗) =

(1−√

73

18,−1

)≈ (−0.4191,−1), F ∗ =

(1−√

73

18

)2

≈ 0.1757.

By the Jacobian representation of Fritz John points, we get the polynomial

ψ(x, y) = (x− 0.5)y2(y2 − 1).

We apply Algorithm 3.1 to solve (4.2). The computational results are reported inTable 3. As one can see, Algorithm 3.1 takes two steps to solve (4.2) successfully.

Table 3. Results of Algorithm 3.1 for solving (4.2).

Step k (xki , y

ki ) zki,j F∗k vki

0 (0.0000, 1.0000) -1.0000 0.0000 -2.0001 (-0.4191, -1.0000) -1.0000 0.1757 -2.4e-11

However, we would like to point out that Algorithm 3.1 may not be able to solveGBPPs globally. The following is such an example.

Example 4.2. ( [19, Example 5.2] ) Consider the GBPP:

(4.3)

minx,y

(x− 3)2 + (y − 2)2

s.t. 0 ≤ x ≤ 8, y ∈ S(x)

where S(x) is the set of minimizers of the optimization problemminz

(z − 5)2

s.t. 0 ≤ z ≤ 6, −2x+ z − 1 ≤ 0,x− 2z + 2 ≤ 0, x+ 2z − 14 ≤ 0.

It can be shown that

S(x) =

1 + 2x, x ∈ [0, 2],

5, x ∈ (2, 4],

7− x

2, x ∈ (4, 6].

Page 15: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

BILEVEL POLYNOMIAL PROGRAMS 15

The feasible set of (4.3) is thus the set

F := (x, y) | x ∈ [0, 6], y ∈ S(x).

It consists of three connected line segments. One can easily check that the globaloptimizer and the optimal values are

(x∗, y∗) = (1, 3), F ∗ = 5.

The polynomial ψ in the Jacobian representation is

ψ(x, y) = (−2x+ y − 1)(x− 2y + 2)(x+ 2y − 14)y(y − 6)(y − 5).

We apply Algorithm 3.1 to solve (4.3). The computational results are reported inTable 4. For a small ε, like 10−4, Algorithm 3.1 stops at k = 1, and return the

Table 4. Results of Algorithm 3.1 for solving (4.3).

Step k (xki , y

ki ) zki,j F∗k vki

0 (2.7996, 2.3998) 5.0021 0.2000 -6.76101 (2.9864, 5.0000) 5.0007 9.0101 -6.6e-6

point (2.9864, 5.0000), which is not a global minimizer. However, it is interesting tonote that the computed solution (2.9864, 5.0000) is a local optimizer of (4.3).

Why does Algorithm 3.1 fail to find a global minimizer in Example 4.2? Byadding z0 to the discrete subset Z1, the feasible set of the subproblem (P1) becomes

x ∈ X, y ∈ Z(x) ∩ ψ(x, y) = 0 ∩ |y − 5| ≤ 0.0021.

It does not include the unique global optimizer (x∗, y∗) = (1, 3). In other words,the reason is z0 6∈ Z(x∗).

From the above example, we observe that the difficulty for GBPPs lies in thedependence of the lower level feasible set on x. Let (x∗, y∗) be a global optimalsolution of the bilevel program (P ). Since it is possible that zk /∈ Z(x∗) for a certaink, (x∗, y∗) may not satisfy the new added constraint in (Pk+1): H(x∗, y∗, zk) ≥ 0.In other word, it is possible that (x∗, y∗) is not feasible for problem (Pk+1). LetXk be the feasible set of problem (Pk). Since Zk ⊆ Zk+1, we have Xk+1 ⊆ Xk and(x∗, y∗) is not feasible for all problem (Pτ ) for all τ ≥ k + 1. Hence Algorithm 3.1will fail to find the global optimal solution. However, this will never happen forSBPPs, since Zx = Z for any x. And for any zk ∈ Z, we have H(x∗, y∗, zk) ≥ 0,i.e., (x∗, y∗) is feasible to any subproblem (Pk). The iteration of Algorithm 3.1will continue until this global optimal solution is found. Under relatively strongerassumptions, Algorithm 3.1 can also solve GBPPs globally.

Theorem 4.3. For the general bilevel polynomial program as in (1.1), assume thelower level optimization is as in (1.2). At each step k of Algorithm 3.1, suppose thepolynomial optimization subproblems (Pk) and each (Qki ) are solved successfully byLasserre relaxations. Let Xk be the feasible set of subproblems (Pk). Suppose F ∗

is achievable and at least one global minimizer (x, y) ∈ Xk for all k ∈ N. Supposeε = 0 and kmax =∞.

(i) If Algorithm 3.1 stops at Step k, then each (x∗, y∗) ∈ X ∗ is a global opti-mizer of (1.1).

Page 16: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

16 JIAWANG NIE, LI WANG, AND JANE J. YE

(ii) Suppose Algorithm 3.1 does not stop and each Sk 6= ∅ is finite. Suppose theunion ∪k≥0Zk is bounded. Let (x∗, y∗) be an arbitrary accumulation pointof the set ∪k≥0Sk. If the value function v(x), as in (1.6), is continuous atx∗, then (x∗, y∗) is a global optimizer of GBPP problem (1.1).

Proof. Since at least one global minimizer (x, y) ∈ Xk, we have F ∗k ≤ F ∗ for allk ∈ N. The rest of the proof is similar to the proof of Theorem 3.5.

If v(x) has restricted inf-compactness at x∗ and the Mangasarian-Fromovitz con-straint qualification (MFCQ) holds at all solutions of the lower level problem P (x∗),then the value function v(x) is Lipschitz continuous at x∗; see [5, Corollary 1]. Re-cently, it was shown in [9, Corollary 4.8] that the MFCQ can be replaced by aweaker condition called quasinormality in the above result.

5. Numerical experiments

In this section, we present numerical experiments for solving BPPs. In Algo-rithm 3.1, the polynomial optimization subproblems are solved by Lasserre semi-definite relaxations, implemented in software Gloptipoly 3 [10] and the SDP solverSeDuMi [28]. The computation is implemented with Matlab R2012a on a MacBookPro 64-bit OS X (10.9.5) system with 16GB memory and 2.3GHz Intel Core i7 CPU.In the algorithms, we set the parameters kmax = 10 and ε = 10−4. In reportingcomputational results, we use (x∗, y∗) to denote the computed global optimizers,F ∗ to denote the optimal value of the outer objective function F , v∗ to denoteminz∈Z H(x∗, y∗, z), Iter to denote the total of number of iterations for conver-gence. Note that (x∗, y∗) is a global minimizer of (P ) if v∗ ≥ 0, up to a smallnumerical error.

5.1. Examples of SBPPs.

Example 5.1. ([19, Example 3.26]) Consider the SBPP:

minx∈R2,y∈R3

x1y1 + x2y22 + x1x2y

33

s.t. x ∈ [−1, 1]2, 0.1− x21 ≤ 0,1.5− y21 − y22 − y23 ≤ 0,−2.5 + y21 + y22 + y23 ≤ 0,y ∈ S(x),

where S(x) is the set of minimizers of

minz∈[−1,1]3

x1z21 + x2z

22 + (x1 − x2)z23 .

Algorithm 3.1 terminates after one iteration, and we get

x∗ ≈ (−1,−1), y∗ ≈ (1,±1,−0.7071), F ∗ ≈ −2.3536, v∗ ≈ −5.71× 10−9.

Example 5.2. Consider the SBPP:min

x∈R2,y∈R3x1y1 + x2y2 + x1x2y1y2y3

s.t. x ∈ [−1, 1]2, y1y2 − x21 ≤ 0,y ∈ S(x),

Page 17: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

BILEVEL POLYNOMIAL PROGRAMS 17

where S(x) is the set of minimizers ofminz∈R3

x1z21 + x22z2z3 − z1z23

s.t. 1 ≤ z21 + z22 + z23 ≤ 2.

Algorithm 3.1 terminates after one iteration, and we get

x∗ ≈ (−1,−1), y∗ ≈ (1.1097, 0.3143,−0.8184), F ∗ ≈ −1.7095, v∗ ≈ −1.19× 10−9.

Example 5.3. We consider some test problems from [19]. For convenience of dis-play, we choose the problems that have common constraints x ∈ [−1, 1] for theouter level optimization and z ∈ [−1, 1] for the inner level optimization. WhenAlgorithm 3.1 is applied, all these SBPPs are solved successfully. The outer objec-tive F (x, y), the inner objective f(x, z), the global optimizers (x∗, y∗), the optimalvalue F ∗, the value v∗, and the number of consumed iterations Iter are reportedin Table 5.

Table 5. Results for some SBPP problems in [19]. They have thecommon constraints x ∈ [−1, 1] and z ∈ [−1, 1].

SBPP (x∗, y∗) Iter F∗ v∗

F = (x− 1/4)2 + y2

f = z3/3− xz (0.2500, 0.5000) 2 0.2500 -5.7e-10

F = x+ yf = xz2/2− z3/3 (-1.0000, 1.0000) 2 2.79e-8 -4.22e-8

F = 2x+ yf = xz2/2− z4/4 (-0.5, -1), (-1, 0) 2 -2.0000 -6.0e-10

F = (x+ 1/2)2 + y2/2f = xz2/2 + z4/4

(−0.2500,±0.5000) 4 0.1875 -8.3e-11

F = −x2 + y2

f = xz2 + z4/2(1.0000, 0.0000) 2 -1.0000 -3.1e-13

F = xy − y + y2/2f = −xz2 + z4/2

(0.1886, 0.4342) 2 -0.2581 -3.6e-12

F = (x− 1/4)2 + y2

f = z3/3− x2z(0.5000, 0.5000) 2 0.3125 -1.1e-10

Example 5.4. Consider the SBPP:

(5.1)

min

x∈R4,y∈R4x21y1 + x2y2 + x3y

23 + x4y

24

s.t. ‖x‖2 ≤ 1, y1y2 − x1 ≤ 0,y3y4 − x23 ≤ 0, y ∈ S(x),

where S(x) is the set of minimizers ofminz∈R4

z21 − z2(x1 + x2)− (z3 + z4)(x3 + x4)

s.t. ‖z‖2 ≤ 1, z22 + z23 + z24 − z1 ≤ 0.

We apply Algorithm 3.1 to solve (5.1). The computational results are reportedin Table 6. As one can see, Algorithm 3.1 stops when k = 3 and solves (5.1)successfully.

An interesting special case of SBPPs is that the inner level optimization has noconstraints, i.e., Z = Rp. In this case, the set KFJ(x) of Fritz John points is justthe set of critical points of the inner objective f(x, z). It is easy to see that thepolynomial ψ(x, y) is given as

ψ(x, z) =( ∂

∂z1f(x, z), . . . ,

∂zpf(x, z)

).

Page 18: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

18 JIAWANG NIE, LI WANG, AND JANE J. YE

Table 6. Results of Algorithm 3.1 for solving (5.1).

Step k (xki , y

ki ) F∗k vki

0 (-0.0000,1.0000,-0.0000,0.0000,0.6180,-0.7862,0.0000,0.0000) -0.7862 -1.64061 (0.0000,-0.0000,0.0000,-1.0000,0.6180, -0.0000,0.0000,-0.7862) -0.6180 -0.3458

(0.0003,-0.0002,-0.9999,0.0000,0.6180,0.0001,-0.7861,-0.0000) -0.6180 -0.34582 (0.0000,-0.0000,-0.8623,-0.5064,0.6180,-0.0000,-0.6403,-0.4561) -0.4589 -0.02113 (0.0000,-0.0000,-0.7098,-0.7042,0.6180,-0.0000,-0.5570,-0.5548) -0.4371 -6.37e-5

Table 7. Results for random SBPPs as in (5.2).

n p d1 d2 Inst AvgIter AvgTime Avg(v∗)2 3 3 2 20 2.00 0:00:02 -6.05e-73 3 2 2 20 2.00 0:00:13 -1.34e-63 3 3 2 20 1.50 0:00:05 -5.41e-74 2 2 2 20 2.50 0:00:07 -2.63e-64 3 2 2 20 2.00 0:00:30 -1.77e-65 2 2 2 20 1.85 0:00:35 -4.33e-65 3 2 2 20 2.00 0:11:33 -4.63e-66 2 2 2 20 1.85 0:09:58 -1.00e-67 2 2 2 5 2.20 1:28:50 -1.53e-4

Example 5.5. (SBPPs with Z = Rp) Consider random SBPPs with ball conditionson x and no constraints on z:

(5.2)

F ∗ := min

x∈Rn, y∈RpF (x, y)

s.t. ‖x‖2 ≤ 1, y ∈ argminz∈Rp

f(x, z),

where F (x, y) and f(x, z) are generated randomly as

F (x, y) := aT1 [u]2d1−1 + ‖B1[u]d1‖2,

f(x, z) := aT2 [x]2d2−1 + aT3 [z]2d2−1 +

∥∥∥∥B2

([x]d2

[z]d2

)∥∥∥∥2 .In the above, x = (x1, . . . , xn), y = (y1, . . . , yp), z = (z1, . . . , zp), u = (x, y) andd1, d2 ∈ N. The symbol [x]d denotes the vector of monomials in x and of degrees≤ d, while [x]d denotes the vector of monomials in x and of degrees equal to d. Thesymbols [y]d, [y]d, [u]d are defined in the same way,

We test the performance of Algorithm 3.1 for solving SBPPs in the form (5.2).The computational results are reported in Table 7. In the table, Inst denotes thenumber of randomly generated instances, AvgIter denotes the average number ofiterations taken by Algorithm 3.1, AvgTime denotes the average of consumed time,and Avg(v∗) denotes the average of the values v∗. The consumed computationaltime is in the format hr:mn:sc, with hr (resp., mn, sc) standing for hours (resp.,minutes, seconds). As we can see, all such SBPPs were solved successfully.

Example 5.6. (Random SBPPs with ball conditions) Consider the SBPP:

(5.3)

min

x∈Rn, y∈RpF (x, y)

s.t. ‖x‖2 ≤ 1, y ∈ argmin‖z‖2≤1

f(x, z).

The outer and inner objectives F (x, y), f(x, z) are generated as

F (x, y) = aT [(x, y)]2d1 , f(x, z) =

([x]d2[z]d2

)TB

([x]d2[z]d2

).

Page 19: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

BILEVEL POLYNOMIAL PROGRAMS 19

Table 8. Results for random SBPPs in (5.3)

n p d1 d2 Inst AvgIter AvgTime Avg(v∗)3 2 3 2 20 1.40 0:00:05 -9.98e-84 2 2 2 20 1.60 0:00:11 -1.54e-73 3 2 2 20 2:00 0:00:22 -6.49e-75 2 2 2 20 3.80 0:01:49 -2.30e-46 2 2 2 10 3.00 0:10:03 -3.78e-46 2 3 2 10 3.40 0:12:14 -6.06e-4

The entries of the vector a and matrix B are generated randomly obeying Gaussiandistributions. The symbols like [(x, y)]2d1 are defined similarly as in Example 5.5.

We apply Algorithm 3.1 to solve (5.3). The computational results are reportedin Table 8. The meanings of Inst, AvgIter, AvgTime, and Avg(v∗) are same as inExample 5.5. As we can see, the SBPPs as in (5.3) can be solved successfully byAlgorithm 3.1.

5.2. Examples of GBPPs.

Example 5.7. Consider the GBPP:

(5.4)

min

x∈R2,y∈R3− 1

2x21y1 + x2y

22 − (x1 + x22)y3

s.t. x ∈ [−1, 1]2, x1 + x2 − x21 − y21 − y22 ≥ 0,y ∈ S(x),

where S(x) is the set of minimizers ofminz∈R3

x2(z1z2z3 + z22 − z33)

s.t. x1 − z21 − z22 − z23 ≥ 0, 1− 2z2z3 ≥ 0.

We apply Algorithm 3.1 to solve (5.4). Algorithm 3.1 terminates after one iteration,and we get

x∗ ≈ (1, 1), y∗ ≈ (0, 0, 1), F ∗ ≈ −2, v∗ ≈ −2.95× 10−8.

Example 5.8. Consider the GBPP:

(5.5)

min

x∈R4,y∈R4(x1 + x2 + x3 + x4)(y1 + y2 + y3 + y4)

s.t. ‖x‖2 ≤ 1, y23 − x4 ≤ 0,y2y4 − x1 ≤ 0, y ∈ S(x),

where S(x) is the set of minimizers ofminz∈R4

x1z1 + x2z2 + 0.1z3 + 0.5z4 − z3z4s.t. z21 + 2z22 + 3z23 + 4z24 ≤ x21 + x23 + x2 + x4,

z2z3 − z1z4 ≥ 0.

We apply Algorithm 3.1 to solve (5.5). The computational results are reportedin Table 9. As one can see, Algorithm 3.1 stops when k = 1 and solves (5.1)successfully.

Table 9. Results of Algorithm 3.1 for solving (5.5).

Step k (xki , y

ki ) F∗k vki

0 (0.5442,0.4682,0.4904,0.4942,-0.7792,-0.5034,-0.2871,-0.1855) -3.5050 -0.03911 (0.5135,0.5050,0.4882,0.4929,-0.8346,-0.4104,-0.2106,-0.2887) -3.4880 3.29e-9

Page 20: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

20 JIAWANG NIE, LI WANG, AND JANE J. YE

Example 5.9. Consider some GBPP examples in [1, 7, 19, 35]. The computationalresults are reported in Table 10. As we can see, Algorithm 3.1 converges in veryfew steps.

Table 10. Results for some GBPPs

Ref. Small GBPPs Results

[1]

min

x∈R,y∈R−x− y

s.t. y ∈ S(x) := argminz∈Zx

z

Zx := z ∈ R| − x+ z ≥ 0, −z ≥ 0.

F∗ -2.78e-13

Iter 1

x∗ 3.82e-14

y∗ 2.40e-13

v∗ -7.43e-13

[35]

min

x∈R,y∈R(x− 1)2 + y2

s.t. x ∈ [−3, 2], y ∈ S(x) := argminz∈Zx

z3 − 3z

Zx := z ∈ R|z ≥ x.

F∗ 0.9999

Iter 2

x∗ 0.9996

y∗ 1.0000

v∗ -4.24e-9

[19]

minx∈R,y∈R

(x− 0.6)2 + y2

s.t. x, y ∈ [−1, 1], y ∈ S(x) := argminz∈Zx

f(x, z) = z4 + 430 (1− x)z

3

+(0.16x− 0.02x2 − 0.4)z2 + (0.004x3 − 0.036x2 + 0.08x)z,

Zx := z ∈ R|0.01(1 + x2) ≤ z2, z ∈ [−1, 1].

F∗ 0.1917

Iter 2

x∗ 0.6436

y∗ -0.4356

v∗ 2.18e-10

[19]

min

x∈R,y∈R2x3y1 + y2

s.t. x ∈ [0, 1], y ∈ [−1, 1]× [0, 100], y ∈ S(x) := argminz∈Zx

− z2

Zx := z ∈ R2|xz1 ≤ 10, z21 + xz2 ≤ 1, z ∈ [−1, 1]× [0, 100].

F∗ 1

Iter 1

x∗ 1

y∗ (0,1)

v∗ 3.45e-8

[19]

minx∈R,y∈R2

−x− 3y1 + 2y2

s.t. x ∈ [0, 8], y ∈ [0, 4]× [0, 6], y ∈ S(x) = argminz∈Zx

− z1

Zx :=

z ∈ R2

∣∣∣∣∣ −2x+ z1 + 4z2 ≤ 16, 8x+ 3z1 − 2z2 ≤ 48

2x− z1 + 3z2 ≥ 12, z ∈ [0, 4]× [0, 6]

.

F∗ -13

Iter 1

x∗ 5

y∗ (4,2)

v∗ 3.95e-6

[7]

minx∈R2,y∈R2

−y2

s.t. y1y2 = 0, x ≥ 0, y ∈ S(x) := argminz∈Zx

z21 + (z2 + 1)2

Zx :=

z ∈ R2

∣∣∣∣∣ (z1 − x1)2 + (z2 − 1− x1)

2 ≤ 1,

(z1 + x2)2 + (z2 − 1− x2)

2 ≤ 1

.

F∗ -1

Iter 2

x∗ (0.71,0.71)

y∗ (0,1)

v∗ -3.77e-10

[1]

minx∈R2,y∈R2

−x21 − 3x2 − 4y1 + y22

s.t. (x, y) ≥ 0,−x1 − 2x2 + 4 ≥ 0, y ∈ S(x) := argminz∈Zx

z21 − 5z2

Zx :=

z ∈ R2

∣∣∣∣∣ x21 − 2x1 + x2

2 − 2z1 + z2 + 3 ≥ 0,

x2 + 3z1 − 4z2 − 4 ≥ 0

F∗ -12.6787

Iter 2

x∗ (0,2)

y∗ (1.88,0.91)

v∗ -1.07e-6

Acknowledgement Jiawang Nie was partially supported by the NSF grants DMS-0844775 and DMS-1417985. Jane Ye’s work was partially supported in part by theNSERC.

References

[1] Gemayqzel Bouza Allende and Georg Still. Solving bilevel programs with the KKT-approach.Mathematical programming, Seris A, 138(1-2):309–332, 2013.

[2] B. Bank, J. Guddat, D. Klatte, B. Kummer and K. Tammer. Nonlinear parametric opti-

mization, Birkhauser Verlag, Basel, 1983.[3] Omar Ben-Ayed and Charles E. Blair. Computational difficulties of bilevel linear program-

ming, Operations Research, 38(3):556-560, 1990.

Page 21: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

BILEVEL POLYNOMIAL PROGRAMS 21

[4] Dimitri P. Bertsekas. Nonlinear programming, Athena Scientific, Belmont, MA, 1999.

[5] Frank H. Clarke, Optimization and nonsmooth analysis, Wiley Interscience, New York, 1983;

reprinted as vol. 5 of Classics in Applied Mathematics, SIAM, Philadelphia, PA, 1990.[6] Stephan Dempe. Foundations of bilevel programming. Springer Science & Business Media,

Kluwer, Dordrecht, 2002.

[7] Stephan Dempe and Joydeep Dutta. Is bilevel programming a special case of a methematicalprogram with complementarity constraints? Mathematical programming, Series A, 131:37–

48, 2012.

[8] Stephan Dempe, Vyacheslav Kalashnikov, Gerardo A. Perez-Valdes and Nataliya Kalash-nykova. Bilevel programming problems: theory, algorithms and applications to energy net-

works, Springer-Verlag, Berlin, 2015.

[9] Lei Guo, Gui Hua Lin, Jane J. Ye, and Jin Zhang, Sensitivity analysis of the value func-tion for parametric mathematical programs with equilibrium constraints, SIAM Journal on

Optimization, 24(3):1206-1237, 2014.[10] Didier Henrion, Jean B. Lasserre, and Johan Lofberg. GloptiPoly 3: moments, optimization

and semidefinite programming. Optimization Methods and Software, 24(4-5):761–779, 2009.

[11] Vaithilingam Jeyakumar, Jean B. Lasserre, Guoyin Li and T. S. Pham. Convergent semi-definite programming relaxations for global bilevel polynomial optimization problems.

arXiv:1506.02099, 2015.

[12] Jean B. Lasserre. Global optimization with polynomials and the problem of moments. SIAMJournal on Optimization, 11(3):796–817, 2001.

[13] Jean B. Lasserre. Moments, positive polynomials and their applications. Imperial College

Press, London, UK, 2009.[14] Monique Laurent. Optimization over polynomials: Selected topics. Proceedings of the Inter-

national Congress of Mathematicians (ICM 2014), pp. 843-869, 2014.

[15] Guihua Lin, Mengwei Xu, and Jane J. Ye. On solving simple bilevel programs with a non-convex lower level program. Mathematical programming, Series A, 144(1-2): 277-305, 2014.

[16] Johan Lofberg. YALMIP : A Toolbox for Modeling and Optimization in MATLAB. Pro-ceedings of the CACSD Conference, http://users.isy.liu.se/johanl/yalmip, 2004.

[17] Zhi Quan Luo, Jong-Shi Pang, Danial Ralph. Mathematical programs with equilibrium

constraints. Cambridge University Press, Cambridge, 1996.[18] J. A. Mirrlees. The theory of moral hazard and unobservable behaviour: part I. The Review

of Economic Studies, 66:3-22, 1999.

[19] Alexander Mitsos and Paul I. Barton. A test set for bilevel programs. 2006,http://www.researchgate.net/publication/228455291.

[20] Alexander Mitsos, P. Lemonidis and Paul I. Barton. Global solution of bilevel programs

with a nonconvex inner program. Journal of Global Optimization, 42:475-513, 2008.[21] Rainer Hettich and Kenneth O. Kortanek. Semi-infinite programming: theory, methods, and

applications. SIAM Review, 35(3):380-429, 1993.

[22] Jiawang Nie. An exact Jacobian SDP relaxation for polynomial optimization. MathematicalProgramming, Series A, 137:225-255, 2013.

[23] Jiawang Nie. Certifying convergence of Lasserre’s hierarchy via flat truncation. MathematicalProgramming, Series A, 142(1-2):485–510, 2013.

[24] Jiawang Nie. Optimality conditions and finite convergence of Lasserre’s hierarchy. Mathe-

matical programming, Series A, 146(1):97-121, 2014.[25] J. Vladimr Outrata. On the numerical solution of a class of Stackelberg problems. Zeitschrift

fur Operations Research 34:255-277, 1990.[26] J. Vladimr Outrata, M. Kocvara and J. Zowe. Nonsmooth approach to optimization prob-

lems with equilibrium constraints: theory, applications and numerical Results. Kluwer Aca-

demic Publishers, Boston, 1998.

[27] Mihai Putinar. Positive polynomials on compact semi-algebraic sets. Indiana UniversityMathematics Journal, 42:969–984, 1993.

[28] Jos F. Sturm. Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetriccones. Optimization Methods and Software, 11/12:625–653, 1999.

[29] Kim Chuan Toh, Michael J. Todd, and Reha Tutuncu. SDPT3 - a MATLAB software

package for semidefinite programming. Optimization Methods and Software, 11:545–581,

1998.

Page 22: arXiv:1508.06985v1 [math.OC] 27 Aug 2015

22 JIAWANG NIE, LI WANG, AND JANE J. YE

[30] Mengwei Xu and Jane J. Ye. A smoothing augmented Lagrangian method for solving simple

bilevel programs. Computational Optimization and Applications, 59(1-2), 353-377, 2014.

[31] Mengwei Xu, Jane J. Ye and Liwei Zhang. Smoothing augmented Lagrangian method forsolving nonsmooth and nonconvex constrained optimization problems. Journal of Global

Optimization, 62(4):675-694, 2014.

[32] Mengwei Xu, Jane J. Ye and Liwei Zhang. A smoothing SQP method for solving degeneratenonsmooth constrained optimization problems with applications to bilevel programs, SIAM

Journal on Optimization, 25(3):1388–1410, 2015.

[33] Li Wang and Feng Guo. Semidefinite relaxations for semi-infinite polynomial programming.Computational Optimization and Applications, 58(1):133-159, 2014.

[34] Jane J. Ye and Daoli Zhu. Optimality conditions for bilevel programming problems. Opti-

mization, 33:9–27, 1995.[35] Jane J. Ye and Daoli Zhu. New necessary optimality conditions for bilevel programs by comb-

ing the MPEC and value function approaches. SIAM Journal on Optimization, 20:1885–1905, 2010.

Department of Mathematics, University of California, 9500 Gilman Drive, La Jolla,

CA, USA, 92093.

E-mail address: [email protected]

Department of Mathematics, Statistics, and Computer Science, University of Illinois

at Chicago, Chicago, IL, USA, 60607.E-mail address: [email protected]

Department of Mathematics and Statistics, University of Victoria, Victoria, B.C.,

Canada, V8W 2Y2.E-mail address: [email protected]