11
262 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 27, NO. 2, FEBRUARY 2016 A New Continuous-Time Equality-Constrained Optimization to Avoid Singularity Quan Quan and Kai-Yuan Cai Abstract— In equality-constrained optimization, a standard regularity assumption is often associated with feasible point methods, namely, that the gradients of constraints are linearly independent. In practice, the regularity assumption may be violated. In order to avoid such a singularity, a new projection matrix is proposed based on which a feasible point method to continuous-time, equality-constrained optimization is developed. First, the equality constraint is transformed into a continuous- time dynamical system with solutions that always satisfy the equality constraint. Second, a new projection matrix without singularity is proposed to realize the transformation. An update (or say a controller) is subsequently designed to decrease the objective function along the solutions of the transformed continuous-time dynamical system. The invariance principle is then applied to analyze the behavior of the solution. Furthermore, the proposed method is modified to address cases in which solutions do not satisfy the equality constraint. Finally, the proposed optimization approach is applied to three examples to demonstrate its effectiveness. Index Terms— Continuous-time dynamical systems, equality constraints, optimization, singularity. I. I NTRODUCTION A CCORDING to the implementation of a differential equation, most approaches to continuous-time optimiza- tion can be classified as either a dynamical system [1]–[3] or a neural network [4]–[10]. The dynamical system approach relies on the numerical integration of differential equations on a digital computer. Unlike discrete optimization methods, the step sizes of dynamical system approaches can be automati- cally controlled in the integration process and can sometimes be made larger than usual. This advantage suggests that the dynamical system approach can, in fact, be comparable with the currently available conventional discrete optimal methods and can facilitate faster convergence [1], [3]. The application of a higher order numerical integration process also enables us to avoid the zigzagging phenomenon, which is often encoun- tered in typical linear extrapolation methods [1]. On the other hand, the neural network approach emphasizes implementation by analog circuits, very large-scale integration, and optical technologies [11]. The major breakthrough of this approach is attributed to [12], which introduced an artificial neural network Manuscript received July 15, 2014; accepted August 29, 2015. Date of publication September 22, 2015; date of current version January 18, 2016. This work was supported by the National Natural Science Foundation of China under Grant 61473012. The authors are with the Department of Automatic Control, Beijing University of Aeronautics and Astronautics, Beijing 100191, China (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2015.2476348 to solve the traveling salesman problem. By employing analog hardware, the neural network approach offers low computa- tional complexity and is suitable for parallel implementation. For continuous-time equality-constrained optimization, existing methods can be classified into three categories [1], [4], [8], [13]: 1) the feasible point method (or primal method); 2) the penalty function method; and 3) the Lagrangian mul- tiplier method (or primal-dual method). The feasible point method directly solves the original problem by searching through the feasible region for the optimal solution. Each point in the process is feasible, and the value of the objective function constantly decreases. Compared with the two other methods, the feasible point method offers three significant advantages that highlight its usefulness as a general proce- dure that is applicable to almost all nonlinear programming problems [13, p. 360]: 1) the terminating point is feasible if the process is terminated before the solution is reached; 2) the limit point of the convergent sequence of solutions must be at least a local constrained minimum; and 3) the method is applicable to general nonlinear programming prob- lems because it does not rely on special problem structures, such as convexity. In equality-constrained optimization, a standard regularity assumption is often associated with fea- sible point methods, namely, that the gradients of constraints are linearly independent. Besides, the regularity assumption is also required by the penalty function method [13, p. 417] and the Lagrangian multiplier method [13, p. 476]. However, in practice, the regularity assumption may be violated. In order to avoid such a singularity, a continuous-time feasible point method is proposed to identify the local minimum from a feedback control perspective for a general equality-constrained optimization problem. Compared with global optimization methods, local optimization methods are still necessary. First, they often serve as a basic component for some global optimizations, such as the branch-and-bound method [14]. On the other hand, they require less computation for online optimization. Compared with the discrete optimal methods offered by MATLAB, illustrative examples show that the proposed method avoids convergence to a singular point and facilitates faster convergence through numerical integra- tion on a digital computer. Moreover, one illustrative example shows that the proposed projection matrix also outperforms the modified commonly used projection matrix. In view of these, the contributions of this paper are clear and listed as follows. 1) A new projection matrix is proposed to remove a standard regularity assumption that is often asso- ciated with feasible point methods, namely, that 2162-237X © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

262 IEEE TRANSACTIONS ON NEURAL NETWORKS AND …

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 262 IEEE TRANSACTIONS ON NEURAL NETWORKS AND …

262 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 27, NO. 2, FEBRUARY 2016

A New Continuous-Time Equality-ConstrainedOptimization to Avoid Singularity

Quan Quan and Kai-Yuan Cai

Abstract— In equality-constrained optimization, a standardregularity assumption is often associated with feasible pointmethods, namely, that the gradients of constraints are linearlyindependent. In practice, the regularity assumption may beviolated. In order to avoid such a singularity, a new projectionmatrix is proposed based on which a feasible point method tocontinuous-time, equality-constrained optimization is developed.First, the equality constraint is transformed into a continuous-time dynamical system with solutions that always satisfy theequality constraint. Second, a new projection matrix withoutsingularity is proposed to realize the transformation. An update(or say a controller) is subsequently designed to decreasethe objective function along the solutions of the transformedcontinuous-time dynamical system. The invariance principle isthen applied to analyze the behavior of the solution. Furthermore,the proposed method is modified to address cases in whichsolutions do not satisfy the equality constraint. Finally, theproposed optimization approach is applied to three examples todemonstrate its effectiveness.

Index Terms— Continuous-time dynamical systems, equalityconstraints, optimization, singularity.

I. INTRODUCTION

ACCORDING to the implementation of a differentialequation, most approaches to continuous-time optimiza-

tion can be classified as either a dynamical system [1]–[3]or a neural network [4]–[10]. The dynamical system approachrelies on the numerical integration of differential equations ona digital computer. Unlike discrete optimization methods, thestep sizes of dynamical system approaches can be automati-cally controlled in the integration process and can sometimesbe made larger than usual. This advantage suggests that thedynamical system approach can, in fact, be comparable withthe currently available conventional discrete optimal methodsand can facilitate faster convergence [1], [3]. The applicationof a higher order numerical integration process also enables usto avoid the zigzagging phenomenon, which is often encoun-tered in typical linear extrapolation methods [1]. On the otherhand, the neural network approach emphasizes implementationby analog circuits, very large-scale integration, and opticaltechnologies [11]. The major breakthrough of this approach isattributed to [12], which introduced an artificial neural network

Manuscript received July 15, 2014; accepted August 29, 2015. Date ofpublication September 22, 2015; date of current version January 18, 2016.This work was supported by the National Natural Science Foundation of Chinaunder Grant 61473012.

The authors are with the Department of Automatic Control,Beijing University of Aeronautics and Astronautics, Beijing 100191, China(e-mail: [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TNNLS.2015.2476348

to solve the traveling salesman problem. By employing analoghardware, the neural network approach offers low computa-tional complexity and is suitable for parallel implementation.

For continuous-time equality-constrained optimization,existing methods can be classified into three categories [1],[4], [8], [13]: 1) the feasible point method (or primal method);2) the penalty function method; and 3) the Lagrangian mul-tiplier method (or primal-dual method). The feasible pointmethod directly solves the original problem by searchingthrough the feasible region for the optimal solution. Eachpoint in the process is feasible, and the value of the objectivefunction constantly decreases. Compared with the two othermethods, the feasible point method offers three significantadvantages that highlight its usefulness as a general proce-dure that is applicable to almost all nonlinear programmingproblems [13, p. 360]: 1) the terminating point is feasibleif the process is terminated before the solution is reached;2) the limit point of the convergent sequence of solutionsmust be at least a local constrained minimum; and 3) themethod is applicable to general nonlinear programming prob-lems because it does not rely on special problem structures,such as convexity. In equality-constrained optimization,a standard regularity assumption is often associated with fea-sible point methods, namely, that the gradients of constraintsare linearly independent. Besides, the regularity assumptionis also required by the penalty function method [13, p. 417]and the Lagrangian multiplier method [13, p. 476]. However,in practice, the regularity assumption may be violated.

In order to avoid such a singularity, a continuous-timefeasible point method is proposed to identify the localminimum from a feedback control perspective for a generalequality-constrained optimization problem. Compared withglobal optimization methods, local optimization methods arestill necessary. First, they often serve as a basic componentfor some global optimizations, such as the branch-and-boundmethod [14]. On the other hand, they require less computationfor online optimization. Compared with the discrete optimalmethods offered by MATLAB, illustrative examples show thatthe proposed method avoids convergence to a singular pointand facilitates faster convergence through numerical integra-tion on a digital computer. Moreover, one illustrative exampleshows that the proposed projection matrix also outperforms themodified commonly used projection matrix. In view of these,the contributions of this paper are clear and listed as follows.

1) A new projection matrix is proposed to remove astandard regularity assumption that is often asso-ciated with feasible point methods, namely, that

2162-237X © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: 262 IEEE TRANSACTIONS ON NEURAL NETWORKS AND …

QUAN AND CAI: NEW CONTINUOUS-TIME EQUALITY-CONSTRAINED OPTIMIZATION TO AVOID SINGULARITY 263

the gradients of constraints are linearly independent(see [1, p. 158, eq. (4)], [2, p. 156, eq. (2.3)],[8, p. 1669, Assumption 1]). Compared with a modifiedcommonly used projection matrix, the proposed projec-tion matrix achieves a better precision. Moreover, itsrecursive form can be implemented more easily.

2) Based on the proposed projection matrix, a continuous-time, equality-constrained optimization method isdeveloped to avoid convergence to a singular point. Theinvariance principle is applied to analyze the behaviorof the solution.

3) The modified version of the proposed optimization isfurther developed to address cases in which solutionsdo not satisfy the equality constraint. This ensures itsrobustness against uncertainties caused by numericalerror or realization by analog hardware.

4) Equality-constrained optimization is formulated as acontrol problem, in which the equality constraint istransformed into a continuous-time dynamical system.The proposed method is easily accessible, especially topractitioners in the control field.

The following notation is used. Rn is Euclidean space of

dimension n. ‖·‖ denotes the Euclidean vector norm or inducedmatrix norm. In is the identity matrix with dimension n. 0n1×n2

denotes a zero vector or a zero matrix with dimension n1 ×n2.(·)† denotes the Moore–Penrose inverse. Direct product ⊗ andvec(·) operation are defined in Appendix A. The function[·]× : R

3 → R3×3 with matrix H ∈ R

9×3 is also definedin Appendix A. Suppose g : R

n → R. The gradient of thefunction g is given by ∇g(x) = ∇x g(x) = [∂g(x)/∂x1 · · ·∂g(x)/∂xn]T ∈ R

n and the matrix of second partial deriva-tives of g(x) known as Hessian is given by ∇x x : R → R

n×n

and ∇x x g(x) = [∂2g(x)/∂xi∂x j ]i j .

II. PROBLEM FORMULATION

In this section, the considered equality-constrainedoptimization problem is formulated first. Then, an equalityconstraint transformation is proposed to replace with theequality constraints. Based on it, the objectives are proposed.

A. Equality-Constrained Optimization

The class of equality-constrained optimization problemsconsidered here is defined as follows:

minx∈Rn

v(x), s.t. c(x) = 0m×1 (1)

where v : Rn → R is the objective function and

c = [c1 c2 · · · cm]T ∈ Rm, ci : R

n → R are theequality constraints. They are both twice continuously dif-ferentiable. Denote by ∇c � [∇c1,∇c2, . . . ,∇cm ] ∈ R

n×m .To avoid a trivial case, suppose the constraint (or feasible set)F = { x ∈ R

n | c(x) = 0m×1} �= ∅.Definition 1 [15, pp. 316–317]: For the problem (1), a vector

x∗ ∈ F is a global minimum if v(x∗) ≤ v(x), ∀x ∈ F;a vector x∗ ∈ F is a local (strict local) minimum if there is aneighborhoodN of x∗, such that v(x∗) ≤ v(x) (v(x∗) < v(x))for x ∈ N ∩ F .

Definition 2 [13, p. 325]: A vector x∗ ∈ F is saidto be a regular point if the gradient vectors∇c1(x∗),∇c2(x∗), . . . ,∇cm(x∗) are linearly independent.Otherwise, it is called a singular point.

Definition 3 [16, p. 117]: A function v(x) satisfyingv(0) = 0 and v(x) > 0 for x �= 0n×1 is said to be positivedefinite.

Remark 1 (On Inequality-Constrained Optimization):Inequality-constrained optimization problems can betransformed into equality-constrained optimization problems.For example, the inequality constraint x ≤ 1, x ∈ R can bereplaced with an equality constraint x + z2 = 1, z ∈ R. On theother hand, the inequality constraint −1 ≤ x ≤ 1, x ∈ R canbe replaced with an equality constraint x = sin(z), z ∈ R.

B. Equality Constraint Transformation

Optimization problems are often solved using numericaliterative methods. For an equality-constrained optimizationproblem, the major difficulty lies in ensuring that each iterationsatisfies the constraints and can further move toward theminimum. To address this difficulty, a transformation of theequality constraints is proposed, which is first formulated asan assumption.

Assumption 1: For a given x0 ∈ F , there exists a functionf : R

n → Rn×l , such that

x(t) = f (x(t))u(t), x(0) = x0 (2)

with solutions that satisfy x(t) ∈ Fu(x0) ⊂ F , whereFu(x0) = {x(t)|x(t) = f (x(t))u(t), x(0) = x0 ∈ F ,u(t) ∈ R

l, t ≥ 0}.The best choice of f (x) is to make Fu(x0) = F . This holds

for linear constraints. For the sake of simplicity, the variable twill be omitted except when necessary.

Theorem 1: Suppose that c(x) = Ax and f (x) ≡ A⊥,where A⊥ is with full column rank, and the space spanned bythe columns of A⊥ is the null space of A. Then, Fu(x0) = F ,∀x0 ∈ F .

Proof: See Appendix B. �From the proof of Theorem 1, the choice of f (x) is, in

fact, an accessibility problem in the control field. However,it is difficult to achieve Fu(x0) = F in a general case. Forexample, if c(x) = (x1 + 1)(x1 − 1), x = [x1 x2]T ∈ R

2,then F = { x ∈ R

2∣∣ x1 = 1 or x1 = −1}. Since the two sets

{ x ∈ R2∣∣ x1 = 1} and { x ∈ R

2∣∣ x1 = −1} are not connected,

the solution of (2) starting from either set cannot access theother. Although Fu(x0) �= F , it is expected that the functionf (x) is chosen to make the set Fu(x0) as large as possible,so that the probability of x∗ ∈ Fu(x0) is higher. Motivatedby the linear case in Theorem 1, the function f (x) shouldsatisfy

V1(x) = V2(x)

where

V1(x) = {z ∈ Rn |∇c(x)T z = 0m×1}, [null space of ∇c(x)T ]

V2(x) = {z ∈ Rn |z = f (x)u, u ∈R

l}, [range space of f (x)].

Page 3: 262 IEEE TRANSACTIONS ON NEURAL NETWORKS AND …

264 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 27, NO. 2, FEBRUARY 2016

For a given x ′ ∈ Rn, if V1(x ′) = V2(x ′), then f (x ′) is defined

as the projection matrix of ∇c(x ′). Obviously, it satisfies

∇c(x ′)T f (x ′) = 0m×l .

One commonly used projection matrix, denoted by fcom(x),is given as follows [1], [2], [8]:

fcom(x) = In − (∇c(∇cT ∇c)−1∇cT )(x). (3)

Obviously, the matrix function fcom(x) is the projection matrixfor all regular points except for some singular points.

C. Objective

This paper aims to propose a continuous-time, equality-constrained optimization method to identify the local min-imum from a feedback control perspective. Concretely, thefirst objective is to propose a new projection matrix to avoidsingularity. Based on it, the second objective is to design theupdate u to make the solutions of (2) achieve a local minimum.By Assumption 1, the update u in (2) can be considered asa control input from the feedback control perspective. Theobjective function v(x) can be considered as a Lyapunov-like function, although v(x) is, in fact, not required to bepositive definite here. According to this, the second objectiveof this paper can be restated as: to design a control input u todecrease v(x) along the solutions of (2) until x has achieveda local minimum.

III. NEW PROJECTION MATRIX

In order to avoid such a singularity, a new projection matrixis proposed. Before it, a modification of fcom(x) in (3) isinvestigated.

A. Modified Commonly Used Projection Matrix

In order to avoid singularity problem, a commonly usedprojection matrix (3) is modified as follows:

fmcom(x) = In − (∇c(ε Im + ∇cT ∇c)−1∇cT )(x) (4)

where ε > 0 is a small positive scale. Obviously, the smaller εis, the closer to zero ‖∇cT fmcom‖ is. On the other hand,however, a very small ε will cause an ill-conditioning problemespecially for a low-precision processor.

Example 1: Consider the following gradient vectors:∇c1 = [ 1 1 1 1 ]T

∇c2 = [ 2 1 1 1 ]T

∇c3 = [ 3 2 2 2 ]T (5)

which are linearly dependent as ∇c3 = ∇c1 + ∇c2. Theprecision error ep = ‖∇cT fmcom‖ is performed with differentε = 10−k, k = 1, . . . , 15, where ∇c = [∇c1,∇c2,∇c3].As shown in Fig. 1, the smallest precision error is achievedonly at ε = 10−8 with a precision error around 10−8. Reducingε further will increase the numerical error.

In order to avoid the singularity problem in Example 1, thebest cure is to remove the vector ∇c3, resulting in

∇cnew = [∇c1,∇c2] ∈ R4×2.

Fig. 1. Precision error of the modified commonly used projection matrixwith different ε = 10−k .

With it, the projection matrix becomes

fnewcom = I4 − ∇cnew(∇cT

new∇cnew)−1∇cT

new.

It is easy to see that ∇cT fnewcom ≡ 03×4. However, the bestcure cannot be implemented continuously using (3), whichfurther cannot be realized by analog hardware. For such apurpose, a new projection matrix is proposed in the following.

B. New Projection Matrix

A new projection matrix, denoted by fpro(x), is pro-posed to avoid the singularity problem. For a special casec : R

n → R, such fpro(x) is designed in Theorem 2.Consequently, a method is proposed to construct a projectionmatrix for a general case c : R

n → Rm . Before the design,

the following preliminary result is needed.Lemma 1: Suppose

W1 = {z ∈ Rn |LT z = 0}

W2 ={

z ∈ Rn|z =

(

In − L LT

δ(‖L‖2) + ‖L‖2

)

u, u ∈ Rn}

where L ∈ Rn and

δ(x) ={

1, x = 0, x ∈ R

0, x �= 0, x ∈ R.

Then, W1 = W2.Proof: See Appendix C. �

Theorem 2: Suppose that c : Rn → R and the function

fpro(x) is designed to be

fpro = In − ∇c∇cT

δ(‖∇c‖2) + ‖∇c‖2 . (6)

Then, Assumption 1 is satisfied with u ∈ Rn and

V1(x) = V2(x), ∀x ∈ Rn.

Proof: Since c(x) = ∇c(x)T x and x = fpro(x)u,it follows c(x) ≡ 0m×1 by Lemma 1. Therefore,Assumption 1 is satisfied with u ∈ R

n . Further by Lemma 1,V1(x) = V2(x). �

Page 4: 262 IEEE TRANSACTIONS ON NEURAL NETWORKS AND …

QUAN AND CAI: NEW CONTINUOUS-TIME EQUALITY-CONSTRAINED OPTIMIZATION TO AVOID SINGULARITY 265

Theorem 3: Suppose that c : Rn → R

m and the func-tions fk(x) are in a recursive form as follows:

f0 = In

fk = fk−1 − fk−1 f Tk−1∇ck∇cT

k fk−1

δ(‖ f T

k−1∇ck‖2) + ‖ f T

k−1∇ck‖2(7)

where k = 1, . . . , m. Then, Assumption 1 is satisfied withfpro = fm and u ∈ R

n and V1(x) = V2(x), ∀x ∈ Rn .

Proof: See Appendix D. �Remark 2 (On the Best Cure): If ∇cm is represented

by a linear combination of ∇ci , then ∇c∇cT is singular,i = 1, . . . , m − 1. In this case, ∇cT

m fm−1 = 01×n as∇cT

i fm−1 = 01×n, i = 1, . . . , m − 1. By (7), the projectionmatrix fpro will degenerate to fpro = fm−1, that is equivalentto removing the term ∇cm . This is consistent with the bestcure aforementioned.

Remark 3 (On Example 1): Let us revisit Example 1.In practice, the impulse function δ(x) is approximated by somecontinuous functions, such as δ(x) ≈ e−γ |x |, where γ is a largepositive scale. The precision error ep = ‖∇cT fpro‖ is per-formed with δ(x) ≈ e−30|x |, resulting in ep = 2.7629×10−10.This demonstrates the advantage of our proposed pro-jection matrix over fmcom in (4). Furthermore, comparedwith (3) or (4), the explicit recursive form of the proposedprojection matrix is also easier to realize by analog hardwarebecause of avoiding the matrix inversion.

IV. UPDATE DESIGN AND CONVERGENCE ANALYSIS

In this section, the update (or say controller) u is designedto make v(x) ≤ 0, so that v(x) is nonincreasing. If v(x) ispositive definite, then the theories of Lyapunov stability isavailable. That will be very familiar to practitioners in thecontrol field. However, in order to make v(x) more general,the objective function v(x) here is not required to be positivedefinite or convexity. Because of this, the analysis is based onthe LaSalle invariance theorem [16, pp. 126–129].

A. Update Design

Based on Assumption 1, taking the time derivative of v(x)along the solutions of (2) results in

v(x) = ∇v(x)T f (x)u (8)

where ∇v(x) ∈ Rn . In order to get v(x) ≤ 0, a direct way of

designing u is proposed as follows:u = −Q(x) f (x)T ∇v(x) (9)

where Q : Rn → R

l×l and Q(x) ≥ ε Il > 0, ε > 0, ∀x ∈ Rn .

Then, (8) becomes

v(x) = −∇v(x)T f (x)Q(x) f (x)T ∇v(x) ≤ 0. (10)

Substituting (9) into the continuous-time dynamical system (2)results in

x = − f (x)Q(x) f (x)T ∇v(x) (11)

with solutions which always satisfy the constraintc(x) = 0m×1. The closed-loop system corresponding to thecontinuous-time dynamical system (2) and the controller (9)is shown in Fig. 2.

Fig. 2. Closed-loop control system.

B. Convergence Analysis

The invariance principle is applied to analyze the behaviorof the solution of (11). The readers not interested in thesedetails can directly proceed to Section IV-C.

Theorem 4: Under Assumption 1, given x0 ∈ F , if theset K = {x ∈ R

n |v(x) ≤ v(x0), c(x) = 0m×1} is bounded,then the solution of (11) starting at x0 approaches x∗

l ∈ S,where S = {x ∈ K|∇v(x)T f (x) = 01×l}. In addition,if V1(x∗

l ) = V2(x∗l ), then there must exist λ∗ = [λ∗

1 λ∗2 · · ·

λ∗m ]T ∈ R

m , such that ∇v(x∗l ) = ∑m

i=1 λ∗i ∇ci (x∗

l ) andc(x∗

l ) = 0m×1, namely, x∗l is a Karush–Kuhn–Tucker (KKT)

point. Furthermore, if zT ∇x x L(x∗l , λ∗)z > 0, for all

z ∈ V1(x∗l ), z �= 0n×1, then x∗

l is a strict local minimum,where L(x, λ) = v(x) − ∑m

i=1 λi ci (x).Proof: The proof is composed of three propositions.

Proposition 1 is to show that K is compact and positivelyinvariant with respect to (11). Proposition 2 is to show thatthe solution of (11) starting at x0 approaches x∗

l ∈ S.Proposition 3 is to show that x∗

l ∈ S is a KKT point, furthera strict local minimum. The three propositions are provenin Appendix E. �

Corollary 1: Suppose that f (x) = fpro(x) as in (7) forc : R

n → Rm and the set K = {x ∈ R

n |v(x) ≤ v(x0),c(x) = 0m×1} is bounded for given x0 ∈ F . Then, thesolution of (11) starting at x0 approaches x∗

l ∈ S, whereS = {x ∈ K|∇v(x)T f (x) = 01×n} and x∗

l is aKKT point. In addition, if zT ∇x x L(x∗

l , λ∗)z > 0, for allz ∈ V1(x∗

l ), z �= 0n×1, then x∗l is a strict local minimum,

where L(x, λ) = v(x) − ∑mi=1 λi ci (x).

Proof: Since V1(x∗l ) = V2(x∗

l ) by Theorem 3, theremainder of the proof is the same as that of Theorem 4. �

Corollary 2: Consider the following equality-constrainedoptimization problem:

minx∈Rn

v(x), s.t. Ax = b (12)

where v(x) is convex and twice continuously differentiable,A ∈ R

m×n with rank A < n, and K = {x ∈ Rn |v(x) ≤

v(x0), Ax = b} is bounded, then the solution of (11) withf (x) ≡ A⊥ starting at any x0 ∈ F approaches the globalminimum x∗.

Proof: The solution of (11) starting at x0 approachesx∗

l ∈ S. Since rank A < n, it holds thatV1(x∗

l ) = V2(x∗l ) �= ∅. Since the equality-constrained

optimization (12) is convex, a KKT point x∗l is the global

minimum x∗ of the problem (12). The remainder of the proofis the same as that of Theorem 4. �

Remark 4 (On Boundedness of Set K): If K is not abounded set, then S defined in Theorem 4 may be empty.

Page 5: 262 IEEE TRANSACTIONS ON NEURAL NETWORKS AND …

266 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 27, NO. 2, FEBRUARY 2016

Therefore, the boundedness of the set K is necessary. Forexample, v(x) = x1 + x2, show that c(x) = x1 − x2 = 0.The set K = { x ∈ R

2∣∣ x1 + x2 ≤ v(x0), x1 − x2 = 0} is

unbounded. According to Theorem 1, f (x) ≡ [1 1]T . In thiscase, ∇v(x)T f (x) ≡ 2 �= 0, and then the set S is empty.

C. Modified Closed-Loop Dynamical System

Although the proposed method ensures that the solutionssatisfy the constraints, this method may fail if x0 /∈ F orif numerical algorithms are used to compute the solutions.Moreover, if the impulse function δ in (6) is approximated,then the constraints will also be violated. With these facts, thefollowing modified closed-loop dynamical system is proposedto amend this situation.

Similar to [2], a correction term −ρ∇c(x)c(x) is introducedinto (11), resulting in

x = −ρ∇c(x)c(x) − f (x)Q(x) f (x)T ∇v(x)

x(0) = x0 (13)

where ρ > 0. Define vc(x) = c(x)T c(x). Then

vc(x) = −ρc(x)T ∇c(x)T ∇c(x)c(x) ≤ 0

where ∇c(x)T f (x) ≡ 0m×n is utilized. Therefore, the solu-tions of (13) will tend to the feasible set F if ∇c(x) is offull column rank. Once c(x) = 0m×1, the modified dynamicalsystem (13) degenerates to (11). The self-correcting featureenables the step size to be automatically controlled in thenumerical integration process or to tolerate uncertainties whenthe differential equation is realized using analog hardware.Singular points will make ∇c(x) be of nonfull column rank.In this case, the self-correction will not occur in the entirespace with respect to c(x), namely, ‖Dc(x)‖ → 0 not‖c(x)‖ → 0, where D ∈ R

p×m , p < m. After passingsingular points, the self-correction works in the entire spaceagain. Once ‖c(x)‖ = 0, the property is mainly maintainedby the proposed projection matrix.

Remark 5 (On the Matrix Q(x)): The matrix Q(x) playsa role in avoiding instability in the numerical solutionof differential equations by normalizing the Lipschitzcondition of functions, such as f (x)Q(x) f (x)T ∇v(x).If the dimension of dynamical system (13) is low, then letQ(x) = μ/‖ f (x) f (x)T ∇v(x)‖ for simplicity, where μ > 0.The matrix Q(x) also plays a role in coordinating theconvergence rate of all states by minimizing the conditionnumber of the matrix functions, such as f (x)Q(x) f (x)T ,especially in the case that the dimension of f (x)Q(x) f (x)T

is high. Ideally, it is expected f (x)Q(x) f (x)T = In .However, it is difficult to obtain such Q(x), since the numberof degrees of freedom of Q(x) ∈ R

l×l is less than the numberof elements of In . A natural choice is proposed as follows:

Q(x) = μ( f (x)T f (x))† + ε Il

where μ, ε > 0. The term ε Il makes Q(x) ≥ ε Il > 0. As aresult, one has

f (x)Q(x) f (x)T = μU††U T + ε f (x) f (x)T .

Here, the singular value decomposition of the matrix f (x)is f (x) = UV , where U ∈ R

n×n and V ∈ Rl×l are real

unitary matrices, and ∈ Rn×l is a rectangular diagonal

matrix with non-negative real numbers on the diagonal. If thedynamical system (13) is solved by the numerical integrationof differential equations, Q(x) needs to be computed everytime. This, however, will cost much time. A time-saving wayis to update Q(x) at a reasonable interval. On the other hand,if the dynamical system (13) is realized by a neural network,then Q(x) should be realized by an additional continuous-timedynamical system. Thanks to parallel implementation, therealization can also be implemented online. Finally, the matrixfunction Q(x) also plays a role in determining the searchdirection. In the future, it is expected that the matrix functionQ(x) can be designed to generate search directions like thosethe interior point method and the conjugate gradient methodprovide.

Remark 6 (On Large-Scale Optimization): Consider thefollowing equality-constrained optimization problem:

minx∈Rn

v(x), s.t. c(x) = c′(x) + c0 = 0m×1

where c0 ∈ Rm is the constant vector depending on a concrete

problem. The continuous-time dynamical system (13) can bewritten as

x = −ρ∇c(x)(c′(x) + c0) + h(x)

where h(x) = − f (x)Q(x) f (x)T ∇v(x) ∈ Rn . For such a

class of large-scale optimization problems, the terms h(x)are the same for any given c0 ∈ R

m . In order to make themethod efficient, the function h(x) can be derived offlinefirst and realized by analog hardware. As shown in Fig. 3(a),h(·) is a block by taking x as an input and h(x) as an output.Therefore, h(x) depends on x , but h(·) does not. Thanks to therecursive form, as shown in Fig. 3(b), the realization of theproposed projection matrix f (·) can use the element F(·, ·)repeatedly. This makes the realization easier. The realizationof F(·, ·) is shown in Fig. 3(c). In practice, the term h(x) canbe approximated offline by a static and simple neural networknn(x) [simpler than h(x) at least] if only a bound x ∈ D isconsidered, resulting in

x = −ρ∇c(x)c(x) + nn(x).

Although the approximation error exists, the solution canstill converge to the constraints because of the correctionterm −ρ∇c(x)c(x).

V. ILLUSTRATIVE EXAMPLES

Three examples are given. The first example is mainly toshow, in the presence of singularity, a commonly-used methodwill fail to find the minimum, whereas the proposed methodcan. The second example is mainly to show that the proposedprojection matrix outperforms the modified commonly usedprojection matrix. The third example is mainly to showhow the proposed method is applied to a practical example.Two of the three examples also show the advantage inrunning time.

Page 6: 262 IEEE TRANSACTIONS ON NEURAL NETWORKS AND …

QUAN AND CAI: NEW CONTINUOUS-TIME EQUALITY-CONSTRAINED OPTIMIZATION TO AVOID SINGULARITY 267

TABLE I

RESULT FOR EXAMPLE IN SECTION V-A

Fig. 3. Block diagram realization of (a) (13), (b) new projection matrix f (·),(c) element F(·, ·), and (d) multiplier and sum.

A. Estimate of Attraction Domain

For a given Lyapunov function, the crucial step in any pro-cedure for estimating the attraction domain is determining theoptimal estimate. Consider the system of differential equations

x = Ax + g(x) (14)

where x ∈ Rn is the state vector, A ∈ R

n×n is a Hurwitzmatrix, and g : R

n → Rn is a vector function. Let

v(x) = x T Px be a given quadratic Lyapunov function for the

origin of (14), i.e., P ∈ Rn×n is a positive-definite matrix, such

that AT P + P A < 0n×n . Then, the largest ellipsoidal estimateof the attraction domain of the origin can be computed via thefollowing equality-constrained optimization problem [17]:

minx∈Rn\{02×1}

x T Px, s.t. xT P[Ax + g(x)] = 0.

Since {x ∈ Rn|x T Px ≤ x T

0 Px0} is bounded, the subset

K = {x ∈ Rn |x T Px ≤ x T

0 Px0, x T P[Ax + g(x)] = 0}is bounded no matter what g is. For simplicity, consider (14)with x = [x1 x2]T ∈ R

2, A = −I2, P = I2 andg(x) = (c(x) + 1)[x1 x2]T , where c(x) = (x1 + x2 + 2)((x2 + 1) − 0.1(x1 + 1)2). Then, the optimization problem isformulated as

minx∈R2\{02×1}

x21 + x2

2 , s.t.(

x21 + x2

2

)

c(x) = 0.

Since x �= 02×1, the problem is further formulated as

minx∈R2

v(x) = x21 + x2

2 , s.t. c(x) = 0.

Then

∇v(x) = [2x1 2x2]T

∇c(x) =[

d2 − 0.1d21 − 0.2d1d3

d2 − 0.1d21 + d3

]

d1 = x1 + 1, d2 = x2 + 1, d3 = x1 + x2 + 2.

In this example, the modified dynamical system (13) isadopted, where f is chosen as fpro in (6) with δ(x) = e−γ |x |,and γ = 10, ρ = Q = 20/‖∇cc − fpro f T

pro∇v‖. It is thensolved using the MATLAB function ode45 with variable step.1

The comparisons with the MATLAB optimal constrainednonlinear multivariate function fmincon are derived in Table I.The point xs = [−1 −1]T is a singular point, at which∇c(xs) = [0 0]T . As shown in Table I, under initial points[−3 1]T ∈ F and [2 −4]T ∈ F , by the MATLAB function, thesingular point instead of the minimum is found, whereas theproposed method still finds the minimum. Under initial point[1 −4]T /∈ F , the proposed method can still find the minimum,similar to the MATLAB function. Under a different initialvalue, the evolutions of (13) are shown in Fig. 4. As shown,once close to the singular point [−1 −1]T , the solutions

1In this section, all computation is performed by MATLAB 6.5 on a personalcomputer (Asus x8ai) with Intel core Duo 2 Processor at 2.2 GHz.

Page 7: 262 IEEE TRANSACTIONS ON NEURAL NETWORKS AND …

268 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 27, NO. 2, FEBRUARY 2016

Fig. 4. Optimization of the estimate of attraction domain. Solid line: solutionevolution. Dotted line: constraint evolution. Dashed–dotted line: objectiveevolution.

Fig. 5. Block diagram realization for example in Section V-A. (a) Blockdiagram realization. (b) Block diagram realization of the new projectionmatrix.

of (13) change direction and then move to the minimumx∗

l = [0.2061 − 0.8545]T . Compared with the discreteoptimal methods offered by MATLAB, these results show thatthe proposed method avoids convergence to a singular point.Moreover, the proposed method is comparable with currentlyavailable conventional discrete optimal methods and facilitateseven faster convergence. The latter conclusion is consistentwith that proposed in [1] and [3]. The realization by analoghardware is shown in Fig. 5.

B. Optimization With Minimum Being Singular Point

Consider a special equality-constrained optimization prob-lem as follows:

minx∈R3

v(x) = x21 + x2

2 +(

x3 − 1√3

)2

s.t. c1(x) = x1 + x2 + x3 − 1

c2(x) = x1 + x2 + x33 − 1 + 1√

3− 1

(√

3)3.

The minimum is

x∗ =[

1

2

(

1 − 1√3

)1

2

(

1 − 1√3

)1√3

]T

.

Meanwhile, the gradients of the constraints are

∇c1 =⎡

⎢⎣

1

1

1

⎥⎦ , ∇c2(x) =

⎢⎣

1

1

3x23

⎥⎦.

Obviously, ∇c1(x∗) = ∇c2(x∗), namely, the minimum isalso the singular point. Therefore, if the solution is closer tothe minimum, the projection matrix fcom(x) in (3) is moresingular. Hence, the modified projection matrix fmcom in (4)is considered. The proposed projection matrix is constructedin the following. By (7), f1 is obtained as

f1 = I3 − ∇c1∇cT1

δ(‖∇c1‖2) + ‖∇c1‖2 =

⎢⎢⎢⎢⎢⎣

2

3−1

3−1

3

−1

3

2

3−1

3

−1

3−1

3

2

3

⎥⎥⎥⎥⎥⎦

where δ(‖∇c1‖2) = 0 as ∇c1 is constant. Furthermore, theproposed projection matrix is written as

fpro(x) = f1 − f1 f T1 ∇c2(x)∇cT

2 (x) f1

ε + ‖ f T1 ∇c2(x)‖2

(15)

where δ(‖ f T1 ∇c2(x)‖2) is replaced by ε same to that in fmcom

for comparison. In particular, since ∇cT2 (x∗) f1 = 01×3 at the

singularity point, one has

fpro(x∗) = f1

which still possesses ∇c(x∗)T fpro(x∗) = 02×3 no matter whatε is. In this example, the modified dynamical system (13)is adopted, where x0 = [ −3 −2 (1/

√3) ]T and γ = 10,

ρ = Q = 20/‖∇cc − f f T ∇v‖, f = fmcom, fpro. The dif-ferential equation (13) is solved using the MATLAB functionode45 with variable step. Denote xest(k) to be the stablesolution when ε = 10−k, k = 1, . . . 10. The results are shownin Fig. 6, where c(x) = [c1(x) c2(x)]T . As shown in Fig. 6(first subplot), the obtained solutions using fpro are closer tothe minimum than those solutions using fmcom. Moreover,as shown in Fig. 6 (second subplot), the obtained solutionsusing fpro are further constrained better. These observation canbe explained that, at the singularity point, ∇c(x∗)T fpro(x∗) =02×3, whereas ∇c(x∗)T fmcom(x∗) �= 02×3. Finally, it isobserved that the advantage of the proposed projection matrixexists if ε ≥ 10−4. This is also useful because a larger εimplies an easier realization by lower-precision hardware, viceversa.

C. Estimate of Rotation Matrix

1) Problem Formulation: For simplicity, assume thatimages are taken by two identical pinhole cameras with thefocal length equal to one. As shown in Fig. 7, the two camerasare specified by the camera centers C1, C2 ∈ R

3 and attached

Page 8: 262 IEEE TRANSACTIONS ON NEURAL NETWORKS AND …

QUAN AND CAI: NEW CONTINUOUS-TIME EQUALITY-CONSTRAINED OPTIMIZATION TO AVOID SINGULARITY 269

Fig. 6. Optimization by applying two projection matrices fmcom, fproto (13), respectively.

Fig. 7. Epipolar geometry.

orthogonal camera frames {e1, e2, e3} and {e′1, e′

2, e′3}, respec-

tively. Denote T = C2 − C1 ∈ R3 to be the translation from

the first camera to the second and R ∈ R3×3 to be the rota-

tion matrix from the basis vectors {e1, e2, e3} to {e′1, e′

2, e′3},

expressed with respect to the basis {e1, e2, e3}. Then, it is wellknown in the computer vision literature [18] that two corre-sponding image points are represented as follows:

m1,k = 1

Mk(3)Mk

m2,k = 1

M ′k(3)

M ′k , k = 1, 2, . . . , N (16)

where Mk and M ′k represent the positions of the kth point

expressed in the two camera frames {e1, e2, e3} to {e′1, e′

2, e′3},

respectively; Mk(3) and M ′k(3) represent the third element

of vectors Mk , M ′k , respectively. They have the relationship

Mk = RM ′k + T, k = 1, 2, . . . , N. These corresponding

image points satisfy the epipolar constraints [18, p. 257]

mT1,k Em2,k = 0, k = 1, 2, . . . , N (17)

where E = [T ]× R is known as the essential matrix.Using the direct product ⊗ and the vec(·) operation, the

equations in (17) are equivalent to

Aϕ = 0N×1 (18)

where

A =⎡

⎢⎣

mT2,1 ⊗ mT

1,1...

mT2,N ⊗ mT

1,N

⎥⎦ ∈ R

N×9

ϕ = vec([T ]× R). (19)

In practice, these image points m1,k and m2,k are subject tonoise, k = 1, 2, . . . , N . Therefore, T and R are often solvedby the following optimization problem:

minx∈R12

v(x) = 1

2ϕ(x)T AT Aϕ(x)

s.t.1

2(‖T ‖2 − 1) = 0

1

2(RT R − I3) = 03×3 (20)

where x = [T T vec(R)T ]T ∈ R12. This is an equality-

constrained optimization considered here. In the following, theproposed method is applied to the optimization problem (20).

2) Projection Matrix: By Theorem 2, the projection matrixfor the constraint (1/2)(‖T‖2 − 1) = 0 is

fpro,1 = I3 − T T T

δ(‖T ‖2) + ‖T ‖2 .

Since ‖T ‖2 = 1 has to be satisfied exactly or approximately,then δ(‖T ‖2) = 0. Therefore, the projection matrix for theconstraint is

fpro,1 = I3 − T T T /‖T ‖2.

Then, the constraint is transformed into

T = (I3 − T T T /‖T ‖2)u1

where u1 ∈ R3. The constraint (1/2)(RT R − I3) = 03×3 is

transformed into

R = [u2]× R (21)

where u2 ∈ R3. If so, then

d

dt(RT R) = RT R + RT R

= RT ([u2]× + [u2]T×)

R = 03×3.

Furthermore, (21) is rewritten as

vec(R) = (RT ⊗ I3)H u2.

Then, the continuous-time dynamical system, whose solutionsalways satisfy the equality constraints (1/2)(‖T‖2 − 1) = 0and (1/2)(RT R − I3) = 03×3, is expressed as (2) with

fpro(x) =[

fpro,1 03×3

09×3 (RT ⊗ I3)H

]

∈ R12×6

u =[

u1u2

]

∈ R6. (22)

If the initial value ‖T (0)‖2 = 1 and R(0)T R(0) = I3, thenall solutions of (2) satisfy the equality constraints.

Page 9: 262 IEEE TRANSACTIONS ON NEURAL NETWORKS AND …

270 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 27, NO. 2, FEBRUARY 2016

3) Update Design: Since ∇v(x) = [(RT ⊗ I3)HI3 ⊗ [T ]×]T AT Aϕ, the time derivative of v(x) along thesolutions of (2) is

v(x) = −ϕT AT A�(x)T Q(x)�(x)AT Aϕ ≤ 0

where

�(x) =[

(I3 − T T T /‖T‖2)T H T (RT ⊗ I3)T

H T (RT ⊗ I3)T (I3 ⊗ [T ]×)T

]

∈ R6×9.

The simplest way of choosing Q(x) is Q(x) ≡ I6. In thiscase, the eigenvalues of the matrix A�(x)T �(x)AT are oftenill-conditioned, namely

λmin(A�(x)T �(x)AT ) � λmax(A�(x)T �(x)AT ).

Convergence rates of the components of Aϕ(x) depend onthe eigenvalues of A�(x)T Q(x)�(x)AT . As a consequence,some components of Aϕ converge fast, while the other mayconverge slowly. This leads to poor asymptotic performance ofthe closed-loop system. It is expected that each component ofAϕ can converge at the same speed as far as possible. Supposethat there exists Q(x), such that

A�(x)T Q(x)�(x)AT = I9.

Then

v(x) = −ϕT AT Aϕ ≤ 0.

By Theorem 4, x will approach the set{ x ∈ R

n| Aϕ(x) = 0N×1}, each element of which is aglobal minimum since v(x) = 0 in the set. Moreover, eachcomponent of Aϕ converges at a similar speed. However, it isdifficult to obtain such Q(x), since the number of degrees offreedom of Q(x) ∈ R

6×6 is less than the number of elementsof I9. A modified way is to make A�(x)T Q(x)�(x)AT ≈ I9.A natural choice is proposed as follows:

Q(x) = μ(�(x)AT A�(x)T )† + ε I6 (23)

where μ > 0, (�(x)AT A�(x)T )† denotes the Moore–Penroseinverse of �(x)AT A�T (x). The matrix ε I6 is to make Q(x)positive definite, where ε is a small positive real. From theprocedure above, (�(x)AT A�(x)T )† needs to be computedevery time. This, however, will cost much time. A time-saving way is to update Q(x) at a reasonable interval. Then,(11) becomes

x = − fpro(x)Q(x)�(x)AT Aϕ(x) (24)

where fpro(x) is defined in (22), and Q(x) is defined in (23).The differential equation can be solved by the Runge–Kuttamethods. The solutions of (24) satisfy the constraints, wherex = [T T vec(R)T ]T . Moreover, the dynamic system will reachsome final resting state eventually.

4) Simulation: Suppose that there exist six points in thefield of view, whose positions are expressed in the first cameraframe as follows: 1) M1 = [−1 1 1]T ; 2) M2 = [2 0 1]T ;3) M3 = [1 −1 1]T ; 4) M4 = [−1 −1 1]T ; 5) M5 = [1 1 1]T ;and 6) M6 = [−1 3 1]T . Compared with the first camera

TABLE II

RESULT FOR EXAMPLE IN SECTION V-C

.

Fig. 8. Evolvement of the state.

frame, the second camera frame has translated and rotatedwith

T =⎡

11

−1

⎦, R =⎡

0.9900 −0.0894 0.10880.0993 0.9910 −0.0894

−0.0998 0.0993 0.9900

⎦.

The image points are generated by (16). Using the generatedimage points, A is obtained by (19). Set the initial value asfollows: T (0) = [0 0 1]T , R(0) = I3, μ = 20, and ε = 0.2.The differential equation (24) is solved using MATLABfunction ode45 with variable step. Compared with MATLABoptimal constrained nonlinear multivariate function fmincon,the following comparisons are given.

As shown in Table II, the proposed method requires lesstime to achieve a higher accuracy. Given that v(x∗) = 0, thesolution is a global minimum. The evolution of each elementof x is shown in Fig. 8. The state eventually reaches a reststate at a similar speed. With different initial values, severalother simulations are also implemented. Based on the results,the proposed algorithm has met the expectations.

VI. CONCLUSION

A method to continuous-time, equality-constrainedoptimization based on a new projection matrix is proposed forthe determination of local minima. With the transformationof the equality constraint into a continuous-time dynamicalsystem, the class of equality-constrained optimization isformulated as a control problem. The resultant method ismore general than the existing control theoretic approaches.Thus, the proposed method serves as a potential bridge

Page 10: 262 IEEE TRANSACTIONS ON NEURAL NETWORKS AND …

QUAN AND CAI: NEW CONTINUOUS-TIME EQUALITY-CONSTRAINED OPTIMIZATION TO AVOID SINGULARITY 271

between the optimization and control theories. Comparedwith other standard discrete-time methods, the proposedmethod avoids convergence to a singular point and facilitatesfaster convergence through numerical integration on a digitalcomputer.

APPENDIX

A. Kronecker Product Vector and Skew-Symmetric Matrix

The symbol vec(X) is the column vector obtained bystacking the second column of X under the first, and thenthe third, and so on. With X = [xi j ] ∈ R

n×m , the Kroneckerproduct X ⊗ Y is the matrix

X ⊗ Y =⎡

⎢⎣

x11Y · · · x1mY...

. . ....

xn1Y · · · xnmY

⎥⎦.

The relation vec(XY Z) = (Z T ⊗ X)vec(Y ) holds [19, p. 318].The cross product of two vectors x ∈ R

3 and y ∈ R3

is denoted by x × y = [x]×y, where the symbol[·]× : R

3 → R3×3 is defined as [21, p. 194]

[x]× �

0 −x3 x2x3 0 −x1

−x2 x1 0

⎦ ∈ R3×3.

By the definition of [x]×, x × x = [x]×x = 03×1, ∀x ∈ R3

and

vec([x]×) = H x

H =⎡

0 0 0 0 0 1 0 −1 00 0 −1 0 0 0 1 0 00 1 0 −1 0 0 0 0 0

T

.

B. Proof of Theorem 1

Since Fu(x0) ⊆ F , the remaining task is to proveF ⊆ Fu(x0), ∀x0 ∈ F , namely, for any x ∈ F , there exists acontrol input u ∈ R

l that can transfer any initial state x0 ∈ Fto x . Since x0, x ∈ F , there exist u0, u ∈ R

l , such thatx = A⊥u and x(0) = A⊥u0 by the definition of A⊥. Designa control input

u(t) =⎧

1

t(u − u0), 0 ≤ t ≤ t

0l×1, t > t .

With the control input above, it holds that

x(t) − x(0) =∫ t

0A⊥u(s)ds

=∫ t

0A⊥u(s)ds = A⊥u − A⊥u0

when t ≥ t . Then, x(t) = x, t ≥ t . Hence F ⊆ Fu(x0),∀x0 ∈ F . Consequently, F = Fu(x0), ∀x0 ∈ F .

C. Proof of Lemma 1

Since

δ(‖L‖2) + ‖L‖2 = 1, if L = 0n×1

δ(‖L‖2) + ‖L‖2 = ‖L‖2, if L �= 0n×1

one has δ(‖L‖2) + ‖L‖2 �= 0, ∀L ∈ Rn . According to this,

the following relationship holds:LT (In − L LT /(δ(‖L‖2) + ‖L‖2))

= LT − LT ‖L‖2/(δ(‖L‖2) + ‖L‖2)

≡ 0 ∀L ∈ Rn .

This implies that LT z = 0, ∀z ∈ W2, namely, W2 ⊆ W1.On the other hand, any z ∈ W1 is rewritten as

z = (In − L LT /(δ(‖L‖2) + ‖L‖2))z

where LT z = 0 is utilized. Hence W1 ⊆ W2. Consequently,W1 = W2.

D. Proof of Theorem 3

The proof is done by mathematical induction. Denote

V j1 = {

z ∈ Rn |∇cT

i z = 0, i = 1, . . . , j, j ≤ m}

V j2 = {

z ∈ Rn |z = f j u j , u j ∈ R

n, j ≤ m}

.

First, by Theorem 2, it is easy to see that the conclusions aresatisfied with j = 1. Suppose that Vk−1

1 = Vk−12 holds. Then,

prove that Vk1 = Vk

2 holds. If so, this proof is concluded.By Vk−1

1 (x) = Vk−12 (x), one has

Vk1 = {

z ∈ Rn |∇cT

k z = 0, z ∈ Vk−11

}

= {

z ∈ Rn |∇cT

k z = 0, z = fk−1uk−1, uk−1 ∈ Rn}

= {

z ∈Rn |∇cT

k fk−1uk−1 =0, z = fk−1uk−1, uk−1 ∈Rn}

.

On the other hand, by Lemma 1, one has

∇cTk fk−1uk−1 = 0

⇔ uk−1 =(

In − f Tk−1∇ck∇cT

k fk−1

δ(‖ f T

k−1∇ck‖2) + ‖ f T

k−1∇ck‖2

)

uk

namely

Vk1 = Vk

2 = {z ∈ Rn∣∣z = fkuk, uk ∈ R

n}where

fk = fk−1

(

In − f Tk−1∇ck∇cT

k fk−1

δ(‖ f T

k−1∇ck‖2) + ‖ f T

k−1∇ck‖2

)

.

E. Proof of Propositions in Theorem 4

The proof of Theorem 4 uses three propositions. Theirproofs are shown as follows.

1) Proof of Proposition 1: In the space Rn, the set K

is compact if and only if it is bounded and closedin [20, p. 41, Th. 8.2]. Hence the remainder of thispaper is to prove that K is closed. Suppose, to thecontrary, K is not closed. Then, there exists a sequencex(tn) ∈ K → p /∈ K with tn → ∞. Whereas,v(p) = limtn→∞v(x(tn)) ≤ v(x0) and c(p) =

Page 11: 262 IEEE TRANSACTIONS ON NEURAL NETWORKS AND …

272 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 27, NO. 2, FEBRUARY 2016

limtn→∞c(x(tn)) = 0m×1 which imply p ∈ K.The contradiction implies that K is closed. Hence, theset K is compact. By (10), v(x) ≤ v(x0) with respectto (11), t ≥ 0. By Assumption 1, all solutions of (11)satisfy c(x) = 0m×1. Therefore, K is positively invariantwith respect to (11).

2) Proof of Proposition 2: Since K is compactand positively invariant with respect to (11), byTheorem 4.4 (invariance principle) in [16, p. 128], thesolution of (11) starting at x0 approaches v(x) = 0,namely, ∇v(x)T f (x) = 0. In addition, since (11)becomes x = 0n×1 in S, the solution approaches aconstant vector x∗

l ∈ S.3) Proof of Proposition 3: Since V1(x∗

l ) = V2(x∗l ) and

x∗l ∈ S satisfy the following two equalities:

∇v(x∗l )T f (x∗

l ) = 01×n, c(x∗l ) = 0m×1

there exists u, such that z = f (x∗l )u for any z ∈ V1(x∗

l ).As a consequence, for any z ∈ V1(x∗

l ), ∇v(x∗l )T z =

∇v(x∗l )T f (x∗

l )u = 0. There must exist λ∗i ∈ R,

i = 1, . . . , m, such that ∇v(x∗l ) = ∑m

i=1 λ∗i ∇ci (x∗

l ).Otherwise ∃z ∈ V1(x∗

l ), ∇v(x∗l )T z �= 0. Therefore,

x∗l ∈ S is a KKT point [15, p. 328]. Furthermore, by

Theorem 12.6 in [15, p. 345], x∗l is a strict local mini-

mum if zT ∇x x L(x∗l , λ∗)z > 0, for all z ∈ V1(x∗

l ), z �= 0.

REFERENCES

[1] K. Tanabe, “A geometric method in nonlinear programming,” J. Optim.Theory Appl., vol. 30, no. 2, pp. 181–210, Feb. 1980.

[2] H. Yamashita, “A differential equation approach to nonlinear program-ming,” Math. Program., vol. 18, no. 1, pp. 155–168, Dec. 1980.

[3] A. A. Brown and M. C. Bartholomew-Biggs, “ODE versus SQP methodsfor constrained optimization,” J. Optim. Theory Appl., vol. 62, no. 3,pp. 371–386, Sep. 1989.

[4] S. Zhang and A. G. Constantinides, “Lagrange programming neuralnetworks,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.,vol. 39, no. 7, pp. 441–452, Jul. 1992.

[5] Z.-G. Hou, “A hierarchical optimization neural network for large-scale dynamic systems,” Automatica, vol. 37, no. 12, pp. 1931–1940,Dec. 2001.

[6] L.-Z. Liao, H. Qi, and L. Qi, “Neurodynamical optimization,” J. GlobalOptim., vol. 28, no. 2, pp. 175–195, 2004.

[7] Y. Xia and J. Wang, “A recurrent neural network for solving nonlinearconvex programs subject to linear constraints,” IEEE Trans. NeuralNetw., vol. 16, no. 2, pp. 379–386, Mar. 2005.

[8] M. P. Barbarosou and N. G. Maratos, “A nonfeasible gradient pro-jection recurrent neural network for equality-constrained optimizationproblems,” IEEE Trans. Neural Netw., vol. 19, no. 10, pp. 1665–1677,Oct. 2008.

[9] Y. Xia, G. Feng, and J. Wang, “A novel recurrent neural network forsolving nonlinear optimization problems with inequality constraints,”IEEE Trans. Neural Netw., vol. 19, no. 8, pp. 1340–1353, Aug. 2008.

[10] S. Qin and X. Xue, “A two-layer recurrent neural network for nonsmoothconvex optimization problems,” IEEE Trans. Neural Netw. Learn. Syst.,vol. 26, no. 6, pp. 1149–1160, May 2015.

[11] P.-A. Absil, “Computation with continuous-time dynamical systems,”in Proc. Grand Challenge Non-Classical Comput. Int. Workshop, York,U.K., Apr. 2005, pp. 18–19.

[12] J. J. Hopfield and D. W. Tank, “‘Neural’ computation of decisionsin optimization problems,” Biological, vol. 52, no. 3, pp. 141–152,Jul. 1985.

[13] D. G. Luenberger and Y. Ye, Linear and Nonlinear Programming, 3rd ed.Boston, MA, USA: Springer-Verlag, 2008.

[14] E. L. Lawler and D. E. Wood, “Branch-and-bound methods: A survey,”Oper. Res., vol. 14, no. 4, pp. 699–719, Jul./Aug. 1966.

[15] J. Nocedal and S. Wright, Numerical Optimization. New York, NY, USA:Springer-Verlag, 1999.

[16] H. K. Khalil, Nonlinear Systems, 3rd ed. Upper Saddle River, NJ, USA:Prentice-Hall, 2002.

[17] G. Chesi, A. Garulli, A. Tesi, and A. Vicino, “Solving quadratic distanceproblems: An LMI-based approach,” IEEE Trans. Autom. Control,vol. 48, no. 2, pp. 200–212, Feb. 2003.

[18] R. Hartley and A. Zisserman, Multiple View Geometry in ComputerVision, 2nd ed. Cambridge, U.K.: Cambridge Univ. Press, 2003.

[19] U. Helmke and J. B. Moore, Optimization and Dynamical Systems.London, U.K.: Springer-Verlag, 1994.

[20] F. Morgan, Real Analysis and Applications: Including Fourier Seriesand the Calculus of Variations. Providence, RI, USA: AMS, 2005.

[21] A. Isidori, L. Marconi, and A. Serrani, Robust Autonomous Guidance:An Internal Model-Based Approach. London, U.K.: Springer-Verlag,2003.

Quan Quan received the B.S. and Ph.D. degreesfrom Beihang University, Beijing, China,in 2004 and 2010, respectively.

He has been an Associate Professor with BeihangUniversity since 2013. His current research interestsinclude vision-based navigation and reliable flightcontrol.

Kai-Yuan Cai received the B.S., M.S., andPh.D. degrees from Beihang University, Beijing,China, in 1984, 1987, and 1991, respectively.

He has been a Full Professor with BeihangUniversity since 1995. He is currently aCheung Kong Scholar (Chair Professor), jointlyappointed by the Ministry of Education of Chinaand the Li Ka Shing Foundation of Hong Kongin 1999. His current research interests includesoftware testing, software reliability, reliable flightcontrol, and software cybernetics.