Stability regions for constrained nonlinear systems and their functional characterization via support-vector-machine learning

Available online at www.sciencedirect.com

Automatica 40 (2004) 1955–1964www.elsevier.com/locate/automatica

Brief paper

Stability regions for constrained nonlinear systems and their functionalcharacterization via support-vector-machine learning�

C.J. Onga,∗, S.S. Keerthib, E.G. Gilbertc, Z.H. Zhanga

aDepartment of Mechanical Engineering, National University of Singapore, 10 Kent Ridge Crescent, Singapore 119260, SingaporebOverture, Pasedena, USA

cDepartment of Aerospace Engineering, The University of Michigan, Ann Arbor, MI, USA

Received 26 August 2003; received in revised form 16 February 2004; accepted 8 June 2004Available online 13 August 2004

Abstract

This paper develops a computational approach for characterizing the stability regions of constrained nonlinear systems. A decisionfunction is constructed that allows arbitrary initial states to be queried for inclusion within the stability region. Data essential to theconstruction process are generated by simulating the nonlinear system with multiple initial states. Using special procedures based onknown properties of the stability region, the state data are randomly selected so that they are concentrated in desirable locations nearthe boundary of the stability region. Selected states belong either to the stability region or do not, thus producing a two-class patternrecognition problem. Support vector machine learning, applied to this problem, determines the decision function. Special techniques areintroduced that significantly improve the accuracy and efficiency of the learning process. Numerical examples illustrate the effectivenessof the overall approach.� 2004 Elsevier Ltd. All rights reserved.

Keywords:Constrained nonlinear system; Stability region; Support vector machine

1. Introduction

In this paper, we consider the constrained dynamicalsystem

x(t)= f (x(t)), x(t) ∈ X, (1)

where the constraint setX is arbitrarily specified by

X = {x : hi(x)>0,∀i ∈ I }, I = {1, . . . , m}. (2)

Given an asymptotically stable equilibrium pointxs ∈ X,the objective is to characterize the corresponding domain of

� This paper was not presented at any IFAC meeting. This paper wasrecommended for publication in revised form by Associate Editor. ThorI. Fossen under the direction of Editor Hassan Khalil.

∗ Corresponding author. Tel.: +65-68742217; fax: +65-67791459.E-mail addresses:[email protected](C.J. Ong),

[email protected](S.S. Keerthi),[email protected](E.G. Gilbert).

0005-1098/$ - see front matter� 2004 Elsevier Ltd. All rights reserved.doi:10.1016/j.automatica.2004.06.005

attraction or stability region

D = {x(0) ∈ Rn : x(t) ∈ X, t�0 and x(t) → xs}. (3)

For the unconstrained case(X=Rn), this problem has beenintensively studied for over 40 years. See, for example,Davison and Kurak (1971), Michel, Sarabudla, and Miller(1982), Vannelli and Vidyasagar (1985), Genesio, Tartaglia,and Vicino (1985), Chiang, Hirsch, and Wu (1988), Chiangand Thorp (1989), Hauser and Lai (1992), Lai and Hauser(1993), Johansen (2000), Ohta, Imanishi, Gong, and Haneda(1993), Julian, Guivant, and Desages (1999)and the refer-ences cited therein. Almost all the prior research determinesinner approximations ofD as sublevel sets of suitableLyapunov functions. While the Lyapunov functions allowarbitrary initial states to be numerically queried for theirinclusion inD, the sublevel sets are often poor approxima-tions of D. A few theoretical studies, best exemplified byChiang et al. (1988), address properties of the boundary ofD. While these properties implicitly characterizeD, they

http://www.elsevier.com/locate/automatica

mailto:[email protected]



1956 C.J. Ong et al. / Automatica 40 (2004) 1955–1964

are not suited to numerical testing of inclusion. Relativelylittle attention has been given to the constrained prob-lem. See, for instance,Praprost and Loparo (1996)andVenkatasubramanian, Schattler, and Zaborszky (1995). Thisis unfortunate, because inequalities of the formhi(x(t))�0often describe unacceptable or dangerous operating condi-tions for physical systems.The approach presented here is motivated by the desire to

obtain fairly accurate functional characterizations ofD forhighly nonlinear systems, both with and without constraints.Specifically, we seek to obtain, algorithmically, a functionO(x) such that{x ∈ Rn: O(x)>0} closely approximatesD. We assume that (1) is solved numerically for many initialconditionsx(0) = xi and that for eachxi it is determinedwhether or notxi ∈ D. The initial data are then collectedinto two sets: a safe set,S={ xi : i ∈ IS}, corresponding toxi ∈ D and an unsafe set,U= {xi : i ∈ IU}, correspondingto xi /∈D. Once obtained, these sets of points are treatedas training data in a two-class pattern recognition problem.The resulting decision function isO(x). Our computationsutilize support vector machine (SVM) learning, a powerfulalgorithmic methodology that has received much attentionin recent years (Burges, 1998; Vapnik, 1995; Scholkopf &Smola, 2001). Effectiveness of the SVM learning processis enhanced by selecting the training points inS andU sothat they are most densely concentrated near the boundaryof D. We achieve this goal by exploiting a modified versionof the theory described in the paper byPraprost and Loparo(1996).Notations and basic assumptions are introduced in Sec-

tion 2. Section 3 provides a brief review of the key ideasof the Prapost and Loparo theory, emphasizing the resultsneeded in Section 4, where the procedures for generatingSandU are described. Section 5 reviews the SVM method-ology and develops a modified formulation for avoiding themisclassification of points inU. Section 6 discusses variouspractical issues associated with the computations. Illustra-tive examples appear in Section 7. Some conclusions followin Section 8.

2. Preliminaries

The boundary, interior, closure, cardinality and com-plement of a setA ⊂ Rn are denoted, respectively, by�A, intA, A, |A| andAC . The set of equilibrium points of(1) in X is E := {xe ∈ X: f (xe)= 0}. An equilibrium pointxe is hyperbolic if the Jacobian matrix�f/�x(xe) has nocharacteristic roots with zero real parts. If�f/�x(xe) hasonly one (real) characteristic root that is positive,xe is saidto be type 1. The class of functions that is continuouslydifferentiable up to therth order is denoted byCr .We make the following overriding assumptions: (A1)

f : Rn → Rn is C1; hi : Rn → R, i ∈ I is C2. (A2) Thesolution of (1) withx(0) = x,�t (x), exists for allx ∈ Rn

andt ∈ R. (A3)X is connected and bounded and�X∩E=∅.

Sx

ex

2M

2M

1M

3M

4M

5M

6M

7M

XD ∂∩∂X∂ ( ) 01 =xh

X∂

( ) 02 =xh

( ) 03 =xh

XD ∂∩∂

( ) 04 =xh

Fig. 1. Illustration ofD. Bold line segments without end points are themanifoldsMi ⊂ D. Bold dots are elements ofD that separate theMi .

(A4) (1) has no periodic orbits. (A5) All equilibrium pointsin E are hyperbolic. (A6)xs ∈ E is asymptotically stable.The stable and unstable manifolds ofxe ∈ E, restricted

to X, are:

WS(xe)={x ∈ X : �t (x) ∈ X ∀t�0 and

�t (x) → xe as t → ∞}, (4)

WU(xe)={x ∈ X : �t (x) ∈ X ∀t�0 and

�t (x) → xe as t → −∞}. (5)

Clearly,D := WS(xs).

3. Properties ofD

From the assumptions stated so far, it is easy to show(Chiang et al., 1988) thatD is open and positively invariantand that�D is closed and of dimensionn − 1. The proce-dures described in the next section generate points that lienear�D. They depend on a boundary characterization of�D of the type obtained byPraprost and Loparo (1996). Wemodify their treatment in two ways. First, instead of defin-ing X by a single functional inequality, we use the multipleinequalities of (2). Second, to eliminate technical details thatgreatly complicate the presentation, we mildly strengthentheir assumptions.To motivate the rigorous characterization of�D that fol-

lows, consider the illustrative situation shown inFig. 1. ThesetX ⊂ R2 is a box defined byhi(x)>0, i = 1,2,3,4. Itcontains two equilibrium points:xs, the desired stable equi-librium, andxe, a type 1 equilibrium point. Sincen=2,�D=⋃jMj whereMj, j = 1, . . . ,7, are the one-dimensional

manifolds shown inFig. 1. Three types of manifolds appear:(a) a submanifold of�X such that for allx(0) in the subman-ifold, x(t) ∈ X for all t >0 andx(t) → xs as t → ∞; seeM1,M3,M5 andM6. (b) a stable manifold associated with

C.J. Ong et al. / Automatica 40 (2004) 1955–1964 1957

a type 1 equilibrium point; seeM2 =WS(xe). (c) a solution(or for n>2 a family of solutions),x(t) that is tangent to�X at t = t and belongs toX for some range of times int > t ; seeM4 andM7.

In the general case, the precise characterization of�D isgiven by

�D = Da ∪ Db ∪ Dc, (6)

where

Da = U+ ∩ �D, (7)

Db =⋃

xe∈�WS(xe), � = {xe ∈ �D ∩ E : xe is type 1}, (8)

Dc={�t (x) : ∀t <0 such that

�t (x) ∈ X, x ∈ �+}. (9)

The definitions of�+ andU+ require more notations. Let

�Xi := {x : hi(x)= 0, hj (x)>0, j ∈ I, j �= i}, (10)

Hi(x) := �hi�x

(x)f (x), Hi(x) := �Hi�x

(x)f (x). (11)

The set

�+i := {x ∈ �Xi : Hi(x)= 0, Hi(x)>0} (12)

describes those points in�Xi where admissible trajectoriesof (1) x(t) ∈ X, become tangent to�Xi . Similarly, the set

U+i := {x ∈ �Xi : Hi(x)>0} (13)

describes those points where trajectories of (1), originatingin �Xi , move intoX. Note that for somei, the set�+

i and/orU+i may be empty. Assembling the sets overi ∈ I yields

�+ :=⋃

i∈I�+i , U+ :=

⋃i∈IU

+i . (14)

A little thought shows the connections betweenDa,Db andDc and the three types of manifolds shown inFig. 1.The proof of (6), not presented here, follows closely the

(lengthy) one given byPraprost and Loparo (1996). In addi-tion to (A1)–(A6) it exploits the following intuitive reason-able assumptions: (A7)�X = ⋃

i∈I�Xi (A8) �hi/�x(x) �=0∀x ∈ �Xi and i ∈ I (A9) Hi(x) �= 0∀x ∈ �Xi such thatHi(x)= 0 andi ∈ I . (A10) all trajectories of (1) belongingto �D converge to an equilibrium point in�D. A final as-sumption on the transversality of certain solution manifolds,rarely violated in practice, is omitted because of its lengthand complexity.

4. Generation of data points

The numerical approach to the generation of points insetsS andU follows closely the characterization of�Dof the preceding section. It consists of four procedures,Proc-i, i = 1,2,3,4. The first three collect points near�D,

each corresponding to one of the three sets on the rightside of (6). The last collects points evenly inX. The mix ofpoints taken from each procedure is decided empirically asdiscussed later.In what followsB�(xs) is a closed�-ball centered atxs

such thatx0 ∈ B�(xs) implies�t (x0) ∈ X for all t�0 and�t (x0) → xs ast → ∞. The ball provides a computationalbasis for the four procedures. For instance,x0 ∈ D if andonly if there exists at >0 such that�t (x0) ∈ X for all t ∈[0, t] and�t (x0) ∈ B�(xs). The computations usingB�(xs)

are faster and more accurate if�>0 is chosen so that it isnot very small.Proc-1 generates points using the standard idea of back-

ward integration in time. It samples data points from�B�(xs). Suppose anx0 ∈ �B�(xs) is chosen and the solution�t (x0) is generated fort <0. If {�t (x0) : 0> t} does notintersect�X, choose a differentx0. Let t be the largestt <0such that�t (x0) ∈ X. Choosete< t so that�t (x0) /∈X forall te< t < t and definexi = �ti (x0). Pick a fixed numberof the xi where ti is close tot . For t < ti <0(te< ti < t)assignxi toS (xi toU). This subprocedure is repeated formany x0 ∈ �B�(xs).In general, non-uniform sampling ofpoints on�B�(xs) is preferred to obtain a good spread ofthe trajectories. As most forward-time trajectories come to-wardsxs along the eigenvectors of the largest eigenvalue of�f/�x(xs), the relative sizes of the eigenvalues can be usedto effect such a sampling. While effective in capturing thestructureDa of �D, Proc-1 is not effective in characterizingDb andDc because the subset of�B�(xs), from which thebackward solution�t (x0) is close toDb orDc, is small.Proc-2 is arranged to generate points that capture the

structure of the(n − 1) dimensional manifoldsWS(xe) ofDb. To identify the correctxe ∈ E ∩ X, we use the resultfrom Venkatasubramanian et al. (1995)(Theorem 5) thatstatesxe ∈ � ⇔ {WU(xe)∩D �= ∅ andWU(xe)∩ Dc �= ∅}.Hence, two forward-time trajectories originating near thetype 1xe, are needed to establish thatxe ∈ �; one trajectoryconverges toxs while the other does not. At every type 1xe ∈ E ∩ X, Proc-2 generates forward-time trajectories byrandomly choosing initial pointsx0 ∈ �B�(xe) where�>0is very small; seeFig. 2. Then almost all thex0 will not lie inWS(xe). Moreover,xe ∈ � is defined by the situation whereroughly half of the random points converge toxs (thesex0 ∈D) and the other half do not (thesex0 ∈ Dc). Hence, witha modest number of random choices it can be determinedif xe ∈ �. Suppose axe ∈ �. Proc-2 then proceeds toob-tain points nearWS(xe). For eachx0 ∈ D ∩ �B�(xe), let tbe the largestt <0 such that�t (x0) ∈ X. If � is small, thesolution set{�t (x0) : t < t <0, x0 ∈ D ∩ �B�(xe)} is closetoWS(xe) in D. A fixed number of pointsxi, t < ti <0 arecollected intoS. Similarly, points inU, nearWS(xe) inDc, are selected from the backward solution starting fromx0 ∈ Dc ∩�B�(xe). Fig. 2shows examples of these solutionmanifolds.Proc-3 is designed to collect points nearDc of �D.

See points inFig. 2 selected from{�t (xud ) : t <0} and


sx

sdx

udx

dx

dx

0),( ≤Φ txsdt

0),( ≤Φ txudtsdx

udx

( )eS xW

( )eS xW

ex

( ) Ce DxB ∩ε

CD

D

( ) DxB e ∩∂ ε

infinityorXorequilibria

othersometo

∂

Fig. 2. Depiction of Proc-2 and Proc-3. The points marked with ‘×’ and‘◦’ are unsafe and safe points, near toDb andDc of �D. The regionsnearDb andDc are expanded for clarity.

{�t (xsd) : t <0}, where xud and xsd are nearxd ∈ �+.

Point xd can be generated by several methods. One wehave used begins at some pointx reasonably close to�+i . Then the projection ofx onto �+

i is found numer-ically by solving the following optimization problem:minx∈Rn 1

2‖x − x‖2 s.t. hi(x) = 0, Hi(x) = 0, Hi(x)�0.With a xd ∈ �+

i found, Proc-3 continues by ensuring that(i) �t (xd) → xs ast → ∞ and (ii)�t (xd) ∈ X for all t >0.When both of these conditions are satisfied, Proc-3 startsthe backward solution from an interior point inD given byxsd := xd + �(�hi(xd)/�x)/‖(�hi(xd)/�x)‖ for some smallvalue of�>0, and collects a fixed number of points from{�t (x

sd) : t�0} ∩X intoS. If �t (x

sd), t <0, extends to the

infeasible region, several unsafe points near�X are also col-lected intoU. Finally, the initial pointxd is perturbed intoDc by choosingxud := xd − �(�h(xd)/�x)/‖(�h(xd)/�x)‖and collecting points from{�t (x

ud ) : t�0} intoU.

Under mild assumptions, the set�+i for eachi ∈ I , if non-

empty, can be shown to be a(n− 2)-dimensional manifold.Whenn>2, sample points that cover�+

i in some reason-able manner are needed as starting points for Proc-3. Variouspossibilities exist. We describe in general terms a methodfor obtaining a random coverage. Supposex∗ ∈ �i is ob-tained by the projection approach. Then randomly chosenperturbations from the base pointx∗ are taken in the tangentspace{y : [(�hi(x∗)/�x)(�Hi(x∗)/�x)]Ty = 0}. After eachperturbation, the projection approach is applied to bring theresulting perturbed point onto�+

i . Many additional pointsin �+

i are generated by repeating this process.Proc-4 is used to generate data points that are more evenly

distributed inX. Points x0 ∈ X are chosen from a uni-form distribution onX. If the solution�t (x0) satisfies the

condition that�t (x0) ∈ X for all t >0 and�t (x0) → xs ast → ∞, x0 is included inS. Otherwise,x0 is included inU.

5. Determination ofO(x)

This section is concerned with the determination of O(x),whereD := {x ∈ Rn : O(x)>0} is a close approximation ofD. Various learning algorithms may be used for this determi-nation. Our choice of support vector machine (SVM) learn-ing (Vapnik, 1995) is motivated by its good performanceson a variety of problems and its desirable properties: a theo-retical error bound, an optimal solution defined by a convexquadratic programming problem, sparsity in solution rep-resentation and good generalization ability. SeeScholkopfand Smola (2001), Campbell (2000, Chapter 4)andBurges(1998)for discussions on SVM and its related properties.Consider two training setsS ⊂ S andU ⊂ U. For ev-

ery xi in S (U), an additional variableyi = +1 (yi = −1)is introduced. The standard SVM formulation introduces amapping� from the data spaceRn to a high (possibly infi-nite) dimensional Hilbert spaceH and attempts to separatethe mapped data by a hyperplane inH. The hyperplane gen-erates a discriminant functionO(x) := w · �(x) + b = 0,where the normal vector to the hyperplanew ∈ H and itsbiasb ∈ R are obtained by solving the following optimiza-tion problem:

minw,b,�

1

2w · w + C

∑i

�i (15)

s.t. w · �(xi)+ b�1− �i ∀i with yi = +1, (16)

w · �(xi)+ b� − 1+ �i ∀i with yi = −1, (17)

�i�0. (18)

If the slack variables�i are all zeroes, the safe and unsafedata are separated by a slab, centered on the hyperplane, withthickness 2‖w‖−1. This result motivates the inclusion of thetermw ·w in the cost function (15). Since a separating planemay not exist, the nonzero slack variables allow some of themapped data points to cross into or even beyond the wrongside of the slab. The parameterC allows different trade-offs between the slack variable “errors” and the thickness ofthe slab. In our formulation of the SVM, we eliminate anyclassification of unsafe training data as safe by replacing(17) with

w · �(xi)+ b� − 1 ∀i with yi = −1. (19)

Following the standard approach, the resulting convex pro-gramming problem in the(w, b, �) space is solved by itsdual formulation. The Wolfe dual (Fletcher, 1987) of (15),(16), (18) and (19) is

min�

1

2

∑i

∑j

�i�j yiyj�(xi) · �(xj )−∑i

�i (20)


s.t.∑

�iyi = 0 (21)

0��i�C ∀i with yi = +1, (22)

0��i ∀i with yi = −1. (23)

Here the vector� ∈ R|S|+|U|, with components�i , corre-sponds to the vector of Lagrange multipliers of (16) and(19). Let�∗ be the solution of the dual problem. Then thesolution of the primal problem is known. In particular,

O(x) := w · �(x)+ b =∑i

�∗i yi�(xi) · �(x)+ b. (24)

In most typical situations only a small fraction of the�∗i are

non-zero. The data points corresponding to these non-zero�∗i are known as the support vectors. Thus, the number of

terms appearing in the sum of (24) is far less than|S|+|U|.The inner product�(x) · �(y) is known as the kernel

functionk(x, y). Note that (20) and (24) are fully expressedin terms of the kernel function, so that the determination ofO(x) involves only the kernel function. The most commonchoice fork is the Gaussian function given byk(x, y) =exp[−(‖x − y‖2/22)]. The parametersC and define theSVM model and their choice is determined experimentallyas discussed in the next section. There, we also introducea third parameter>0 to define the desired discriminantfunction O(x).The numerical solution of SVM dual problems is non-

trivial. Since |S| + |U| is typically extremely large, stan-dard quadratic programming algorithms (Fletcher, 1987) areineffective. Thus, a variety of recursive algorithms havebeen developed for the dual of the standard SVM problem(15)–(18). A common approach is to use a variant of theSequential Minimal Optimization (SMO) algorithm (Platt,1998; Keerthi, Shevade, Bhattacharyya, & Murthy, 2001).By simple changes to SMO, we apply it to the numericalsolution of our unconventional dual problem (20)–(23).

6. Implementation issues

SupposeS andU are two data sets generated using Proc-1 to Proc-4. The training setS(U) consists of random sam-ples, amounting to about 10–20%, ofS ( U ) and the re-maining sets,S = S\S ( U = U\U) are used for valida-tion. The following procedure uses as inputS, U, S andU. Within a given degree of accuracy it determines valuesof C, and that optimize validation performance. Specif-ically, no points inU are misclassified and the number ofmisclassified points inS is minimized.

6.1. Tuning procedure

(1) Consider a grid over the space of (C,).(2) For every grid point of (C,), train the SVM using the

data setsS andU and obtain the functionO(x).

(3) For eachO(x) obtained from step 2, compute :=max{0, } where := max{O(xi) : xi ∈ U} and defineO(x) := O(x)− .

(4) EvaluateO(xi) for eachxi ∈ S and form the misclas-sified setSm := {xi ∈ S : O(xi)<0}.Record the ratio|Sm|/|S| for each grid point (C,).

(5) OutputO(x) for the grid point (C,) that minimizes|Sm|/|S|.

Clearly, the accuracy ofD depends on the choices ofS and U. D will be inaccurate in regions whereS andU under-representD. In general, it is difficult to identifythese regions, resulting in the use of large sets ofS andU.Besides the obvious increase in computational time, the useof large data sets can still be inaccurate if the data pointsare not strategically distributed inD. By experimentationwe have developed an adaptive procedure for improving thedistribution by shifting limited numbers of points fromSand U into S and U. It provides improved accuracy andkeeps|S| + |U| reasonably small. This is important, sincethe overall time for determiningO(x) scales quadraticallywith|S| + |U|.

6.2. Adaptive procedure (parameters�1, �2, q)

(1) Let � = 1.(2) Find the optimal pair of (C,) using thetuning proce-

durewith data setsS, U, S andU. Obtain the functionO(x) and the value at the optimal (C,).

(3) EvaluateO(x) for eachx in S andU and denote theset of misclassified points asSm andUm, respectively.Compute the ratior� := |Sm|/|S|.

(4) If �=1, go to step 5. Else letrmin�−1 := min1�i��−1{ri}. If

|r�−rmin�−1|<ε1 andr�<ε2, then terminate the adaptive

procedure withO(x)= O(x)− .(5) Rank all points inSm ∪ Um based on the degree of

misclassification,yiO(xi). Let the topq% of the mostmisclassified points beSmq andUmq .

(6) UpdateS= S∪ Smq , U= U∪ Umq , S=S\S andU = U\U. Increment� by 1 and go to step (2).

The adaptive procedure selects the most (topq%) mis-classified points in the setS∪ U to be included in the nextround of training. This step ensures that a good distributionof points is selected at regions where the current trainingdata under-representD, and has been found to be very ef-fective in our numerical experiments.We have experimentedwith various values ofq and found that its choice affects�∗,the number of iterations till termination. Whenq is large,�∗is typically small and vice versa.The stopping criterion in step 4 is ad hoc, one of many

possibilities. The adaptive procedure terminates when thepossibility of adding new points to the training set has dimin-ished and the fraction of misclassified safe points is small.It has the advantage of being able to identify cases where


S andU are poor representation ofD, and hence, flags outcases where additional collection of data is needed for betteraccuracy. This happens when|r� − rmin

�−1| is relatively smallbut r� and� are reasonably large.

7. Numerical examples

We illustrate several examples using the basic ideas de-scribed in Sections 4 through 6. All computations were per-formed on a Pentium P4 machine of (1.9GHz) having amemory size of 512MB. No effort was made to optimize therun-time for the tuning/training process. The parameters fortheAdaptive Procedureareε1=0.03,ε2=0.08 andq=10,except in Example 5 whereq is changed. The boundariesof the exact stability regions displayed in Figs. 3 through 6were determined by using the results of Section 3.

Example 1. The first example is a simple speed-controlsystem studied inGenesio et al. (1985)andChiang et al.(1988). The system is described by

x1 = x2,

x2 = −Kdx2 − x1 − gx21

(x2

Kd+ x1 + 1

).

ForKd = 1 andg = 6, there are three equilibria: two stableequilibria at (−0.78865,0) and (0.0,0.0), and one type-1 equilibrium at(−0.21135,0.0). Constraints not presentin Genesio et al., 1985were added by settingX = {x :−0.8<x1<0.8,−1.0<x2<1.0, x1+ x2<0.5}. The sizesof the point sets are|S| =667 and|U| =1299. For the firstadaptive iteration|S ∪ U| = 197 with 71 safe points. Theadaptive procedure terminated at the 5th iteration with thesuccessive iterations adding 14,13,12,12 new points. Theresults are shown inFig. 3 . The number of support vectorsis 48 and the CPU time needed for the overall training/tuningprocess is about 20 s.

Example 2. The second example is also studied byGenesioet al. (1985)andChiang et al. (1988)and is given by

x1 = −2x1 + x1x2,

x2 = −x2 + x1x2,

where there are two equilibria: a stable equilibrium at(0.0,0.0) and a type-1 equilibrium at(1.0,2.0). Again, aconstraint set,X = {x : 4.0>x1>− 3.0,4.0>x2>− 4.0}was added to the original problem. The sizes of the sets are|S| = 986 with |U| = 672. The training sets begin with|S∪U|=185. There are six iterations with 20,19,23,21,19new points added at each iteration. The results are shownin Fig. 4. The number of support vectors is 82 and the CPUtime needed for the overall training/tuning process is 30 s.

Example 3. The third example is a second-order systemof an electromagnetically actuated spring-mass-damper sys-tem studied inGilbert and Kolmanovsky (2002)andMiller,

Kolmanovsky, Gilbert, andWashabaugh (2000). The originalsystem is a nonlinear system subjected to control and stateinequality constraints. When a feedback linearizing controlis used, the resulting system becomes

x1 = x2,

x2 = − k

mx1 − c + cd

mx2 + k

mv,

wherev is a constant reference input for position of themass.The system has three constraints, arising from limits on thedisplacement of the mass and current of the electromagnet:

h1(x, v) : x1<0.008,

h2(x, v) : −kv + cdx2<0,

h3(x, v) : 1�(d0 − x1)

(kv − cdx2)− 0.3<0, (25)

with parameters:� = 4.5× 10−5, = 1.99, c= 0.6590, k =38.94, d0 = 0.0102,m= 1.54, cd = 4.0. It is easy to verifythat the equilibrium for this system is(v,0) and that the con-straints are functions ofv. For this example, we consider thecase wherev=0.005 with |S| =388 and|U| =663. At thefirst iteration|S∪ U| = 117 with |S| = 50. Subsequent it-erations added 12,14,16,16 and 9 new points, respectively,and the adaptive procedure terminates at the sixth iteration.The results for� = 1 and 6 are shown inFig. 5. The to-tal number of support vectors is 58 and the training/tuningprocess took 20 s.

Example 4. The above example can also be made into a3-dimensional problem by treatingv as the third variableand X as being defined by the three constraints in (25)and 0<v<0.008. The adaptive procedure is used to de-termine aD for the 3-dimensional stability region in thejoint (x1, x2, v) space. A representation of this type is re-quired for implementing the reference governor describedin Gilbert and Kolmanovsky (2002). The size of the data setis |S ∪ U| = 10,220 with|S| = 5037 and the first itera-tion uses|S ∪ U| = 1537 with|S| = 768. The subsequentiterations added 220,142,146,95,103 and 106 new points,respectively.Fig. 6 shows the results of stability region af-ter 1 and 7 iterations, respectively. Total number of supportvectors is 992 while the overall training/tuning process tookabout 5 h.

Example 5. The last example involves the classic problemof controlling an inverted pendulum on a cart. It is a 4-dimensional problem described by

x1 = x2,

x2=((M/m)+ sin2 x3)−1

×( um

+ x24l sinx3 − g sinx3 cosx3),


-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8-1

-0.8-0.6-0.4-0.2

0

0.20.40.60.8

1

-1

-0.8-0.6-0.4-0.2

0

0.20.40.60.8

1

xsxe

(i) After 1 iteration-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8

(ii) After 5 iterations

xsxe

Fig. 3. The exact (solid line) and estimated (dashed line) stability regions for Example 1.

-5 -4 -3 -2 -1 0 1 2 3 4 5-6

-4

-2

0

2

4

6

xs

xd

xd

xe

(i) After 1 iteration-5 -4 -3 -2 -1 0 1 2 3 4 5

-6

-4

-2

0

2

4

6

xs

xd

xd

xe



-2 0 2 4 6 8 10

x 10-3 x 10-3

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

(i) After 1 iteration

xs

-2 0 2 4 6 8 10


xs


x3 = x4,

x4=((M/m)+ sin2x3)−1

(− u

mcosx3 − x24l cosx3 sinx3

+(1+ M

m

)g sinx3

),

whereu is given by the feedback law

u= 0.55x1 + 1.84x2 + 27.3x3 + 8.66x4. (26)

The setX = {x : |x1|<7.0, |x3|<�/4, |u|<10} with theorigin being the only equilibrium point inX. The param-eter values arem = M = 0.5, l = 1.4 and g = 10. For


-4 -2 0 2 4 6 8 10-40

-30

-20

-10

-40

-30

-20

-10

-30

-20

-10

-30

-20

-10

0

10

20

-40

-30

-20

-10

0

10

20

v = 0.001-4 -2 0 2 4 6 8 10

v = 0.001

-6 -4 -2 0 2 4 6 8 10

0

10

20

30

40

-40

-30

-20

-10

0

10

20

30

40

v = 0.003-6 -4 -2 0 2 4 6 8 10

v = 0.003

-2 0 2 4 6 8 10

0

10

20

30

-30

-20

-10

0

10

20

30

v = 0.005-2 0 2 4 6 8 10

v = 0.005

3 4 5 6 7 8 9 10

0

10

20

-30

-20

-10

0

10

20

v = 0.0073 4 5 6 7 8 9 10

v = 0.007

Fig. 6. Sections of the exact (solid line) and estimated (dashed line) stability regions for four values ofv in Example 4. Left column,�=1; right column,� = 6.

this problem, q = 80 is used to increase the num-ber of misclassified points added in each iteration ofthe adaptive procedure. The size of the data set is

|S ∪ U| = 36,399 with |S| = 18,389 and the firstiteration uses|S ∪ U| = 7624 with |S| = 3210. Thesubsequent iterations added 767,452,443 and 465 new


points, respectively. The fifth iteration ended with 4553support vectors and the overall training/tuning process took9h. At the final iteration, the number of misclassified pointsarer5 = |Sm|/|S| = 0.0753 and|Um|/|U| = 0.

8. Discussions and conclusions

Very little has been done in the past to characterize thestability regionD of constrained nonlinear systems. This isnot surprising since the description of the boundary ofD,as described in Section 3, is complex. This paper presents acomputational methodology for approximately characteriz-ing D as the sublevel set of the discriminant functionO(x).The methodology is based on two main steps: the collectionof randomly selected safe and unsafe initial data via systemsimulation; the determination ofO(x) by machine learningclassification of the safe and unsafe data. Details in the im-plementation of the methodology, crucial to its success, in-clude: Proc-1,2,3 for weighting more heavily the selectionof initial data near the boundary ofD, a modified formu-lation of the SVM learning that guarantees that all unsafedata points are not classified as being safe, a tuning pro-cedure for optimizing the values of the SVM parametersCand, an adaptive procedure that introduces new trainingdata that greatly improve validation accuracy without sig-nificantly increasing the run time of the SMO algorithm.Several example problems, of dimension up to four, supportthe effectiveness of the overall approach.Possibilities for extending our approach appear promis-

ing. As can be seen from the examples, total computationaltimes increase rapidly with the dimensionn. The explana-tion is that|S ∪ U| must increase significantly to suitablycover neighborhoods of the higher dimensional boundariesof D. Fortunately, in most practical applications, the deter-mination ofO(x) is off-line; it is the use ofO(x) that ison-line. A feasible upper limit forn with our present ap-proach is 5 or 6. Our emphasis has been on obtaining ratheraccurate representations ofD. By seeking inner less accu-rate representations, it should be possible to ease the upperlimit on n. By choosingS from a suitable subset ofD (sayinitial points are safe if their resulting trajectories satisfyconstraints tightened byhi(x)> �>0) and definingU asbefore, the separation betweenS andU mapped intoH isless tight. This eases the difficulty of the SVM learning andhence the required value of|S ∪ U|. It is also possible totreat systems with a finite number of disturbances and mod-eling errors. Eachx(0) must be safe for all of the distur-bances and all of the modeling errors. This greatly increasesthe number of required simulations (parallel processing istrivially implemented) but not the complexity of the SVMlearning. There is some reason to be encouraged by both ofthese extensions. There are many examples of SVM learn-ing, where separation is not as difficult as in our examples,that have treated effectively problems with largen and rel-atively small data sets.

References

Burges, C. C. (1998). A tutorial on support vector machines for patternrecognition.Data Mining and Knowledge Discovery, 2(2), 121–167.

Campbell, C. (2000). An introduction to Kernel methods. in Howlett, &Jain (Eds.),Radial basis function networks: Design and Applications(pp. 155–192). Berlin: Physica, (Chapter 7).

Chiang, H. D., Hirsch, M., & Wu, F. (1988). Stability regions of nonlinearautonomous dynamical systems.IEEE Transactions on AutomaticControl, AC-33, 16–27.

Chiang, H. D., & Thorp, J. S. (1989). Stability regions of nonlineardynamical systems: A constructive methodology.IEEE Transactionson Automatic Control, AC-34, 1229–1241.

Davison, E. J., & Kurak, E. M. (1971). A computational methodfor determining quadratic Lyapunov functions for nonlinear systems.Automatica, 7, 627–636.

Fletcher, R. (1987). Practical methods of optimization. 2nd ed., NewYork,Wiley.

Genesio, R., Tartaglia, M., & Vicino, A. (1985). On the estimation ofasymptotic stability region: state of the art and new proposals.IEEETransactions on Automatic Control, AC-30, 747–755.

Gilbert, E. G., & Kolmanovsky, I. (2002). Nonlinear tracking control inthe presence of state and control constraints: A generalized referencegovernor.Automatica, 38, 2063–2073.

Hauser, J., & Lai, M. C. (1992). Estimating quadratic stability domainsby nonsmooth optimization.Proceedings of the American contributionconference(pp. 571–576).

Julian, P., Guivant, J., & Desages, A. (1999). A parameterizationof piecewise linear Lyapunov functions via linear programming.International Journal of Control, 72, 702–715.

Johansen, T. (2000). Computation of Lyapunov functions for smoothnonlinear systems using convex optimization.Automatica, 36,1617–1626.

Keerthi, S. S., Shevade, S. K., Bhattacharyya, C., & Murthy, K. R. K.(2001). Improvements to Platt’s SMO algorithm for SVM classifierdesign.Neural Computation, 13, 637–649.

Lai, M. C., & Hauser, J. (1993). Computing maximal stability region usinga given Lyapunoc function.Proceedings of the American contributionconference(pp. 1500–1502).

Michel, A. N., Sarabudla, N. R., & Miller, R. K. (1982). Stability analysisof complex dynamical systems.Circuits, Systems, Signal Processing,1, 171–202.

Miller, R. H., Kolmanovsky, I., Gilbert, E. G., & Washabaugh, P. D.(2000). Control of constrained nonlinear systems: A case study.IEEEControl Systems Magazine, 20, 23–32.

Ohta, Y., Imanishi, H., Gong, L., & Haneda, H. (1993). Computergenerated Lyapunov functions for a class of nonlinear systems.IEEETransactions Circuits and Systems, 40, 343–354.

Platt, J. (1998). Fast training of support vector machines using sequentialminimal optimization. in Schölkopf, Burges, & Smola (Eds.),Advancesin Kernel methods—support vector learningCambridge, MA: MITPress.

Praprost, K. L., & Loparo, K. A. (1996). A stability theory for constraineddynamic systems with applications to electric power systems.IEEETransactions on Automatic Control, 41, 1605–1617.

Scholkopf, B., & Smola, A. (2001). Learning with Kernels: Support vectormachine. Regularization, optimization and Beyond. Cambridge, MA,MIT Press.

Vannelli, A., & Vidyasagar, M. (1985). Maximal Lyapunov functions anddomains of attraction for autonomous nonlinear systems.Automatica,21, 69–80.

Vapnik, V. (1995). The nature of statistical learning theory. New York,Springer.

Venkatasubramanian, V., Schattler, H., & Zaborszky, J. (1995).Dynamics of large constrained nonlinear systems—A taxonomy theory.Proceedings of the IEEE, 83(11),


Chong-Jin Ong received his B.Eng (Hons)and M.Eng degrees in mechanical engineer-ing from the National University of Sin-gapore in 1986 and 1988 respectively, andthe M.S.E. and Ph.D. degrees in mechani-cal and applied mechanics from Universityof Michigan, Ann Arbor, in 1992 and 1993respectively. He joined the National Univer-sity of Singapore in 1993 and is now anAssociate Professor with the Department ofMechanical Engineering. His research inter-ests are in robotics, control theories and ma-chine learning.

S. Sathiya Keerthi received the B.S. degreein mechanical engineering from REC Trichy,University of Madras, Madras, India in 1980,the M.S. degree in mechanical engineeringfrom the University of Missouri_Rolla, in1982 and teh Ph.D. in control engineeringfrom the University of Michigan, Ann Arborin 1986. After working for about one yearwith Applied Dynamics International, AnnArbor, MI, doing Research and Developmentin real-time simulation, he joined the Fac-ulty of the Department of Computer Scienceand Automation, Indian Institute of Science,

Bangalore, India in April 1987. From May 1999 till Dec 2003 he waswith the Control Division of the Department of Mechanical Engineering,National University of Singapore, as an Associate Professor. Currently heis a researcher at the Yahoo! Research Labs, Pasadena, CA, USA. He haspublished over 40 papers in leading international journals. His currentresearch interests are mainly in kernel methods for pattern classification.

Elmer Gilbert received the degree of Ph.D.from the University of Michigan in 1957.Since then, he has been in the Departmentof Aerospace Engineering at the Universityof Michigan and is now Professor Emeri-tus. His current interests are in optimal con-trol, nonlinear systems, robotics and ma-chine learning. He has published numerouspapers and holds eight patents. He receivedIEEE Control Systems Field Award in 1994and the Bellman Control Heritage Awardof the American Automatic Control Coun-cil in 1996. He is a member of the John

Hopkins Society of Scholars, a Fellow of the American Associationfor the Advancement of Science, a Fellow of the Institute of Electricaland Electronics Engineers and a member of the National Academy ofEngineering (USA).

Zhang Zhenhua received the B.S. degreein mechanical engineering and B.S. degreein application mathematics from ShanghaiJiao Tong University, P.R. China in 1998and M.S. degree in mechanical engineer-ing from National University of Singapore(NUS), Singapore in 2003. While in NUS,his research focused on the use of supportvector machines for stability region deter-mination of dynamical systems.

Documents

Stability regions for constrained nonlinear systems and their functional characterization via support-vector-machine learning