21
OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD * ILYAS FATKHULLIN ‡† AND BORIS POLYAK Abstract. The linear quadratic regulator is the fundamental problem of optimal control. Its state feedback version was set and solved in the early 1960s. However the static output feedback problem has no explicit-form solution. It is suggested to look at both of them from another point of view as matrix optimization problems, where the variable is a feedback matrix gain. The properties of such a function are investigated, it turns out to be smooth, but not convex, with possible non-connected domain. Nevertheless, the gradient method for it with the special step-size choice converges to the optimal solution in the state feedback case and to a stationary point in the output feedback case. The results can be extended for the general framework of unconstrained optimization and for reduced gradient method for minimization with equality-type constraints. Key words. Linear quadratic regulator, optimal control, nonconvex minimization, state feedback, output feedback, gradient method, convergence AMS subject classifications. Primary, 49N10; Secondary, 49M37, 90C26, 90C52 1. Introduction. The linear quadratic regulator (LQR) problem is formulated as an optimization problem of minimizing a quadratic integral cost with respect to control function. It has been extensively analyzed in the last century since the seminal works of Kalman in 1960 [20, 21]. The main result claims that for an infinite-horizon LTI system the optimal control can be expressed as linear static state feedback. The optimal gain can be found by solving the algebraic matrix Riccati equation (ARE). The results became classical and were immediately included in textbooks on control [4, 5, 24]. New approaches to the problem were based on the techniques of semidefinite programming — reduction to convex optimization with Linear Matrix Inequalities (LMIs) as constraints [10, 17, 6, 23]. Linear static feedback is a very natural and simple form of control for engineers, thus there were many attempts to extend the technique for other control problems. The nearest relative of LQR is output feedback — the same LTI system with quadratic performance in the case when full state is not measured but some output (a linear function of the state) is available. The attempts to apply static output feedback (SOF) met numerous difficulties. The problem was first addressed by Levine and Athans [26], but it was discovered that such stabilizing control may be lacking and there are no simple optimality certificates if it does exist. Serious theoretical efforts were directed on the formulation of existence conditions, see [39, 9], but the problem remains open. If a system is stabilizable via a static output controller, there are just necessary conditions for optimality; moreover, these conditions are formulated as a system of nonlinear matrix equations [26]. Thus the design of optimal SOF implies application of numerical methods. The first one was proposed in [26], but it requires to solve nonlinear matrix equations on each iteration. The method suggested by Anderson and Moore [4] is based on the solution of linear matrix equations only, but its properties were not obvious. Some results on the convergence of both methods can be found in [30]. Since then, numerous iterative schemes have been proposed, see [40, 30, 34, 31, 13, 38, 17, 35] and references therein. However rigorous validation is lacking for many of them, while some others include hard nonlinear problems to be solved at each iteration. To sum up, optimization of SOF remains a challenging problem. A promising tool for solving both state and output feedback control is the direct gradient method. Matrix gain K for state u(t)= Kx(t) or output u(t)= Ky(t) control is considered as variable for optimization of the objective function which is expressed as f (K). This function is well-defined for the set of stabilizing controllers S (otherwise the quadratic integral performance index is not defined). The set S is open and the minimum of f (K) is achieved at the interior point. Thus a simple gradient method for unconstrained minimization of f (K) can be applied K j+1 = K j - γ j f (K j ) * Submitted to the editors on April 2020. Funding: The revised version of this work was funded by Russian Science Foundation under Grant 21-71-30005. Moscow Institute of Physics and Technology, Institutskiy per. 9, 141700, Moscow Region, Dolgoprudny, Russia ([email protected]). Institute for Control Sciences, Profsoyuznaya 65, 117806, Moscow, Russia ([email protected]). 1 arXiv:2004.09875v2 [math.OC] 1 Nov 2020

arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD∗

ILYAS FATKHULLIN‡† AND BORIS POLYAK‡

Abstract. The linear quadratic regulator is the fundamental problem of optimal control. Its state feedback version wasset and solved in the early 1960s. However the static output feedback problem has no explicit-form solution. It is suggestedto look at both of them from another point of view as matrix optimization problems, where the variable is a feedback matrixgain. The properties of such a function are investigated, it turns out to be smooth, but not convex, with possible non-connecteddomain. Nevertheless, the gradient method for it with the special step-size choice converges to the optimal solution in the statefeedback case and to a stationary point in the output feedback case. The results can be extended for the general framework ofunconstrained optimization and for reduced gradient method for minimization with equality-type constraints.

Key words. Linear quadratic regulator, optimal control, nonconvex minimization, state feedback, output feedback,gradient method, convergence

AMS subject classifications. Primary, 49N10; Secondary, 49M37, 90C26, 90C52

1. Introduction. The linear quadratic regulator (LQR) problem is formulated as an optimizationproblem of minimizing a quadratic integral cost with respect to control function. It has been extensivelyanalyzed in the last century since the seminal works of Kalman in 1960 [20, 21]. The main result claimsthat for an infinite-horizon LTI system the optimal control can be expressed as linear static state feedback.The optimal gain can be found by solving the algebraic matrix Riccati equation (ARE). The results becameclassical and were immediately included in textbooks on control [4, 5, 24]. New approaches to the problemwere based on the techniques of semidefinite programming — reduction to convex optimization with LinearMatrix Inequalities (LMIs) as constraints [10, 17, 6, 23]. Linear static feedback is a very natural and simpleform of control for engineers, thus there were many attempts to extend the technique for other controlproblems.

The nearest relative of LQR is output feedback — the same LTI system with quadratic performance inthe case when full state is not measured but some output (a linear function of the state) is available. Theattempts to apply static output feedback (SOF) met numerous difficulties. The problem was first addressedby Levine and Athans [26], but it was discovered that such stabilizing control may be lacking and there areno simple optimality certificates if it does exist. Serious theoretical efforts were directed on the formulation ofexistence conditions, see [39, 9], but the problem remains open. If a system is stabilizable via a static outputcontroller, there are just necessary conditions for optimality; moreover, these conditions are formulated as asystem of nonlinear matrix equations [26]. Thus the design of optimal SOF implies application of numericalmethods. The first one was proposed in [26], but it requires to solve nonlinear matrix equations on eachiteration. The method suggested by Anderson and Moore [4] is based on the solution of linear matrixequations only, but its properties were not obvious. Some results on the convergence of both methods can befound in [30]. Since then, numerous iterative schemes have been proposed, see [40, 30, 34, 31, 13, 38, 17, 35]and references therein. However rigorous validation is lacking for many of them, while some others includehard nonlinear problems to be solved at each iteration. To sum up, optimization of SOF remains a challengingproblem.

A promising tool for solving both state and output feedback control is the direct gradient method. Matrixgain K for state u(t) = Kx(t) or output u(t) = Ky(t) control is considered as variable for optimization ofthe objective function which is expressed as f(K). This function is well-defined for the set of stabilizingcontrollers S (otherwise the quadratic integral performance index is not defined). The set S is open andthe minimum of f(K) is achieved at the interior point. Thus a simple gradient method for unconstrainedminimization of f(K) can be applied

Kj+1 = Kj − γj∇f(Kj)

∗Submitted to the editors on April 2020.Funding: The revised version of this work was funded by Russian Science Foundation under Grant 21-71-30005.†Moscow Institute of Physics and Technology, Institutskiy per. 9, 141700, Moscow Region, Dolgoprudny, Russia

([email protected]).‡Institute for Control Sciences, Profsoyuznaya 65, 117806, Moscow, Russia ([email protected]).

1

arX

iv:2

004.

0987

5v2

[m

ath.

OC

] 1

Nov

202

0

Page 2: arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

provided that the initial stabilizing controller K0 is known. Gradient ∇f(K) for state feedback case has beenfound in the pioneering paper of Kalman [20], for output feedback it was obtained by Levine and Athans[26]. Its calculation is computationally inexpensive — it requires the solution of two Lyapunov equations.Such approach looks very attractive, but there are some obstacles. For state feedback the set S is connectedbut (in general) nonconvex [2], thus f(K) can be nonconvex as well. A more sophisticated situation is metfor output control. The set S can be disconnected [16, 15] while saddle points or local minima can exist in aconnected component. These difficulties explain why in many papers gradient method was applied withoutrigorous validation, as a purely heuristic algorithm. Luckily it worked successively in many applications.

Recently there was a breakthrough in this field. First there appeared papers devoted to discrete-timeversion of state-feedback LQR [11, 14]. f(K), despite being non-convex is shown to satisfy the so-calledLezanski-Polyak-Lojasiewicz (LPL) condition. This condition was proposed in the works [27, 36, 28] backin the 1960s and still remains a powerful tool in non-convex optimization [22]. Based on the LPL conditionit was possible to prove global convergence of the gradient method to optimal controller. Important works[32, 33] overcome the nonconvexity obstacle for classical continuous-time LQR. It was proved that the LPLcondition holds for this case and the gradient method converges. This line of research is continued in[29, 18, 19, 12].

The situation is more complicated for output control. As we mentioned above, the domain S can benonconnected, and values of local minima at different connected components are different. Moreover, severallocal minima points can exist in a single component. Thus it is hard to expect something better thanconvergence to a stationary point.

Contributions of the paper As we have mentioned, direct optimization methods for feedback controlis a highly intensive direction of recent research. If compared with known results the main contributions ofthe presented paper can be formulated as follows.

a. Most of the results on the convergence of the gradient method for state feedback were known fordiscrete-time case [11, 14, 19, 18]. We focus on the continuous-time case and prove convergence of the methodto the single minimizer with a linear rate. Similar results have been obtained in [32, 33] but the techniqueof the proof there is completely different. In [32] the problem was converted into convex optimization bythe change of variables. However such transformation is possible for state control only, while we use thetechnique which fits for both state and output cases.

b. Novel results on convergence (and rate of convergence) to stationary points are obtained for outputfeedback. They are based on the proved L-smoothness of the objective function. It is worth mentioning thata similar analysis can be applied to the wider class of problems which is called in [30] parametric LQR. Itincludes such important problems as low-order control, PID control, decentralized control.

c. The particular properties of the feedback optimization allow to design new versions of minimizationmethods — such as novel step-size rule for gradient and conjugate gradient methods or global convergenceof the reduced gradient method. These algorithms can be extended to a general optimization setup, seesection 6.

Organization of the paper In section 2 we formulate the LQR as a matrix optimization problem withnonlinear equality constraints. Then it is reduced to matrix unconstrained minimization with objective f(K)and its domain S. Section 3 discusses the properties of this function defined on a generally non-convex set.The most important are L-smoothness property; for state feedback case LPL condition holds. In section 4the gradient flow on this set is showed to be exponentially stable and the discrete gradient method withspecial step-size rule is introduced. The convergence guarantees are presented. Section 5 illustrates thenumerical experiments for the proposed method. In section 6 we address the links between the proposedmethod and general optimization problems such as unconstrained smooth minimization and optimizationwith equality-type constraints. Finally, in section 7 we discuss directions for future research. The proofs ofthe results are relegated to Appendix.

2. Problem Statement. We use standard notation: ‖ · ‖− spectral norm of a matrix; ‖ · ‖F− itsFrobenius norm; Sn− the set of symmetric matrices; I is the identity matrix; A � B (A � B) means thatthe matrix A−B is positive (semi-)definite; the eigenvalues λi(A) of a matrix A ∈ Rn×n are indexed in anincreasing order with respect to their real parts, i.e., < (λ1(A)) ≤ . . . ≤ < (λn(A)) .

2

Page 3: arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

Consider linear time-invariant system

x(t) = Ax(t) +Bu(t),(2.1)

y(t) = Cx(t),

Ex(0)x(0)> = Σ,

with state x and output y and matrices A ∈ Rn×n, B ∈ Rn×m, C ∈ Rr×n. The infinite-horizon LQRperformance criterion is given by

(2.2) E∫ ∞

0

[x(t)TQx(t) + u(t)TRu(t)

]dt,

where the expectation is taken over the distribution of an initial condition x(0) with zero mean and covariancematrix Σ, and the quadratic cost is parameterized by 0 ≺ Q ∈ Sn, and 0 ≺ R ∈ Sm.

The static feedback control is u(t) = −Ky(t), where gain K ∈ Rm×r is a constant matrix. Then theclosed loop system is given by

(2.3) x(t) = AKx(t), AK = (A−BKC)

and objective function becomes

(2.4) f(K) = E∫ ∞

0

[x(t)>(Q+ C>K>RKC)x(t)

]dt.

We use notation f(K) to underline that the performance index depends on gain only; all other ingredientsof system description are known. Thus our optimization problem is

(2.5) f(K)→ minK∈S

,

here S is the set of stabilizing feedback gains,

S ={K ∈ Rm×n : <λi(A−BKC) < 0,∀i ∈ 1, n

}.

Indeed, f(K) is defined for stabilizing controllers K ∈ S only.The problem of existence of stable output feedback is hard, see e.g. [9, 39]. However we are not interested

in this, our main assumption is that a stabilizing controller exists and is available:K0 ∈ S is known.For instance if A is Hurwitz then we can take K0 = 0. This controller will be taken as the initial

approximation for iterative methods. Thus our goal is to improve the performance of the known regulator.Denote S0 the sublevel set

S0 = {K ∈ S : f(K) ≤ f(K0)} .

We suppose the following Assumptions hold:• K0 ∈ S exists;• Q,R,Σ � 0;• rank(C) = r.

Notice that there are no assumtions on controllability/observability, existence of K0 ∈ S suffices. Also weassume B 6= 0, otherwise the problem is trivial. Condition Q � 0 sometimes can be relaxed to Q � 0, butwe do not focus on this.

We distinguish two main versions of the problem:1. SLQR - state LQR - if C = I, that is the state x(t) is available as control input. If it is needed to

specify the performance index f(K) for this case, we denote it as fS(K).2. OLQR - output LQR - if C 6= I, when output y(t) is the only information available. We use notation

fO(K) to specify this case, while f(K) is used in general situation.Let us formulate the problem as matrix constrained optimization one. To avoid calculation of integrals

Bellman lemma [7] is instrumental.

3

Page 4: arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

Lemma 2.1. Given W � 0, and a Hurwitz matrix A. Then on the solution of the LTI system

x(t) = Ax(t), x(0) = x0

it holds that ∫ ∞0

x>(t)Wx(t)dt = x>0 Xx0,

where X is the solution of the Lyapunov matrix equation

A>X +XA = −W.

Applying this result we rewrite 2.5 in the final form

Problem 2.2.

(2.6) f(K) := Tr (XΣ)→ minK,

(2.7) (A−BKC)>X +X(A−BKC) + C>K>RKC +Q = 0, X � 0.

This is an optimization problem with matrix variables K,X and nonlinear equality-type constraint (2.7).For K ∈ S the solution X � 0 of this equation exists (Lyapunov theorem), we denote it as X(K). Thus theproblem is rewritten in the form (2.5) with f(K) = Tr (X(K)Σ).

In the next section we analyse the properties of the function f(K), its domain S and sublevel set S0.

3. Properties of f(K).

3.1. Examples. We start with few simple examples to exhibit the variety of situations.

Example 3.1. Let us consider 1D example with parameters A = 0 ∈ R, Q = R = 2B = 1 ∈ R,K = k ∈R. The function

f(k) = k +1

k

can be written explicitly.

Here S = R+ is convex and unbounded, S0 is bounded, f(K) is convex and unbounded on S (see Figure 1).

Example 3.2. Let n = 2,m = 2, set A,B and C to be identity matrices. Then

S ={K ∈ R2×2 : k11 + k22 < 1 + k11k22 + k12k21, k11 + k22 < 2

}.

We see that S is not convex. This can be verified if one takes a cut x = k11 = k12, y = k22 = k21 (seeFigure 2). Moreover the boundary of S is non-smooth.

Example 3.3. For n = 3,m = 1 consider the matrices A =

0 1 00 0 10 0 0

, B =

001

and C = I. Then

S ={K ∈ R1×3 : k1 > 0, k2k3 > k1

}.

Again S is not convex with non-smooth boundary. For instance, the cut x = k1, y = k2 = k3 (see Figure 3)is not convex.

Previous examples related to SLQR (state feedback). Now we proceed to OLQR (output feedback).

Example 3.4. Consider an example with a scalar control and Q = I3, R = 1, A =

0 1 00 0 1−1 −1 −α

,

B =

001

and C =(5 2 1

).

4

Page 5: arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

Figure 1. f(K) for 1D example

Figure 2. Nonconvex cut of S for m = n = 2 Figure 3. Nonconvex cut of S for m = 1, n = 3

Then

S = {k ∈ R : k + α > 0, (k + α)(2k + 1) > 5k + 1 > 0} .

If α = −1 this set is non-connected, it has two connectivity components. The function is illustrated onFigure 4. It has a single minima at each of the components. If α = −1.4 this set is connected, it is a rayk > −0.2. The function is illustrated on Figure 5. It has two local minima located in the same connected

5

Page 6: arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

Figure 4. fO(K) for scalar output control with localminima in two disconnected set.

Figure 5. fO(K) for scalar output control with localminima in the same connected component.

Figure 6. Two local minima of fO(K) Figure 7. Two connectivity components of S

component.

Example 3.5.

A =

0 1 00 0 1−1 −1 −α

, B =

001

, C =

(1 1 01 −1 1

), Q = I3, R = 1.

The set

S ={K ∈ R1×2 : α+ k2 > 0, 1 + k1 + k2 > 0, (1 + k2)(α+ k1 − k2) > 1 + k1 + k2

}is connected with two local minima and a saddle point K = (1.95, 0.38) for α = 1.2 (Figure 6). If α is set to0.9 there are two connectivity components with a single local minimum in each component (Figure 7).

We conclude that domain S of f(K) can be nonconvex with non-smooth boundary even for SLQR, anddisconnected for OLQR. Function f(K) can be unbounded on its domain but it looks smooth. We shallvalidate these properties below.

3.2. Connectednes of S, S0. It was known that S in state feedback case is connected [32], and thesame is true for S0.

Lemma 3.6. Let C = I. The sets S,S0 are connected for every K0 ∈ S.

6

Page 7: arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

Proof. For C = I equation (2.7) becomes (A−BK)>X +X(A−BK) +K>RK +Q = 0. It is provedin [23] that equality here can be replaced with inequality and after change of variables P = X−1 definitionof stabilizing controllers becomes

S = {K = R−1BTP−1, AP + PAT −BR−1BT + PQP � 0, P � 0}.

The inequality for P can be rewritten as block LMI and defines a convex set. Its image given by thecontinuous map K = R−1BTP−1 is connected. Similarly the set S0 is defined by the same map for the sameset of P with extra constraint TrP−1Σ ≤ f(K0) which is convex (again it can be written as LMI in P ), thisimplies connectedness of S0.

We provided the proof to demonstrate well known technique of variable change [10] which allows to trans-form the original problem to a convex one. This line of research was developed in [32, 33] to validate thegradient method. Unfortunately this trick does not work for output feedback — there exist no convexreparametrization in this case.

As we have seen in Examples, the set S can be non-connected. Upper estimates for the number Nof connected elements for particular cases may be found in [16]. For instance, if m = r = 1 (single-inputsingle-output system) then N ≤ n+ 1. For more general problems with additional condition K ∈ L,L beinga linear subspace in the set of matrices (so-called decentralised control) the number of components can growexponentially, see [15], where numerous examples can be found.

3.3. f(K) is coercive and S0 is bounded. The Examples exhibit that function f(K) is unboundedon its domain. Below we analyse its behavior in more details.

Definition 3.7. A continuous function f : K 7→ f(K) ∈ R defined on the set S is called coercive if forany sequence {Kj}∞j=1 ⊆ S

f(Kj)→ +∞

if ‖Kj‖ → +∞ or Kj → K ∈ ∂S.

Lemma 3.8. The function f(K) = Tr (X(K)Σ) is coercive and the following estimates hold

(3.1) f(K) ≥ λ1(Σ)λ1(Q)

−2<λn(AK),

(3.2) f(K) ≥ λ1(Σ)λ1(R)||K||2Fλ1(CCT )

2||A||+ 2||K||F ||B||||C||.

The proof of the Lemma and further results can be found in Appendices B and C. From estimate (3.2) weimmediately get

Corollary 3.9. For any K0 ∈ S the set S0 is bounded.

On the other hand a minimum point of f(K) on S0 exists (continuous function on a compact set) but S0

has no common points with boundary of S due to (3.1). Hence

Corollary 3.10. There exists a minimum point K∗ ∈ S.

This reasoning can be seen as an alternative proof of lemma 2.1 in [40].

3.4. Gradient of f(K). Differentiability of f(K) is a well known fact, proved in the pioneering papersby Kalman [20] for SLQR and by Levine and Athans [26] for OLQR. We provide it for completeness.

Lemma 3.11. For all K ∈ S, the gradient of (2.6) is

(3.3) ∇f(K) = 2(RKC −BTX

)Y C>,

where Y is the solution to the Lyapunov matrix equation

(3.4) AKY + Y A>K + Σ = 0.

7

Page 8: arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

Proof. Consider the increment of the Lyapunov equation (2.7)

A>KdX + dXAK + dA>KX +XdAK + C>dK>RKC + C>K>RdKC = 0,

A>KdX + dXAK + C>dK>(RKC −B>X) + (C>K>R−XB)dKC = 0.

Denote M := RKC −B>X then

df(K) = Tr (ΣdX) = 2 Tr(Y C>dK>M

)= 〈2MYC>, dK〉,

where Y is the solution to (3.4).

The necessary condition for the minimizer of f(K) is ∇f(K∗) = 0 (because K∗ exists and belongs tothe open set S). This condition implies the set of three nonlinear matrix equations for K∗: ∇f(K∗) = 0,(3.4), (2.7). In general they can not be solved explicitly and numerical methods are required.

However there is the famous case of state feedback control C = I when explicit form of the solution(going back to Kalman [20]) can be obtained. Then by setting the gradient calculated in Lemma 3.11 tozero and noting that Y∗ � 0, C = I we get

K∗ = R−1BTX∗.

Further, substituting the control matrix in (2.7) by the expression for K∗ we obtain the well known Riccatiequation for X∗

ATX∗ −X∗BR−1BTX∗ +X∗A−X∗BR−1BTX∗ +X∗BR−1RR−1BTX∗ +Q = 0,

ATX∗ +X∗A−X∗BR−1BTX∗ +Q = 0.

Of course this is not completely explicit solution because Riccati equation should be solved numerically, butthe methods for this purpose are well developed [3, 8].

3.5. Second derivative of f(K). The performance index f(K) is twice differentiable. To avoidtensors, we restrict analysis with the action of the Hessian ∇2f(K))[E,E] on a matrix E ∈ Rm×n. It isgiven by the expression

(3.5)1

2∇2f(K)[E,E] =

⟨(REC −B>X ′(K)[E]

)Y C>, E

⟩+⟨MY ′(K)[E]C>, E

⟩,

where X ′ := X ′(K)[E] and Y ′ := Y ′(K)[E] are the solutions to equations

A>KX′ +X ′AK + (−BEC)>X +X(−BEC) + C>E>RKC + C>K>REC = 0,

AKY′ + Y ′A>K + (−BEC)Y + Y (−BEC)> = 0,

which can be equivalently rewritten as

A>KX′ +X ′AK +M>EC +

(M>EC

)>= 0,

AKY′ + Y ′A>K −

(BECY + (BECY )>

)= 0.

Then applying Lemma A.1 and substituting X ′ for Y ′ in the last term of (3.5) we obtain

Lemma 3.12. For all K ∈ S, the gradient of f(·) is differentiable and the action of the Hessian of f(·)on any E ∈ Rm×n satisfies

(3.6)1

2∇2f(K)[E,E] =

⟨RECY C>, E

⟩− 2

⟨B>X ′Y C>, E

⟩.

As Examples show, f(K) is in general nonconvex. However for state feedback case we can guaranteelocal strong convexity in the neighborhood of the minimum point K∗.

Corollary 3.13. fS(·) is strongly convex in the neighborhood of K∗.

8

Page 9: arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

Proof. Note that when K = K∗ the second term in (3.6) turns to zero. If we recall that R, Y � 0 it isstraightforward to show that

〈REY,E〉 = Tr(

(R12E)Y (R

12E)>

)> 0,

Then the Hessian is positive definite at K∗ and there is a neighbourhood of K∗ where the function fS(·) isstrongly convex.

The upper bound for the second derivative is available.

Lemma 3.14. On the set S the action of the Hessian ∇2f(K) on a matrix E ∈ Rm×n, ‖E‖F = 1 can bebounded as

(3.7)1

2∇2f(K)[E,E] ≤ λn(R)λn(CY C>) + ‖X ′‖F ‖B‖‖C‖F ‖Y ‖,

where X ′ and Y are solutions to the Lyapunov matrix equations

A>KX′ +X ′AK +M>EC +

(M>EC

)>= 0,

AKY + Y A>K + Σ = 0.

Proof. It follows from (3.6) that

1

2sup‖E‖F =1

∣∣∇2f(K)[E,E]∣∣ ≤ sup

‖E‖F =1

(∣∣⟨RECY C>, E⟩∣∣+ 2∣∣⟨B>X ′Y,E⟩∣∣)

Now we estimate both terms in this expression separately assuming ‖E‖F = 1:⟨RECY C>, E

⟩= Tr

(RECY C>E>

)≤ λn(R)λn(CY C>).

By Cauchy - Schwarz inequality

|⟨B>X ′Y C>, E

⟩| = | 〈X ′, BECY 〉 | ≤ ‖X ′‖F ‖BECY ‖F .

It suffices to bound ‖BECY ‖F when ‖E‖F = 1

‖BECY ‖F =√

Tr (BECY Y C>E>B>) ≤ ‖B‖‖C‖F ‖Y ‖.

3.6. f(K) is L-smooth on S0. A function is called L-smooth, if its gradient satisfies Lipschitz conditionwith constant L. Function f(K) fails to be L-smooth on S, however it has this property on sublevel set S0.

Theorem 3.15. On the set S0 the function f(K) is L-smooth with constant

(3.8) L =2f(K0)

λ1(Q)

(λn(R)‖C‖2 + ‖B‖‖C‖F ξ

),

where ξ =√nf(K0)λ1(Σ)

(f(K0)‖B‖λ1(Σ)λ1(Q) +

√(f(K0)‖B‖λ1(Σ)λ1(Q)

)2

+ λn(R)

).

For the proof see Appendix B.

Corollary 3.16. The following inequality holds for K ∈ S0:

(3.9)∣∣∇2f(K)[E,E]

∣∣ ≤ L‖E‖2Fwhere L is given in (3.8).

Indeed for twice differentiable functions Lipschitz constant L for gradients equals to the upper bound for thenorm of second derivatives.

As we have seen for the examples, the boundary of S can be non-smooth, while level sets of f(K) aresmooth due to L-smoothness property of f(K).

9

Page 10: arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

3.7. Gradient domination property. As we have seen, f(K) can be noncovex even for state feedbackcase (SLQR). However there is a useful property which replaces convexity in validation of minimizationmethods. This property is referred to in the optimization literature as gradient domination or Lezanski-Polyak-Lojasiewicz (LPL) condition [36, 27, 28, 22].

Theorem 3.17. The function fS(K) defined in (2.6) satisfies the LPL condition on the set S0

(3.10)1

2‖∇fS(K)‖2F ≥ µ(fS(K)− fS (K∗))

where µ > 0 is given by

(3.11) µ =λ1 (R)λ2

1(Σ)λ1 (Q)

8f(K∗)(‖A‖+ ‖B‖2fS(K0)

λ1(Σ)λ1(R)

)2 .

Constant µ in the LPL condition depends on K0 and tends to zero when fS(K) tends to infinity. Thecondition is false for the entire set S, as can be seen from Example 3.1. The condition cannot be applied foroutput feedback - for instance, in Example 3.4 there are two disconnected components with different valuesof minima. Moreover in Example 3.5 there are two local minima in the connected domain.

4. Methods. Now we proceed to versions of gradient method for minimization of f(K). This is nota standard task, because function f(K) is defined not on the entire space of matrices, it is unbounded onits domain and can be nonconvex. However the properties of the function obtained in section 3 allow to getconvergence results. In all cases, the gradient methods behave monotonically. For SLQR global convergenceto the single minimum point with linear rate can be validated. For OLQR global convergence to a stationarypoint holds. In all versions of the method, the known stabilizing controller K0 serves as the initial point.

4.1. Continuous Method. First we consider the gradient flow defined by the system of ordinarydifferential equations

(4.1)

{K(t) = −∇f(K),K (0) = K0 ∈ S.

Theorem 4.1. The solution of the above system Kt = K(t) ∈ S0 exists for all t ≥ 0, f(Kt) is monotonedecreasing and

(4.2) ∇f(Kt) −−−→t→∞

0, min0≤t≤T

||∇f(Kt)||2 ≤f(K0)

T.

If C = I then Kt converges to the global minimum point K∗ exponentially:

(4.3) ‖Kt −K?‖F ≤√

2L(f(K0)− f(K∗))

µe−µt,

where µ and L are determined in Theorems 3.15 and 3.17.

The main idea of the proof is the equality ddtf(K) = −||∇f(K)||2, the details are in Appendix D.

4.2. Discrete Method. Consider the gradient method in general form

(4.4) Kj+1 = Kj − γj∇f(Kj).

The properties obtained in Theorems 3.15 and 3.17 allow to establish convergence guaranties for the abovemethod.

Theorem 4.2. For arbitrary 0 < γj ≤ 2L method (4.4) generates nonincreasing sequence f(Kj):

(4.5) f(Kj+1) ≤ f(Kj)− γj(

1− Lγj2

)||∇f(Kj)||2F .

10

Page 11: arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

Moreover if 0 < ε1 ≤ γj ≤ 2L − ε2, ε2 > 0 then

∇f(Kj)→ 0, min0≤j≤k

||∇f(Kj)||2F ≤f(K0)

c1k, c1 =

ε1ε2L

2.

and for C = I the method converges to the global minimum K∗ with a linear rate

(4.6) ||Kj −K∗|| ≤ cqj , 0 ≤ q < 1.

The simplest choice is γj = 1/L, then in the last inequality constants c, q can be written explicitly. Theproof in Appendix D is mainly the replica of the standard ones in [36]; however the non-trivial part is theproof that all iterations remain in S0.

4.3. Algorithm. The method above is just a “conceptual” one, we do not know constant L and it ishard to estimate it. Thus an implementable version of the algorithm is needed. It can be constructed asfollows. Inequality (4.5) provides the opportunity to apply Armijo-like rule: step-size γ satisfies this rule if

f(K − γ∇f(K)) ≤ f(K)− αγ||∇f(K)||2F

for some 0 < α < 1. We can achieve this inequality by subsequent reduction of the initial guess for γ due to(4.5). This initial guess can be taken as follows. Consider a univariate function

(4.7) ϕ(t) = f(K − t∇f(K)),

One iteration of Newton method for minimization of ϕ(t) starting from t0 = 0 implies

t1 =ϕ′(0)

ϕ′′(0).

Calculating derivatives we get

(4.8) t1 =‖∇f(K)‖2F

∇2f(K)[∇f(K),∇f(K)].

But expressions for these quantities were obtained in section 3 (see (3.3) and (3.6)). Notice that t1 ≥ 1/Ldue to (3.9), thus such step-size is bounded below. Taking γj = min{t1, T1} with some T1 > 0 (such upperbound is needed to restrict the step-size) for K = Kj in gradient method we arrive to the basic algorithmbelow.

Algorithm 4.1 Gradient method

1: Return: K.2: Initialization: K0 ∈ S, ε > 0, α ∈ (0, 1), T1 > 0.3: while ‖∇f(K)‖F ≥ ε do4: Solve for X: A>KX +XAK +Q+K>RK = 0.5: Solve for Y : AKY + Y A>K + Σ = 0.6: M ← RK −B>X, ∇f(K)← 2MY .7: Solve for X ′: A>KX

′ +X ′AK +M>∇f(K) +∇f(K)>M = 0.8: ∇2f(K)[∇f(K),∇f(K)]← 2〈R∇f(K)Y,∇f(K)〉 − 4〈B>X ′Y,∇f(K)〉.9: t← min{T1,

‖∇f(K)‖2F∇2f(K)[∇f(K),∇f(K)]},Kprev ← K.

10: Gradient step: K ← K − t∇f(K).11: if K ∈ S or f(K) ≥ f(Kprev)− αt‖∇f(Kprev)‖2F then12: t← αt,13: repeat the gradient step.14: end if15: end while

11

Page 12: arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

Theorem 4.3. For Algorithm 4.1 the number of step reductions is bounded uniformly for all iterationsand convergence results of Theorem 4.2 hold true.

The proof follows the same lines as for Theorem 4.2 and is given in the Appendix.There are different ways to choose constants T1, α in the Algorithm. We do not discuss them here,

because there are various implementations of the Algorithm and they deserve a separate consideration.It is also possible to consider a different approach for a stepsize choice. For instance, it can be chosen in

such a way that guaranties that a new iterate remains stabilizing. Then there is no need to check if K ∈ Son every iteration. Consider the Lyapunov equation

(A−BKC)Y + (A−BKC)> + I = 0.

Denote Kt = K − t∇f(K) and G = (B∇f(K)C)Y + Y (B∇f(K)C)>.

AY + Y A> −[(BKtC)Y + Y (BKtC)>

]+ I − tG = 0,

AKtY + Y A>Kt+ I − tG = 0.

The function V (x) = x>Y −1x remains the quadratic Lyapunov function for a new AKtwhen I − tG � 0. If

λmax(G) ≤ 0, then Kt ∈ S,∀t > 0. Otherwise, Kt ∈ S if 0 < t < 1λmax(G) .

5. Simulation. We have started with the comparison of various versions of the step-size choice ofgradient descent method for low-dimensional tests, such as Examples 3.1 to 3.5. In all cases, Algorithm 4.1was superior and converged to global or local minimizers with high accuracy in 10–20 iterations.

For medium-size simulation we generated matrices with dimensions n = 100,m = 10 for SLQR problem:

C = I, A =1

nrand(n, n)− I, B = ones(n,m) +

1

2rand(n,m),

Q = Q1Q>1 , Q1 = rand(n, n), R = R1R

>1 , R1 = rand(m,m),

where ones(n,m) is a n×m matrix with all entries equal to one and rand(n,m) is a n×m matrix with everyentry generated from the uniform distribution between 0 and 1. We choose the initial stabilizing controlleras K0 = 0. It is indeed stabilizing because A is Hurwitz. We find optimal gain K∗ by solving ARE, thuswe could compare the accuracy of the obtained solutions. Then we apply three different versions of thefirst order methods to solve this problem. The first one is the simplest version of the gradient method (4.4)with constant step-size γj = γ tuned at initial iterations to guarantee monotonicity of f(Kj), it is denotedas GD r. The second is our basic Algorithm 4.1 (GDN) . The last one is the conjugate gradient methoddescribed below by update rules in (6.5) (CGN). The convergence of the methods is illustrated in Figure 8.Of course, the simplest form of gradient method GD r is very slow, because step-size should be stronglyenlarged after initial iterations. Our basic algorithm (GDN) converges satisfactory; it is worth mentioningthat the number of step reductions or truncations is minimal (approximately 10 for 100 iterations), thusstep-size rule (4.8) works with minor corrections at all stages of iteration process. Finally, the proposedversion of the conjugate gradient method strongly accelerates convergence.

Of course these calculations are preliminary, much more should be done to develop reliable and efficientgradient-based algorithms for state feedback which can win in competition with classical algorithms basedon Riccati-equation techniques. The behavior of the method for output feedback also requires detailedinvestigations.

6. Links with general optimization problems. The results obtained above for the particular feed-back minimization problem can provide some surplus profit for the analysis of several abstract formulationsfor unconstrained and constrained optimization. We consider three such ”side effects”.

6.1. Step-size choice for gradient descent. The step-size rule proposed in (4.8) is also valid for ageneral setup of smooth unconstrained optimization problem

minx∈Rn

f(x).

12

Page 13: arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

Figure 8. Three first-order methods for n = 100,m = 10

The gradient method becomes

(6.1) xj+1 = xj − γj∇f(xj), γj =‖∇f(xj)‖2

〈∇2f(xj)∇f(xj),∇f(xj)〉

and it is promising whenever the problem structure allows an efficient computation of the quadratic formin the denominator. It is particularly attractive in practice, because it does not require the knowledge ofconstants L and µ and uses a second order information at a minor cost. For quadratic functions f(x) =(Hx, x) the method coincides with the steepest descent. For nonquadratic functions its rigorous validationis possible for strongly convex case.

Theorem 6.1. Let f(·) be twice differentiable µ-strongly convex function in Rn, ∇f(·) and ∇2f(·) Lip-schitz continuous with constants L and M respectively. Then if the initial condition x0 satisfies

(6.2) M√

2L(f(x0)− f(x∗)) ≤ 3µ2 (1− δ) , δ > 0,

then the method (6.1) converges to the global minimizer x∗ with a linear rate:

(6.3) f(xj)− f(x∗) ≤ (f(x0)− f(x∗))

(1− µδ

L

)j.

The damped version of (6.1) (γj replaced with σγj , σ ≤ µL) converges for an arbitrary x0:

(6.4) f (xj)− f (x∗) ≤ (f (x0)− f (x∗))(

1− µσ

L

)j.

The proofs are deferred to Appendix D.Similar step-size rule can be applied for the solution of constrained minimization problem minQ f(x) via

gradient projection method [37].

13

Page 14: arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

6.2. New version of the conjugate gradient method. Similar approach can be exploited for theconjugate gradient method for unconstrained minimization of f(x) in Rn. The standard version of themethod requires 1D minimization for finding step-size αj , but it can be replaced as follows:

(6.5) xj+1 = xj + αjpj , αj =‖pj‖2

(∇2f(xj)pj , pj),

pj = −∇f(xj) + βjpj−1, βj =‖∇f(xj)‖2

‖∇f(xj−1)‖2, β0 = 0.

There are various formulae for βj , see e.g. [37], we provided above just the simplest one. Probably, conver-gence results for (6.5) can be obtained.

6.3. Reduced gradient method. Gradient method for feedback minimization can be considered ingeneral setup of abstract optimization problem with equality-type constraints

minx,y

f(x, y),

s.t. g(x, y) = 0,

here x ∈ Rn, y ∈ Rm, g(x) ∈ Rn. Suppose that the solution x(y) of the equality g(x, y) = 0 for fixed y ∈ Scan be found either explicitly or with minor computational efforts. Define F (y) := f(x(y), y). Thus problemis converted to unconstrained optimization

miny∈S

F (y).

Gradient of F (y) can be written with no problems

∇F (y) = −∇yg(x, y)>((∇xg(x, y))−1)>∇xf(x, y)> +∇yf(x, y)>

and gradient method with y0 ∈ S becomes so-called reduced gradient method:

(6.6) yj+1 = yj − γj∇F (yj), xj = x(yj).

The method has been proposed by Ph.Wolfe [41] and implemented in numerous algorithms, see e.g. [1].The standard assumption was S = Rn. However the method for nonlinear equalty constraints had justlocal theoretical validation (see e.g. Theorem 8, Chapter 8.2 in [37]), while the main interest is its globalconvergence. In the setup of the present paper x corresponds to Y , y to K. The main tool for provingconvergence in general case is to obtain the conditions which are the analogs of our results on L-smoothnessand LPL-condition (Theorems 3.15 and 3.17). If such results hold, the proof is a replica of our considerations.

7. Conclusion. The results can be extended in several directions. First, more efficient computationalschemes are of interest. Gradient method is the simplest method for unconstrained smooth optimization.Accelerated algorithms - such as conjugate gradient, heavy ball, Nesterov acceleration - are developed forstrongly convex functions. But we have proved (Corollary 3.13) that fS(K) is strongly convex in theneighborhood of the optimal solution K∗. Thus such methods are applicable to accelerate local convergence.Second, more research should be devoted to output minimization. For instance, how common is the effect ofmultiple minima in one connectivity component (as in Example 3.5)? Does the method converge to a localminima only or it can be a saddle point? Third, there is highly important research direction which unitesthe problems of control, optimization and machine learning and uses such approaches as policy optimization,reinforcement learning, adaptive control, see the survey [19]. The gradient method can be easily extended todecentralised control (this additional condition K ∈ L,L being a linear subspace in the space of matrices),see e.g. [15] and to other parametric LQR problems [30]. However its validation remains open question.

Appendix A. Basic Facts. The following lemmas are helpful throughout the paper.

14

Page 15: arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

Lemma A.1. Let X and Y be the solutions to the dual Lyapunov equations with Hurwitz matrix A

A>X +XA+W = 0,

AY + Y A> + V = 0.

Then Tr (XV ) = Tr (YW ).

Lemma A.2. Let W1 �W2 and X1, X2 be the solutions to Lyapunov equations with Hurwitz matrix A

A>X1 +X1A+W1 = 0,

A>X2 +X2A+W2 = 0.

Then X1 � X2.

Lemma A.3. Let N,L ∈ Rm×n. Then for any α > 0

N>L+ L>N � αN>N +1

αL>L.

Lemma A.4. For all positive semi-definite A,B ∈ Rn×n, it holds that

λ1(A) Tr(B) ≤ Tr(AB) ≤ λn(A) Tr(B).

Lemma A.5. If X � 0 is the solution of Lyapunov equation

A>X +XA−Q = 0,

with A Hurwitz and Q � 0 then

λn(X) ≥ λ1(Q)

2σ(A), λ1(X) ≥ λ1(Q)

2‖A‖,

where σ(A) := −maxi [<λi(A)] is the stability degree of A. These are well known lower bounds for Lyapunovequation, see e.g. [25].

Appendix B. Analysis of the OLQR.

B.1. Proof of Lemma 3.8.

Proof. Let us first consider the sequence {Kj}∞j=1 ⊆ S: Kj → K ∈ ∂S, i.e. σ(K) = 0. The stability

degree is a continuous map, i.e. σ (A−BKjC) → σ(A − BKC). Therefore, ∀ε > 0,∃N = N(ε) ∈ N suchthat

|σ (A−BKjC)− σ(A−BKC)| = σ (A−BKjC) < ε, ∀j ≥ N.

Let Xj be the solution to the corresponding Lyapunov equation (2.7) associated with Kj , then

(B.1)

f (Kj) = Tr(XjΣ) ≥ λ1(Σ) Tr(Xj) ≥λ1(Σ)λ1(Q+ C>K>j RKjC)

2σ(AKj )≥ λ1(Σ)λ1(Q)

2σ(AKj )≥ λ1(Σ)λ1(Q)

2ε→ +∞

if ε→ 0. Here the second inequality is based on Lemma A.5.On the other hand, suppose that the sequence {Kj}∞j=1 ⊆ S : ‖Kj‖ → +∞.

f(Kj) = Tr (XjΣ) = Tr(Yj(Q+ C>K>j RKjC)

)≥ Tr

(YjC

>K>j RKjC)≥ λ1 (Yj) Tr

(C>K>j RKjC

)≥ λ1(Yj)λ1(CC>) Tr

(K>j RKj

)≥ λ1(Yj)λ1(CC>)λ1(R) Tr

(K>j Kj

)≥ λ1(Yj)λ1(CC>)λ1(R)‖Kj‖2F ,

where Yj is the solution to the Lyapunov equation AKjYj +YjA

>Kj

+ Σ = 0. Here the second equality followsfrom Lemma A.1, the first inequality is due to Yj , Q � 0 and all the rest inequalities simply use Lemma A.4.

15

Page 16: arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

But Lemma A.5 implies

(B.2) λ1(Yj) ≥λ1(Σ)

2‖AKj‖.

Therefore, we can complete the proof by estimating the denominator using properties of the matrix norm:

‖A−BKjC‖ ≤ ‖A‖+‖BKjC‖ ≤ ‖A‖+‖BKjC‖F ≤ ‖A‖+√

Tr(C>K>j B>BKjC) ≤ ‖A‖+‖Kj‖F ‖B‖‖C‖,

where the last inequality uses Lemma A.4 and the definition ‖C‖ =√λn(CC>), ‖B‖ =

√λn(BB>). Hence

(B.3) f(Kj) ≥λ1(Σ)λ1(CC>)λ1(R)‖Kj‖2F

2‖A−BKjC‖≥ λ1(Σ)λ1(CC>)λ1(R)‖Kj‖2F

2‖A‖+ 2‖Kj‖F ‖B‖‖C‖→ +∞,

if ‖Kj‖F → +∞.

B.2. Proof of Theorem 3.15.

Proof. Note that in Lemma 3.14 Y depends on K and X ′ on K and E. We should obtain a uniformestimate that depends only on the problem parameters and K0. The first term in Lemma 3.14 can be upperbounded as

(B.4) λn(R)λn(CY C>) ≤ λn(R)

λ1(Q)f(K0)‖C‖2.

The above estimate follows from Lemma C.2 and inequalities

λn(CY C>) ≤ Tr(CY C>) ≤ Tr(Y )‖C‖2.

In view of ‖Y ‖ ≤ Tr(Y ), the second term in Lemma 3.14 is bounded as

(B.5) ‖B‖‖C‖F ‖Y ‖ ≤‖B‖‖C‖Fλ1(Q)

f(K0).

Further it suffices to bound ‖X ′‖F . We first show that X ′ � αX with some constant α. Recall that X ′ isthe solution to

A>KX′ +X ′AK + C>K>REC + C>E>RKC − (XBEC + (XBEC)>) = 0.

By utilizing the facts from Appendix A we obtain for any α, β > 0 that X ′ � X ′, where X ′ is the solutionto

A>KX′ + X ′AK + αC>K>RKC +

1

αC>E>REC +

(βX2 +

1

β(BEC)>BEC

)= 0.

Further we divide the previous equation by α > 0 and aim to choose the constants α and β to ensure thatX ′ � αX.

(B.6) A>K

(X ′

α

)+

(X ′

α

)AK + C>K>RKC +

1

α2C>E>REC +

1

α

(βX2 +

1

β(BEC)>BEC

)= 0.

Consider the matrix function of two variables

F (α, β) := C>E>(

1

αR+

1

βB>B

)EC + βX2 − αQ.

To obtain an upper bound on X ′ we solve the two dimensional minimization problem on (α, β) with therelaxed matrix inequality constraint F1(α, β) � F (α, β) � 0:

α→ minα,β>0

,

F1(α, β) � 0,

16

Page 17: arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

where

F1(α, β) :=

(1

αλn(R) +

1

β‖B‖2 + β‖X‖2 − αλ1(Q)

)I.

The solution is the pair

α∗ =‖X‖‖B‖+

√‖X‖2‖B‖2 + λ1(Q)λn(R)

λ1(Q), β∗ =

‖B||‖X‖

.

Note that it trivially follows from X � f(K0)λ1(Σ) I that X2 � f2(K0)

λ21(Σ)

I. Therefore,

X ′ ≤ α∗X ≤α∗

λ1(Σ)f(K0)I.

Let us denote η := α∗λ1(Σ)f(K0) and note that as X ′ and ηI commute we obtain the bound on the Frobenius

norm

(B.7) ‖X ′‖F ≤√nη ≤

√nf(K0)

λ1(Σ)

f(K0)‖B‖λ1(Σ)λ1(Q)

+

√(f(K0)‖B‖λ1(Σ)λ1(Q)

)2

+ λn(R)

:= ξ.

The result (3.8) follows directly from Lemma 3.14 if we apply the obtained bounds (B.4), (B.5), and (B.7).

Appendix C. Analysis of SLQR.

C.1. Technical Lemmas.

Lemma C.1. Consider the state feedback control (i.e. C = I ). Let K∗ ∈ S be the optimal feedback gainand K0 ∈ S. Then for K ∈ S0

(C.1) fS(K)− fS(K∗) ≤(‖A‖+ ‖K‖F ‖B‖)2λn(Y∗)

λ1(R)λ1(Σ)‖∇fS(K)‖2F ,

where Y∗ is the solution to the Lyapunov matrix equation

(C.2) AK∗Y∗ + Y∗A>K∗ + Σ = 0.

Proof. The Lyapunov equations for arbitrary X = X(K) and X∗ = X(K∗) are

(C.3) A>KX +XAK +K>RK +Q = 0,

(C.4) A>K∗X∗ +X∗AK∗ +K>∗ RK∗ +Q = 0.

Substituting (C.3) from (C.4) gives

(C.5) A>KX −A>K?X∗ +XAK −X∗AK∗ +K>RK −K>∗ RK∗ = 0,

which is equivalent to

(C.6) A>K?(X −X∗) + (X −X∗)AK∗ + (K −K∗)>M +M>(K −K∗)− (K −K∗)>R(K −K∗) = 0,

where M = RK −B>X.For any α > 0(K −K?)

>M +M> (K −K?) ≤ 1

α (K −K?)>

(K −K?) + αM>M.Therefore, picking α = 1

λ1(R) we obtain

(K −K?)>M +M> (K −K?)− (K −K∗)>R(K −K∗)

≤ αM>M + (K −K∗)>(1

αI −R)(K −K∗)

≤ 1

λ1(R)M>M.

17

Page 18: arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

Let Z be the solution to

A>K∗Z + ZAK∗ +1

λ1(R)M>M = 0.

Then (X −X∗) ≤ Z. Further,

fS(K)− fS(K∗) = Tr((X −X?)Σ) ≤ Tr(ZΣ) =1

λ1(R)Tr(M>MY∗

)≤ λn(Y∗)

λ1(R)Tr(M>M

)≤ λn(Y∗)

λ1(R)λ21(Y )

Tr(Y >M>MY

)=

λn(Y∗)

4λ1(R)λ21(Y )

‖∇fS(K)‖2F ,

where Y satisfiesAKY + Y A>K + Σ = 0.

It follows from (B.2) that

λ1(Y ) ≥ λ1(Σ)

2‖AK‖≥ λ1(Σ)

2 (‖A‖+ ‖B‖‖K‖F )> 0.

Therefore,

fS(K)− fS(K∗) ≤(‖A‖+ ‖B‖‖K‖F )

2λn(Y∗)

λ1(R)λ21(Σ)

‖∇fS(K)‖2F .

Lemma C.2. For K ∈ S and the solution to the Lyapunov matrix equation

AKY + Y A>K + Σ = 0

it holds that

(C.7) λn(Y ) ≤ f(K)

λ1(Q+ C>K>RKC).

Proof.

λ1(Q+ C>K>RKC) Tr (Y ) ≤ Tr(Y(Q+ C>K>RKC

))= Tr (XΣ) = f(K).

Lemma C.3. For K ∈ S the norm ‖K‖F is bounded for f(K) bounded and

(C.8) ‖K‖F ≤2‖B‖f(K)

λ1(Σ)λ1(R)+‖A‖‖B‖

.

Proof. Indeed, consider (3.2) as a quadratic equation with respect to ‖K‖F . Bounding its largest rootwe obtain an explicit expression

‖K‖F ≤2‖B‖f(K) + 2‖B‖f(K)

√1 + 2‖A‖λ1(Σ)λ1(R)

‖B‖2f(K)

2λ1(Σ)λ1(R)≤ 2‖B‖f(K)

λ1(Σ)λ1(R)+‖A‖‖B‖

.

C.2. Proof of Theorem 3.17.

Proof. In view of Lemma C.1 it sufficies to plug (C.7) and (C.8) into (C.1).

Appendix D. Analysis of the Methods.

D.1. Proof of Theorem 4.1.

Proof. The proof is the direct replica of Theorems 8, 9 in [36]. The only difference is that in [36] theobjective function was defined on the entire space while here it is defined on S and is L-smooth on S0 ∈ S.But differentiating f(Kt) as a function of t we get d

dtf(Kt) = −||∇f(Kt)||2, thus f(Kt) is monotone andK(t) remains in S0 for all t ≥ 0. The estimate (4.2) follows from

f(K0) ≥ f(K0)− f(KT ) =

∫ T

0

||∇f(Kt)||2dt ≥ T min0≤t≤T

||∇f(Kt)||2.

18

Page 19: arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

D.2. Proof of Theorem 4.2.

Proof. Denote K = K0 and introduce ϕ(t) = f(Kt),Kt = K− t∇f(K) as in (4.7); we assume ∇f(K) 6=0. Then scalar function ϕ(t) is differentiable for small t (because K is the interior point of S and f(K)is differentiable on S) and ϕ′(0) = −‖∇f(K)‖2 < 0. Thus ϕ(t) < ϕ(0) = f(K) and Kt ∈ S0 for smallt > 0. The set S0 is bounded, denote T = max{t : Kτ ∈ S0, 0 ≤ τ ≤ t} (i.e. the moment of the firstintersection of the ray Kt, t > 0 with the boundary of S0). For 0 ≤ t ≤ T we can exploit Theorem 3.15 toguarantee L-smoothness of f(Kt). This implies L′-smoothness of ϕ(t), 0 ≤ t ≤ T with L′ = L‖∇f(K)‖2.

Hence |ϕ(T )− ϕ(0)− ϕ′(0)T | ≤ L′T 2

2 . But ϕ(T ) = ϕ(0) = f(K) and we conclude that T ≥ 2L . This means

that Kt = K − t∇f(K) ∈ S0 for all 0 ≤ t ≤ 2L . Thus the segment [K0,K1] ∈ S0. The same follows for

all Kj , j > 1. The end of the proof is the same as for Theorems 3, 4 in [36], because we are able to useL-smoothness along the entire descent trajectory.

D.3. Proof of Theorem 4.3. The proof can be easily reconstructed via the comments which led tothe formulation of the algorithm.

D.4. Proof of Theorem 6.1.

Proof. Lipschitz continuity of ∇2f implies∣∣∣∣f(x+ y)− f(x)− 〈∇f(x), y〉 − 1

2〈∇2f(x)y, y〉

∣∣∣∣ ≤ M

6‖y‖3, ∀x, y ∈ Rn.

Taking x = xj , y = −γj∇f(xj) yields∣∣∣∣f(xj+1)− f(xj) +1

2γj‖∇f(xj)‖2

∣∣∣∣ ≤ Mγ3j

6‖∇f(xj)‖3.

Denoting ϕj = f(xj) we obtain

ϕj+1 ≤ ϕj −1

2γj‖∇f(xj)‖2

(1−

Mγ2j

3‖∇f(xj)‖

).

Now it is clear that if M‖∇f(xj)‖ < 3µ2, then ϕj+1 < ϕj . It follows from 12‖∇f(xj)‖2 ≤ Lϕj that this

condition is satisfied at each step if (6.2) holds. Then the sequence {ϕj}∞j=0 is strictly monotonic decreasingand in view of µ-strong convexity we obtain

ϕj+1 ≤ ϕj(

1− µδ

L

).

We proceed to proving global convergence of the damped method. For µ-strongly convex and L-smoothfunction it holds for any x, y ∈ Rn

(D.1) f(y) ≤ f(x) + 〈∇f(x), y − x〉+L

2µ‖y − x‖2∇2f(x).

Taking into account that σ ≤ µL and applying (D.1) for x = xj and y = xj − σγj∇f(xj) we obtain

f (xj+1) ≤ f (xj) + 〈∇f (xj) , xj+1 − xj〉+1

2σ‖xj+1 − xj‖2∇2f(xj)

≤ f (xj)− σγj‖∇f(xj)‖2 +σγ2

j

2‖∇f(xj)‖2∇2f(xj) = f (xj)−

σγj2‖∇f(xj)‖2.

(D.2)

Further L-smoothness and µ-strong convexity ensure that

f (xj+1)− f (x∗) ≤ (f (xj)− f (x∗))(

1− µσ

L

).

19

Page 20: arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

Acknowledgement. The authors are grateful to two anonymous reviewers for their helpful commentsand to Bin Hu for detecting the gap in the proof of Theorem 4.2.

REFERENCES

[1] J. Abadie and J. Carpentier, Generalization of the Wolfe reduced gradient method to the case of nonlinear constraints.,in Optimization, R. Fletcher, ed., Academic Press, London, 1969, pp. 37–47.

[2] J. Ackermann, Parameter space design of robust control systems, IEEE Transactions on Automatic Control, 25 (1980),pp. 1058–1072, https://doi.org/10.1109/TAC.1980.1102505, http://ieeexplore.ieee.org/document/1102505/.

[3] H. M. Amman and H. Neudecker, Numerical solutions of the algebraic matrix Riccati equation, Journal of EconomicDynamics and Control, 21 (1997), pp. 363 – 369, https://doi.org/https://doi.org/10.1016/S0165-1889(96)00936-0,http://www.sciencedirect.com/science/article/pii/S0165188996009360.

[4] B. D. Anderson and J. B. Moore, Linear Optimal Control, Prentice Hall, Englewood Cliffs, N.J, 1971.[5] M. Athans and P. Falb, Optimal Control, McGraw-Hill, 1 ed., 1966.[6] D. V. Balandin and M. M. Kogan, Synthesis of linear quadratic control laws on basis of linear matrix inequalities,

Automation and Remote Control, 68 (2007), pp. 371–385, https://doi.org/10.1134/S0005117907030010, http://link.springer.com/10.1134/S0005117907030010.

[7] R. Bellman, Notes on matrix theory. X. A problem in control, Quarterly of Applied Mathematics, 14 (1957), pp. 417–419,https://doi.org/10.1090/qam/82592, http://www.ams.org/qam/1957-14-04/S0033-569X-1957-82592-1/.

[8] P. Benner, J.-R. Li, and T. Penzl, Numerical solution of large-scale Lyapunov equations, Riccati equations, andlinear-quadratic optimal control problems, Numerical Linear Algebra with Applications, 15 (2008), pp. 755–777,https://doi.org/10.1002/nla.622, https://hal.inria.fr/hal-00781119.

[9] V. Blondel and J. N. Tsitsiklis, NP-Hardness of Some Linear Control Design Problems, SIAM Journal on Controland Optimization, 35 (1997), pp. 2118–2127, https://doi.org/10.1137/S0363012994272630.

[10] S. Boyd, L. E. Ghaoui, E. Feron, and V. Balakrishnan, Linear matrix inequalities in system and control theory,SIAM studies in applied mathematics, Society for Industrial and Applied Mathematics, Philadelphia, 1994.

[11] J. Bu, A. Mesbahi, M. Fazel, and M. Mesbahi, LQR through the lens of first order methods: Discrete-time case, 2019,https://arxiv.org/abs/1907.08921.

[12] J. Bu and M. Mesbahi, Global convergence of policy gradient algorithms for indefinite least squares stationary optimalcontrol, IEEE Control Systems Letters, 4 (2020), pp. 638–643.

[13] J. Eilbrecht, M. Jilg, and O. Stursberg, Distributed H 2 -optimized Output Feedback Controller Design using theADMM, IFAC-PapersOnLine, 50 (2017), pp. 10389–10394, https://doi.org/10.1016/j.ifacol.2017.08.1706, https://linkinghub.elsevier.com/retrieve/pii/S2405896317323133.

[14] M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, Global convergence of policy gradient methods for the linear quadraticregulator, in Proceedings of the 35th International Conference on Machine Learning, J. Dy and A. Krause, eds., vol. 80of Proceedings of Machine Learning Research, Stockholmsmassan, Stockholm Sweden, Jul 2018, PMLR, pp. 1467–1476, http://proceedings.mlr.press/v80/fazel18a.html.

[15] H. Feng and J. Lavaei, On the exponential number of connected components for the feasible set of optimal decentralizedcontrol problems, in 2019 American Control Conference (ACC), Philadelphia, PA, USA, July 2019, IEEE, pp. 1430–1437, https://doi.org/10.23919/ACC.2019.8814952, https://ieeexplore.ieee.org/document/8814952/.

[16] E. N. Gryazina, B. T. Polyak, and A. A. Tremba, D-decomposition technique state-of-the-art, Automation and RemoteControl, 69 (2008), pp. 1991–2026, https://doi.org/10.1134/S0005117908120011, http://link.springer.com/10.1134/S0005117908120011.

[17] T. Iwasaki, R. Skelton, and J. Geromel, Linear quadratic suboptimal control with static output feedback, Systems &Control Letters, 23 (1994), pp. 421–430, https://doi.org/10.1016/0167-6911(94)90096-5.

[18] G. D. Joao Paulo Jansch-Porto, Bin Hu, Convergence guarantees of policy optimization methods for markovian jumplinear systems, 2020, https://arxiv.org/abs/2002.04090.

[19] T. B. Kaiqing Zhang, Bin Hu, Policy optimization for H2 linear control with H∞ robustness guarantee: Implicitregularization and global convergence, 2019, https://arxiv.org/abs/1910.09496.

[20] R. Kalman, Contributions to the theory of optimal control, Boletın de la Sociedad Matematica Mexicana, (1960), pp. 102–119, https://springer.com/journal/40590/.

[21] R. Kalman, On the general theory of control systems, Proc. First Internat. Congress Automat. Contr., (1960), pp. 481–491.

[22] H. Karimi, J. Nutini, and M. Schmidt, Linear convergence of gradient and proximal-gradient methods under thePolyak- Lojasiewicz condition, in Machine Learning and Knowledge Discovery in Databases, P. Frasconi, N. Landwehr,G. Manco, and J. Vreeken, eds., Cham, 2016, Springer International Publishing, pp. 795–811.

[23] M. V. Khlebnikov, P. S. Shcherbakov, and V. N. Chestnov, Linear-quadratic regulator. I. a new solution, Automationand Remote Control, 76 (2015), pp. 2143–2155, https://doi.org/10.1134/S0005117915120048, http://link.springer.com/10.1134/S0005117915120048.

[24] H. Kwakernaak and R. Sivan, Linear Optimal Control, Wiley, 1 ed., 1972, https://doi.org/10.1115/1.3426828.[25] C.-H. Lee, New results for the bounds of the solution for the continuous Riccati and Lyapunov equations, IEEE Trans-

actions on Automatic Control, 42 (1997), p. 118–123, https://doi.org/10.1109/9.553695.[26] W. Levine and M. Athans, On the determination of the optimal constant output feedback gains for linear multivariable

systems, IEEE Transactions on Automatic Control, 15 (1970), pp. 44–48, https://doi.org/10.1109/TAC.1970.1099363.[27] T. Lezanski, Uber das Minimumproblem fur Funktionale in Banachschen Raumen, Mathematische Annalen, 152 (1963),

20

Page 21: arXiv:2004.09875v1 [math.OC] 21 Apr 2020 · 2020. 4. 22. · OPTIMIZING STATIC LINEAR FEEDBACK: GRADIENT METHOD ILYAS FATKHULLINzyAND BORIS POLYAKz Abstract. The linear quadratic

pp. 271–274.[28] S. Lojasiewicz, Une propriet e topologique des sous-ensembles analytiques reels, Les Equations aux Derivees Partielles,

Paris: CNRS, (1963), pp. 87–89.[29] M. K. Luca Furieri, Yang Zheng, Learning the globally optimal distributed LQ regulator, 2019, https://arxiv.org/abs/

1912.08774.[30] P. Makila and H. Toivonen, Computational methods for parametric LQ problems–A survey, IEEE Transactions on

Automatic Control, 32 (1987), pp. 658–671, https://doi.org/10.1109/TAC.1987.1104686, http://ieeexplore.ieee.org/document/1104686/.

[31] D. Moerder and A. Calise, Convergence of a numerical algorithm for calculating optimal output feedback gains, IEEETransactions on Automatic Control, 30 (1985), pp. 900–903, https://doi.org/10.1109/TAC.1985.1104073, http://ieeexplore.ieee.org/document/1104073/.

[32] H. Mohammadi, A. Zare, M. Soltanolkotabi, and M. Jovanovic, Convergence and sample complexity of gradientmethods for the model-free linear quadratic regulator problem, 12 2019, https://arxiv.org/abs/1912.11899.

[33] H. Mohammadi, A. Zare, M. Soltanolkotabi, and M. R. Jovanovic, Global exponential convergence of gradientmethods over the nonconvex landscape of the linear quadratic regulator, in 2019 IEEE 58th Conference on Decisionand Control (CDC), 2019, pp. 7474–7479, https://doi.org/10.1109/CDC40024.2019.9029985.

[34] E.-S. M. Mostafa, A Conjugate Gradient Method for Discrete–Time Output Feedback Control Design, Journal of Com-putational Mathematics, 30 (2012), pp. 279–297, https://doi.org/10.4208/jcm.1109-m3364.

[35] K. Martensson and A. Rantzer, Gradient methods for iterative distributed control synthesis, in Proceedings of the48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference, 2009,pp. 549–554, https://doi.org/10.1109/CDC.2009.5400233.

[36] B. Polyak, Gradient methods for the minimisation of functionals, USSR Computational Mathematics and Mathemati-cal Physics, 3 (1963), pp. 864–878, https://doi.org/10.1016/0041-5553(63)90382-3, https://linkinghub.elsevier.com/retrieve/pii/0041555363903823.

[37] B. T. Polyak, Introduction to optimization, Optimization Software, New York, 1987.[38] T. Rautert and E. W. Sachs, Computational Design of Optimal Output Feedback Controllers, SIAM Journal on Op-

timization, 7 (1997), pp. 837–852, https://doi.org/10.1137/S1052623495290441, http://epubs.siam.org/doi/10.1137/S1052623495290441.

[39] V. Syrmos, C. Abdallah, and P. Dorato, Static output feedback: a survey, in Proceedings of 1994 33rd IEEE Conferenceon Decision and Control, vol. 1, Lake Buena Vista, FL, USA, 1994, IEEE, pp. 837–842, https://doi.org/10.1109/CDC.1994.410963, http://ieeexplore.ieee.org/document/410963/.

[40] H. T. Toivonen, A globally convergent algorithm for the optimal constant output feedback problem, International Journalof Control, 41 (1985), pp. 1589–1599, https://doi.org/10.1080/0020718508961217, http://www.tandfonline.com/doi/abs/10.1080/0020718508961217.

[41] P. Wolfe, Methods of nonlinear programming, in J. Abadie, ed., Nonlinear programming (Wiley, New York) ch. 6, 1967,pp. 97–131.

21