Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced

Markov jump linear systemsOptimal Control

Pantelis Sopasakis

IMT Institute for Advanced Studies Lucca

February 5, 2016

Abbreviations

1. MJLS: Markov Jump Linear Systems

2. FHOC: Finite Horizon Optimal Control

3. IHOC: Infinite Horizon Optimal Control

4. CARE: Coupled Algebraic Ricatti Equations

1 / 26

Outline

1. LQR (deterministic case) – A quick revision

2. FHOC for MJLS

3. IHOC for MJLS (CARE)

2 / 26

I. Dynamic programming

3 / 26

Finite horizon optimal control

We have a (deterministic) LTI system

x(k + 1) = Ax(k) +Bu(k),

with x(0) = x0. For a given sequence of input values of length N , that is,πN = (u(0), u(1), . . . , u(N − 1)) we define the cost function

JN (πN ;x0) =

N−1∑k=0

`(x(k), u(k)) + `N (xN ).

Assume`(x, u) = x′Qx+ u′Ru, and `N (x) = x′PNx.

for some Q ∈ Sn+, Pf ∈ Sn++, R ∈ Sm++.

4 / 26


We need to determine a finite sequence πN to minimise JN (πN ):

J?N (x0) = minπN

JN (πN ;x0)

subject to the system dynamics and x(0) = x0. DP recursion1:

VN (x(N)) = x(N)′PNx(N),

Vk(x(k)) = minuk

`(x(k), u(k)) + Vk+1(x(k + 1)),

for k = N − 1, . . . , 0.

1See for instance: F. Borelli, Constrained Optimal Control of Linear and HybridSystems, Springer, 2003.

5 / 26

Why DP?

DP facts:

I We may decompose a complex optimisation problem into simplersubproblems

I Here, we solve for one uk at a time

I DP used Bellman’s principle of optimality

I It can be applied the same way to stochastic optimal controlproblems

I It is a powerful tool to study the MSS of MJLS and Markovianswitching systems (next class)

6 / 26

Why DP?

DP facts:






6 / 26

Why DP?

DP facts:






6 / 26

Why DP?

DP facts:






6 / 26

Why DP?

DP facts:






6 / 26


Let π?(x0) be the respective minimiser with

π?(x0) = {u?(1), u?(2), . . . , u?(N − 1)}.

Using DP we derive

Vk(x) = x′Pkx,

u?(k) = F (Pk+1)x(k)

where Pk is determined as follows:

Pk = A′Pk+1A+Q+A′Pk+1F (Pk+1)

andF (P ) = −B(B′PB +R)−1B′PA.

7 / 26

Infinite horizon optimal control

What happens as N →∞? Let us define

J∞(π;x0) =

∞∑k=0

`(x(k), u(k)),

where π is a sequence of inputs {u(k)}k∈N. For the series to converge it isof course required that

‖x(k)‖2, ‖u(k)‖2 → 0, as k →∞.

8 / 26

Infinite horizon optimal control

We can show that – under certain conditions2 – the IHOC problem issolvable and

J?∞(x) = x′P∞x,

u?(k) = F (P∞)x(k),

where P∞ is a fixed point of the DP recursion of the FHOC problem(Algebraic Ricatti Equation), that is

P∞ = A′P∞A+Q−A′P∞B(B′P∞B +R)−1B′P∞A.

2Provided that (A,B) is stabilisable and (Q1/2, A) is detectable. Then the matrixA+ BF (P∞) is stable. Proof. See D.P. Bertsekas, Dynamic programming andoptimal control, Vol. 1, 2005, Prop. 4.4.1.

9 / 26

End of first section

I Revision of FHOC and DP

I We solved the LQR problem

10 / 26

II. FHOC for MJLS

11 / 26

FHOC for MJLS

Consider a MJLS

x(k + 1) = Aθ(k)x(k) +Bθ(k)u(k) +Mθ(k) v(k)︸︷︷︸noise

,

with x(0) = x0, and let z(k) = Cθ(k)x(k) +Dθ(k)u(k) be the quantitythat will be penalised. We define the following cost functional

J(θ0, x0, πN ) :=

N−1∑k=0

E[‖z(k)‖2

]+ E

[x(T )′Vθ(N)x(T )

].

Where πN is a policy π = (u(0), . . . , u(N − 1)) with

u(k) = µk(x(k), θ(k)).

12 / 26

FHOC assumptions

Let Gk be the σ-algebra generated by {x(t), θ(t); t = 0, . . . , N − 1}.

Assumptions on v:

1. v(k) are random variables with E[v(k)v(k)′1{θ(k)=i}

]= Ξi(k)

2. For every f , g, f(v(k)) and g(θ(k)) are independent w.r.t Gk

3. E[v(0)x(0)′1{θ(0)=i}

]= 0

Assumptions on z(k):

1. Ci(k)′Di(k) = 0 – no penalties of the form x(k)′Sθ(k)u(k)

2. Di(k)′Di(k) > 0

13 / 26

Control laws and policies for MJLS

A measurable functionµ : IRn ×N → IRm

is called a control law.

A (finite of infinite) sequence of control laws

π = {µ0, µ1, . . .},

where µk is Gk-measurable, called a control policy.

14 / 26

FHOC – Dynamic programming recursion

To perform DP we introduce the cost functional

Jκ(θ(κ), x(κ), uκ) :=

N−1∑k=κ

E[‖z(k)‖2 | Gκ

]+ E

[x(T )′Vθ(N)x(T ) | Gκ

],

for κ ∈ {0, . . . , N − 1} where uκ = (u(κ), . . . , u(N − 1)) so that each u(k)is Gk-measurable. The optimal value of Jκ(θ(κ), x(κ), uκ) is then given by

J?κ(i, x) = x′Xi(κ)x+ α(κ),

where Xi is given by a Ricatti-like equation.

15 / 26

FHOC – Dynamic programming recursion

We haveJ?κ(i, x) = x′Xi(κ)x+ α(κ),

where

Xi(N) = Vi,

Xi(k) = A′iE(X(k + 1))Ai −AiE(X(k + 1))BiFi(X(k + 1)) + C ′iCi,

where Ei(X) =∑N

j=1 pijXj , Ri(X) := D′iDi +B′iE(X)Bi and

Fi(X) := −R−1i B′iE(X)Ai.

The respective optimisers are given by

u?(k) = Fθ(k)(X(k + 1))x(k).

16 / 26

End of second section

I Formulation of FHOC for MJLS considering also an additive noiseterm

I Control policies and control laws

I Solution of FHOC: piecewise linear control laws

u?(k) = κ(x(k), θ(k)) = Fθ(k)x(k).

17 / 26

III. IHOC for MJLS and MSS

18 / 26

IHOC for MJLS

Consider a MJLS without additive noise

x(k + 1) = Aθ(k)x(k) +Bθ(k)u(k),

with x(0) = x0, and let z(k) = Cθ(k)x(k) +Dθ(k)u(k) be the quantity thatwill be penalised. We are now looking for sequences π = {u(k)}k∈N in

U =

{π

∣∣∣∣ u(k) is Gk-measurable,∀k ∈ Nlimk→∞ E

[‖x(k)‖2

]= 0.

}

19 / 26

IHOC for MJLS

With π ∈ U the following is a well-defined infinite horizon cost function

J(θ0, x0, π) :=

∞∑k=0

E[‖z(k)‖2

],

and the IHOC problem amounts to determining

J?(θ0, x0) := infπ∈U

J(θ0, x0, π),

and we define π? to be the respective optimiser with elements

u?(k) = ψk(θ(k), x(k)).

20 / 26

Objectives

1. Under what conditions does the IHOC problem have a solution?

2. How can this solution be determined?

3. Can we derive a MS-stabilising controller by solving the IHOCP?

21 / 26

Control CARE

Assume that there is X ∈ Hn+ satisfying the control CARE :

Xi=A′iEi(X)Ai−AiEi(X)Bi(D

′iDi+B

′iEi(X)Bi)

−1B′iEi(X)Ai+C′iCi

and letFi(X) := −(D′iDi+B

′iEi(X)Bi)

−1B′iEi(X)Ai.

The IHOC problem solution is given by

u?(k) = Fθ(k)(X)x(k)

and the value function is

J?(θ0, x0) = E[x′0Xθ0x0

].

22 / 26

Control CARE ⇒ MSS

The control CARE, when solvable, yields a MS-stabilising control law, i.e.,the closed-loop system

x(k + 1) = (Aθ(k) +Bθ(k)Fθ(k)(X))x(k),

is mean square stable.

23 / 26

Solvability conditions

The following conditions entail the solvability of the control CARE:

1. (A,B) – with A ∈ Hn and B ∈ Hn,m – is stabilisable,

2. (C,A) – with C ∈ Hn,nz is detectable.

Proof. Book of Costa et al., 2005, Corollary A.16.

24 / 26

End of third section

I We formulated the infinite horizon optimal control problem

I The solution of IHOC produces a MS-stabilising control law

I IHOC is solved by a CARE which can be formulated as an LMI

I Solvability conditions: (A,B) is stabilisable, (C,A) is detectable

25 / 26

References

1. For an introduction to DP: D. P. Bertsekas, Dynamic Programming and OptimalControl. Athena Scientific, 2nd ed., 2000.

2. Chapter 4 of: O.L.V. Costa, M.D. Fragoso and R.P. Marques, Discrete-timeMarkov Jump Linear Systems, Springer 2005.

3. Chapter 6 of: M.H.A. Davis and R.B. Vinter, Stochastic modelling and control,Chapman and Hall, New York 1985.

4. M.D. Fragoso, Discrete-time jump LQG problem, Int. J. Systems Sci., 20(12), pp.2539–2545, 1989.

26 / 26

Documents

Markov jump linear systems Optimal Controldysco.imtlucca.it/atcs/course-material/mjls-optcontr.pdfMarkov jump linear systems Optimal Control Pantelis Sopasakis IMT Institute for Advanced