Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Markov jump linear systemsOptimal Control
Pantelis Sopasakis
IMT Institute for Advanced Studies Lucca
February 5, 2016
Abbreviations
1. MJLS: Markov Jump Linear Systems
2. FHOC: Finite Horizon Optimal Control
3. IHOC: Infinite Horizon Optimal Control
4. CARE: Coupled Algebraic Ricatti Equations
1 / 26
Outline
1. LQR (deterministic case) – A quick revision
2. FHOC for MJLS
3. IHOC for MJLS (CARE)
2 / 26
I. Dynamic programming
3 / 26
Finite horizon optimal control
We have a (deterministic) LTI system
x(k + 1) = Ax(k) +Bu(k),
with x(0) = x0. For a given sequence of input values of length N , that is,πN = (u(0), u(1), . . . , u(N − 1)) we define the cost function
JN (πN ;x0) =
N−1∑k=0
`(x(k), u(k)) + `N (xN ).
Assume`(x, u) = x′Qx+ u′Ru, and `N (x) = x′PNx.
for some Q ∈ Sn+, Pf ∈ Sn++, R ∈ Sm++.
4 / 26
Finite horizon optimal control
We need to determine a finite sequence πN to minimise JN (πN ):
J?N (x0) = minπN
JN (πN ;x0)
subject to the system dynamics and x(0) = x0. DP recursion1:
VN (x(N)) = x(N)′PNx(N),
Vk(x(k)) = minuk
`(x(k), u(k)) + Vk+1(x(k + 1)),
for k = N − 1, . . . , 0.
1See for instance: F. Borelli, Constrained Optimal Control of Linear and HybridSystems, Springer, 2003.
5 / 26
Why DP?
DP facts:
I We may decompose a complex optimisation problem into simplersubproblems
I Here, we solve for one uk at a time
I DP used Bellman’s principle of optimality
I It can be applied the same way to stochastic optimal controlproblems
I It is a powerful tool to study the MSS of MJLS and Markovianswitching systems (next class)
6 / 26
Why DP?
DP facts:
I We may decompose a complex optimisation problem into simplersubproblems
I Here, we solve for one uk at a time
I DP used Bellman’s principle of optimality
I It can be applied the same way to stochastic optimal controlproblems
I It is a powerful tool to study the MSS of MJLS and Markovianswitching systems (next class)
6 / 26
Why DP?
DP facts:
I We may decompose a complex optimisation problem into simplersubproblems
I Here, we solve for one uk at a time
I DP used Bellman’s principle of optimality
I It can be applied the same way to stochastic optimal controlproblems
I It is a powerful tool to study the MSS of MJLS and Markovianswitching systems (next class)
6 / 26
Why DP?
DP facts:
I We may decompose a complex optimisation problem into simplersubproblems
I Here, we solve for one uk at a time
I DP used Bellman’s principle of optimality
I It can be applied the same way to stochastic optimal controlproblems
I It is a powerful tool to study the MSS of MJLS and Markovianswitching systems (next class)
6 / 26
Why DP?
DP facts:
I We may decompose a complex optimisation problem into simplersubproblems
I Here, we solve for one uk at a time
I DP used Bellman’s principle of optimality
I It can be applied the same way to stochastic optimal controlproblems
I It is a powerful tool to study the MSS of MJLS and Markovianswitching systems (next class)
6 / 26
Finite horizon optimal control
Let π?(x0) be the respective minimiser with
π?(x0) = {u?(1), u?(2), . . . , u?(N − 1)}.
Using DP we derive
Vk(x) = x′Pkx,
u?(k) = F (Pk+1)x(k)
where Pk is determined as follows:
Pk = A′Pk+1A+Q+A′Pk+1F (Pk+1)
andF (P ) = −B(B′PB +R)−1B′PA.
7 / 26
Infinite horizon optimal control
What happens as N →∞? Let us define
J∞(π;x0) =
∞∑k=0
`(x(k), u(k)),
where π is a sequence of inputs {u(k)}k∈N. For the series to converge it isof course required that
‖x(k)‖2, ‖u(k)‖2 → 0, as k →∞.
8 / 26
Infinite horizon optimal control
We can show that – under certain conditions2 – the IHOC problem issolvable and
J?∞(x) = x′P∞x,
u?(k) = F (P∞)x(k),
where P∞ is a fixed point of the DP recursion of the FHOC problem(Algebraic Ricatti Equation), that is
P∞ = A′P∞A+Q−A′P∞B(B′P∞B +R)−1B′P∞A.
2Provided that (A,B) is stabilisable and (Q1/2, A) is detectable. Then the matrixA+ BF (P∞) is stable. Proof. See D.P. Bertsekas, Dynamic programming andoptimal control, Vol. 1, 2005, Prop. 4.4.1.
9 / 26
End of first section
I Revision of FHOC and DP
I We solved the LQR problem
10 / 26
II. FHOC for MJLS
11 / 26
FHOC for MJLS
Consider a MJLS
x(k + 1) = Aθ(k)x(k) +Bθ(k)u(k) +Mθ(k) v(k)︸︷︷︸noise
,
with x(0) = x0, and let z(k) = Cθ(k)x(k) +Dθ(k)u(k) be the quantitythat will be penalised. We define the following cost functional
J(θ0, x0, πN ) :=
N−1∑k=0
E[‖z(k)‖2
]+ E
[x(T )′Vθ(N)x(T )
].
Where πN is a policy π = (u(0), . . . , u(N − 1)) with
u(k) = µk(x(k), θ(k)).
12 / 26
FHOC assumptions
Let Gk be the σ-algebra generated by {x(t), θ(t); t = 0, . . . , N − 1}.
Assumptions on v:
1. v(k) are random variables with E[v(k)v(k)′1{θ(k)=i}
]= Ξi(k)
2. For every f , g, f(v(k)) and g(θ(k)) are independent w.r.t Gk
3. E[v(0)x(0)′1{θ(0)=i}
]= 0
Assumptions on z(k):
1. Ci(k)′Di(k) = 0 – no penalties of the form x(k)′Sθ(k)u(k)
2. Di(k)′Di(k) > 0
13 / 26
Control laws and policies for MJLS
A measurable functionµ : IRn ×N → IRm
is called a control law.
A (finite of infinite) sequence of control laws
π = {µ0, µ1, . . .},
where µk is Gk-measurable, called a control policy.
14 / 26
FHOC – Dynamic programming recursion
To perform DP we introduce the cost functional
Jκ(θ(κ), x(κ), uκ) :=
N−1∑k=κ
E[‖z(k)‖2 | Gκ
]+ E
[x(T )′Vθ(N)x(T ) | Gκ
],
for κ ∈ {0, . . . , N − 1} where uκ = (u(κ), . . . , u(N − 1)) so that each u(k)is Gk-measurable. The optimal value of Jκ(θ(κ), x(κ), uκ) is then given by
J?κ(i, x) = x′Xi(κ)x+ α(κ),
where Xi is given by a Ricatti-like equation.
15 / 26
FHOC – Dynamic programming recursion
We haveJ?κ(i, x) = x′Xi(κ)x+ α(κ),
where
Xi(N) = Vi,
Xi(k) = A′iE(X(k + 1))Ai −AiE(X(k + 1))BiFi(X(k + 1)) + C ′iCi,
where Ei(X) =∑N
j=1 pijXj , Ri(X) := D′iDi +B′iE(X)Bi and
Fi(X) := −R−1i B′iE(X)Ai.
The respective optimisers are given by
u?(k) = Fθ(k)(X(k + 1))x(k).
16 / 26
End of second section
I Formulation of FHOC for MJLS considering also an additive noiseterm
I Control policies and control laws
I Solution of FHOC: piecewise linear control laws
u?(k) = κ(x(k), θ(k)) = Fθ(k)x(k).
17 / 26
III. IHOC for MJLS and MSS
18 / 26
IHOC for MJLS
Consider a MJLS without additive noise
x(k + 1) = Aθ(k)x(k) +Bθ(k)u(k),
with x(0) = x0, and let z(k) = Cθ(k)x(k) +Dθ(k)u(k) be the quantity thatwill be penalised. We are now looking for sequences π = {u(k)}k∈N in
U =
{π
∣∣∣∣ u(k) is Gk-measurable,∀k ∈ Nlimk→∞ E
[‖x(k)‖2
]= 0.
}
19 / 26
IHOC for MJLS
With π ∈ U the following is a well-defined infinite horizon cost function
J(θ0, x0, π) :=
∞∑k=0
E[‖z(k)‖2
],
and the IHOC problem amounts to determining
J?(θ0, x0) := infπ∈U
J(θ0, x0, π),
and we define π? to be the respective optimiser with elements
u?(k) = ψk(θ(k), x(k)).
20 / 26
Objectives
1. Under what conditions does the IHOC problem have a solution?
2. How can this solution be determined?
3. Can we derive a MS-stabilising controller by solving the IHOCP?
21 / 26
Control CARE
Assume that there is X ∈ Hn+ satisfying the control CARE :
Xi=A′iEi(X)Ai−AiEi(X)Bi(D
′iDi+B
′iEi(X)Bi)
−1B′iEi(X)Ai+C′iCi
and letFi(X) := −(D′iDi+B
′iEi(X)Bi)
−1B′iEi(X)Ai.
The IHOC problem solution is given by
u?(k) = Fθ(k)(X)x(k)
and the value function is
J?(θ0, x0) = E[x′0Xθ0x0
].
22 / 26
Control CARE ⇒ MSS
The control CARE, when solvable, yields a MS-stabilising control law, i.e.,the closed-loop system
x(k + 1) = (Aθ(k) +Bθ(k)Fθ(k)(X))x(k),
is mean square stable.
23 / 26
Solvability conditions
The following conditions entail the solvability of the control CARE:
1. (A,B) – with A ∈ Hn and B ∈ Hn,m – is stabilisable,
2. (C,A) – with C ∈ Hn,nz is detectable.
Proof. Book of Costa et al., 2005, Corollary A.16.
24 / 26
End of third section
I We formulated the infinite horizon optimal control problem
I The solution of IHOC produces a MS-stabilising control law
I IHOC is solved by a CARE which can be formulated as an LMI
I Solvability conditions: (A,B) is stabilisable, (C,A) is detectable
25 / 26
References
1. For an introduction to DP: D. P. Bertsekas, Dynamic Programming and OptimalControl. Athena Scientific, 2nd ed., 2000.
2. Chapter 4 of: O.L.V. Costa, M.D. Fragoso and R.P. Marques, Discrete-timeMarkov Jump Linear Systems, Springer 2005.
3. Chapter 6 of: M.H.A. Davis and R.B. Vinter, Stochastic modelling and control,Chapman and Hall, New York 1985.
4. M.D. Fragoso, Discrete-time jump LQG problem, Int. J. Systems Sci., 20(12), pp.2539–2545, 1989.
26 / 26