Markov Tutorial CDC Shanghai 2009

Lyapunov functions, value functions,

Sean Meyn

Department of Electrical and Computer EngineeringUniversity of Illinois and the Coordinated Science Laboratory

Joint work with R. Tweedie, I. Kontoyiannis, and P. Mehta

Supported in part by NSF (ECS 05 23620, and prior funding), and AFOSR

and performance bounds

Objectives

Nonlinear state space model ≡ (controlled) Markov process,

Typical form:

noisecontrol

dX(t) = f(X(t), U(t)) dt + σ(X(t), U(t)) dW (t)

state process X

Objectives

Nonlinear state space model ≡ (controlled) Markov process,

Typical form:

Questions: For a given feedback law,

• Is the state process stable?

• Is the average cost finite?

• Can we solve the DP equations?

• Can we approximate the average cost η ? The value function h ?

noisecontrol

E[c(X(t), U(t))]

dX(t) = f(X(t), U(t)) dt + σ(X(t), U(t)) dW (t)

c(x, u) + Duh∗

∗∗

(x) = η∗{ }

state process X

Outline

Markov Models

Representations

Lyapunov Theory

Conclusionsπ(f

DV (x) ≤ −f(x) + bIC(x)

‖P t (x, · ) − π‖f → 0

x [Sτ

IMarkov Models

Notation

Markov chain:

Countable state space, X

Transition semigroup,

X = X(t) : t ≥ 0{ }

P t(x, y) = P X(s + t) = y X(s) = x{ }

x, y, ∈ X

Notation: Generators & Resolvents

Markov chain:

Countable state space, X

Transition semigroup,

Generator: For some domain of functions h,

X = X(t) : t ≥ 0{ }

Dh (x) = limt→0

tE[h(X(s + t)) − h(X(s)) X(s) = x]

= limt→0

t(P th (x) − h(x))

P t(x, y) = P X(s + t) = y X(s) = x{ }

x, y, ∈ X

Rate matrix:

P t = eQt

Dh (x) = limt→0

tE[h(X(s + t)) − h(X(s)) X(s) = x]

= limt→0

t(P th (x) − h(x))

Dh (x) =∑

Q(x, y)h(y)

Example: MM1 Queue

Sample paths:

Rate matrix:

X(t + ε) ≈

x + 1 Prob εα

x − 1 Prob εµ

x Prob 1 − ε(α + µ)

−α α 0 0 0 0 · · ·µ −α − µ α 0 0 0 · · ·0 µ −α − µ α 0 0 · · ·0 0 µ −α − µ α 0 · · ·0 0 0 µ −α − µ α · · ·0 0 0 0 µ −α − µ · · ·...

......

σ2W = 0 σ2

Example: O-U Model

Sample paths:

Generator:

dX(t) = AX(t) dt + B dW (t)

A n × n, Bn× 1, W standard BM

Dh (x) = (Ax)T∇h (x) + BT∇2h (x)B

σ2W = 0 σ2

Example: O-U Model

Sample paths:

Generator:

h quadratic,

dX(t) = AX(t) dt + B dW (t)

A n × n, Bn× 1, W standard BM

Dh (x) = (Ax)T∇h (x) + BT∇2h (x)B

h(x) = 12xTPx ∇h (x) = Px

∇2h (x) = P

Dh (x) = 12xT(PA + ATP )x + BTPB

Rate matrix:

Resolvent:

P t = eQt

Dh (x) = limt→0

tE[h(X(s + t)) − h(X(s)) X(s) = x]

= limt→0

t(P th (x) − h(x))

Dh (x) =∑

Q(x, y)h(y)

∫ ∞

e−αtP t

Rate matrix:

Resolvent: Resolvent equations:

P t = eQt

Dh (x) = limt→0

tE[h(X(s + t)) − h(X(s)) X(s) = x]

= limt→0

t(P th (x) − h(x))

Dh (x) =∑

Q(x, y)h(y)

Rα = Rα =

∫ ∞

e−αtP t [ Iα − Q]−1

QRα = RαQ = αRα − I

Motivation: Dynamic programming. For a cost function c,

Discounted-cost value function

α(x) = Rαc (x) =∑

Rα(x, y)c(y)

∫ ∞

eαtE[c(X(t)) X(0) = x] dt

Motivation: Dynamic programming. For a cost function c,

Resolvent equation = dynamic programming equation,

Discounted-cost value function

α(x) = Rαc (x) =∑

Rα(x, y)c(y)

c + Dh αα = hα

∫ ∞

eαtE[c(X(t)) X(0) = x] dt

Notation: Steady State Distribution

Invariant (probability) measure π: X is stationary. In particular,

X(t) ∼ π, t ≥ 0

Notation: Steady State Distribution

Invariant (probability) measure π: X is stationary. In particular,

Characterizations:

X(t) ∼ π, t ≥ 0

π(x)P t(x, y) = π(y)

y ∈ X

π(x)Q(x, y) = 0

π(x)Rα(x, y) = π(y), α > 0

Notation: Relative Value Function

Invariant measure π, cost function c , steady-state mean η

Relative value function:

h(x) =

∫ ∞

E[c(X(t)) − η X(0) = x] dt

Notation: Relative Value Function

Invariant measure π, cost function c , steady-state mean η

Relative value function:

Solution to Poisson’s equation (average-cost DP equation):

h(x) =

∫ ∞

E[c(X(t)) − η X(0) = x] dt

c + Dh = η

IIRepresentations

π ∝ ν[I − (R − s ⊗ ν)]−1

h = [I − (R − s ⊗ ν)]−1c̃

Irreducibility

ψ-Irreducibility:

ψ(y) > 0 =⇒ R(x, y) > 0 all x

> 0ψ(y) > 0 =⇒ P X(t) reaches all xy X(0) = x{ }

Small Functions and Small Measures

ψ-Irreducibility:

Small functions and measures: For a function s and probability ν,

ψ(y) > 0 =⇒ R(x, y) > 0 all x

R(x, y) ≥ s(x)ν(y), x, y ∈ XR =

∫ ∞

e−tP t dt

ψ-Irreducibility:

Resolvent dominates rank-one matrix,

ψ(y) > 0 =⇒ R(x, y) > 0 all x

R(x, y) ≥ s(x)ν(y), x, y ∈ X

R ≥ s ⊗ νR =

∫ ∞

e−tP t dt

ψ-Irreducibility:

Resolvent dominates rank-one matrix,

and WLOG,

ψ-Irreducibility justi�es assumption: for all x

ψ(y) > 0 =⇒ R(x, y) > 0 all x

R(x, y) ≥ s(x)ν(y), x, y ∈ X

R ≥ s ⊗ ν

s(x) > 0

ν = δx∗ , where ψ(x∗) > 0

∫ ∞

e−tP t dt

Example: MM1 Queue

R(x, y) > 0 for all x and y (irreducible in usual sense)

Conclusion:

R(x, y ) ≥ s(x)ν(y )

s(x) := R(x, 0)

ν := δ0

σ2W = 0 σ2

dX(t) = AX(t) dt + B dW (t)Example: O-U Model

GaussianFull rank if and only if (A, B) is controllable.

Conclusion: Under controllability, for any m, there is ε s.t.,

(0, ).

R(x, A) ≥ s(x all x and A)ν(A)

s(x) = εIν(A) uniform on

{‖x‖ ≤ m

Potential Matrix

Potential matrix: G(x, y) =

∞∑

(R − s ⊗ ν)n (x, y)

G = [I − (R − s ⊗ ν)]−1

Representation of π

Potential matrix:

νG (y) =∑

ν(x)G(x, y)

G(x, y) =

∞∑

(R − s ⊗ ν)n (x, y)

π ∝ νG

Representation of h

Gc̃ (y) =∑

G(x, y)c̃(y)∑

Potential matrix: G(x, y) =

∞∑

(R − s ⊗ ν)n (x, y)

c̃(x) = c(x) − η

η = π(x)c(x)

h = RGc̃ + constant

Representation of h

Gc̃ (y) =∑

G(x, y)c̃(y)∑

Potential matrix:

If sum converges, then Poisson’s equation is solved:

G(x, y) =

∞∑

(R − s ⊗ ν)n (x, y)

c̃(x) = c(x) − η

η = π(x)c(x)

c(x) + Dh (x) = η

h = RGc̃ + constant

IIILyapunov Theory

∆V (x) ≤ −f(x) + bIC(x)

‖P n(x, · ) − π‖f → 0

x [Sτ

Lyapunov Functions

DV ≤ −g + bs

Lyapunov Functions

General assumptions:

DV ≤ −g + bs

V : X → (0,∞)

g : X → [1,∞)

b < ∞, s smalle.g., s(x) = IC(x), C �nite

Lyapunov Bounds on G DV ≤ −g + bs

Resolvent equation gives RV − V ≤ −Rg + bRs

Resolvent equation gives

Since s⊗ν is non-negative,

−[I − (R − s ⊗ ν)]V ≤ RV − V ≤ −Rg + bRs

RV − V ≤ −Rg + bRs

More positivity,

Some algebra,

−[I − (R − s ⊗ ν)]V ≤ RV − V ≤ −Rg + bRs

V ≥ GRg − bGRs

GR = G(R − s ⊗ ν) + (Gs) ⊗ ν ≥ G − I

Gs ≤ 1

More positivity,

Some algebra,

General bound:

−[I − (R − s ⊗ ν)]V ≤ RV − V ≤ −Rg + bRs

V ≥ GRg − bGRs

GR = G(R − s ⊗ ν) + (Gs) ⊗ ν ≥ G − I

Gs ≤ 1

GRg ≤ V + 2b

Gg ≤ V + g + 2b

Existence of π DV ≤ −g + bs

DV ≤ −1 + bsCondition (V2)

Representation:

Bound:

Conclusion: π exists as a probability measure on X

π ∝ νG

GRg ≤ V + 2b =⇒ G(x, X) ≤ V (x) + 2b

Existence of moments DV ≤ −g + bs

DV ≤ −g + bsCondition (V3)

Representation:

Bound:

π ∝ νG

Gg ≤ V + 2+ g b

Existence of moments DV ≤ −g + bs

Representation:

Bound:

Conclusion: π exists as a probability measure on X

and the steady-state mean is �nite,

π ∝ νG

Gg ≤ V + 2+ g b

π(g) :=∑

π(x)g(x) ≤ b

Example: MM1 Queue ρ =α

Linear Lyapunov function,

Conclusion: (V2) holds if and only if ρ < 1

DV (x) =

∞∑

Q(x, y)y

= α(x + 1) + µ(x − 1) − (α + µ)x

V (x) = x

= −(µ − α) x > 0

g(x) = 1 + x

QuadraticLyapunov function,

Conclusion: (V3) holds,

DV (x) =

∞∑

Q(x, y)y

V (x) = x2

= α(x + 1)2 + µ(x − 1)2 − (α + µ)x2

= α(x2 + 2x + 1) + µ(x2 − 2x + 1)2 − (α + µ)x2

= −2(µ − α)x + α + µ

if and only if ρ < 1

σ2W = 0 σ2

Suppose that P>0 solves the Lyapunov equation,

h quadratic, h(x) = 12xTPx ∇h (x) = Px

∇2h (x) = P

= -IPA + ATP

σ2W = 0 σ2

Then (V3) follows from the identity,

h quadratic, h(x) = 12xTPx ∇h (x) = Px

∇2h (x) = P

= -IPA + ATP

Dh (x) = − 12‖x‖

2 + σ2X , σ2

X = BTPB

σ2W = 0 σ2

Then (V3) follows from the identity,

The function solves Poisson’s equation,h(x) = 12xTPx

= -IPA + ATP

Dh (x) = − 12‖x‖

2 + σ2X , σ2

X = BTPB

g(x) = 12‖x‖

η = σ2X

Dh = −g + η

Poisson’s Equation DV ≤ −g + bs

RGg RGg≤ V + 2b =⇒ (x ) ≤ V (x) + 2b

Representation:

Bound:

c̃(x) = c(x) − ηh = RGc̃ + constant

Poisson’s Equation DV ≤ −g + bs

RGg RGg≤ V + 2b =⇒ (x ) ≤ V (x) + 2b

Representation:

Bound:

Conclusion: If c is bounded by g, then h is bounded,

c̃(x) = c(x) − ηh = RGc̃ + constant

h(x) ≤ V (x) + 2b

Poisson’s equation with

We have (V3) with V a quadratic function of x:

Recall, with

g (x) = x

Dh = −g + η

Dh (x) = −2(µ − α)x x>0+ α + µ

h (x) = x2

Poisson’s equation with

Solved with

g (x) = x

Dh = −g + η

h(x) = 12

x2 + x

µ − αη =

1 − ρ

IVConclusions

Final words

Just as in linear systems theory, Lyapunov functionsprovide a characterization of system properties, as wellas a practical verification tool

DV (x) ≤ −f(x) + bIC(x)

‖P t (x, · ) − π‖f → 0

x [Sτ

Final words

Just as in linear systems theory, Lyapunov functionsprovide a characterization of system properties, as wellas a practical verification tool

Much is left out of this survey - in particular,

• Converse theory• Limit theory• Approximation techniques to construct Lyapunov functions or approximations to value functions• Application to controlled Markov processes, and approximate dynamic programming

DV (x) ≤ −f(x) + bIC(x)

‖P t (x, · ) − π‖f → 0

x [Sτ

References

[1] S. P. Meyn and R. L. Tweedie.

[1,4] ψ-Irreducible foundations

[2,11,12,13] Mean-�eld models, ODE models, and Lyapunov functions

[1,4,5,9,10] Operator-theoretic methods. See also appendix of [2]

[3,6,7,10] Generators and continuous time models

Markov chains and stochasticstability. Cambridge University Press, Cambridge, secondedition, 2009. Published in the Cambridge MathematicalLibrary.

[2] S. P. Meyn. Control Techniques for Complex Networks. Cam-bridge University Press, Cambridge, 2007. Pre-publicationedition online: http://black.csl.uiuc.edu/˜meyn.

[3] S. N. Ethier and T. G. Kurtz. Markov Processes : Charac-terization and Convergence. John Wiley & Sons, New York,1986.

[4] E. Nummelin. General Irreducible Markov Chains and Non-negative Operators. Cambridge University Press, Cambridge,1984.

[5] S. P. Meyn and R. L. Tweedie. Generalized resolventsand Harris recurrence of Markov processes. ContemporaryMathematics, 149:227–250, 1993.

[6] S. P. Meyn and R. L. Tweedie. Stability of Markovianprocesses III: Foster-Lyapunov criteria for continuous timeprocesses. Adv. Appl. Probab., 25:518–548, 1993.

[7] D. Down, S. P. Meyn, and R. L. Tweedie. Exponentialand uniform ergodicity of Markov processes. Ann. Probab.,23(4):1671–1691, 1995.

[8] P. W. Glynn and S. P. Meyn. A Liapounov bound for solutionsof the Poisson equation. Ann. Probab., 24(2):916–931, 1996.

[9] I. Kontoyiannis and S. P. Meyn. Spectral theory and limittheorems for geometrically ergodic Markov processes. Ann.Appl. Probab., 13:304–362, 2003. Presented at the INFORMSApplied Probability Conference, NYC, July, 2001.

[10] I. Kontoyiannis and S. P. Meyn. Large deviations asymptoticsand the spectral theory of multiplicatively regular Markovprocesses. Electron. J. Probab., 10(3):61–123 (electronic),2005.

[11] W. Chen, D. Huang, A. Kulkarni, J. Unnikrishnan, Q. Zhu,P. Mehta, S. Meyn, and A. Wierman. Approximate dynamicprogramming using fluid and diffusion approximations withapplications to power management. Accepted for inclusion inthe 48th IEEE Conference on Decision and Control, December16-18 2009.

[12] P. Mehta and S. Meyn. Q-learning and Pontryagin’s MinimumPrinciple. Accepted for inclusion in the 48th IEEE Conferenceon Decision and Control, December 16-18 2009.

[13] G. Fort, S. Meyn, E. Moulines, and P. Priouret. ODEmethods for skip-free Markov chain stability with applicationsto MCMC. Ann. Appl. Probab., 18(2):664–707, 2008.

See also earlier seminal work by Hordijk, Tweedie, ... full references in [1].

Markov Tutorial CDC Shanghai 2009

Education

Markov Processes and Controlled Markov Chains

Stochastic processes and Markov chains (part I)Markov ... · Stochastic processes and Markov chains (part I)Markov chains (part I) Wessel van Wieringen ... a Markov chain is an acceptable

2 1 Discrete Markov Processes (Markov Chains) 3 1 First-Order Markov Models

CDC 2000 / CDC 3000 Dust Collector System

Markov Chains and Hidden Markov Models - Cornell …physiology.med.cornell.edu/people/banfelder/qbio/resources_2012... · Conclusion: Introduction to Markov Chains and Hidden Markov

Cdc x2250 Cdc x2270

CIFF Office, Shanghai – 2016 Shanghai 2016.pdf · CIFF Office, Shanghai – 2016 A Review John Sacks NECC Shanghai ©John Sacks ... (China International Furniture Fair) Shanghai

午餐 LUNCH MENU - Shanghai Shanghai

Markov Chains - Penn Engineeringese303/block_2_markov_chains/slides/200_m… · Systems Analysis Markov chains 1. Markov chains. ... I Model as Markov chain with transition probabilities

Markov Chains - UTKweb.eecs.utk.edu/.../cs594_spring2017/presentations/markov-chains.pdf · Overview of Markov Chains •What is a Markov Chain? •A discrete-time stochastic process

FINITE HORIZON MARKOV DECISION PROBLEMS …steele/Publications/PDF/Dobrushin...FINITE HORIZON MARKOV DECISION PROBLEMS ... non-homogeneous Markov chain, central limit theorem, Markov

Chapter 9: Markov Chain Regular Markov Chains Section 9…momran/m118videos/notes/sec92.pdf · Chapter 9: Markov Chain Section 9.2: Regular Markov Chains • Irreducible Markov Chain:

CDC STD Yasmin CDC Final

GRAND HYATT SHANGHAI Jin Mao Tower, 88 … · • Shanghai Oriental Pearl TV Tower • Tianzifang • Shanghai Aquarium • Jing’an Temple • Shanghai Circus World • Shanghai

Non-infectious Chronic Diseases Control & Prevention in Zhabei District Jane Xiong Zhabei District CDC, Shanghai, China Aug 29,2006

D3.3 / Shanghai Learning Mobility Report OD&M - Shanghai ...odmplatform.eu/wp-content/uploads/2017/05/Report-ODM-Shanghai-… · D3.3 / Shanghai Learning Mobility Report OD&M - Shanghai

Hidden Markov Models€¦ · •Hidden Markov models 1 The “Markov”swe have learned so far. Markov Models 2 •A Markov model is a chain-structured BN –Conditional probabilities

CDC IT Directions CDC Project Management Summit Jim Seligman CDC Chief Information Officer

Markov Systems and Markov Decision Processes...States, Actions, Observations Passive Controlled Fully Observable Markov Model Markov Decision Process (MDP) Hidden State Hidden Markov

Disease update 2012 - Saskatchewan Pulse Growers...cdc 2847-21 /pi 660733 cdc 2847-21 cdc 2936-7 cdc 2950-19/pi 660736 cdc 2950-19/pi 660736 cdc 2936-7 cdc 3007-6 cdc 3007-6 /pi 660733