Probability Theory and Stochastic ModellingAlbert Nikolaevich Shiryaev Steklov Mathematical Institute Russian Academy of Sciences, Moscow, Russia viii Foreword Preface The study of

Probability Theory and Stochastic Modelling

Volume 97

Editors-in-Chief

Peter W. Glynn, Stanford, CA, USA

Andreas E. Kyprianou, Bath, UK

Yves Le Jan, Orsay, France

Advisory Editors

Søren Asmussen, Aarhus, Denmark

Martin Hairer, Coventry, UK

Peter Jagers, Gothenburg, Sweden

Ioannis Karatzas, New York, NY, USA

Frank P. Kelly, Cambridge, UK

Bernt Øksendal, Oslo, Norway

George Papanicolaou, Stanford, CA, USA

Etienne Pardoux, Marseille, France

Edwin Perkins, Vancouver, Canada

Halil Mete Soner, Zürich, Switzerland

The Probability Theory and Stochastic Modelling series is a merger andcontinuation of Springer’s two well established series, Stochastic Modelling andApplied Probability and Probability and Its Applications. It publishes researchmonographs that make a significant contribution to probability theory or anapplications domain in which advanced probability methods are fundamental.Books in this series are expected to follow rigorous mathematical standards, whilealso displaying the expository quality necessary to make them useful and accessibleto advanced students, as well as researchers. The series covers all aspects of modernprobability theory including

• Gaussian processes• Markov processes• Random Fields, point processes and random sets• Random matrices• Statistical mechanics and random media• Stochastic analysis

as well as applications that include (but are not restricted to):

• Branching processes and other models of population growth• Communications and processing networks• Computational methods in probability and stochastic processes, including

simulation• Genetics and other stochastic models in biology and the life sciences• Information theory, signal processing, and image synthesis• Mathematical economics and finance• Statistical methods (e.g. empirical processes, MCMC)• Statistics for stochastic processes• Stochastic control• Stochastic models in operations research and stochastic optimization• Stochastic models in the physical sciences

More information about this series at http://www.springer.com/series/13205

http://www.springer.com/series/13205

Alexey Piunovskiy • Yi Zhang

Continuous-Time MarkovDecision ProcessesBorel Space Models and General ControlStrategies

123

Foreword by Albert Nikolaevich Shiryaev

Alexey PiunovskiyDepartment of Mathematical SciencesUniversity of LiverpoolLiverpool, UK

Yi ZhangDepartment of Mathematical SciencesUniversity of LiverpoolLiverpool, UK

ISSN 2199-3130 ISSN 2199-3149 (electronic)Probability Theory and Stochastic ModellingISBN 978-3-030-54986-2 ISBN 978-3-030-54987-9 (eBook)https://doi.org/10.1007/978-3-030-54987-9

Mathematics Subject Classification: 90C40, 60J76, 62L10, 90C05, 90C29, 90C39, 90C46, 93C27,93E20

© Springer Nature Switzerland AG 2020This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or partof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmissionor information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilarmethodology now known or hereafter developed.The use of general descriptive names, registered names, trademarks, service marks, etc. in thispublication does not imply, even in the absence of a specific statement, that such names are exempt fromthe relevant protective laws and regulations and therefore free for general use.The publisher, the authors and the editors are safe to assume that the advice and information in thisbook are believed to be true and accurate at the date of publication. Neither the publisher nor theauthors or the editors give a warranty, expressed or implied, with respect to the material containedherein or for any errors or omissions that may have been made. The publisher remains neutral with regardto jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AGThe registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

https://orcid.org/0000-0002-3200-6306

https://doi.org/10.1007/978-3-030-54987-9

Nothing is as useful as goodtheory—investigation ofchallenging real-life problemsproduces profound theories.

Guy Soulsby

Foreword

This monograph presents a systematic and modern treatment of continuous-timeMarkov decision processes. One can view the latter as a special class of (stochastic)optimal control problems. Thus, it is not surprising that the traditional method ofinvestigation is dynamic programming. Another method, sometimes termed theconvex analytic approach, is via a reduction to linear programming, and is similar toand comparable with the weak formulation of optimal control problems. On theother hand, this class of problems possesses its own features that can be employedto derive interesting and meaningful results, such as its connection to discrete-timeproblems. It is this connection that accounts for many modern developments orcomplete solutions to otherwise delicate problems in this topic.

The authors of this book are well established researchers in the topic of thisbook, to which, in recent years, they have made important contributions. So theyare well positioned to compose this updated and timely presentation of the currentstate-of-the-art of continuous-time Markov decision processes.

Turning to its content, this book presents three major methods of investigatingcontinuous-time Markov decision processes: the dynamic programming approach,the linear programming approach and the method based on reduction todiscrete-time problems. The performance criterion under primary consideration isthe expected total cost, in addition to one chapter devoted to the long run averagecost. Both the unconstrained and constrained versions of the optimal controlproblem are studied. The issue at the core is the sufficient class of control strategies.In terms of the technical level, in most cases, this book intends to present the resultsas generally as possible and under conditions as weak as possible. That means, theauthors consider Borel models under a wide class of control strategies. This is notonly for the sake of generality, indeed, it simultaneously covers both the traditionalclass of relaxed controls for continuous-time models and randomized controls insemi-Markov decision processes. The more relevant reason is perhaps that it pavesthe way for a rigorous treatment of more involving issues on the realizability or theimplementability of the control strategies. In particular, mixed strategies, whichwere otherwise often introduced verbally, are now introduced as a subclass ofcontrol strategies, making use of an external space, an idea that can be ascribed to

vii

the works of I. V. Girsanov, N. V. Krylov, A. V. Skorokhod and I. I. Gikhman andto the book Statistics of Random Processes by R. S. Liptser and myself for modelsof various degrees of generality, which was further developed by E. A. Feinberg fordiscrete-time problems.

The authors have made this book self-contained: all the statements in the maintext are proved in detail, and the appendices contain all the necessary facts frommathematical analysis, applied probability and discrete-time Markov decisionprocesses. Moreover, the authors present numerous solved real-life and academicexamples, illustrating how the theory can be used in practice.

The selection of the material seems to be balanced. It is natural that manystatements presented and proved in this monograph come from the authors them-selves, but the rest come from other researchers, to reflect the progress made both inthe west and in the east. Moreover, it contains several statements unpublishedelsewhere. Finally, the bibliographical remarks also contain useful information. Nodoubt, active researchers (from the level of graduate students onward) in the fieldsof applied probability, statistics and operational research, and in particular,stochastic optimal control, as well as statistical decision theory and sequentialanalysis, will find this monograph useful and valuable. I can recommend this bookto any of them.

Albert Nikolaevich ShiryaevSteklov Mathematical InstituteRussian Academy of Sciences,

Moscow, Russia

viii Foreword

Preface

The study of continuous-time Markov decision processes dates back at least to the1950s, shortly after that of its discrete-time analogue. Since then, the theory hasrapidly developed and has found a large spectrum of applications to, for example,queueing systems, epidemiology, telecommunication and so on. In this monograph,we present some recent developments on selected topics in the theory ofcontinuous-time Markov decision processes.

Prior to this book, there have been monographs [106, 150, 197], solely devotedto the theory of continuous-time Markov decision processes. They all focus onmodels with a finite or denumerable state space, [150] also discussing semi-Markovdecision processes and featuring applications to queueing systems. Here, weemphasized the word “solely” in the previous claim, because, leaving alone thoseon controlled diffusion processes, there have also been important books on the moregeneral class of controlled processes, see [46, 49], as well as the thesis [236]. Theseworks are on piecewise deterministic processes and deal with problems withoutconstraints. Here, we consider, in that language, piecewise constant processes in aBorel state space, but we pay special attention to problems with constraints anddevelop techniques tailored for our processes.

The authors of the books [106, 150, 197], followed a direct approach, in thesense that no reduction to discrete-time Markov decision processes is involved.Consequently, as far as the presentation is concerned, this approach has thedesirable advantage of being self-contained. The main tool is the Dynkin formula,and so to ensure the class of functions of interest is in the domain of the extendedgenerator of the controlled process, a weight function needs to be imposed on thecost and transition rates. In some parts of this book, we also present this approachand apply it to the study of constrained problems. Following the observation madein [230, 231, 264], we present necessary and sufficient conditions for the appli-cability of the Dynkin formula to the class of functions of interest. This hopefullyleads to a clearer picture of what minimal conditions are needed for this approach toapply.

ix

On the other hand, the main theme of this book is the reduction method ofcontinuous-time Markov decision processes. When this method is applicable, itoften allows one to deduce optimality results under more general conditions on thesystem primitives. Another advantage is that it allows one to make full use ofresults known for discrete-time Markov decision processes, and referring to recentresults of this kind makes the present book a more updated treatment ofcontinuous-time Markov decision processes.

In greater detail, a large part of this book is devoted to the justification of thereduction method and its application to problems with total (undiscounted) costcriteria. This performance criterion was rarely touched upon in [106, 150, 197].Recently, a method for investigating the space of occupation measures fordiscrete-time Markov decision processes with total cost criteria has been described,see [61, 63]. The extension to continuous-time Markov decision processes withtotal cost criteria was carried out in [117, 185, 186]. Although the continuous-timeMarkov decision processes in [117, 185, 186], were all reduced to equivalentdiscrete-time Markov decision processes, leading to the same optimality results,different methods were pursued. In this book, we present in detail the method of[185, 186], because it is based on the introduction of a class of so-calledPoisson-related strategies. This class of strategies is new to the context ofcontinuous-time Markov decision processes. The advantage of this class ofstrategies is that they are implementable or realizable, in the sense that they induceaction processes that are measurable. This realizability issue does not arise indiscrete-time Markov decision processes, but is especially relevant to problems withconstraints, where relaxed strategies often need to be considered for the sake ofoptimality. Although has long been known that relaxed strategies induce actionprocesses with complicated trajectories, in the context of continuous-time Markovdecision processes, it was [76], that drew special attention on it, and also con-structed realizable optimal strategies, termed switching strategies, for discountedproblems. By the way, in [76], a reduction method was developed for discountedproblems, which is also presented in this book. This method is different from thestandard uniformization technique. Although it is not directly applicable to theproblem when the discount factor is null, our works [117, 185, 186]. were moti-vated by it.

A different reduction method was followed in [45, 49], where the induceddiscrete-time Markov decision process has a more complicated action space (in theform of some space of measurable mappings) than the original continuous-timeMarkov decision process. The reduction method presented in this book is differentas it induces a discrete-time Markov decision process with the same action space asthe original problem in continuous-time.

An outline of the material presented in this book follows. In Chap. 1, wedescribe the controlled processes and introduce the primarily concerned class ofstrategies. We discuss their realizability and sufficiency for problems with total costcriteria. The latter was achieved by investigating the detailed occupation measures.A series of examples of continuous-time Markov decision processes can be found inthis chapter which illustrate the practical applications, many of which are solved

x Preface

either analytically or numerically in subsequent chapters. In Chap. 2, we provideconditions for the explosiveness or the non-explosiveness of the controlled processunder a particular strategy or under all strategies simultaneously, and discuss thevalidity of Dynkin’s formula. Chapters 3–5 are devoted to problems involving thediscounted cost criteria, total undiscounted cost criteria and the average cost cri-teria, respectively. The main tool used in Chap. 4, where Poisson-related strategiesare introduced, is the reduction to discrete-time Markov decision processes. For theaverage cost criteria, extra conditions were imposed in a form based on how theyare used in the reasoning of the proofs. Chapter 6 returns to the total cost modelwith a more general class of strategies. Chapter 7 is devoted to models with bothgradual and impulsive control. Each chapter is supplemented with bibliographicalremarks. Relevant facts about discrete-time Markov decision processes, as well asthose from analysis and probability, together with selected technical proofs, areincluded in the appendices.

We hope that this monograph will be of interest to the research community inApplied Probability, Statistics, Operational Research/Management Science andElectrical Engineering, including both experts and postgraduate students. In thisconnection, we have made efforts to present all the proofs in detail. This book mayalso be used by “outsiders” if they focus on the solved examples. Readers areexpected to have taken courses in real analysis, applied probability and stochasticmodels/processes. Basic knowledge on discrete-time Markov decision processes isalso useful, though not essential, as all the necessary facts from these topics areincluded in the appendices.

Acknowledgements

We would like to take this opportunity to thank Oswaldo Costa, François Dufour,Eugene Feinberg, Xian-Ping Guo and Flora Spieksma for discussions, communi-cations and collaborations on the topic of this book. We also thank our student, XinGuo, who helped us with some tricks in using LaTeX. Finally, we are very gratefulto Professor Albert Nikolaevich Shiryaev for the foreword and his consistentsupport.

Notations

The following notations are frequently used throughout this book.For all constants a; b 2 ½�1;1�; a _ b ¼ maxfa; bg, a ^ b ¼ minfa; bg, bþ :¼

maxfb; 0g and b� :¼ maxf�b; 0g. The supremum/maximum (infimum/minimum)over the empty set is �1 (þ1). N ¼ f1; 2; . . .g is the set of natural numbers;N0 ¼ N[f0g.Z is the setofall integers.Weoftenwrite Rþ ¼ ð0;1Þ,R0

þ ¼ ½0;1Þ,

Preface xi

�R ¼ ½�1; þ1�, �Rþ ¼ ð0;1�, �R0þ ¼ ½0;1�.Given two setsA;B, ifA is a subset of

B, thenwewriteA�BorA � B interchangeably.WedenotebyAc thecomplementofA.On a set E, if E1 and E2 are two r-algebras (or r-fields), then E1 _ E2 denotes the

smallest r-algebra on E containing the two r-algebras E1 and E2:Consider two measurable spaces ðE; EÞ and ðF; FÞ. A mapping X from E to F is

said to be measurable (from ðE; EÞ to ðF; FÞ) if X�1ðFÞ�E, i.e., for each C 2 F ,its preimage X�1ðCÞ with respect to X belongs to E.

By a measure l on a measurable space ðE; EÞ, we always mean an �R0þ -valued

r-additive function on the r-algebra E, taking value 0 at the empty set ;. Whenthere is no confusion regarding the underlying r-algebra, we write “a measure onE”, instead of ðE; EÞ, or E. If the singleton fxg is a measurable subset of ameasurable space ðE; EÞ, then dxð�Þ is the Dirac measure concentrated at x, and wecall such distributions degenerate; If�gis the indicator function.

Defined on an arbitrary measure space ðE; E; lÞ, an �R-valued measurablefunction f is called integrable if

RE j f ðeÞjlðdeÞ\1 (We shall use the notations for

integrals such asRE f ðeÞlðdeÞ and

RE f ðeÞdlðeÞ interchangeably.) More generally,

for each 1� p\1, an �R-valued measurable function f is said to be pth integrableif j f jp is integrable. The space of pth integrable (real-valued) functions on themeasure space ðE; E; lÞ is denoted by LpðE; E; lÞ, where two functions in it arenot distinguished if they differ only at a null set with respect to l. The spaceLpðE; E; lÞ is a Banach space when it is endowed with the norm defined by

ðR Ej f ðeÞjplðdeÞÞ� �1

p for each f 2 LpðE; E; lÞ.Here is one more convention regarding integrals. Suppose r is an �R-valued

measurable function on the product measure space

ðE� R; E � BðRÞ; lðdeÞ � dtÞ;where l is a measure on E, and dt stands for the Lebesgue measure on R. (By theway, the Lebesgue measure is also often denoted by Leb.) Then we understand theintegral of r with respect to lðdeÞ � dt asZ

E�R

rðe; tÞdt lðdeÞ :¼Z

E�R

rþ ðe; tÞdt lðdeÞ �Z

E�R

r�ðe; tÞdt lðdeÞ; ð1Þ

where þ1�1 :¼ þ1. When l is a r-finite measure on E, the Fubini–Tonellitheorem applies:Z

E�R

rðe; tÞdt lðdeÞ ¼Z

E

ZR

rþ ðe; tÞdt lðdeÞ �Z

E

ZR

r�ðe; tÞdt lðdeÞ:

By a Borel space is meant a measurable space that is isomorphic to a Borelsubset of a Polish space (complete separable metric space). A topological Borelspace is a topological space that is homeomorphic to a Borel subset of a Polishspace, endowed with the relative topology. Thus, when talking about a Borel space,

xii Preface

only the underlying r-algebra is fixed, whereas when talking about a topologicalBorel space, the topology is fixed. For a topological space E, BðEÞ denotes its Borelr-algebra, i.e., the r-algebra on E generated by all the open subsets of E. If E is atopological Borel space, then the measurable space ðE; BðEÞÞ is a Borel space. If Eis a Borel space without a fixed topology, we still typically denote its r-algebra byBðEÞ.

For a topological space E, we denote by CðEÞ ðrespectively, Cþ ðEÞÞ the spaceof bounded continuous (respectively, nonnegative bounded continuous) real-valuedfunctions on E.

If ðE; EÞ is a measurable space, then we denote by MFðEÞ ðrespectively,MF

þ ðEÞÞ the space of finite signed measures (respectively, finite nonnegativemeasures) on ðE; EÞ: Also MþðEÞ denotes the space of (possibly infinite) mea-sures on ðE; EÞ, and PðEÞ is the space of probability measures on ðE; EÞ.

If E is a topological space, then MFðEÞ (respectively, MFþ ðEÞÞ is understood

as the space of finite signed measures (respectively, finite nonnegative measures) onðE; B ðEÞÞ: Similarly, MþðEÞ denotes the space of (possibly infinite) measures onðE; B ðEÞÞ, and PðEÞ is the space of probability measures on ðE; BðEÞÞ.

The abbreviation a.s. (respectively, i.i.d.) stands for “almost surely” (respec-tively, “independent identically distributed”). Expressions like “for almost alls 2 R” refer to the Lebesgue measure, unless stated otherwise. For an ðE; EÞ-valued random variable X on ðX;F ; PÞ, the assertion “a statement holds for P-almost all X” and the assertion “a statement holds for PX�1-almost all x 2 E” meanthe same, where PX�1 denotes the distribution of X under P.

Throughout the main text (excluding the appendices), capital letters such as Husually denote random elements, lower case letters such as h denote arguments offunctions and realizations of random variables; and spaces are denoted using boldfonts, e.g., H.

In the rest of the book, we use the following abbreviations: CTMDP (respec-tively, DTMDP, ESMDP, SMDP) stands for continuous-time Markov decisionprocess (respectively, discrete-time Markov decision process, exponentialsemi-Markov decision process, semi-Markov decision process). They will berecalled when they appear for the first time in the main text below.

Liverpool, UK Alexey PiunovskiyYi Zhang

Preface xiii

Contents

1 Description of CTMDPs and Preliminaries . . . . . . . . . . . . . . . . . . . . 11.1 Description of the CTMDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Initial Data and Conventional Notations . . . . . . . . . . . . . . 11.1.2 Informal Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.3 Strategies, Strategic Measures, and Optimal Control

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.4 Realizable Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.1.5 Instant Costs at the Jump Epochs . . . . . . . . . . . . . . . . . . . 24

1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281.2.1 Queueing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281.2.2 The Freelancer Dilemma . . . . . . . . . . . . . . . . . . . . . . . . . 301.2.3 Epidemic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301.2.4 Inventory Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331.2.5 Selling an Asset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331.2.6 Power-Managed Systems . . . . . . . . . . . . . . . . . . . . . . . . . 351.2.7 Fragmentation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 361.2.8 Infrastructure Surveillance Models . . . . . . . . . . . . . . . . . . 371.2.9 Preventive Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . 39

1.3 Detailed Occupation Measures and Further Sufficient Classesof Strategies for Total Cost Problems . . . . . . . . . . . . . . . . . . . . . 401.3.1 Definitions and Notations . . . . . . . . . . . . . . . . . . . . . . . . . 401.3.2 Sufficiency of Markov …-Strategies . . . . . . . . . . . . . . . . . . 431.3.3 Sufficiency of Markov Standard n-Strategies . . . . . . . . . . . 491.3.4 Counterexamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521.3.5 The Discounted Cost Model as a Special Case

of Undiscounted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551.4 Bibliographical Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

xv

2 Selected Properties of Controlled Processes . . . . . . . . . . . . . . . . . . . 632.1 Transition Functions and the Markov Property . . . . . . . . . . . . . . . 63

2.1.1 Basic Definitions and Notations . . . . . . . . . . . . . . . . . . . . 642.1.2 Construction of the Transition Function . . . . . . . . . . . . . . 652.1.3 The Minimal (Nonnegative) Solution to the Kolmogorov

Forward Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 692.1.4 Markov Property of the Controlled Process Under

a Natural Markov Strategy . . . . . . . . . . . . . . . . . . . . . . . . 762.2 Conditions for Non-explosiveness Under a Fixed Natural

Markov Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 792.2.1 The Nonhomogeneous Case . . . . . . . . . . . . . . . . . . . . . . . 802.2.2 The Homogeneous Case . . . . . . . . . . . . . . . . . . . . . . . . . . 932.2.3 Possible Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . 952.2.4 A Condition for Non-explosiveness Under

All Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 972.2.5 Direct Proof for Non-explosiveness Under

All Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1022.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

2.3.1 Birth-and-Death Processes . . . . . . . . . . . . . . . . . . . . . . . . 1092.3.2 The Gaussian Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1122.3.3 The Fragmentation Model . . . . . . . . . . . . . . . . . . . . . . . . 1142.3.4 The Infrastructure Surveillance Model . . . . . . . . . . . . . . . . 115

2.4 Dynkin’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1172.4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1172.4.2 Non-explosiveness and Dynkin’s Formula . . . . . . . . . . . . . 1302.4.3 Dynkin’s Formula Under All …-Strategies . . . . . . . . . . . . . 1332.4.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

2.5 Bibliographical Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

3 The Discounted Cost Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1453.1 The Unconstrained Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

3.1.1 The Optimality Equation . . . . . . . . . . . . . . . . . . . . . . . . . 1473.1.2 Dynamic Programming and Dual Linear Programs . . . . . . 151

3.2 The Constrained Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1593.2.1 Properties of the Total Occupation Measures . . . . . . . . . . . 1603.2.2 The Primal Linear Program and Its Solvability . . . . . . . . . 1673.2.3 Comparison of the Convex Analytic and Dynamic

Programming Approaches . . . . . . . . . . . . . . . . . . . . . . . . 1733.2.4 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1743.2.5 The Space of Performance Vectors . . . . . . . . . . . . . . . . . . 183

3.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1923.3.1 A Queuing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1923.3.2 A Birth-and-Death Process . . . . . . . . . . . . . . . . . . . . . . . . 198


xvi Contents

4 Reduction to DTMDP: The Total Cost Model . . . . . . . . . . . . . . . . . 2014.1 Poisson-Related Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2014.2 Reduction to DTMDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

4.2.1 Description of the Concerned DTMDP . . . . . . . . . . . . . . . 2254.2.2 Selected Results of the Reduction to DTMDP . . . . . . . . . . 2334.2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2424.2.4 Models with Strongly Positive Intensities . . . . . . . . . . . . . 2514.2.5 Example: Preventive Maintenance . . . . . . . . . . . . . . . . . . . 258


5 The Average Cost Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2635.1 Unconstrained Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

5.1.1 Unconstrained Problem: Nonnegative Cost . . . . . . . . . . . . 2645.1.2 Unconstrained Problem: Weight Function . . . . . . . . . . . . . 281

5.2 Constrained Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2935.2.1 The Primal Linear Program and Its Solvability . . . . . . . . . 2935.2.2 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3025.2.3 The Space of Performance Vectors . . . . . . . . . . . . . . . . . . 3075.2.4 Denumerable and Finite Models . . . . . . . . . . . . . . . . . . . . 314

5.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3195.3.1 The Gaussian Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3195.3.2 The Freelancer Dilemma . . . . . . . . . . . . . . . . . . . . . . . . . 322


6 The Total Cost Model: General Case . . . . . . . . . . . . . . . . . . . . . . . . 3376.1 Description of the General Total Cost Model . . . . . . . . . . . . . . . . 337

6.1.1 Generalized Control Strategies and Their StrategicMeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337

6.1.2 Subclasses of Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . 3426.2 Detailed Occupation Measures and Sufficient Classes

of Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3456.2.1 Detailed Occupation Measures . . . . . . . . . . . . . . . . . . . . . 3466.2.2 Sufficient Classes of Strategies . . . . . . . . . . . . . . . . . . . . . 3486.2.3 Counterexamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357

6.3 Reduction to DTMDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3616.4 Mixtures of Strategies and Convexity of Spaces of Strategic

and Occupation Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3676.4.1 Properties of Strategic Measures . . . . . . . . . . . . . . . . . . . . 3686.4.2 Properties of Occupation Measures . . . . . . . . . . . . . . . . . . 384

6.5 Example: Utilization of an Unreliable Device . . . . . . . . . . . . . . . . 3886.6 Realizable Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3956.7 Bibliographical Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400

Contents xvii

7 Gradual-Impulsive Control Models . . . . . . . . . . . . . . . . . . . . . . . . . . 4037.1 The Total Cost Model and Reduction . . . . . . . . . . . . . . . . . . . . . 403

7.1.1 System Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4037.1.2 Total Cost Gradual-Impulsive Control Problems . . . . . . . . 4057.1.3 Reduction to CTMDP Model with Gradual

Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4107.2 Example: An Epidemic with Carriers . . . . . . . . . . . . . . . . . . . . . . 425

7.2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4257.2.2 General Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4287.2.3 Optimal Solution to the Associated DTMDP

Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4317.2.4 The Optimal Solution to the Original Gradual-Impulsive

Control Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4377.3 The Discounted Cost Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441

7.3.1 a-Discounted Gradual-Impulsive ControlProblems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441

7.3.2 Reduction to DTMDP with Total Cost . . . . . . . . . . . . . . . 4467.3.3 The Dynamic Programming Approach . . . . . . . . . . . . . . . 4487.3.4 Example: The Inventory Model . . . . . . . . . . . . . . . . . . . . 452


Appendix A: Miscellaneous Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473

Appendix B: Relevant Definitions and Facts . . . . . . . . . . . . . . . . . . . . . . . 505

Appendix C: Definitions and Facts about Discrete-Time MarkovDecision Processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581

xviii Contents

Notation

A Action space 1, 549�A Action space in the DTMDP describing

the gradual-impulsive model405

�a ¼ ð�c;�b;qÞ Actions in the DTMDP describing thegradual-impulsive model

406

AG Space of gradual controls (actions) 404AI Space of impulsive controls (actions) 404A xð Þ Set of admissible actions 5, 156, 307, 551A tð Þ Action process 343, 398Bf Xð Þ Space of all f -bounded measurable

functions on X119, 527

b Impulsive action (control) 404�Bn (�bn) Random (realized) impulsive action 407

fBig1i¼1;fAig1i¼1

�Controlling process in DTMDP 226, 362, 550

�Cn (�cn) Random (realized) planned time until thenext impulse

407

cj x; að Þ;cGj x; að Þ

�Cost rates 2, 404

cIj x; að Þ Cost functions 404C x; að Þ Lump sum cost 24Co (Positive) cone 175, 540Co Dual cone 175, 540dj Constraints constants 11, 146, 263,

409, 554D Total occupation measures in DTMDP 562DS Sequences of detailed occupation

measures41

DReM Sequences of detailed occupationmeasures generated by Markov…-strategies

41

DRaM Sequences of detailed occupationmeasures generated by Markovstandard n-strategies

41

xix

DnP

Sequences of detailed occupationmeasures generated by Poisson-relatedstrategies

205

Dn Sequences of detailed occupationmeasures generated by generalizedstandard n strategies

385

DdetM Sequences of detailed occupationmeasures generated by mixtures ofsimple deterministic Markov strategies

386

Ddet Sequences of detailed occupationmeasures generated by mixtures ofdeterministic generalized standardn-strategies

387

DM Sequences of detailed occupationmeasures generated by mixtures ofMarkov standard n-strategies

387

D t Collection of all total normalizedoccupation measures

159

D av Collection of all stable probabilitymeasures

294

E Sc , E

Sx Expectations with respect to P S

c dxð Þ,P Sx dxð Þ

10, 341

E Sc Expectation with respect to P S

c57

Erc ;E

r;ac ;

Erx

�Expectation with respect to P

rc ;P

r;ac ;P r

x 226, 252, 551

Eqx Expectation with respect to the transition

(probability) function generated by theQ-function q

117

�E rx0 Expectation with respect to �P r

x0409

�Fnþ 1ð�hn;�c;�bÞ Relaxed control in the gradual-impulsivemodel

407

fF tgt 0;fGtgt 0

�Filtration 4, 339, 399, 544

�g Lagrange multipliers 181, 303, 542Gn, Gn

n Conditional distribution of the sojourntime and the post-jump state

9, 202, 203, 340,376eG Transition probability in the a-jump chain 82

ha xð Þ, h xð Þ Relative value function 268h; g;uð Þ Canonical triplet 284Hn (hn) Random (realized) finite history 4, 338�Hn (�hn) Random (realized) finite history in the

gradual-impulsive model407

Hn Space of histories 4, 338inf Pð Þ;inf Pavð Þ

�Value of the Primal Linear Program 179, 303, 541

inf P cð Þ;inf P c

av

� �� Value of the Primal Convex Program 179, 304, 542

J Number of constraints 11, 146, 263,409, 554

K, K Space of admissible state-action pairs 156, 307, 551

xx Notation

L, Lav Lagrangian 178, 304, 542lj x; að Þ Cost functions in DTMDP 226, 252, 362,

549�lj h; xð Þ;�a; t; yð Þð Þ;�lja h; xð Þ;�a; t; yð Þð Þ

�Cost functions in the DTMDP describing

the gradual-impulsive model406, 441

MGO CTMDP model with gradual control only 410ma Infimum of the discounted cost 268m S;a

c;n dx� dað Þ Detailed occupation measure 40, 205, 346M

rc dx� dað Þ Total occupation measure in DTMDP 561

Mrc;n dx� dað Þ Detailed occupation measure in DTMDP 562

O, ~O, OavA �ð Þ Space of performance vectors 184, 186, 309

pM , -M Markov standard n-strategy 6, 345p s; x; t; dyð Þ, pq s; x; t; dyð Þ Transition function 64, 65pq t; dyð Þ;pq x; t; dyð Þ

�(Homogeneous) transition function 76, 482

~pq s; x; t; dyð Þ Transition probability function generatedby pqðs; x; t; dyÞ

79

~pn;kðdajxÞ Element of a Poisson-related strategy 202pðdyjx; aÞ Transition probability 226, 362, 376paðdyjx; aÞ Transition probability 252�pðdt � dyj h; xð Þ;�aÞpðdt � dyj h; xð Þ;�aÞ

�Transition probability in the DTMDP

describing the gradual-impulsivemodel

406, 441

Pr; cPr Predictable r-algebras 5, 339, 399

PD Space of strategic measures in theassociated DTMDP

377

1-1-correspondence between strategicmeasures

377

PSc dxð Þ, PSx dxð Þ Strategic measure 9, 341

PSc Strategic measure in the “hat” model withkilling

56

�P rx0

Strategic measure in the DTMDPdescribing the gradual-impulsivemodel

409

~P v;xð Þ Probability on the trajectories of thea-jump chain

83

P Sc t; dy� dað Þ;

P Sc t; dyð Þ

�Marginal distribution 10, 341

Prc , P

r;ac , P r

x Strategic measures in DTMDP 226, 252, 551ParðEÞ Pareto set 537qðdyjx; aÞ;qð jji; aÞ;qGOðdyjx; aÞ

9=;Transition rate 1, 2, 404, 411

qðdyjx; sÞ Q-function 8q f ðdyjx; sÞ f -transformed Q-function 118qx að Þ; qx …ð Þ;qx sð Þ; qx qtð Þ

�Jump intensity 1, 8, 404, 405

qx n;…; sð Þ Jump intensity under a generalized…-n-strategy

340

Notation xxi

�qx Supremum of the jump intensity 2, 404~qðdyjx; aÞ;~qðdyjx;…Þ;~qðdyjx; sÞ;~qGOðdyjx; aÞ

9>>=>>;Post-jump measures 1, 7, 8, 404, 411

~qðdyjx; qtÞ Post-jump measure 405~qnðdyjx; sÞ Post-jump measure under a

Poisson-related strategy203

~qðdyjx; n; …; sÞ Post-jump measure under a generalized…-n-strategy

340

Qðdyjx; bÞ Post-impulse state distribution 404R AG� �

Collection of relaxed controls (P AG� �-

valued mappings)404

S ¼ fSng1n¼1 Control strategy 5S ¼ fN; p0; pn;…nð Þg1n¼1

� �Generalized control strategy 339

S (Uniformly) Optimal strategy 11SP Poisson-related strategy 202S Set of all strategies 7S Set of all generalized …-n-strategies 339S…, S… Set of all …-strategies 7, 343S-, Sn Set of all n-strategies 7, 343SGn

Set of all generalized standard n-strategies 345

SDS Set of all deterministic stationary strategies 7

SM… ;SM…

SM- ;SMn

� �)Set of all Markov p-strategies

(Markov standard n-strategies)7, 345

Ssn Set of all stationary standard n-strategies 345

SP, SPe Set of all Poisson-related strategies 202

Sstable Set of all stable strategies 294sup Dð Þ;sup Davð Þ

�Value of the Dual Linear Program 179, 304, 541

sup Dcð Þ;sup Dc

av

� �� Value of the Dual Convex Program 179, 304, 542

Tn (tn) Random (realized) jump moment 4, 3380; T1ð Þ Time horizon 5, 12U, U

0 Adjoint mapping 176, 303w t; xð Þ;wðxÞ Lyapunov function 80, 90, 94, 98ew xð Þ; ew0 xð Þ;w xð Þ;w0 xð Þ

�Lyapunov functions 133, 136, 173

Waj S; cð Þ;

Waj S; xð Þ

�Expected total a-discounted costs 11, 41

Wj S; cð Þ;Wj S; xð Þ

�Long run average cost 12, 263

�Wj r; x0ð Þ;�Wj

a r; x0ð Þ�

Performance functionals in thegradual-impulsive model

409, 441

WDT0 r; xð Þ Performance functional in DTMDP 553

Wa0 xð Þ Value (Bellman) function 11, 146

WDT0 xð Þ;

WDT;b0 xð Þ

�Value (Bellman) function in DTMDP 554, 559

~W Performance vector 184

xxii Notation

X State space 1, 403, 549�X State space in the DTMDP describing the

gradual-impulsive model405

X1 Extended state space 3, 338~X State space in the a-jump chain 82

XD State space excluding the cemetery 42Xd State space on which the f -transformed Q-

function q f is defined118

Xn (xn) Random (realized) post-jump state of thecontrolled process

4, 338

�Xn (�xn) Random (realized) state in the DTMDPdescribing the gradual-impulsivemodel

407

x1 Artificial isolated point 3, 338X tð Þ Controlled process 5, 339X ;Yð Þ Dual pair 540fYig1i¼0;fXig1i¼0

�Controlled process in DTMDP 226, 362, 550

a Discount factor 11c dxð Þ Initial distribution 2, 549D Cemetery 5, 41g dx� dað Þ Stable measure 293g S;ac dx� dað Þ a-discounted total occupation measure 159, 346

g S;0c dx� dað Þ Total occupation measure 241

Hn (hn) Random (realized) sojourn time 4, 338l x;CR � CXð Þ;~l x;CR � CXð Þ

�Random measures 4, 99

lðx;CR � CN � CXÞ Random measure 339lðx;CR � CX � CNÞ Random measure 399l;u 0ð Þ;u 1ð Þ� �

l-deterministic stationary strategy in thegradual-impulsive model

408

m Compensator of l 10, 341, 474~v Compensator of ~l 476m dbð Þ, m kð Þ Weights distribution 368, 371…nðdajhn�1; sÞ Relaxed control 5�…ðdajx; tÞ Natural Markov strategy 6…M Markov …-strategy 6, 344…sðdajxÞ Stationary …-strategy 7, 344P x; tð Þ P Að Þ-valued predictable process 6, 341-nðdajhn�1Þ Randomized control 5-M , pM Markov standard n-strategy 6, 345-sðdajxÞ;psðdajxÞ

�Stationary standard n-strategy 7, 345

qt dað Þ Relaxed control 404r ¼ frng1n¼1

¼ fr 0ð Þn ; �Fng1n¼1

Strategy in the gradual-impulsive model 407

fr 0ð Þn ;r 1ð Þ

n g1n¼1¼ fr 0ð Þ

n ; ~ung1n¼1

Markov standard n-strategy in thegradual-impulsive model

409

rC Hitting time in the a-jump chain 83r Control strategy in DTMDP 550rM , rsðdajxÞ Markov, stationary strategy in DTMDP 550

Notation xxiii

R Set of all strategies in DTMDP 226, 550RS Set of all stationary strategies in DTMDP 550RDM Set of all deterministic Markov strategies

in DTMDP550

RDS Set of all deterministic stationary strategiesin DTMDP

550

RGI Set of all strategies in thegradual-impulsive CTMDP

407

sC Return time in the a-jump chain 83sweak , s X ;Yð Þ Weak topology 527, 540u xð Þ, us xð Þ Deterministic stationary strategy 7, 344u xð Þ Deterministic stationary strategy in

DTMDP550

u Simple deterministic Markov strategy 344~uðdajxÞ Stationary randomized gradual control in

the gradual-impulsive model409

u 0ð Þ (u 1ð Þ) Impulsive (gradual) component of al-deterministic stationary strategy

408

N Artificial space for constructing strategicmeasures

202, 337

N1 Extended artificial space 338n1 Additional artificial point 338nn ¼ ðsn0; an0;sn1; a

n1; . . .Þ

Artificial point in case of a Poisson-relatedstrategy

351

Nn (nn) Random (realized) artificial point 338X;Fð Þ Sample space 4, 338, 344X;B Xð Þð Þ Sample space in DTMDP 550X Sample space in the “hat” model with

killing56

�X Sample space in the gradual-impulsivemodel

409

h Cemetery in a DTMDP 252, 565h�; �i Bilinear form, scalar product 175, 186, 540

xxiv Notation

Documents

Probability Theory and Stochastic ModellingAlbert Nikolaevich Shiryaev Steklov Mathematical Institute Russian Academy of Sciences, Moscow, Russia viii Foreword Preface The study of