Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Probability Theory and Stochastic Modelling
Volume 97
Editors-in-Chief
Peter W. Glynn, Stanford, CA, USA
Andreas E. Kyprianou, Bath, UK
Yves Le Jan, Orsay, France
Advisory Editors
Søren Asmussen, Aarhus, Denmark
Martin Hairer, Coventry, UK
Peter Jagers, Gothenburg, Sweden
Ioannis Karatzas, New York, NY, USA
Frank P. Kelly, Cambridge, UK
Bernt Øksendal, Oslo, Norway
George Papanicolaou, Stanford, CA, USA
Etienne Pardoux, Marseille, France
Edwin Perkins, Vancouver, Canada
Halil Mete Soner, Zürich, Switzerland
The Probability Theory and Stochastic Modelling series is a merger andcontinuation of Springer’s two well established series, Stochastic Modelling andApplied Probability and Probability and Its Applications. It publishes researchmonographs that make a significant contribution to probability theory or anapplications domain in which advanced probability methods are fundamental.Books in this series are expected to follow rigorous mathematical standards, whilealso displaying the expository quality necessary to make them useful and accessibleto advanced students, as well as researchers. The series covers all aspects of modernprobability theory including
• Gaussian processes• Markov processes• Random Fields, point processes and random sets• Random matrices• Statistical mechanics and random media• Stochastic analysis
as well as applications that include (but are not restricted to):
• Branching processes and other models of population growth• Communications and processing networks• Computational methods in probability and stochastic processes, including
simulation• Genetics and other stochastic models in biology and the life sciences• Information theory, signal processing, and image synthesis• Mathematical economics and finance• Statistical methods (e.g. empirical processes, MCMC)• Statistics for stochastic processes• Stochastic control• Stochastic models in operations research and stochastic optimization• Stochastic models in the physical sciences
More information about this series at http://www.springer.com/series/13205
Alexey Piunovskiy • Yi Zhang
Continuous-Time MarkovDecision ProcessesBorel Space Models and General ControlStrategies
123
Foreword by Albert Nikolaevich Shiryaev
Alexey PiunovskiyDepartment of Mathematical SciencesUniversity of LiverpoolLiverpool, UK
Yi ZhangDepartment of Mathematical SciencesUniversity of LiverpoolLiverpool, UK
ISSN 2199-3130 ISSN 2199-3149 (electronic)Probability Theory and Stochastic ModellingISBN 978-3-030-54986-2 ISBN 978-3-030-54987-9 (eBook)https://doi.org/10.1007/978-3-030-54987-9
Mathematics Subject Classification: 90C40, 60J76, 62L10, 90C05, 90C29, 90C39, 90C46, 93C27,93E20
© Springer Nature Switzerland AG 2020This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or partof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmissionor information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilarmethodology now known or hereafter developed.The use of general descriptive names, registered names, trademarks, service marks, etc. in thispublication does not imply, even in the absence of a specific statement, that such names are exempt fromthe relevant protective laws and regulations and therefore free for general use.The publisher, the authors and the editors are safe to assume that the advice and information in thisbook are believed to be true and accurate at the date of publication. Neither the publisher nor theauthors or the editors give a warranty, expressed or implied, with respect to the material containedherein or for any errors or omissions that may have been made. The publisher remains neutral with regardto jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AGThe registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Nothing is as useful as goodtheory—investigation ofchallenging real-life problemsproduces profound theories.
Guy Soulsby
Foreword
This monograph presents a systematic and modern treatment of continuous-timeMarkov decision processes. One can view the latter as a special class of (stochastic)optimal control problems. Thus, it is not surprising that the traditional method ofinvestigation is dynamic programming. Another method, sometimes termed theconvex analytic approach, is via a reduction to linear programming, and is similar toand comparable with the weak formulation of optimal control problems. On theother hand, this class of problems possesses its own features that can be employedto derive interesting and meaningful results, such as its connection to discrete-timeproblems. It is this connection that accounts for many modern developments orcomplete solutions to otherwise delicate problems in this topic.
The authors of this book are well established researchers in the topic of thisbook, to which, in recent years, they have made important contributions. So theyare well positioned to compose this updated and timely presentation of the currentstate-of-the-art of continuous-time Markov decision processes.
Turning to its content, this book presents three major methods of investigatingcontinuous-time Markov decision processes: the dynamic programming approach,the linear programming approach and the method based on reduction todiscrete-time problems. The performance criterion under primary consideration isthe expected total cost, in addition to one chapter devoted to the long run averagecost. Both the unconstrained and constrained versions of the optimal controlproblem are studied. The issue at the core is the sufficient class of control strategies.In terms of the technical level, in most cases, this book intends to present the resultsas generally as possible and under conditions as weak as possible. That means, theauthors consider Borel models under a wide class of control strategies. This is notonly for the sake of generality, indeed, it simultaneously covers both the traditionalclass of relaxed controls for continuous-time models and randomized controls insemi-Markov decision processes. The more relevant reason is perhaps that it pavesthe way for a rigorous treatment of more involving issues on the realizability or theimplementability of the control strategies. In particular, mixed strategies, whichwere otherwise often introduced verbally, are now introduced as a subclass ofcontrol strategies, making use of an external space, an idea that can be ascribed to
vii
the works of I. V. Girsanov, N. V. Krylov, A. V. Skorokhod and I. I. Gikhman andto the book Statistics of Random Processes by R. S. Liptser and myself for modelsof various degrees of generality, which was further developed by E. A. Feinberg fordiscrete-time problems.
The authors have made this book self-contained: all the statements in the maintext are proved in detail, and the appendices contain all the necessary facts frommathematical analysis, applied probability and discrete-time Markov decisionprocesses. Moreover, the authors present numerous solved real-life and academicexamples, illustrating how the theory can be used in practice.
The selection of the material seems to be balanced. It is natural that manystatements presented and proved in this monograph come from the authors them-selves, but the rest come from other researchers, to reflect the progress made both inthe west and in the east. Moreover, it contains several statements unpublishedelsewhere. Finally, the bibliographical remarks also contain useful information. Nodoubt, active researchers (from the level of graduate students onward) in the fieldsof applied probability, statistics and operational research, and in particular,stochastic optimal control, as well as statistical decision theory and sequentialanalysis, will find this monograph useful and valuable. I can recommend this bookto any of them.
Albert Nikolaevich ShiryaevSteklov Mathematical InstituteRussian Academy of Sciences,
Moscow, Russia
viii Foreword
Preface
The study of continuous-time Markov decision processes dates back at least to the1950s, shortly after that of its discrete-time analogue. Since then, the theory hasrapidly developed and has found a large spectrum of applications to, for example,queueing systems, epidemiology, telecommunication and so on. In this monograph,we present some recent developments on selected topics in the theory ofcontinuous-time Markov decision processes.
Prior to this book, there have been monographs [106, 150, 197], solely devotedto the theory of continuous-time Markov decision processes. They all focus onmodels with a finite or denumerable state space, [150] also discussing semi-Markovdecision processes and featuring applications to queueing systems. Here, weemphasized the word “solely” in the previous claim, because, leaving alone thoseon controlled diffusion processes, there have also been important books on the moregeneral class of controlled processes, see [46, 49], as well as the thesis [236]. Theseworks are on piecewise deterministic processes and deal with problems withoutconstraints. Here, we consider, in that language, piecewise constant processes in aBorel state space, but we pay special attention to problems with constraints anddevelop techniques tailored for our processes.
The authors of the books [106, 150, 197], followed a direct approach, in thesense that no reduction to discrete-time Markov decision processes is involved.Consequently, as far as the presentation is concerned, this approach has thedesirable advantage of being self-contained. The main tool is the Dynkin formula,and so to ensure the class of functions of interest is in the domain of the extendedgenerator of the controlled process, a weight function needs to be imposed on thecost and transition rates. In some parts of this book, we also present this approachand apply it to the study of constrained problems. Following the observation madein [230, 231, 264], we present necessary and sufficient conditions for the appli-cability of the Dynkin formula to the class of functions of interest. This hopefullyleads to a clearer picture of what minimal conditions are needed for this approach toapply.
ix
On the other hand, the main theme of this book is the reduction method ofcontinuous-time Markov decision processes. When this method is applicable, itoften allows one to deduce optimality results under more general conditions on thesystem primitives. Another advantage is that it allows one to make full use ofresults known for discrete-time Markov decision processes, and referring to recentresults of this kind makes the present book a more updated treatment ofcontinuous-time Markov decision processes.
In greater detail, a large part of this book is devoted to the justification of thereduction method and its application to problems with total (undiscounted) costcriteria. This performance criterion was rarely touched upon in [106, 150, 197].Recently, a method for investigating the space of occupation measures fordiscrete-time Markov decision processes with total cost criteria has been described,see [61, 63]. The extension to continuous-time Markov decision processes withtotal cost criteria was carried out in [117, 185, 186]. Although the continuous-timeMarkov decision processes in [117, 185, 186], were all reduced to equivalentdiscrete-time Markov decision processes, leading to the same optimality results,different methods were pursued. In this book, we present in detail the method of[185, 186], because it is based on the introduction of a class of so-calledPoisson-related strategies. This class of strategies is new to the context ofcontinuous-time Markov decision processes. The advantage of this class ofstrategies is that they are implementable or realizable, in the sense that they induceaction processes that are measurable. This realizability issue does not arise indiscrete-time Markov decision processes, but is especially relevant to problems withconstraints, where relaxed strategies often need to be considered for the sake ofoptimality. Although has long been known that relaxed strategies induce actionprocesses with complicated trajectories, in the context of continuous-time Markovdecision processes, it was [76], that drew special attention on it, and also con-structed realizable optimal strategies, termed switching strategies, for discountedproblems. By the way, in [76], a reduction method was developed for discountedproblems, which is also presented in this book. This method is different from thestandard uniformization technique. Although it is not directly applicable to theproblem when the discount factor is null, our works [117, 185, 186]. were moti-vated by it.
A different reduction method was followed in [45, 49], where the induceddiscrete-time Markov decision process has a more complicated action space (in theform of some space of measurable mappings) than the original continuous-timeMarkov decision process. The reduction method presented in this book is differentas it induces a discrete-time Markov decision process with the same action space asthe original problem in continuous-time.
An outline of the material presented in this book follows. In Chap. 1, wedescribe the controlled processes and introduce the primarily concerned class ofstrategies. We discuss their realizability and sufficiency for problems with total costcriteria. The latter was achieved by investigating the detailed occupation measures.A series of examples of continuous-time Markov decision processes can be found inthis chapter which illustrate the practical applications, many of which are solved
x Preface
either analytically or numerically in subsequent chapters. In Chap. 2, we provideconditions for the explosiveness or the non-explosiveness of the controlled processunder a particular strategy or under all strategies simultaneously, and discuss thevalidity of Dynkin’s formula. Chapters 3–5 are devoted to problems involving thediscounted cost criteria, total undiscounted cost criteria and the average cost cri-teria, respectively. The main tool used in Chap. 4, where Poisson-related strategiesare introduced, is the reduction to discrete-time Markov decision processes. For theaverage cost criteria, extra conditions were imposed in a form based on how theyare used in the reasoning of the proofs. Chapter 6 returns to the total cost modelwith a more general class of strategies. Chapter 7 is devoted to models with bothgradual and impulsive control. Each chapter is supplemented with bibliographicalremarks. Relevant facts about discrete-time Markov decision processes, as well asthose from analysis and probability, together with selected technical proofs, areincluded in the appendices.
We hope that this monograph will be of interest to the research community inApplied Probability, Statistics, Operational Research/Management Science andElectrical Engineering, including both experts and postgraduate students. In thisconnection, we have made efforts to present all the proofs in detail. This book mayalso be used by “outsiders” if they focus on the solved examples. Readers areexpected to have taken courses in real analysis, applied probability and stochasticmodels/processes. Basic knowledge on discrete-time Markov decision processes isalso useful, though not essential, as all the necessary facts from these topics areincluded in the appendices.
Acknowledgements
We would like to take this opportunity to thank Oswaldo Costa, François Dufour,Eugene Feinberg, Xian-Ping Guo and Flora Spieksma for discussions, communi-cations and collaborations on the topic of this book. We also thank our student, XinGuo, who helped us with some tricks in using LaTeX. Finally, we are very gratefulto Professor Albert Nikolaevich Shiryaev for the foreword and his consistentsupport.
Notations
The following notations are frequently used throughout this book.For all constants a; b 2 ½�1;1�; a _ b ¼ maxfa; bg, a ^ b ¼ minfa; bg, bþ :¼
maxfb; 0g and b� :¼ maxf�b; 0g. The supremum/maximum (infimum/minimum)over the empty set is �1 (þ1). N ¼ f1; 2; . . .g is the set of natural numbers;N0 ¼ N[f0g.Z is the setofall integers.Weoftenwrite Rþ ¼ ð0;1Þ,R0
þ ¼ ½0;1Þ,
Preface xi
�R ¼ ½�1; þ1�, �Rþ ¼ ð0;1�, �R0þ ¼ ½0;1�.Given two setsA;B, ifA is a subset of
B, thenwewriteA�BorA � B interchangeably.WedenotebyAc thecomplementofA.On a set E, if E1 and E2 are two r-algebras (or r-fields), then E1 _ E2 denotes the
smallest r-algebra on E containing the two r-algebras E1 and E2:Consider two measurable spaces ðE; EÞ and ðF; FÞ. A mapping X from E to F is
said to be measurable (from ðE; EÞ to ðF; FÞ) if X�1ðFÞ�E, i.e., for each C 2 F ,its preimage X�1ðCÞ with respect to X belongs to E.
By a measure l on a measurable space ðE; EÞ, we always mean an �R0þ -valued
r-additive function on the r-algebra E, taking value 0 at the empty set ;. Whenthere is no confusion regarding the underlying r-algebra, we write “a measure onE”, instead of ðE; EÞ, or E. If the singleton fxg is a measurable subset of ameasurable space ðE; EÞ, then dxð�Þ is the Dirac measure concentrated at x, and wecall such distributions degenerate; If�gis the indicator function.
Defined on an arbitrary measure space ðE; E; lÞ, an �R-valued measurablefunction f is called integrable if
RE j f ðeÞjlðdeÞ\1 (We shall use the notations for
integrals such asRE f ðeÞlðdeÞ and
RE f ðeÞdlðeÞ interchangeably.) More generally,
for each 1� p\1, an �R-valued measurable function f is said to be pth integrableif j f jp is integrable. The space of pth integrable (real-valued) functions on themeasure space ðE; E; lÞ is denoted by LpðE; E; lÞ, where two functions in it arenot distinguished if they differ only at a null set with respect to l. The spaceLpðE; E; lÞ is a Banach space when it is endowed with the norm defined by
ðR Ej f ðeÞjplðdeÞÞ� �1
p for each f 2 LpðE; E; lÞ.Here is one more convention regarding integrals. Suppose r is an �R-valued
measurable function on the product measure space
ðE� R; E � BðRÞ; lðdeÞ � dtÞ;where l is a measure on E, and dt stands for the Lebesgue measure on R. (By theway, the Lebesgue measure is also often denoted by Leb.) Then we understand theintegral of r with respect to lðdeÞ � dt asZ
E�R
rðe; tÞdt lðdeÞ :¼Z
E�R
rþ ðe; tÞdt lðdeÞ �Z
E�R
r�ðe; tÞdt lðdeÞ; ð1Þ
where þ1�1 :¼ þ1. When l is a r-finite measure on E, the Fubini–Tonellitheorem applies:Z
E�R
rðe; tÞdt lðdeÞ ¼Z
E
ZR
rþ ðe; tÞdt lðdeÞ �Z
E
ZR
r�ðe; tÞdt lðdeÞ:
By a Borel space is meant a measurable space that is isomorphic to a Borelsubset of a Polish space (complete separable metric space). A topological Borelspace is a topological space that is homeomorphic to a Borel subset of a Polishspace, endowed with the relative topology. Thus, when talking about a Borel space,
xii Preface
only the underlying r-algebra is fixed, whereas when talking about a topologicalBorel space, the topology is fixed. For a topological space E, BðEÞ denotes its Borelr-algebra, i.e., the r-algebra on E generated by all the open subsets of E. If E is atopological Borel space, then the measurable space ðE; BðEÞÞ is a Borel space. If Eis a Borel space without a fixed topology, we still typically denote its r-algebra byBðEÞ.
For a topological space E, we denote by CðEÞ ðrespectively, Cþ ðEÞÞ the spaceof bounded continuous (respectively, nonnegative bounded continuous) real-valuedfunctions on E.
If ðE; EÞ is a measurable space, then we denote by MFðEÞ ðrespectively,MF
þ ðEÞÞ the space of finite signed measures (respectively, finite nonnegativemeasures) on ðE; EÞ: Also MþðEÞ denotes the space of (possibly infinite) mea-sures on ðE; EÞ, and PðEÞ is the space of probability measures on ðE; EÞ.
If E is a topological space, then MFðEÞ (respectively, MFþ ðEÞÞ is understood
as the space of finite signed measures (respectively, finite nonnegative measures) onðE; B ðEÞÞ: Similarly, MþðEÞ denotes the space of (possibly infinite) measures onðE; B ðEÞÞ, and PðEÞ is the space of probability measures on ðE; BðEÞÞ.
The abbreviation a.s. (respectively, i.i.d.) stands for “almost surely” (respec-tively, “independent identically distributed”). Expressions like “for almost alls 2 R” refer to the Lebesgue measure, unless stated otherwise. For an ðE; EÞ-valued random variable X on ðX;F ; PÞ, the assertion “a statement holds for P-almost all X” and the assertion “a statement holds for PX�1-almost all x 2 E” meanthe same, where PX�1 denotes the distribution of X under P.
Throughout the main text (excluding the appendices), capital letters such as Husually denote random elements, lower case letters such as h denote arguments offunctions and realizations of random variables; and spaces are denoted using boldfonts, e.g., H.
In the rest of the book, we use the following abbreviations: CTMDP (respec-tively, DTMDP, ESMDP, SMDP) stands for continuous-time Markov decisionprocess (respectively, discrete-time Markov decision process, exponentialsemi-Markov decision process, semi-Markov decision process). They will berecalled when they appear for the first time in the main text below.
Liverpool, UK Alexey PiunovskiyYi Zhang
Preface xiii
Contents
1 Description of CTMDPs and Preliminaries . . . . . . . . . . . . . . . . . . . . 11.1 Description of the CTMDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Initial Data and Conventional Notations . . . . . . . . . . . . . . 11.1.2 Informal Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.3 Strategies, Strategic Measures, and Optimal Control
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.4 Realizable Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.1.5 Instant Costs at the Jump Epochs . . . . . . . . . . . . . . . . . . . 24
1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281.2.1 Queueing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281.2.2 The Freelancer Dilemma . . . . . . . . . . . . . . . . . . . . . . . . . 301.2.3 Epidemic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301.2.4 Inventory Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331.2.5 Selling an Asset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331.2.6 Power-Managed Systems . . . . . . . . . . . . . . . . . . . . . . . . . 351.2.7 Fragmentation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 361.2.8 Infrastructure Surveillance Models . . . . . . . . . . . . . . . . . . 371.2.9 Preventive Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.3 Detailed Occupation Measures and Further Sufficient Classesof Strategies for Total Cost Problems . . . . . . . . . . . . . . . . . . . . . 401.3.1 Definitions and Notations . . . . . . . . . . . . . . . . . . . . . . . . . 401.3.2 Sufficiency of Markov …-Strategies . . . . . . . . . . . . . . . . . . 431.3.3 Sufficiency of Markov Standard n-Strategies . . . . . . . . . . . 491.3.4 Counterexamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521.3.5 The Discounted Cost Model as a Special Case
of Undiscounted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551.4 Bibliographical Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
xv
2 Selected Properties of Controlled Processes . . . . . . . . . . . . . . . . . . . 632.1 Transition Functions and the Markov Property . . . . . . . . . . . . . . . 63
2.1.1 Basic Definitions and Notations . . . . . . . . . . . . . . . . . . . . 642.1.2 Construction of the Transition Function . . . . . . . . . . . . . . 652.1.3 The Minimal (Nonnegative) Solution to the Kolmogorov
Forward Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 692.1.4 Markov Property of the Controlled Process Under
a Natural Markov Strategy . . . . . . . . . . . . . . . . . . . . . . . . 762.2 Conditions for Non-explosiveness Under a Fixed Natural
Markov Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 792.2.1 The Nonhomogeneous Case . . . . . . . . . . . . . . . . . . . . . . . 802.2.2 The Homogeneous Case . . . . . . . . . . . . . . . . . . . . . . . . . . 932.2.3 Possible Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . 952.2.4 A Condition for Non-explosiveness Under
All Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 972.2.5 Direct Proof for Non-explosiveness Under
All Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1022.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
2.3.1 Birth-and-Death Processes . . . . . . . . . . . . . . . . . . . . . . . . 1092.3.2 The Gaussian Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1122.3.3 The Fragmentation Model . . . . . . . . . . . . . . . . . . . . . . . . 1142.3.4 The Infrastructure Surveillance Model . . . . . . . . . . . . . . . . 115
2.4 Dynkin’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1172.4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1172.4.2 Non-explosiveness and Dynkin’s Formula . . . . . . . . . . . . . 1302.4.3 Dynkin’s Formula Under All …-Strategies . . . . . . . . . . . . . 1332.4.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
2.5 Bibliographical Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
3 The Discounted Cost Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1453.1 The Unconstrained Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
3.1.1 The Optimality Equation . . . . . . . . . . . . . . . . . . . . . . . . . 1473.1.2 Dynamic Programming and Dual Linear Programs . . . . . . 151
3.2 The Constrained Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1593.2.1 Properties of the Total Occupation Measures . . . . . . . . . . . 1603.2.2 The Primal Linear Program and Its Solvability . . . . . . . . . 1673.2.3 Comparison of the Convex Analytic and Dynamic
Programming Approaches . . . . . . . . . . . . . . . . . . . . . . . . 1733.2.4 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1743.2.5 The Space of Performance Vectors . . . . . . . . . . . . . . . . . . 183
3.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1923.3.1 A Queuing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1923.3.2 A Birth-and-Death Process . . . . . . . . . . . . . . . . . . . . . . . . 198
3.4 Bibliographical Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
xvi Contents
4 Reduction to DTMDP: The Total Cost Model . . . . . . . . . . . . . . . . . 2014.1 Poisson-Related Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2014.2 Reduction to DTMDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
4.2.1 Description of the Concerned DTMDP . . . . . . . . . . . . . . . 2254.2.2 Selected Results of the Reduction to DTMDP . . . . . . . . . . 2334.2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2424.2.4 Models with Strongly Positive Intensities . . . . . . . . . . . . . 2514.2.5 Example: Preventive Maintenance . . . . . . . . . . . . . . . . . . . 258
4.3 Bibliographical Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
5 The Average Cost Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2635.1 Unconstrained Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
5.1.1 Unconstrained Problem: Nonnegative Cost . . . . . . . . . . . . 2645.1.2 Unconstrained Problem: Weight Function . . . . . . . . . . . . . 281
5.2 Constrained Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2935.2.1 The Primal Linear Program and Its Solvability . . . . . . . . . 2935.2.2 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3025.2.3 The Space of Performance Vectors . . . . . . . . . . . . . . . . . . 3075.2.4 Denumerable and Finite Models . . . . . . . . . . . . . . . . . . . . 314
5.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3195.3.1 The Gaussian Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3195.3.2 The Freelancer Dilemma . . . . . . . . . . . . . . . . . . . . . . . . . 322
5.4 Bibliographical Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
6 The Total Cost Model: General Case . . . . . . . . . . . . . . . . . . . . . . . . 3376.1 Description of the General Total Cost Model . . . . . . . . . . . . . . . . 337
6.1.1 Generalized Control Strategies and Their StrategicMeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
6.1.2 Subclasses of Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . 3426.2 Detailed Occupation Measures and Sufficient Classes
of Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3456.2.1 Detailed Occupation Measures . . . . . . . . . . . . . . . . . . . . . 3466.2.2 Sufficient Classes of Strategies . . . . . . . . . . . . . . . . . . . . . 3486.2.3 Counterexamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
6.3 Reduction to DTMDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3616.4 Mixtures of Strategies and Convexity of Spaces of Strategic
and Occupation Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3676.4.1 Properties of Strategic Measures . . . . . . . . . . . . . . . . . . . . 3686.4.2 Properties of Occupation Measures . . . . . . . . . . . . . . . . . . 384
6.5 Example: Utilization of an Unreliable Device . . . . . . . . . . . . . . . . 3886.6 Realizable Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3956.7 Bibliographical Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
Contents xvii
7 Gradual-Impulsive Control Models . . . . . . . . . . . . . . . . . . . . . . . . . . 4037.1 The Total Cost Model and Reduction . . . . . . . . . . . . . . . . . . . . . 403
7.1.1 System Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4037.1.2 Total Cost Gradual-Impulsive Control Problems . . . . . . . . 4057.1.3 Reduction to CTMDP Model with Gradual
Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4107.2 Example: An Epidemic with Carriers . . . . . . . . . . . . . . . . . . . . . . 425
7.2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4257.2.2 General Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4287.2.3 Optimal Solution to the Associated DTMDP
Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4317.2.4 The Optimal Solution to the Original Gradual-Impulsive
Control Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4377.3 The Discounted Cost Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
7.3.1 a-Discounted Gradual-Impulsive ControlProblems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
7.3.2 Reduction to DTMDP with Total Cost . . . . . . . . . . . . . . . 4467.3.3 The Dynamic Programming Approach . . . . . . . . . . . . . . . 4487.3.4 Example: The Inventory Model . . . . . . . . . . . . . . . . . . . . 452
7.4 Bibliographical Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
Appendix A: Miscellaneous Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
Appendix B: Relevant Definitions and Facts . . . . . . . . . . . . . . . . . . . . . . . 505
Appendix C: Definitions and Facts about Discrete-Time MarkovDecision Processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581
xviii Contents
Notation
A Action space 1, 549�A Action space in the DTMDP describing
the gradual-impulsive model405
�a ¼ ð�c;�b;qÞ Actions in the DTMDP describing thegradual-impulsive model
406
AG Space of gradual controls (actions) 404AI Space of impulsive controls (actions) 404A xð Þ Set of admissible actions 5, 156, 307, 551A tð Þ Action process 343, 398Bf Xð Þ Space of all f -bounded measurable
functions on X119, 527
b Impulsive action (control) 404�Bn (�bn) Random (realized) impulsive action 407
fBig1i¼1;fAig1i¼1
�Controlling process in DTMDP 226, 362, 550
�Cn (�cn) Random (realized) planned time until thenext impulse
407
cj x; að Þ;cGj x; að Þ
�Cost rates 2, 404
cIj x; að Þ Cost functions 404C x; að Þ Lump sum cost 24Co (Positive) cone 175, 540Co Dual cone 175, 540dj Constraints constants 11, 146, 263,
409, 554D Total occupation measures in DTMDP 562DS Sequences of detailed occupation
measures41
DReM Sequences of detailed occupationmeasures generated by Markov…-strategies
41
DRaM Sequences of detailed occupationmeasures generated by Markovstandard n-strategies
41
xix
DnP
Sequences of detailed occupationmeasures generated by Poisson-relatedstrategies
205
Dn Sequences of detailed occupationmeasures generated by generalizedstandard n strategies
385
DdetM Sequences of detailed occupationmeasures generated by mixtures ofsimple deterministic Markov strategies
386
Ddet Sequences of detailed occupationmeasures generated by mixtures ofdeterministic generalized standardn-strategies
387
DM Sequences of detailed occupationmeasures generated by mixtures ofMarkov standard n-strategies
387
D t Collection of all total normalizedoccupation measures
159
D av Collection of all stable probabilitymeasures
294
E Sc , E
Sx Expectations with respect to P S
c dxð Þ,P Sx dxð Þ
10, 341
E Sc Expectation with respect to P S
c57
Erc ;E
r;ac ;
Erx
�Expectation with respect to P
rc ;P
r;ac ;P r
x 226, 252, 551
Eqx Expectation with respect to the transition
(probability) function generated by theQ-function q
117
�E rx0 Expectation with respect to �P r
x0409
�Fnþ 1ð�hn;�c;�bÞ Relaxed control in the gradual-impulsivemodel
407
fF tgt 0;fGtgt 0
�Filtration 4, 339, 399, 544
�g Lagrange multipliers 181, 303, 542Gn, Gn
n Conditional distribution of the sojourntime and the post-jump state
9, 202, 203, 340,376eG Transition probability in the a-jump chain 82
ha xð Þ, h xð Þ Relative value function 268h; g;uð Þ Canonical triplet 284Hn (hn) Random (realized) finite history 4, 338�Hn (�hn) Random (realized) finite history in the
gradual-impulsive model407
Hn Space of histories 4, 338inf Pð Þ;inf Pavð Þ
�Value of the Primal Linear Program 179, 303, 541
inf P cð Þ;inf P c
av
� �� Value of the Primal Convex Program 179, 304, 542
J Number of constraints 11, 146, 263,409, 554
K, K Space of admissible state-action pairs 156, 307, 551
xx Notation
L, Lav Lagrangian 178, 304, 542lj x; að Þ Cost functions in DTMDP 226, 252, 362,
549�lj h; xð Þ;�a; t; yð Þð Þ;�lja h; xð Þ;�a; t; yð Þð Þ
�Cost functions in the DTMDP describing
the gradual-impulsive model406, 441
MGO CTMDP model with gradual control only 410ma Infimum of the discounted cost 268m S;a
c;n dx� dað Þ Detailed occupation measure 40, 205, 346M
rc dx� dað Þ Total occupation measure in DTMDP 561
Mrc;n dx� dað Þ Detailed occupation measure in DTMDP 562
O, ~O, OavA �ð Þ Space of performance vectors 184, 186, 309
pM , -M Markov standard n-strategy 6, 345p s; x; t; dyð Þ, pq s; x; t; dyð Þ Transition function 64, 65pq t; dyð Þ;pq x; t; dyð Þ
�(Homogeneous) transition function 76, 482
~pq s; x; t; dyð Þ Transition probability function generatedby pqðs; x; t; dyÞ
79
~pn;kðdajxÞ Element of a Poisson-related strategy 202pðdyjx; aÞ Transition probability 226, 362, 376paðdyjx; aÞ Transition probability 252�pðdt � dyj h; xð Þ;�aÞpðdt � dyj h; xð Þ;�aÞ
�Transition probability in the DTMDP
describing the gradual-impulsivemodel
406, 441
Pr; cPr Predictable r-algebras 5, 339, 399
PD Space of strategic measures in theassociated DTMDP
377
1-1-correspondence between strategicmeasures
377
PSc dxð Þ, PSx dxð Þ Strategic measure 9, 341
PSc Strategic measure in the “hat” model withkilling
56
�P rx0
Strategic measure in the DTMDPdescribing the gradual-impulsivemodel
409
~P v;xð Þ Probability on the trajectories of thea-jump chain
83
P Sc t; dy� dað Þ;
P Sc t; dyð Þ
�Marginal distribution 10, 341
Prc , P
r;ac , P r
x Strategic measures in DTMDP 226, 252, 551ParðEÞ Pareto set 537qðdyjx; aÞ;qð jji; aÞ;qGOðdyjx; aÞ
9=;Transition rate 1, 2, 404, 411
qðdyjx; sÞ Q-function 8q f ðdyjx; sÞ f -transformed Q-function 118qx að Þ; qx …ð Þ;qx sð Þ; qx qtð Þ
�Jump intensity 1, 8, 404, 405
qx n;…; sð Þ Jump intensity under a generalized…-n-strategy
340
Notation xxi
�qx Supremum of the jump intensity 2, 404~qðdyjx; aÞ;~qðdyjx;…Þ;~qðdyjx; sÞ;~qGOðdyjx; aÞ
9>>=>>;Post-jump measures 1, 7, 8, 404, 411
~qðdyjx; qtÞ Post-jump measure 405~qnðdyjx; sÞ Post-jump measure under a
Poisson-related strategy203
~qðdyjx; n; …; sÞ Post-jump measure under a generalized…-n-strategy
340
Qðdyjx; bÞ Post-impulse state distribution 404R AG� �
Collection of relaxed controls (P AG� �-
valued mappings)404
S ¼ fSng1n¼1 Control strategy 5S ¼ fN; p0; pn;…nð Þg1n¼1
� �Generalized control strategy 339
S (Uniformly) Optimal strategy 11SP Poisson-related strategy 202S Set of all strategies 7S Set of all generalized …-n-strategies 339S…, S… Set of all …-strategies 7, 343S-, Sn Set of all n-strategies 7, 343SGn
Set of all generalized standard n-strategies 345
SDS Set of all deterministic stationary strategies 7
SM… ;SM…
SM- ;SMn
� �)Set of all Markov p-strategies
(Markov standard n-strategies)7, 345
Ssn Set of all stationary standard n-strategies 345
SP, SPe Set of all Poisson-related strategies 202
Sstable Set of all stable strategies 294sup Dð Þ;sup Davð Þ
�Value of the Dual Linear Program 179, 304, 541
sup Dcð Þ;sup Dc
av
� �� Value of the Dual Convex Program 179, 304, 542
Tn (tn) Random (realized) jump moment 4, 3380; T1ð Þ Time horizon 5, 12U, U
0 Adjoint mapping 176, 303w t; xð Þ;wðxÞ Lyapunov function 80, 90, 94, 98ew xð Þ; ew0 xð Þ;w xð Þ;w0 xð Þ
�Lyapunov functions 133, 136, 173
Waj S; cð Þ;
Waj S; xð Þ
�Expected total a-discounted costs 11, 41
Wj S; cð Þ;Wj S; xð Þ
�Long run average cost 12, 263
�Wj r; x0ð Þ;�Wj
a r; x0ð Þ�
Performance functionals in thegradual-impulsive model
409, 441
WDT0 r; xð Þ Performance functional in DTMDP 553
Wa0 xð Þ Value (Bellman) function 11, 146
WDT0 xð Þ;
WDT;b0 xð Þ
�Value (Bellman) function in DTMDP 554, 559
~W Performance vector 184
xxii Notation
X State space 1, 403, 549�X State space in the DTMDP describing the
gradual-impulsive model405
X1 Extended state space 3, 338~X State space in the a-jump chain 82
XD State space excluding the cemetery 42Xd State space on which the f -transformed Q-
function q f is defined118
Xn (xn) Random (realized) post-jump state of thecontrolled process
4, 338
�Xn (�xn) Random (realized) state in the DTMDPdescribing the gradual-impulsivemodel
407
x1 Artificial isolated point 3, 338X tð Þ Controlled process 5, 339X ;Yð Þ Dual pair 540fYig1i¼0;fXig1i¼0
�Controlled process in DTMDP 226, 362, 550
a Discount factor 11c dxð Þ Initial distribution 2, 549D Cemetery 5, 41g dx� dað Þ Stable measure 293g S;ac dx� dað Þ a-discounted total occupation measure 159, 346
g S;0c dx� dað Þ Total occupation measure 241
Hn (hn) Random (realized) sojourn time 4, 338l x;CR � CXð Þ;~l x;CR � CXð Þ
�Random measures 4, 99
lðx;CR � CN � CXÞ Random measure 339lðx;CR � CX � CNÞ Random measure 399l;u 0ð Þ;u 1ð Þ� �
l-deterministic stationary strategy in thegradual-impulsive model
408
m Compensator of l 10, 341, 474~v Compensator of ~l 476m dbð Þ, m kð Þ Weights distribution 368, 371…nðdajhn�1; sÞ Relaxed control 5�…ðdajx; tÞ Natural Markov strategy 6…M Markov …-strategy 6, 344…sðdajxÞ Stationary …-strategy 7, 344P x; tð Þ P Að Þ-valued predictable process 6, 341-nðdajhn�1Þ Randomized control 5-M , pM Markov standard n-strategy 6, 345-sðdajxÞ;psðdajxÞ
�Stationary standard n-strategy 7, 345
qt dað Þ Relaxed control 404r ¼ frng1n¼1
¼ fr 0ð Þn ; �Fng1n¼1
Strategy in the gradual-impulsive model 407
fr 0ð Þn ;r 1ð Þ
n g1n¼1¼ fr 0ð Þ
n ; ~ung1n¼1
Markov standard n-strategy in thegradual-impulsive model
409
rC Hitting time in the a-jump chain 83r Control strategy in DTMDP 550rM , rsðdajxÞ Markov, stationary strategy in DTMDP 550
Notation xxiii
R Set of all strategies in DTMDP 226, 550RS Set of all stationary strategies in DTMDP 550RDM Set of all deterministic Markov strategies
in DTMDP550
RDS Set of all deterministic stationary strategiesin DTMDP
550
RGI Set of all strategies in thegradual-impulsive CTMDP
407
sC Return time in the a-jump chain 83sweak , s X ;Yð Þ Weak topology 527, 540u xð Þ, us xð Þ Deterministic stationary strategy 7, 344u xð Þ Deterministic stationary strategy in
DTMDP550
u Simple deterministic Markov strategy 344~uðdajxÞ Stationary randomized gradual control in
the gradual-impulsive model409
u 0ð Þ (u 1ð Þ) Impulsive (gradual) component of al-deterministic stationary strategy
408
N Artificial space for constructing strategicmeasures
202, 337
N1 Extended artificial space 338n1 Additional artificial point 338nn ¼ ðsn0; an0;sn1; a
n1; . . .Þ
Artificial point in case of a Poisson-relatedstrategy
351
Nn (nn) Random (realized) artificial point 338X;Fð Þ Sample space 4, 338, 344X;B Xð Þð Þ Sample space in DTMDP 550X Sample space in the “hat” model with
killing56
�X Sample space in the gradual-impulsivemodel
409
h Cemetery in a DTMDP 252, 565h�; �i Bilinear form, scalar product 175, 186, 540
xxiv Notation