160
MEASURE–VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES: THEORY AND APPLICATIONS

MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

MEASURE–VALUED DIFFERENTIATION FOR FINITE

PRODUCTS OF MEASURES:

THEORY AND APPLICATIONS

Page 2: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

ISBN 978 90 5170 905 6

c© Haralambie Leahu, 2008

Cover design: Crasborn Graphic Designers bno, Valkenburg a.d. Geul

This book is no. 428 of the Tinbergen Institute Research Series, established throughcooperation between Thela Thesis and the Tinbergen Institute. A list of books whichalready appeared in the series can be found in the back.

Page 3: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

VRIJE UNIVERSITEIT

MEASURE–VALUED DIFFERENTIATION FOR FINITE PRODUCTS OFMEASURES: THEORY AND APPLICATIONS

ACADEMISCH PROEFSCHRIFT

ter verkrijging van de graad Doctor aande Vrije Universiteit Amsterdam,op gezag van de rector magnificus

prof.dr. L. M. Bouter,in het openbaar te verdedigen

ten overstaan van de promotiecommissievan de faculteit der Economische Wetenschappen en Bedrijfskunde

op maandag 22 september 2008 om 13.45 uurin de aula van de universiteit,

De Boelelaan 1105

doorHaralambie Leahu

geboren te Galati, Roemenie

Page 4: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

promotor: prof.dr. H.C. Tijmscopromotor: dr. B.F. Heidergott

Page 5: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

TO MY PARENTS

Page 6: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES
Page 7: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

CONTENTS

1. Measure Theory and Functional Analysis . . . . . . . . . . . . . . . . . . . . . . 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Elements of Topology and Measure Theory . . . . . . . . . . . . . . . . . . 2

1.2.1 Topological and Metric Spaces . . . . . . . . . . . . . . . . . . . . . 21.2.2 The Concept of Measure . . . . . . . . . . . . . . . . . . . . . . . . 51.2.3 Cv-spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2.4 Convergence of Measures . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3 Norm Linear Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.3.1 Basic Facts from Functional Analysis . . . . . . . . . . . . . . . . . 141.3.2 Banach Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.3.3 Spaces of Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.3.4 Banach Bases on Product Spaces . . . . . . . . . . . . . . . . . . . 23

1.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2. Measure-Valued Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.2 The Concept of Measure-Valued Differentiation . . . . . . . . . . . . . . . 30

2.2.1 Weak, Strong and Regular Differentiability . . . . . . . . . . . . . . 302.2.2 Representation of the Weak Derivatives . . . . . . . . . . . . . . . . 352.2.3 Computation of Weak Derivatives and Examples . . . . . . . . . . . 40

2.3 Differentiability of Product Measures . . . . . . . . . . . . . . . . . . . . . 452.4 Non-Continuous Cost-Functions and Set-Wise Differentiation . . . . . . . . 482.5 Gradient Estimation Examples . . . . . . . . . . . . . . . . . . . . . . . . 52

2.5.1 The Derivative of a Ruin Probability . . . . . . . . . . . . . . . . . 522.5.2 Differentiation of the Waiting Times in a G/G/1 Queue . . . . . . . 56

2.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3. Strong Bounds on Perturbations Based on Lipschitz Constants . . . . . . . . . . 613.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.2 Bounds on Perturbations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.2.1 Bounds on Perturbations for Product Measures . . . . . . . . . . . 633.2.2 Bounds on Perturbations for Markov Chains . . . . . . . . . . . . . 68

3.3 Bounds on Perturbations for the Steady-State Waiting Time . . . . . . . . 753.3.1 Strong Stability of the Steady-State Waiting Time . . . . . . . . . . 753.3.2 Comments and Bound Improvements . . . . . . . . . . . . . . . . . 81

3.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Page 8: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

ii Contents

4. Measure-Valued Differential Calculus . . . . . . . . . . . . . . . . . . . . . . . . 854.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854.2 Leibnitz-Newton Rule and Weak Analyticity . . . . . . . . . . . . . . . . . 86

4.2.1 Leibnitz-Newton Rule and Extensions . . . . . . . . . . . . . . . . . 864.2.2 Weak Analyticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.3 Application: Stochastic Activity Networks (SAN) . . . . . . . . . . . . . . 944.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5. A Class of Non-Conventional Algebras with Applications in OR . . . . . . . . . 995.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 995.2 Topological Algebras of Matrices . . . . . . . . . . . . . . . . . . . . . . . 1005.3 Dp-Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.3.1 Dp-spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1045.3.2 Dp-Differentiability for Random Matrices . . . . . . . . . . . . . . . 106

5.4 A Formal Differential Calculus for Random Matrices . . . . . . . . . . . . 1085.4.1 The Extended Algebra of Matrices . . . . . . . . . . . . . . . . . . 1085.4.2 Dp-Differential Calculus . . . . . . . . . . . . . . . . . . . . . . . . 111

5.5 Taylor Series Approximations for Stochastic Max-Plus Systems . . . . . . . 1155.5.1 A Multi-Server Network with Delays/Breakdowns . . . . . . . . . . 1155.5.2 SAN Modeled as Max-Plus-Linear Systems . . . . . . . . . . . . . . 120

5.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125A. Convergence of Infinite Series of Real Numbers . . . . . . . . . . . . . . 125B. Interchanging Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126C. Measure Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127D. Conditional Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . 128E. Fubini Theorem and Applications . . . . . . . . . . . . . . . . . . . . . 129F. Weak Convergence of Measures . . . . . . . . . . . . . . . . . . . . . . . 130G. Functional Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131H. Overview of Weakly Differentiable Distributions . . . . . . . . . . . . . 132

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Samenvatting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

List of Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

Page 9: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

PREFACE

A wide range of stochastic systems in the area of manufacturing, transportation, financeand communication can be modeled by studying cost-functions1 over a finite collection ofindependent random variables, called input variables. From a probabilistic point of viewsuch a system is completely determined by the distributions of the input variables underconsideration, which will be called input distributions. Throughout this thesis we considerparameter-dependent stochastic systems, i.e., we assume that the input distributions de-pend on some real-valued parameter denoted by θ. More specifically, let Θ ⊂ R denotean open, connected subset of real numbers and let µi,θ, for 1 ≤ i ≤ n, be a finite familyof probability measures (input distributions) on some state spaces Si, for 1 ≤ i ≤ n,depending on some parameter θ ∈ Θ, such as, for example, the mean. We consider astochastic system driven by the above specified distributions and we call a performancemeasure of such a system the expression

Pg(θ) := Eθ[g(X1, . . . , Xn)] =

∫. . .

∫g(x1, . . . , xn)Πθ(dx1, . . . , dxn), (0.1)

for an arbitrary cost-function g, where the input variables Xi, for 1 ≤ i ≤ n, are dis-tributed according to µi,θ, respectively, and Πθ denotes the product measure

∀θ ∈ Θ : Πθ := µ1,θ × . . .× µn,θ. (0.2)

This thesis is devoted to the analysis of performance measures modeled in (0.1). Thisclass of models covers a wide area of applications such as queueing theory, project eval-uation and review technique (PERT), which provide suitable models for manufacturingor transportation networks, and insurance models. Specifically, the following concretemodels will be treated as examples: single-server queueing networks, stochastic activitynetworks and insurance models over a finite number of claims. Correspondingly, transientwaiting times in queueing networks, completion times in stochastic activity networks orruin probabilities in insurance models are examples of performance measures.

The main topic of research put forward in this thesis will be the study of analyticalproperties of the performance measures Pg(θ) such as continuity, differentiability and an-alyticity with respect to the parameter θ, for g belonging to some pre-specified class ofcost-functions D. This allows for a wide range of applications such as gradient estima-tion (which very often is an useful tool for performing stochastic optimization), sensitivityanalysis (bounds on perturbations) or Taylor series approximations. To this end, we studythe distribution Πθ of the vector (X1, . . . , Xn) rather than investigating each Pg(θ) indi-vidually, i.e., we study weak properties of the probability measure Πθ. More specifically, if

1 Real-valued functions designed to measure some specific performance of the system.

Page 10: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

iv Preface

D is a set of cost-functions, we say that a property (P) (e.g., continuity, differentiability)holds weakly, in a D-sense, for the measure-valued mapping θ 7→ µθ if for each g ∈ D themapping θ 7→ ∫

gdµθ has the same property (P). It turns out that one can simultaneouslyhandle the whole class of performance measures Pg(θ) : g ∈ D.

We propose here a modular approach to the analysis of Πθ, explained in the following.Let us identify the original stochastic process with the product measure Πθ defined in(0.2). Assume further that the input distributions µi,θ are weakly D-differentiable, foreach 1 ≤ i ≤ n. Then we show that the product probability measure Πθ is weaklydifferentiable and it follows that Pg(θ) is differentiable with respect to θ, for each g ∈ D.In addition, there exist a finite collection of “parallel processes”, Πl

θ : 1 ≤ l ≤ 2n, whichhave the same physical interpretation as the original process but differ from that by (atmost) one input distribution, such that for each g ∈ D we have

P ′g(θ) =

d

∫g(x)Πθ(dx) =

2n∑

l=1

βl,θ

∫g(x)Πl

θ(dx) =2n∑

l=1

βl,θPlg(θ), (0.3)

for some constants βl,θ which do not depend on g, where x := (x1, . . . , xn) denotes asample path of the process and, for 1 ≤ l ≤ 2n, P l

g(θ) denotes the counterpart of Pg(θ)in the process driven by Πl

θ. Therefore, in accordance with (0.3), one can evaluate thederivative of the performance measure Pg(θ) as a linear combination of the corresponding

performance measures P lg(θ) in some parallel processes. In particular, if P l

g is an unbiasedestimator for P l

g(θ), for each 1 ≤ l ≤ 2n, then

∂θ(Pg) :=2n∑

l=1

βl,θPlg (0.4)

provides an unbiased estimator for P ′g(θ). As it will turn out, a similar procedure can be

applied for evaluating higher-order derivatives of Pg(θ).The concept of weak differentiation has been first introduced in [47] for D consisting

of bounded and continuous performance measures and studied further in [48]. Althoughconsistent with classical convergence of probability measures, which induces convergencein distribution for random variables, this approach has a major pitfall. Namely, it can notdeal with unbounded cost-functions such as, for instance, the identity mapping. Therefore,the concept was extended to general classes of cost-functions in [32] where it has beenshown that weak differentiation provides unbiased gradient estimators.

In this thesis we aim to develop a weak differential calculus for measures (measure-valued differential calculus). More specifically, if D denotes a class of real-valued mappingson some “well-behaved” metric space S then for any continuous, non-negative mappingv : S→ R one can define the subsequent class [D]v of v-bounded mappings as follows:

[D]v := g ∈ D : ∃c > 0 s.t. ∀s ∈ S : |g(s)| ≤ c v(s). (0.5)

It turns out that if D denotes the class of either continuous or measurable mappings onS then [D]v defined by (0.5) becomes a Banach space when endowed with the so-calledv-norm ‖ · ‖v given by

∀g ∈ D : ‖g‖v := sups∈S

|g(s)|v(s)

. (0.6)

Page 11: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

Preface v

The pair (D, v) will be called a Banach base on S and will serve as a basis for defining weakdifferentiability and, more generally, weak properties. Therefore, in order to establish asolid mathematical background to support our theory, we appeal to a rather advancedmathematical machinery. More specifically, starting from the observation that regularmeasures on metric spaces appear as continuous linear functionals on some functional(Banach) spaces, e.g., [D]v, we apply standard results from functional analysis in order toderive fruitful results for weak differentiation theory. For instance, if we identify a measurewith a linear functional on the Banach space [D]v then weak convergence of measures isequivalent to the convergence in the weak topology induced by [D]v on its topologicaldual [D]∗v. In addition one can define a strong (norm) topology on the space of measuresby using the operator v-norm defined as

∀µ ∈ [D]∗v : ‖µ‖v := sup‖g‖v≤1

∣∣∣∣∫

g(s)µ(ds)

∣∣∣∣ , (0.7)

where ‖g‖v is defined by (0.6). It will turn out that classical theorems such as the Banach-Steinhaus Theorem and the Banach-Alaoglu Theorem will perfectly fit into this setting.

The material in this thesis is organized into five chapters and it is largely based onthe results put forward in [22], [23], [26] and [28]. However, this dissertation does notreduce to a simple concatenation of the results in the above papers but it is rather amonograph on weak differentiation of measures, and applications, which, for the sake ofthe completeness of the theory, includes some results which were not presented in theaforementioned papers. In Chapter 1 we provide a detailed overview of basic conceptsand preliminary results which are used to develop a weak differentiation theory. Althoughmost of these facts can be found in any standard text book on topology, measure theoryor functional analysis, we think that a small compendium of mathematical analysis wouldbe helpful for the reader. Apart from that, some new concepts, such as Banach base,which will be later used to formalize the concept of weak differentiation, are introducedand studied. Moreover, the theory of weak convergence of sequences of signed measuresis developed in Chapter 1. More specifically, sufficient conditions for both weak [D]v-convergence of measures and weak convergence of positive and negative parts of signedmeasures are treated.

In Chapter 2 several types of measure-valued differentiation, among which weak dif-ferentiation plays a key role, are discussed. It turns out that, in some situations, weakdifferentiability is equivalent to Frechet (strong) differentiability. A key result in thischapter, which has been first established in [28], will show that the product of two weaklydifferentiable measures is again weakly differentiable. This leads one to conclude that theproduct measure Πθ defined by (0.2) is weakly differentiable, provided that the input dis-tributions µi,θ, for 1 ≤ i ≤ n, are weakly differentiable. In addition, a result which showsthat weak differentiability implies strong Lipschitz continuity, where “strong” means withrespect to the operator v-norm defined by (0.7), will be provided. This will be the startingpoint for establishing strong bounds on perturbations in Chapter 3. Eventually, we inves-tigate under which conditions weak differentiability of measures implies set-wise differen-tiability and we illustrate our theory with some elaborate gradient estimation examples.For instance, a ruin problem arising in insurance will be treated in Section 2.5.1 and the

Page 12: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

vi Preface

weak differentiability of the distribution of the transient waiting time will be analyzed inSection 2.5.2.

Chapter 3 deals with strong bounds on perturbations. That is, we establish boundsfor expressions such as

∆g(θ1, θ2) := |Pg(θ2)− Pg(θ1)| (0.8)

where, for θ ∈ Θ, Pg(θ) is defined by (0.1). We establish bounds on the perturbations ∆g

in (0.8) by showing that the function Pg(θ) is Lipschitz continuous in θ and we extendour results to general Markov chains. A first attempt on this issue was made in [22] andfurther developed in [26]. The results presented in Chapter 3 basically rely on the theorydeveloped in Chapter 2. Eventually, we illustrate the results by an application to bothtransient and steady-state waiting times in the G/G/1 queue. An important result, whichshows that weak differentiability of the service-time distribution in a G/G/1 queue impliesstrong Lipschitz continuity of the stationary distribution of the queue, will indicate thatweak differentiation techniques can be successfully applied when studying strong stabilityof Markov chains.

In Chapter 4 we extend the concept of weak differentiation to higher order derivativesand weak analyticity. It will turn out that differentiation of products of measures israther similar to that of functions in classical analysis, i.e., a “Leibnitz-Newton” ruleholds true. Moreover, we show that, just like in conventional analysis, the product oftwo weakly analytical measures is again weakly analytical. Eventually, we perform Taylorseries approximations for parameter-dependent stochastic systems. These results werealso established in [28].

Finally, in Chapter 5 we apply our measure-valued differential calculus developedin Chapter 4 to distributions of random matrices in some non-conventional algebras ofmatrices (e.g., max-plus and min-plus algebra). An elaborate example was treated in [23].It will turn out that, by choosing the set D to be a class of polynomially bounded cost-functions, a formal calculus of weak differentiation can be introduced for random matricesas well. This appears to be useful in applications as it provides handy tools for computingalgorithmically higher-order derivatives and, consequently, constructing Taylor series.

Page 13: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

1. MEASURE THEORY AND FUNCTIONAL ANALYSIS

This preliminary chapter deals with basic concepts and results from both measure theoryand functional analysis as much of the theory put forward in this thesis relies on standardresults from these two highly inter-connected fields of mathematics.

1.1 Introduction

The connection between measure theory and functional analysis is very well known. Con-cepts like duality and norm spaces find a perfect justification in terms of measures. Morespecifically, measures can be viewed as elements in some particular linear space. It iswell known that Radon measures appear as linear functionals on the space of continuousfunctions on some locally compact topological space. For a recent reference see, e.g., [10].Therefore, one can derive interesting results by establishing structural properties for thespace of measures using tools from functional analysis and then translating them in termsof measures. This is particulary useful when dealing with convergence issues on spaces ofmeasures.

Throughout this chapter, particular attention will be paid to signed measures. Thisdeviates from standard literature where convergence results are formulated for probabilitymeasures, only. While many properties of signed measures can be easily derived fromsimilar properties of positive measures via the well known Hahn-Jordan decomposition,this is not straightforward when dealing with convergence issues, as will be illustrated inSection 1.2.4. This will lead us to introduce the concept of regular convergence.

Most likely, the reason why not many authors dealt with convergence of signed mea-sures is its lack of applications. So why investing in such a topic? The answer is partlygiven in Chapter 2, where the concept of weak differentiation is introduced. As it willturn out, the weak derivative is a signed measure and for studying weak derivatives itwill prove fruitful to extend standard results regarding weak convergence of probabilitymeasures to signed measures. However, to be able to use tools from functional analysis,like the Banach-Steinhaus and Banach-Alaoglu theorems, an appropriate mathematicalsetting is needed and this leads to the concept of Banach base introduced in Section 1.3.2.

Weak convergence of measures is one of the key topics of this chapter. It was originallyintroduced by P. Billingsley in [8] for probability measures in terms of bounded andcontinuous functions (test functions). Here we aim to extend the concept in the followingdirections: (1) by considering signed measures and (2) by considering a larger class oftest functions. The main reason is that weak convergence as introduced in [8] is unableto handle unbounded performance measures, e.g., the mean and the deviation, whichdrastically reduces its area of applicability. The analysis of weak convergence of signedmeasures as put forward in this chapter is new. The theoretical work is a technical

Page 14: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

2 1. Measure Theory and Functional Analysis

preliminary for our later results on weak differentiability.The chapter is organized as follows. A brief introduction to topology and measure

theory is provided in Section 1.2, where basic definitions and notations are presented.Section 1.3 deals with norm spaces of both functions and measures. In particular, theconcept of Banach base, which will serve as a basis for developing our theory, will beintroduced.

1.2 Elements of Topology and Measure Theory

This section is devoted to recall basic concepts related to topology and measure theory. InSection 1.2.1 metric spaces, which will be the basis for developing our theory, are discussed.Then, in Section 1.2.2 we discuss the concept of measure and particular attention will bepaid to signed measures. Eventually, in Section 1.2.3 a special class of functional spacesis introduced to be used in Section 1.2.4 for defining weak convergence of measures.

1.2.1 Topological and Metric Spaces

Let S be a non-empty set. A family T of subsets of S is called a topology on S if it satisfiesthe following requirements

• S and ∅ belong to T.

• Any union of elements from T belongs to T.

• Any finite intersection of elements from T belongs to T.

A sub-family B ⊂ T is called a base for the topology T if any set A ∈ T can be expressedas a union of elements from B. Bases are useful because many properties of topologiescan be reduced to statements about a base generating that topology and because manytopologies are most easily defined in terms of a base which generates them.

If T,T′ are topologies on S we say that T is coarser than T′ if T ⊂ T′. It can be easilyseen that any arbitrary intersection of topologies on S is again a topology on S. Therefore,for an arbitrary family A of subsets of S one can define the topology generated by A bytaking the intersection of all topologies on S which contain A, i.e., the coarsest topologywhich contains A. Consequently, it can be shown that B is a base for the topology T ifand only if

(i) there exist an arbitrary family Ai : i ∈ I ⊂ B such that

S =⋃i∈I

Ai,

(ii) for any A1, A2 ∈ B and s ∈ A1 ∩ A2 there exist A3 ∈ B such that

s ∈ A3 ⊂ A1 ∩ A2,

(iii) T is the topology generated by the family B.

Page 15: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

1.2. Elements of Topology and Measure Theory 3

If T is a topology on S then the pair (S, T) is called a topological space. The elementsof T are called open sets and the closed sets are defined as the complements of the opensets. It follows that any union and any finite intersection of open sets is still an open setand any topology is determined by the open sets.

Let (S1,T1) and (S2,T2) be topological spaces. A mapping f : S1 → S2 is said to becontinuous if

∀A ∈ T2 : f−1(A) ∈ T1,

where f−1(A) denotes the pre-image of the set A through f , i.e.,

f−1(A) = s ∈ S1 : f(s) ∈ A.

Note that the continuity property of f depends on the topologies T1 and T2. Moreover,f remains continuous if one enlarges T1 but one can not draw the same conclusion if T1

becomes coarser. Hence, we conclude that, for fixed T2, there is a minimal (coarsest)topology which makes f continuous. This is generated by the family

f−1(A) : A ∈ T2

and it is called the topology generated by f . In the same way, one can define the topologygenerated by an arbitrary family of functions fi : i ∈ I.

While many other concepts such as compactness, separability and completeness canbe introduced at this abstract level we prefer to concentrate our attention on the specialclass of metric spaces to be introduced presently.

A mapping d : S× S→ [0,∞) is said to be a distance (or metric) on S if

• d(s, t) = 0 if and only if s = t,

• it is symmetric, i.e.,

∀s, t ∈ S : d(s, t) = d(t, s),

• it satisfies the triangle inequality, i.e.,

∀r, s, t ∈ S : d(r, t) ≤ d(r, s) + d(s, t).

If d is a metric on S, then the pair (S, d) will be called a metric space.In what follows, we assume that (S, d) is a metric space and we let

∀s ∈ S, ε > 0 : Bε(s) := x ∈ S : d(x, s) < ε

denote the open ball centered in s of radius ε. S is endowed with the standard topologygiven by metric d, i.e., the topology generated by the base

B = Bε(s) : s ∈ S, ε > 0 .

It turns out that the set A ⊂ S is open if for all s ∈ A there exists ε > 0 such thatBε(s) ⊂ A. The closure of a set A, denoted by A, is defined as the smallest closed set

Page 16: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

4 1. Measure Theory and Functional Analysis

which includes A. For instance, it can be shown that the closure of Bε(s), denoted shortlyby Bε(s), is given by

Bε(s) = x ∈ S : d(x, s) ≤ ε.An element x ∈ S is said to be an adherent point for the set A ⊂ S if x ∈ A and we callx an accumulation point for A if x ∈ A \ A. If A ⊂ B ⊂ S, we say that A is a densesubset of B if A = B, i.e., B consists at all adherent points of A. S is said to be separableif there exists a dense countable subset si : i ∈ I ⊂ S. It is known, for instance, thatEuclidean spaces Rn are separable.

The set A ⊂ S is said to be bounded if

sups,t∈A

d(s, t) < ∞

and we call A compact if for each family Ai : i ∈ I satisfying

A ⊂⋃i∈I

Ai

there exist a finite set of indices i1, . . . , in ⊂ I, for some n ≥ 1, such that

A ⊂n⋃

i=1

Ai.

It turns out that every compact set is closed and bounded but the converse is, in general,not true1.

The metric space S is said to be locally compact if for all s ∈ S, there exists someε > 0 such that Bε(s) is a compact set. S is said to be complete if each Cauchy sequencesnn ⊂ S is convergent to some limit s ∈ S. Note that compactness implies completenesswhile the converse is not true. For instance, R is complete but it fails to be compact. Itis however locally compact. For more details on general topology we refer to [36].

On the metric space S we denote by C(S) the space of continuous, real-valued functionsand by CB(S) the subspace of continuous and bounded functions. The set CB(S) becomesitself a metric space when endowed with the distance

∀f, g ∈ CB(S) : D(f, g) = sups∈S

d(f(s), g(s)). (1.1)

Since every continuous function maps compacts into compacts (in particular boundedsets) CB(S) = C(S) provided that S is compact. Moreover, if S is complete then CB(S)enjoys the same property. For later reference we denote by C+(S) the cone of non-negative,continuous mappings on S, i.e.,

C+(S) = f ∈ C(S) : f(s) ≥ 0, ∀s ∈ S.1 For euclidian spaces, compact is actually equivalent to closed and bounded.

Page 17: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

1.2. Elements of Topology and Measure Theory 5

1.2.2 The Concept of Measure

We call a σ-field on S a family S of subsets of S satisfying

• ∅ ∈ S,

• if An ∈ S, for each n ∈ N, then

n∈NAn ∈ S,

• for each A ∈ S it holds that A ∈ S,

where A denotes the complement of A, i.e., A = S \ A. Similar to topologies, theintersection of an arbitrary family of σ-fields is a σ-field and consequently we define theσ-field generated by a family A as the intersection of all σ-fields containing A. On themetric space S we denote by S its Borel field , i.e., the smallest σ-field which contains theopen sets. If R denotes the Borel field of R, then we say that the mapping f : S→ R ismeasurable if

∀C ∈ R : s ∈ S : f(s) ∈ C ∈ S.

Let F(S) denote the space of measurable functions on S and FB(S) ⊂ F(S) denote thesubspace of bounded mappings. Since continuity implies measurability it holds that

C(S) ⊂ F(S).

σ-fields are basic structures on which we define measures. A mapping µ : S → R∪±∞is called a signed measure if µ(∅) = 0 and for each family Ann ⊂ F of mutually disjointsets it holds that2

µ

(⋃

n∈NAn

)=

n∈Nµ(An).

If µ(A) ≥ 0, for each A ∈ S, we call µ a positive measure, or simply a measure, whenno confusion occurs. In standard terminology, a signed measure is a measure which isallowed to attain negative values.

Positive Measures

The positive measure µ is said to be finite if µ(A) < ∞, for each A ∈ S, i.e., µ(S) < ∞.A (positive) measure µ is said to be locally finite if for all s ∈ S there exists ε > 0 suchthat µ(Bε(s)) < ∞. We call µ a Radon measure if it is locally finite and regular , i.e.,

• µ is outer regular, i.e., each set A ∈ S satisfies

µ(A) = infµ(U) : A ⊂ U, U is open,2 The property is often referred to as σ-additivity. To avoid unnecessary complications we exclude the

case when µ takes both ±∞ as values.

Page 18: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

6 1. Measure Theory and Functional Analysis

• µ is inner regular, i.e., each open subset U ⊂ S satisfies

µ(U) = supµ(K) : K ⊂ U, K is compact.

We say that a family P of measures is tight if each µ ∈ P is finite and for each ε > 0there exists a compact subset K of S such that

∀µ ∈ P : µ(S \K) < ε.

Note that, if P = µ, i.e., P consists of a single element, then tightness is equivalent toinner regularity of µ, provided that µ is finite.

For a measure µ and p ≥ 1 we denote by Lp(µ) the family of measurable functionswhose pth power is Lebesgue integrable with respect to µ, i.e.,

Lp(µ) =

g ∈ F(S) :

∫|g(s)|pµ(ds) < ∞

.

For an arbitrary family of measures µi : i ∈ I we denote by Lp(µi : i ∈ I) the family ofmeasurable functions which are Lebesgue integrable with respect to µi, for all i ∈ I, i.e.,

Lp(µi : i ∈ I) =⋂i∈I

Lp(µi).

We say that v ∈ F is uniformly integrable with respect to the family µi : i ∈ I if

limx↑∞

supi∈I

∫|v(s)| · I|v|>x(s)µi(ds) = 0,

where I|v|>x denotes the indicator function of the set s ∈ S : |v(s)| > x. It is worthnoting that uniform integrability of v with respect to the family µi : i ∈ I impliesuniform integrability of v with respect to any sub-family µi : i ∈ J, with J ⊂ I and ifv is uniformly integrable with respect to µi : i ∈ I it follows that v ∈ L1(µi : i ∈ I).However, the converse is true only when I is finite.

In general, checking uniform integrability of a function g with respect to some familyµi : i ∈ I ⊂ M+, by definition, might not be the most convenient method. In practice,a common way to prove uniform integrability is the following.

Lemma 1.1. Let g ∈ F , µi : i ∈ I ⊂ M+. If there exists ϑ : [0,∞) → [0,∞) satisfying

M := supi∈I

∫ϑ(|g(s)|)µi(ds) < ∞, lim

x→∞ϑ(x)

x= ∞,

then g is uniformly integrable with respect to µi : i ∈ I.Proof. From the limit-relation we conclude that for arbitrarily small ε > 0 there existssome xε > 0 such that for each x > xε it holds that ε−1 < x−1ϑ(x). Hence, for each s,

|g(s)| > xε ⇒ |g(s)| < ε ϑ(|g(s)|).

Page 19: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

1.2. Elements of Topology and Measure Theory 7

Therefore, for any x > xε it holds that

∀i ∈ I :

∫|g(s)| · I|g|>x(s)µi(ds) ≤ ε

∫ϑ(|g(s)|)µi(ds) ≤ ε M.

Take in the above inequality the supremum with respect to i ∈ I and the claim followsby letting ε → 0.

A measure µ is said to be absolutely continuous with respect to another measure λif for each A ∈ S, λ(A) = 0 implies µ(A) = 0. Two measures µ and κ are said to beorthogonal if there exists A ∈ S such that µ(A) = κ(A) = 0. If S is a Euclidean spaceand we denote by ` the Lebesgue measure on S then any measure which is absolutelycontinuous with respect to ` is referred to as absolutely continuous, or continuous , andany measure which is orthogonal with ` is referred to as singular .

Signed Measures

At a theoretical level, signed measures arise as natural extensions of measures becausethey can be organized as a linear space. This will be explained in Section 1.3.3. Inpractice, signed measures very often appear as differences between positive measures. Infact, any signed measure can be represented as the difference between two measures. Thisfact derives from the well known Hahn-Jordan decomposition theorem which states thatany signed measure µ can be represented as

∀A ∈ S : µ(A) = [µ]+(A)− [µ]−(A), (1.2)

where [µ]± are uniquely determined orthogonal measures called the positive (resp. nega-tive) part of µ. The measure |µ| defined as

∀A ∈ S : |µ|(A) = [µ]+(A) + [µ]−(A)

is called the variation measure of µ and the positive number

‖µ‖ = |µ|(S) = [µ]+(S) + [µ]−(S) (1.3)

is called the the total variation (norm) of µ. Note, however, that a representation as in(1.2) is not unique if we drop the orthogonality condition. More specifically, it can beshown that [µ]± satisfy

[µ]+(A) = supµ(E) : E ∈ S, E ⊂ A ≥ maxµ(A), 0, [µ]−(A) = [µ]+(A)− µ(A) ≥ 0,

and any other decomposition µ = µ+−µ− of µ satisfies µ± = ν +[µ]±, for some (positive)measure ν. This means in particular that the orthogonal decomposition in (1.2) minimizesthe sum µ+ +µ−, where the minimization has to be understood with respect to the orderrelation given by µ ≥ ν iff µ(A) ≥ ν(A), for all A ∈ S. Therefore, it holds that

|µ| = infµ+ + µ− : µ+ − µ− = µ, µ± are positive measures.

Throughout this thesis we will denote the orthogonal decomposition by [µ]±.

Page 20: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

8 1. Measure Theory and Functional Analysis

In what follows we assume that S is separable and locally compact, we denote byM(S) the space of signed Radon measures on S and denote by MB(S) the subset offinite (bounded) measures. The cone of positive measures in M(S) is denoted by M+(S)and we denote by M1(S) the subset of probability measures, i.e.,

M1(S) = µ ∈M+(S) : µ(S) = 1.Many properties of measures can be extended to signed measures by means of the

variation measure. More specifically, we say that µ is locally finite, finite, regular orabsolutely continuous with respect to some λ if |µ| is locally finite (resp. finite, regularor absolutely continuous with respect to λ). In all of these situations it turns out thatboth [µ]+ and [µ]− enjoy the same property. Moreover, we say that a measurable functionis integrable with respect to a signed measure µ if it is integrable with respect to thevariation of µ or, equivalently, if it is integrable with respect to both [µ]±. In the samevein we say that the family P of signed measures is tight if the corresponding family ofpositive measures |µ| : µ ∈ P is tight, which is equivalent to the tightness of bothfamilies [µ]± : µ ∈ P. Consequently, some standard results from measure theory canbe easily extended to signed measures. A list of a few standard results in measure theorycan be found in Section C of the Appendix. For thorough treatment of signed measures,we refer to [14].

We conclude this section with a few remarks on measure-valued mappings . For a non-empty set Θ ⊂ R let µθ : θ ∈ Θ ⊂ M(S) be an arbitrary family of signed measuresand consider the mapping µ∗ : Θ →M(S) defined as

∀θ ∈ Θ : µ∗(θ) = µθ,

i.e., µθ : θ ∈ Θ is the range of µ∗. Provided that an appropriate topology is introducedon M(S), or some subset which includes the range of µ∗, continuity of measure-valuedmappings is defined in an obvious way.

1.2.3 Cv-spaces

Throughout this section we assume that v is a non-negative, continuous function on S,i.e., v ∈ C+(S) and we denote by Sv the support of v, i.e., the open set

Sv := s ∈ S : v(s) > 0.We denote by Cv(S) the set of v-bounded, continuous functions, i.e.,

Cv(S) := g ∈ C(S) : ∃c > 0 s.t. |g(s)| ≤ c v(s),∀s ∈ S. (1.4)

Note that if v ∈ CB(S) then Cv(S) = CB(S) and, in general, CB(S) ⊂ Cv(S) provided thatinfv(s) : s ∈ S > 0. In addition, if g ∈ Cv then g(s) = 0 for any s ∈ S \ Sv. A typicalchoice for Cv(S) is provided in the following example.

Example 1.1. Let vα(x) = eαx, for some α ≥ 0, for x ∈ S = [0,∞). Since for everypolynomial p it holds that limx→∞ e−αx|p(x)| = 0 it turns out that the space Cvα([0,∞))contains all (finite) polynomials. However, Cvα is not restricted to polynomials. Indeed,note that the mapping x 7→ ln(1 + x) also belongs to Cvα. Moreover, for α < β we haveCvα ⊂ Cvβ

since α < β implies ‖g‖vβ≤ ‖g‖vα, for any g.

Page 21: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

1.2. Elements of Topology and Measure Theory 9

Remark 1.1. A set D of measurable mappings is said to separate the points of a familyP ⊂M(S) if for each µ1, µ2 ∈ P, µ1 6= µ2 there exists some g ∈ D such that

∫g(s)µ1(ds) 6=

∫g(s)µ2(ds).

This can be re-phrased by saying that “the family of integrals with integrands g ∈ Duniquely determines the measure in P”. It is known that CB(S) enjoys this propertywhile, in general, such a property fails to hold true when D = Cv(S). Indeed, let us denoteby v the identity mapping on S = [0,∞), i.e., v(s) = s, for each s ≥ 0. Then, for allg ∈ Cv(S) it holds that |g(0)| ≤ c v(0) = 0 and if for α > 0 we let

∀A ∈ S : µα(A) = α · IA(0),

i.e., the measure which assigns mass α to 0, we note that Cv does not separate the pointsof the family P := µα : α > 0. Indeed, for α 6= β, it holds that

∀g ∈ Cv(S) :

∫g(s)µα(ds) =

∫g(s)µβ(ds) = 0,

which stems from the fact that all measures in P assign mass exclusively to point 0 /∈ Sv.

As detailed in Remark 1.1, Cv-spaces fail to separate the points of M(S). However,in applications one is typically interested in evaluating the integrals

∫gdµ, for g ∈ Cv(S),

rather than investigating the measure µ, itself. That is, we study the trace of a measureµ on Sv, since any g ∈ Cv vanishes on S\Sv. The following result will show that Cv-spacesseparate equivalence classes.

Lemma 1.2. Let µ1, µ2 ∈M(S) and let v ∈ C+(S) ∩ L1(µ1, µ2) be such that

∀g ∈ Cv :

∫g(s)µ1(ds) =

∫g(s)µ2(ds). (1.5)

Then the traces of µ1 and µ2 on Sv coincide, i.e.,

∀A ∈ S : µ1(A ∩ Sv) = µ2(A ∩ Sv), (1.6)

provided that minµ1(Sv), µ2(Sv) < ∞.

Proof. Since S is the Borel σ-field of S, we may assume without loss of generality thatA ∈ S is an arbitrary non-empty, open set. For n ≥ 1 consider the set

An := s ∈ A : d(s, A) ≥ 1/n ⊂ A,

where, for E ⊂ S we denote d(s, E) = infd(s, x) : x ∈ E. Note that, for sufficiently largen, An is a non-empty, closed set satisfying An∩A = ∅. Since A is an open set, i.e., A isclosed, according to Urysohn’s Lemma there exists a continuous function fn : S → [0, 1]such that fn(s) = 1 for s ∈ An and fn(s) = 0, for s ∈ A. On the other hand the familyAn : n ≥ 1 ⊂ S is increasing and ∪n≥1An = A. Hence, fn converges point-wise to IA asn →∞.

Page 22: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

10 1. Measure Theory and Functional Analysis

Consider now for each n ≥ 1 the mapping gn ∈ C+(S) defined by

gn(s) = minfn(s), n · v(s).

Note that gn ∈ Cv(S), for each n ≥ 1, and by hypothesis it follows that

∀n ≥ 1 :

∫gn(s)µ1(ds) =

∫gn(s)µ2(ds). (1.7)

Moreover, we have gn ≤ IA∩Sv and limn gn = IA∩Sv , point-wise. Therefore, provided thatminµ1(Sv), µ

2(Sv) < ∞, by letting n → ∞ in (1.7) it follows from the DominatedConvergence Theorem that

µ1(A ∩ Sv) = µ2(A ∩ Sv),

which concludes the proof of (1.6).

Remark 1.2. If we denote by ∼ the equivalence relation on M(S) given by µ1 ∼ µ2 if(1.6) holds true then Lemma 1.2 shows that if (1.5) holds true then µ1 ∼ µ2. That is,Cv(S) separates the points of the quotient space M(S)/ ∼.

For ease of notation, in the following we will omit specifying the space S or the σ-fieldS, when no confusion occurs.

1.2.4 Convergence of Measures

Throughout this section we discuss the concept of weak convergence of measures. For-mally, we say that a sequence of measures µnn is weakly D-convergent to some limit µif the integrals of µn converge to those of µ for some predefined class of cost-functions D.Weak convergence of measures was originally introduced in [8] in terms of continuous andbounded functions, i.e., D = CB. The main reason for this is that CB(S) separates thepoints of M(S) and, as a consequence, the weak limit is uniquely determined, providedthat it exists.

A first step in extending this concept is by taking D = Cv since, according to Lemma1.2, Cv-spaces posses satisfactory separation properties which make them suitable fordefining weak convergence. Concurrently, the main result of this section will establishhow general Cv-convergence is related to classical CB-convergence. The following definitionintroduces the concept of weak convergence on M.

Definition 1.1. Let µn : n ∈ N ⊂ M and D ⊂ L1(µn : n ∈ N). The sequence µnn isweakly D-convergent, if there exists µ ∈M such that

∀g ∈ D : limn→∞

∫g(s)µn(ds) =

∫g(s)µ(ds). (1.8)

We write µnD

=⇒ µ (or simply µn ⇒ µ when no confusion occurs) and we call µ a weakD-limit of the sequence µnn.

Page 23: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

1.2. Elements of Topology and Measure Theory 11

Note that a weak D-limit is determined by the class of integrals with integrands g ∈ Dand is not unique if D does not separate the points of M; see Remark 1.2. However,Cv ⊂ L1(µn : n ∈ N) is equivalent to v ∈ L1(µn : n ∈ N) and by letting D = Cv the weaklimit µ in (1.8) is unique in the sense specified by Lemma 1.2. Therefore, one obtains asensible definition for Cv-convergence by letting D = Cv, for some v ∈ C+∩L1(µn : n ∈ N),in Definition 1.1. The following example illustrates the dependence of Cv-convergence ofa sequence of measures µnn on the choice of v.

Example 1.2. On S = R let us consider the family of probability densities

∀θ ∈ (0, 2), x ∈ R : f(θ, x) =sin

(πθ2

)

π· |x|

θ−1

1 + x2.

If we consider the sequence of probability measures µn : n ≥ 1, given by

∀n ≥ 1, x ∈ R : µn(dx) = f

(n− 1

n, x

)dx,

then µnCB=⇒ µ, where µ denotes the Cauchy distribution, i.e.,

µ(dx) = f(1, x)dx.

Nevertheless, the sequence µnn fails to be Cv-convergent, when v(x) = |x|, althoughv ∈ L1(µn : n ≥ 1). Indeed, we have

∀n ≥ 1 :

∫|x| µn(dx) < ∞ but

∫|x| µ(dx) = ∞.

Now the following question comes naturally: “When does CB-convergence of measuresimply Cv-convergence?” More specifically, which g ∈ F satisfy

limn→∞

∫g(s)µn(ds) =

∫g(s)µ(ds), (1.9)

provided that µnCB=⇒ µ? In the following we aim to answer to this question and investigate

how general Cv-convergence relates to classical convergence. A first step into that directionis the following result which has been proved in [8]; see Theorem F.2 in the Appendix.

Lemma 1.3. Let µn : n ∈ N ⊂ M+ be such that µnCB=⇒ µ. The mapping g ∈ C+

satisfies equation (1.9) if and only if g is uniformly integrable with respect to the familyµn : n ∈ N.

Note that, in Example 1.2, v(x) = |x| is not uniformly integrable with respect to thefamily µn : n ≥ 1 although v ∈ L1(µn : n ≥ 1). The following result will establish arelationship between Cv-convergence and classical weak convergence of positive measures.

Theorem 1.1. Let v ∈ C+ and let µn : n ∈ N ⊂ M+ be a sequence of measures.

(i) If µnCB=⇒ µ, i.e., µ is the classical weak limit of the sequence µnn and v is

uniformly integrable with respect to µn : n ∈ N then µnCv=⇒ µ.

Page 24: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

12 1. Measure Theory and Functional Analysis

(ii) If µnCv=⇒ µ, µn(S \ Sv) = 0, for each n ∈ N, and the family µn : n ∈ N is tight3

then µnCB=⇒ µ.

Proof. (i) We have to show that the limit relation in (1.9) holds true for each g ∈ Cv andwe can assume without loss of generality that

∀x ∈ S : 0 ≤ g(x) ≤ v(x). (1.10)

Therefore, in accordance with Lemma 1.3 it suffices to show that each g satisfying (1.10)is uniformly integrable with respect to µn : n ∈ N, provided that v is. Now, this followsimmediately from the inequality

∀x ∈ S : g(x) · Ig>α(x) ≤ v(x) · Iv>α(x).

(ii) We have to show that (1.9) holds true for each g ∈ CB. We can assume withoutloss of generality that 0 ≤ g(s) ≤ 1, for each s ∈ S. For m ≥ 1, let us define

∀s ∈ S : gm(s) := ming(s), m · v(s)and let us show that the double-indexed sequence am,nm,n, defined as

∀m ≥ 1, n ∈ N : am,n :=

∫gm(s)µn(ds)

satisfies the conditions of Theorem B.1 (see the Appendix).First, note that, for m ≥ 1, gm ∈ Cv and, by hypothesis,

∀m ≥ 1 : limn→∞

am,n = bm :=

∫gm(s)µ(ds).

On the other hand, since µn(S \ Sv) = 0, for each n ∈ N, by the Monotone ConvergenceTheorem (see Theorem C.2 in the Appendix) we conclude that

∀n ∈ N : limm→∞

am,n = cn :=

∫g(s)µn(ds).

Furthermore, the family µn : n ∈ N being tight it follows that there exists some compactKε ⊂ Sv such that µn(Sv \Kε) < ε, for each n ∈ N, and µ(Sv \Kε) < ε. Furthermore, thefunction g/v being continuous, hence bounded on Kε, it follows that

M := sups∈Kε

g(s)

v(s)< ∞.

Choosing now mε ≥ M , it follows that for n ∈ N and m ≥ mε we have

|am,n − cn| ≤ µn(s : g(s) > m · v(s)) ≤ µn(Sv \Kε) ≤ ε, (1.11)

since µn(S \ Sv) = 0, for each n ∈ N, and for s ∈ Kε we have g(s) ≤ M · v(s).

3 Note that, if infs v(s) > 0 then tightness of the family µn : n ∈ N is a consequence of µnCv=⇒ µ.

Page 25: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

1.3. Norm Linear Spaces 13

Therefore, the sequence am,nm,n satisfies the conditions of Theorem C.1 and inter-changing limits is justified, i.e.,

limn→∞

∫g(s)µn(ds) = lim

n→∞lim

m→∞am,n = lim

m→∞lim

n→∞am,n =

∫g(s)µ(ds),

which concludes the proof.

Theorem 1.1 provides the means for assessing Cv-convergence when classical weakconvergence of measures holds true and vice-versa. For instance, applying Theorem 1.1to Cv defined in Example 1.1, we conclude that if the sequence µnn converges CB-weaklyto µ and (1.9) holds true for v(s) = eαs, for some α ≥ 0, then the moments of µn convergeto those of µ.

We conclude this section by discussing the concept of regular convergence. Let thesequence µnn be Cv-convergent to some limit µ. We say that µnn is regularly Cv-convergent if

[µn]+Cv=⇒ [µ]+ and [µn]−

Cv=⇒ [µ]−,

i.e., the positive and negative parts of µn converge to the positive and negative parts ofµ, respectively. A natural question that arises in the study of limits of signed measures iswether Cv-convergence is equivalent to regular Cv-convergence. Or, if the sequences [µn]±

converge at all. That would allow one to extend standard results regarding classical weakconvergence of measures (e.g., Lemma 1.3 and Theorem 1.1) to general signed measures.Unfortunately, as the following example illustrates, this is not always the case.

Example 1.3. Let ξn = 1/n, for each n ≥ 1, and consider the sequence

µn =

δξn + δ1+ξn − δ1, for n even;δξn , for n odd,

where, for x ∈ S, we denote by δx the Dirac distribution, assigning mass at point x, i.e.,

∀A ∈ S : δx(A) = IA(x).

Then µnCB=⇒ δ0 but [µ2k+1]

+ CB=⇒ δ0 and [µ2k]+ CB=⇒ δ0 + δ1.

However, it is worth noting that Cv-convergence of the sequence [µn]+ is equivalent

to that of [µn]−, provided that µnCv=⇒ µ. Moreover, [µn]+

Cv=⇒ [µ]+ is equivalent to

[µn]−Cv=⇒ [µ]−. A sufficient condition for regular convergence will be given in Section 1.3.3.

1.3 Norm Linear Spaces

This section aims to illustrate the link between measure theory and functional analysis.More specifically, we show how both functions and measures can be treated as ordinaryelements in some norm linear spaces. Moreover, powerful results can be derived by ap-plying standard results from Banach spaces theory. To this end, we provide in Section1.3.1 a brief overview of the basic concepts and tools from functional analysis which willbe used throughout this thesis. In Section 1.3.2 we introduce the concept of Banach baseand show, by means of an example, that this leads to a proper generalization of the Cv-space introduced in Section 1.2.3. Spaces of measures are treated in Section 1.3.3 whereasSection 1.3.4 provides a method to construct Banach bases on product spaces.

Page 26: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

14 1. Measure Theory and Functional Analysis

1.3.1 Basic Facts from Functional Analysis

The central concept in functional analysis is the linear (vector) space. We say that V isa (real) linear space if there exist two binary operations

+ : V × V → V , · : R× V → V

called addition and scalar multiplication, respectively, such that

• the addition is commutative and associative, i.e.

∀x,y, z ∈ V : x + y = y + x, x + (y + z) = (x + y) + z,

• there exists a zero element 0, i.e.,

∀x ∈ V : x + 0 = x,

• for each x ∈ V there exists an inverse element −x ∈ V , i.e.,

∀x ∈ V , ∃ − x ∈ V : x + (−x) = 0,

• scalar multiplication is compatible with real number multiplication, i.e.,

∀α, β ∈ R,x ∈ V : (αβ) · x = α · (β · x),

• 1 acts as an identity element for scalar multiplication, i.e.,

∀x ∈ V : 1 · x = x,

• scalar multiplication distributes over both vector and real numbers addition, i.e.,

∀α, β ∈ R,x,y ∈ V : α · (x + y) = α · x + α · y, (α + β) · x = α · x + β · x.

A subset W ⊂ V is called stable, or linear subspace if

∀α, β ∈ R,x,y ∈ W : α · x + β · y ∈ W .

We say that the mapping ‖ · ‖ : V → [0,∞) is a semi-norm on V if

• ‖ · ‖ is sub-additive, i.e.,

∀x,y ∈ V : ‖x + y‖ ≤ ‖x‖+ ‖y‖,

• ‖ · ‖ is positively homogenous, i.e.,

∀α ∈ R,x ∈ V : ‖α · x‖ = |α| ‖x‖.

Page 27: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

1.3. Norm Linear Spaces 15

In particular, from the last property we conclude that ‖0‖ = 0, by letting α = 0. A familyof semi-norms ‖ · ‖i : i ∈ I is said to be separating if for each x ∈ V , x 6= 0, there existssome i ∈ I such that ‖x‖i > 0. A separating family of semi-norms induces a topology onV if we consider as a base the class of finite intersections from the family

B0 = Vi(x, ε) : x ∈ V , ε > 0, i ∈ I,

where, for each x ∈ V , ε > 0 and i ∈ I we set

Vi(x, ε) := y : ‖y− x‖i < ε.

A topology generated in this way will be called a locally convex topology and this topologyis the coarsest topology on V which makes the mappings ‖ · ‖i continuous, for each i ∈ I.For a full treatment of locally convex topologies we refer to [54].

If, in addition, ‖x‖ = 0 implies that x = 0, we say that ‖ · ‖ is a norm. A norm ‖ · ‖induces a metric d on V , as follows:

∀x,y ∈ V : d(x,y) = ‖x− y‖. (1.12)

Therefore, any norm induces a topology on V by means of the metric d, given by (1.12),and the topology induced by the metric d will be called the norm topology on V . Notethat, if ‖ · ‖ is a norm on V then the single-element-family ‖ · ‖ is a separating familyof semi-norms and the corresponding locally convex topology coincides with the normtopology, i.e., the norm topology is a particular case of locally convex topology.

We say that the linear norm space (V , ‖ · ‖) is a Banach space if it is complete underthe norm topology. The simplest examples of Banach spaces are euclidian spaces Rk, fork ≥ 1, with the uniform topology , induced by the norm

∀x = (x1, . . . , xk) ∈ Rk : ‖x‖ = max|x1|, . . . , |xk|.

A standard non-elementary Banach space is the space of bounded and continuous func-tions CB(S) endowed with the supremum norm, i.e.,

∀f ∈ CB(S) : ‖f‖ = sups∈S|f(s)|. (1.13)

If (U , ‖ · ‖U) and (V , ‖ · ‖V) are norm spaces we say that the mapping Φ : V → U is alinear operator from V onto U if it is additive and homogeneous, i.e.,

∀α, β ∈ R;x,y ∈ V : Φ(α · x + β · y) = α · Φ(x) + β · Φ(y).

The linear operator Φ is said to be a bounded if there exists M > 0 such that

∀x ∈ V : ‖Φ(x)‖U ≤ M‖x‖V (1.14)

and Φ is said to be an isometric operator or isometry, for short, if

∀x ∈ V : ‖Φ(x)‖U = ‖x‖V . (1.15)

Page 28: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

16 1. Measure Theory and Functional Analysis

It is a standard fact that a linear operator is continuous if and only if it is bounded.Moreover, any isometry is a continuous operator, since (1.14) holds true for any M ≥ 1and if V is a Banach space and Φ is a bijective isometry it follows that U is a Banachspace, as well.

The minimal M > 0 for which (1.14) holds true is called the operator norm of Φ andis denoted by ‖Φ‖; in formula,

‖Φ‖ = inf M > 0 : ‖Φ(x)‖U ≤ M‖x‖V ,∀x ∈ V . (1.16)

If we denote by L(V ,U) the class of linear operators form V onto U then L(V ,U) is alinear space and ‖·‖ defined by (1.16) is a proper norm on the subspace of linear boundedoperators, denoted by LB(V ,U). In addition, for each Φ ∈ LB(V ,U) it holds that

‖Φ‖ = sup ‖Φ(x)‖U : ‖x‖V ≤ 1 = sup ‖Φ(x)‖U : ‖x‖V = 1 .

If (U , ‖ · ‖U) is a Banach space then LB(V ,U) is a Banach space as well. Furthermore,if U = R then L(V ,R) is called the algebraic dual of V , its elements are called linearfunctionals and LB(V ,R) is called the topological dual of V , typically denoted by V∗.Therefore, we conclude that the topological dual of a norm space is a Banach space. Formore details on continuous linear operators we refer to [19].

Topological duality plays an important role in functional analysis and it provides themeans for constructing new topologies on norm spaces. In some situations, the newtopologies appear more natural for applications. That is why we briefly explain theconcept of duality in the following. Let V and U be a pair of topological linear spaces andlet < ·, · >: V × U → R be a bilinear mapping such that

< x,y >= 0,∀x ∈ V ⇒ y = 0, and < x,y >= 0, ∀y ∈ U ⇒ x = 0.

Then one can define on V a minimal, locally convex topology, denoted by σ(U ,V), whichmakes the projection (linear) mappings

< ·,y >: y ∈ Ucontinuous. This is the topology induced by the family of semi-norms

| < ·,y > | : y ∈ U.In addition, one can define by symmetry a minimal topology on U , denoted by σ(V ,U),which makes the mappings

< x, · >: x ∈ Vcontinuous. The topologies σ(U ,V) and σ(V ,U) are called dual topologies.

An interesting situation arises when considering the norm spaces V and V∗, bothendowed with the corresponding norm topology, and the continuous, bilinear mapping< ·, · > defined as

∀x ∈ V , Φ ∈ V∗ : < x, Φ >= Φ(x).

In this case, the dual topologies are called weak topologies. More specifically, σ(V∗,V) iscalled the weak topology and σ(V ,V∗) is called the weak-* topology .

Page 29: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

1.3. Norm Linear Spaces 17

Note that, in general, the weak topology is coarser than the norm topology. Con-sequently, continuity in norm topology implies continuity in weak topology whereas, ingeneral, the converse is not true. This justifies the name “weak topology” and the fact thatthe norm topology is typically called “strong topology”. For details on dual topologieswe refer to [9], [19].

1.3.2 Banach Bases

In this section we provide a general method to construct spaces of measurable functions.These are norm spaces (in some cases even Banach spaces) and extend the concept Cv-space introduced in Section 1.2.3.

For some v ∈ C+ let us consider the so-called v-norm on F , as follows

‖g‖v = sups∈S

|g(s)|v(s)

= infc > 0 : |g(s)| ≤ c · v(s),∀s ∈ S.

In particular, for each g ∈ F it holds that4:

∀s ∈ S : |g(s)| ≤ ‖g‖v · v(s). (1.17)

Example 1.4. Let Cv be defined as in Example 1.1, for α = 1. That is, v(x) = ex, forx ≥ 0. If f(x) = 1 + x, for x ≥ 0, we have f(x) ≤ ex, for all x ≥ 0 and

supx≥0

f(x)e−x = limx↓0

(1 + x)e−x = 1.

Hence, ‖f‖v = 1. On the other hand, if g(x) = x then ‖g‖v = e−1 since

supx≥0

xe−x = e−1.

Remark 1.3. The v-norm is also known as weighted supremum norm in the literature.An early reference is [42]. The v-norm is frequently used in Markov decision analysis.First traces date back to the early eighties, see [16] and the revised version which waspublished as [17]. It was originally used in analysis of Blackwell optimality; see [17], and[34] for a recent publication on this topic. Since then, it has been used in various formsunder different names in many subsequent papers; see, for example, [35] and [44]. Forthe use of v-norm in the theory of measure-valued differentiation of Markov chains; see,e.g., [24]. For the use of v-norm in the context of strong stable Markov chains we referto [35].

For an arbitrary subset D ⊂ F and v ∈ C+ let us denote by [D]v the set of elementsof D with finite v-norm, i.e.,

[D]v = g ∈ D : ‖g‖v < ∞ (1.18)

and extend Definition 1.1 by calling the sequence µnn∈Nweakly [D]v-convergent if thereexists µ such that

∀g ∈ [D]v : limn→∞

∫g(s)µn(ds) =

∫g(s)µ(ds). (1.19)

4 Note that inequality in (1.17) still holds true if ‖g‖v = ∞.

Page 30: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

18 1. Measure Theory and Functional Analysis

The set D in (1.18) is called the base set of [D]v and note that it can be chosen, withoutloss of generality, to be a linear subspace of F . Moreover, the set Cv defined in (1.4) can bewritten as [C]v, i.e., Cv-convergence introduced in Definition 1.1 is in fact [C]v-convergenceand [C]v = CB, for any v ∈ CB, i.e., for v ∈ CB we recover the classical weak convergence.In particular, if v ≡ 1 then the v-norm coincides with the supremum norm on CB.

As it will turn out, powerful results on convergence, continuity and differentiability ofproduct measures can be established if the base set in (1.18) is such that [D]v becomes aBanach space when endowed with the appropriate v-norm. This gives rise to the followingdefinition.

Definition 1.2. The pair (D, v) is called a Banach base on S if:

(i) D is a linear space such that C ⊂ D ⊂ F ,

(ii) v ∈ C+ and the set [D]v in (1.18) endowed with the v-norm is a Banach space.

In the following we present two examples of Banach bases that arise in applications.

Example 1.5. The continuity paradigm: D = C. Taking v ∈ C+ we obtain [C]v as theset of all continuous mappings bounded by v. It can be shown that (C, v) is a Banach baseon S. Indeed, the mapping5 Φ : [C(S)]v → CB(Sv) defined as

∀s ∈ Sv, g ∈ [C(S)]v : (Φg)(s) =g(s)

v(s)(1.20)

establishes a linear bijection between two norm spaces and the inverse Φ−1 is given by

∀s ∈ S, g ∈ CB(Sv) : (Φ−1g)(s) = g(s) · v(s).

Furthermore, Φ is an isometry as it satisfies

∀g ∈ [C(S)]v : ‖Φg‖ = ‖g‖v.

Since CB(Sv) is a Banach space when equipped with the supremum-norm, [C(S)]v inheritsthe same property; see [56].

The measurability paradigm: D = F . Taking v ∈ C+, we obtain [F ]v as the set ofall measurable mappings bounded by v. Again, the linear mapping Φ : [F(S)]v → FB(Sv)defined by (1.20) is an isometry and we conclude that (F , v) is a Banach base on S.

As the above example shows, the functional spaces [C]v and [F ]v are Banach bases foreach v ∈ C+. Note that, the condition C ⊂ D is a minimal prerequisite for developingour theory, since by Lemma 1.2 the space [C]v posses satisfactory separation properties,while the condition D ⊂ F comes naturally since we only deal with measurable functions.Therefore, if (D, v) is a Banach base then we have

[C]v ⊂ [D]v ⊂ [F ]v.

5 The assumption v ∈ C guarantees that the transformation Φ preserves continuity.

Page 31: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

1.3. Norm Linear Spaces 19

Remark 1.4. Theorem F.2 (see the Appendix) shows that, for D = C, the set of func-tions g satisfying (1.19) includes some significant class of non-continuous, measurablemappings. Namely, if the sequence µnn ⊂M1 is weakly CB-convergent to µ, i.e., (1.19)holds true for each g ∈ CB, then the class of functions g which satisfy (1.19) can beextended to [C(µ)]v, for some v which is uniformly integrable with respect to the familyµn : n ∈ N, where C(µ) denotes the space of functions which are continuous µ-a.e.

In the remainder of this thesis, we will impose the following assumption:

Whenever a Banach base (D, v) is considered, D is either C or F .

The idea behind this assumption is that one should think of D as a class of functionsenjoying some topological property rather than a simple set of functions. This is no severerestriction with respect to our applications; see Remark 1.4. In this setting, [D]v spacesenjoy an important property which will be used in many proofs. Namely, if the functiong belongs to the class D then a continuous transformation of g, i.e., the composition f gor the product f · g, with f continuous, belongs also to the class D.

Many statements in this thesis will be formulated in terms of [D]v spaces, whichmeans that they hold true for both D = C and D = F , i.e., they generate two statementswhich are obtained by replacing D by C and F , respectively. In most of the cases theproof does not distinguish between these two situations but, when necessary, the proofwill be modified accordingly. As a final remark, since a weak [F ]v property implies thecorresponding weak [C]v property, in some statements we will replace D by C, if possible,in order to make the result stronger.

1.3.3 Spaces of Measures

In functional analysis, signed measures often appear as continuous linear functionals onspaces of functions. More precisely, by the Riesz Representation Theorem (see TheoremF.3 in the Appendix) a space of measures can be seen as the topological dual of a certainspace of functions. Throughout this section we aim to exploit this fact in order to derivenew results using specific tools from Banach space theory.

Let (D, v) be a Banach base on S and let

Mv :=µ ∈M : v ∈ L1(µ)

.

If α, β ∈ R and µ, ν ∈Mv then α · µ + β · ν ∈Mv, where

∀A ∈ S : (α · µ + β · ν)(A) = αµ(A) + βν(A).

Hence, Mv can be organized as a linear space. Moreover, note that we have

[D]v ⊂ L1(µθ : θ ∈ Θ) ⇔ v ∈ L1(µθ : θ ∈ Θ) ⇔ µθ : θ ∈ Θ ⊂ Mv

and for v = 1 we have Mv = MB, i.e., Mv consists of finite elements. The subset of Mv

which consists of probability measures is denoted by M1v, i.e.,

M1v := Mv ∩M1

Page 32: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

20 1. Measure Theory and Functional Analysis

and note that if v = 1 then M1v = MB ∩M1 = M1.

For µ ∈ Mv consider the Hahn-Jordan decomposition µ = [µ]+ − [µ]− and define theweighted total variation norm of µ with respect to v (shortly: v-norm), as follows:

‖µ‖v =

∫v(s)|µ|(ds) =

∫v(s)[µ]+(ds) +

∫v(s)[µ]−(ds). (1.21)

In particular, a Cauchy-Schwartz Inequality holds for v-norms. In formula:

∀g ∈ [D]v,∀µ ∈Mv :

∣∣∣∣∫

g(s)µ(ds)

∣∣∣∣ ≤ ‖g‖v · ‖µ‖v. (1.22)

Note that, using the v-norm, the space Mv can be alternatively described as

Mv = µ ∈M : ‖µ‖v < ∞

and for v ≡ 1 one recovers the total variation norm, given by (1.3).On the other hand, for µ ∈Mv the application ΦD

µ : [D]v → R defined as

∀g ∈ [D]v : ΦDµ (g) =

∫g(s)µ(ds)

is a linear functional on the space [D]v, whose operator norm satisfies

∥∥ΦDµ

∥∥v

= sup∣∣ΦD

µ (g)∣∣ : g ∈ [D]v, ‖g‖v ≤ 1

= ‖µ‖v.

To see this, note that if A is a set such that [µ]+(A) = [µ]−(A) = 0 and

∀s ∈ S : g∗(s) := v(s)IA(s)− v(s)IA(s)

it follows that g∗ is measurable, ‖g∗‖v = 1 and ‖µ‖v = | ∫ g∗(s)µ(ds)|. Hence,

‖µ‖v =

∣∣∣∣∫

g∗(s)µ(ds)

∣∣∣∣ ≤∥∥ΦF

µ

∥∥v. (1.23)

Moreover, using Urysohn’s Lemma it can be shown that there exists some sequence ofcontinuous functions fnn such that |fn(s)| ≤ 1, for all n and s and such that

∀s ∈ S : limn→∞

fn(s) = IA(s)− IA(s).

Hence, if we define gn(s) = fn(s)v(s), for each n and s, we have gn ∈ C, ‖gn‖v ≤ 1, foreach n, and by the Dominated Convergence Theorem (see Theorem C.1 in the Appendix)we have

‖µ‖v =

∣∣∣∣∫

g∗(s)µ(ds)

∣∣∣∣ = limn→∞

∣∣∣∣∫

gn(s)µ(ds)

∣∣∣∣ ≤∥∥ΦC

µ

∥∥v. (1.24)

On the other hand, from the Cauchy-Schwarz Inequality we conclude that

∥∥ΦDµ

∥∥v≤ ‖µ‖v,

Page 33: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

1.3. Norm Linear Spaces 21

which, together with (1.23) and (1.24) leads to

‖ΦDµ ‖v = ‖µ‖v.

Therefore, any element µ in Mv can be identified by a continuous linear functional ΦDµ

on the space [D]v and the operator norm of ΦDµ coincides with the weighted total variation

norm of µ, given by (1.21). It follows that Mv is a subset of the topological dual of [D]vand the weak [D]v-convergence is in fact the convergence given by the trace of the weak-*topology on Mv. However, for ease of exposure, we agree to call it “weak” since we willnot make any reference to the actual weak topology, induced on [D]v by its topologicaldual, so no confusion occurs.

As discussed in Section 1.3.1, norm convergence on Mv implies weak convergence. Inthis case, this is a consequence of the Cauchy-Schwartz Inequality. Indeed, if µn convergesin v-norm to µ, letting ν = µn−µ in (1.22), it follows that (1.19) holds true for all g ∈ [D]v.The converse is, however, not true as detailed in the following example.

Example 1.6. Consider the convergent sequence xnn ⊂ R having limit x ∈ R. Itis known that the sequence of corresponding Dirac distributions δxnn ⊂ M is weaklyCB-convergent to δx. However, norm convergence does not hold since

limn→∞

‖δxn − δx‖ = limn→∞

sup|g|≤1

|g(xn)− g(x)| = 2 6= 0.

In the following, we endow Mv with the weak-* topology given by [D]v-convergence(we omit specifying [D]v when not relevant) and refer to the v-norm convergence as strongconvergence . Consequently, by continuity we mean weak continuity, i.e., with respect toweak-* topology and by strong continuity we mean continuity with respect to the v-normconvergence.

We continue our analysis by presenting a few results which can be easily derived byusing a functional analytic approach to spaces of measures. For instance, the Banach-Steinhaus Theorem can be applied to a convergent sequence µn of measures which allowsto deduce that the family µnn is strongly bounded in Mv. For later reference weformalize this statement in the following lemma.

Lemma 1.4. Let (D, v) be a Banach base and let the sequence µnn converge to somelimit µ in Mv. Then, it holds that

supn∈N

‖µn‖v < ∞.

Proof. Under the assumption in the lemma, the set µn : n ∈ N is bounded in the weaksense, i.e., for each g ∈ [D]v, the set ∫ gdµn : n ∈ N is bounded in R. The claim thenfollows from the Banach-Steinhaus Theorem (see Theorem G.1 in the Appendix).

Recall now the definition of regular convergence given in Section 1.2.4. As illustratedby Example 1.3, convergence of a sequence µn towards some limit µ does not implyregular convergence. The following result will show that under some additional conditionthe positive part of µn converge to the positive parts of µ.

Page 34: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

22 1. Measure Theory and Functional Analysis

Lemma 1.5. Let (D, v) be a Banach base and let the sequence µnn converge to somelimit µ in Mv. Then, the sequence µnn converges regularly to µ if and only if

limn→∞

‖µn‖v = ‖µ‖v. (1.25)

Proof. The direct implication is immediate. Assume now that the sequence µnn con-verges to µ and (1.25) holds true. Lemma 1.4 implies that the family µn : n ∈ N isstrongly bounded in Mv and so is [µn]+ : n ∈ N, since ‖[µn]+‖v ≤ ‖µn‖v. Therefore,in accordance with the Banach-Alaoglu Theorem (see Theorem G.2), it follows that theclosure of the set [µn]+ : n ∈ N is compact in the weak-* topology and there exist asubsequence nkk≥1 ⊂ N such that [µnk

]+k converges in Mv.Next, we show that any convergent subsequence of [µn]+ : n ∈ N converges to [µ]+.

Indeed, choose an arbitrary convergent subsequence [µnk]+k and denote by λ ∈M+ its

limit. Since [µnk]− = [µnk

]+ − µnkit follows that [µnk

]− converges to λ − µ. Moreover,(λ− µ) ∈M+ since it is the limit of a sequence of positive measures. The uniqueness ofthe limit implies that µ = λ − (λ − µ) and from the minimality property of the Hahn-Jordan decomposition we conclude that there exists some ν ∈M+ such that λ = ν +[µ]+

and λ− µ = ν + [µ]−. Consequently,

limk→∞

‖µnk‖v = ‖µ‖v + 2‖ν‖v

and by hypothesis it follows that ‖ν‖v = 0. Therefore, from Lemma 1.2 it follows that νis the null measure, i.e., λ = [µ]+, which concludes the proof.

Remark 1.5. The proof of Lemma 1.5 indicates that if µn converges to µ then it holdsthat

‖µ‖v ≤ lim infn

‖µn‖v.

Therefore, another equivalent condition for regular convergence is

‖µ‖v ≥ lim supn

‖µn‖v.

An immediate consequence of Lemma 1.5 is the following result.

Corollary 1.1. Under the conditions put forward in Lemma 1.5, if the sequence µnn

converges strongly to µ then it converges regularly to µ.

Proof. First, note that the following inequality holds true:

∀n ∈ N : |‖µn‖v − ‖µ‖v| ≤ ‖µn − µ‖v.

Now the proof follows from Lemma 1.5

We say that the continuous measure-valued mapping µ∗ is regularly continuous at θ ifthe mapping [µ∗]+ is continuous at θ. It follows that [µ∗]− is continuous at θ, as well. Thestatements in Lemma 1.4 and Lemma 1.5 can be easily extended to arbitrary families ofmeasures. More specifically, the following statement holds true.

Page 35: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

1.3. Norm Linear Spaces 23

Theorem 1.2. Let µ∗ : Θ →Mv be a continuous measure-valued mapping.

(i) Then for each compact K ⊂ Θ it holds that

supθ∈K

‖µθ‖v < ∞.

(ii) In addition, if the real-valued mapping ‖µ∗‖v is continuous at θ then the measure-valued mapping µ∗ is regularly continuous at θ. In particular, the same conclusionholds true when µ∗ is strongly continuous.

Proof. (i) By hypothesis, for each compact K ⊂ Θ it holds that

∀g ∈ [D]v : supθ∈K

∣∣∣∣∫

g(s)µθ(ds)

∣∣∣∣ < ∞.

Assuming that there exists some compact K ′ ⊂ Θ such that supK′ ‖µθ‖v = ∞ it followsthat there exists a sequence θnn in K ′ such that supn ‖µθn‖v = ∞, which contradictsLemma 1.4.

(ii) Assuming, for instance, that [µ∗]+ is not continuous at θ it follows that thereexists a sequence ξn → 0 such that [µθ+ξn ]+ does not converge to [µθ]

+, which contradictsLemma 1.5. A similar reasoning as in Corollary 1.1 concludes the proof.

1.3.4 Banach Bases on Product Spaces

Let S,T be separable complete metric spaces endowed with Borel fields S and T andBanach bases, (D(S), v) and (D(T), u), respectively and consider the class of mappingsg : S× T→ R satisfying

∀s ∈ S, t ∈ T : g(s, ·) ∈ D(T), g(·, t) ∈ D(S). (1.26)

In addition, let us define the tensor product v ⊗ u : S× T→ R as follows:

∀s ∈ S, t ∈ T : (v ⊗ u)(s, t) = v(s) · u(t). (1.27)

Let us denote by D(S) ⊗ D(T) the class of functions g ∈ F(S × T) satisfying condition(1.26) which, as the following example shows, imposes no restriction in applications.

Example 1.7. We revisit the Banach bases introduced in Example 1.5.

• Let g ∈ C(S× T). Then

∀s ∈ S, t ∈ T : g(s, ·) ∈ C(T), g(·, t) ∈ C(S)

and it follows thatC(S× T) ⊂ C(S)⊗ C(T). (1.28)

• Let g ∈ F(S× T). Then

∀s ∈ S, t ∈ T : g(s, ·) ∈ F(T), g(·, t) ∈ F(S)

and it follows thatF(S× T) ⊂ F(S)⊗F(T), (1.29)

Page 36: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

24 1. Measure Theory and Functional Analysis

We define now the product of (D(S), v) and (D(T), u) as follows:

(D(S)⊗D(T), v ⊗ u).

The next result shows that products of Banach bases are again Banach bases, where theabove definitions are extended to finite products in the obvious way.

Theorem 1.3. Let (D(Si), vi, ) be Banach bases, respectively, for 1 ≤ i ≤ k.

(i) Then the pair

(D(S1)⊗ · · · ⊗ D(Sk), v1 ⊗ . . .⊗ vk)

is a Banach base on S1 × · · · × Sk. In particular, for all 1 ≤ i ≤ k

∀sj ∈ Sj, j 6= i : g(s1, . . . , si−1, ·, si+1, . . . , sk) ∈ [D(Si)]vi,

provided that g ∈ [D(S1)⊗ · · · ⊗ D(Sk)]v1⊗...⊗vk.

(ii) If for each 1 ≤ i ≤ k, Si is the Borel field on Si and µi ∈Mvi(Si) then

‖µ1 × . . .× µk‖v1⊗...⊗vk≤ ‖µ1‖v1 · . . . · ‖µk‖vk

.

In particular, µ1 × . . .× µk ∈Mv1⊗...⊗vk(σ(S1 × . . .× Sk))

6.

Proof. (i) The proof follows by finite induction with respect to k and we only providea proof for the case k = 2. More precisely, we prove the following: let (D(S), v) and(D(T), u) be Banach bases on S and T, respectively, then (D(S) ⊗ D(T), vøtimesu) isa Banach base on the product space S × T; moreover, if g ∈ [D(S) ⊗ D(T)]v⊗u, theng(s, ·) ∈ D(T) and g(·, t) ∈ D(S) for all s ∈ S, t ∈ T. To this end we verify the conditionsin Definition 1.2. It is immediate that D(S)⊗D(T) is a linear space, satisfying

CB(S× T) ⊂ D(S)⊗D(T) ⊂ F(S× T).

For the second part, one proceeds as follows: Let g ∈ [D(S)⊗D(T)]v⊗u. It follows that

supt∈T

‖g(·, t)‖v

u(t)= sup

t∈Tsups∈S

|g(s, t)|v(s) · u(t)

≤ sup(s,t)

|g(s, t)|v(s) · u(t)

= ‖g‖v⊗u < ∞. (1.30)

Thus, for t ∈ T we have ‖g(·, t)‖v ≤ ‖g‖v⊗u · u(t) < ∞ which shows that g(·, t) ∈ [D(S)]v.By symmetry, we obtain g(s, ·) ∈ [D(T)]u, for all s ∈ S.

Next, we show that [D(S)⊗D(T)]v⊗u is a Banach space with respect to v⊗u-norm. Tothis end, let gnn be a Cauchy sequence in [D(S)⊗D(T)]v⊗u. That means that for eachε > 0, there exist a rank nε ≥ 1, such that for all j, k ≥ nε it holds that ‖gj − gk‖v⊗u ≤ ε.Inserting now g = gj − gk in (1.30) one obtains for j, k ≥ nε

∀t ∈ T : ‖gj(·, t)− gk(·, t)‖v ≤ ‖gj − gk‖v⊗u · u(t) ≤ ε · u(t).

6 Here σ(S1 × . . .× Sk) denotes the σ-field generated by the product S1 × . . .× Sk.

Page 37: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

1.4. Concluding Remarks 25

Hence, for all t ∈ T, gn(·, t)n is a Cauchy sequence in the Banach space [D(S)]v, thusconvergent to some limit g(·, t) ∈ [D(S)]v. Using again a symmetry argument we deducethat g(s, ·) ∈ [D(T)]u, for all s ∈ S, and we conclude that g ∈ D(S)⊗D(T).

Finally, we show that g is the v ⊗ u-norm limit of the sequence gnn. Choose ε > 0and nε ≥ 1 such that for all j, k ≥ nε we have ‖gj − gk‖v⊗u < ε; more explicitly:

∀s ∈ S, t ∈ T : |gj(s, t)− gk(s, t)| < ε · v(s)u(t),

for all j, k ≥ nε. Letting now k →∞ in the above inequality yields

∀s ∈ S, t ∈ T : |gj(s, t)− g(s, t)| ≤ ε · v(s)u(t),

for all j ≥ nε, which is equivalent to ‖gj − g‖v⊗u ≤ ε for all j ≥ nε. Therefore, it followsthat ‖g‖v⊗u < ∞, i.e., g ∈ [D(S)⊗D(T)]v⊗u and since ε was chosen arbitrarily we concludethat limn→∞ ‖gn − g‖v⊗u = 0 which proves the claim.

(ii) To prove the second statement

‖µ× η‖v⊗u ≤ ‖µ‖v ‖η‖u, (1.31)

To this end, let µ = [µ]+ − [µ]− and η = [η]+ − [η]− be the Hahn-Jordan decompositionsof µ and η, respectively. Then

µ× η = ([µ]+ × [η]+ + [µ]− × [η]−)− ([µ]+ × [η]− + [µ]− × [η]+)

is a decomposition of µ × η and the minimality property of Hahn-Jordan decompositionensures that

[µ× η]+ ≤ [µ]+ × [η]+ + [µ]− × [η]−, [µ× η]− ≤ [µ]+ × [η]− + [µ]− × [η]+.

By adding up the above inequalities we obtain

|µ× η| ≤ |µ| × |η|.

Thus, according to (1.21) it holds that (use Fubini Theorem; see Theorem E.1 in theAppendix)

‖µ× η‖v⊗u ≤∫

(v ⊗ u)(s, z) (|µ| × |η|) (ds, dz) = ‖µ‖v ‖η‖u,

which establishes (1.31).

1.4 Concluding Remarks

When the metric space S is compact, the Riesz Representation Theorem (see, e.g, [19])asserts that the space MB of finite Radon measures on S is isometric to the topologicaldual of CB, when endowed with the supremum norm defined by (1.13). Such a result doesnot hold true in general and MB is isometric to a proper subspace of the topological dualspace (CB)∗. Nevertheless, when S is locally compact, it has been shown in [11] that MB

Page 38: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

26 1. Measure Theory and Functional Analysis

is precisely the topological dual of CB endowed with the so-called strict (compact open)topology, i.e., the locally convex topology generated by the family of semi-norms

‖f‖K = sups∈K

|f(s)|,

where K ranges over the compact subsets7 of S. Moreover, the topological dual of CB,when endowed with the supremum norm topology, is the space of Radon measures onthe Stone compactification of S; see, e.g., [41], [60]. Therefore, tightness of a family ofelements inMB is a technical condition which ensures that all the limit points in the weak-* topology are contained in MB; see Prokhorov Theorem (Theorem F.3 in the Appendix).More specifically, if P ⊂ MB is tight then the closure P of P in the weak-* topologysatisfies P ⊂MB. A standard example which illustrates this fact is the following.

Example 1.8. Let us consider the family of Dirac measures δx : x ≥ 0. Then, theclassical weak limit limx→∞ δx does not exist in MB(R). Indeed,

∀g ∈ CB : limx→∞

∫g(s)δx(ds) = lim

x→∞g(x),

but the right-hand side limit above does not exist in general. Hence, the closure of thefamily Px0 := δx : x ≥ x0 in the weak-* topology is not contained in MB(R), for anyx0 ≥ 0. This stems from the fact that the family Px0 is not tight. However, note that forv(s) = 1/s the [C]v-limit of the family δx, for x →∞, is the null measure.

[C]v-spaces appear as particular cases of weighted spaces which are introduced bymeans of the so-called Nachbin families of functions; see, e.g., [46]. A Nachbin family is,in fact, a family N of upper-semi-continuous functions which is upper directed, i.e., foreach v1, v2 ∈ N there exist α > 0 and v ∈ N such that

∀s ∈ S : maxv1(s), v2(s) ≤ αv(s).

Then, the weighted space generated by the family N is defined as the class of continuousfunctions g for which g · v is bounded for each v ∈ N and that becomes a topologicalvector space when endowed with the locally convex topology generated by the familysemi-norms

∀v ∈ N : ‖g‖v := sups∈S

v(s)|g(s)|.

Therefore, when N = α/v : α > 0, for some v ∈ C+, one recovers the definition of the[C]v-space.

Weighted spaces have received a thorough treatment in [50], [51], [57], [58]. For in-stance, a result regarding the completeness of weighted spaces has been presented in [51]and an extension of the Stone-Weierstrass Theorem to weighted spaces has been dis-cussed in [50]. Moreover, [57] addresses the problem of determining the topological dualof a weighted space. In particular, it turns out that the topological dual of a [C]v-space

7 Local compactness of S implies that the above family of semi-norms is separating.

Page 39: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

1.4. Concluding Remarks 27

includes the space Mv, which can alternatively described as the class of measures µ ∈Msuch that v · µ is a finite, where, for arbitrary µ ∈M, we define v · µ ∈M as follows:

∀s ∈ S : (v · µ)(ds) = v(s)µ(ds).

The reasoning essentially relies on the isometry between [C(S)]v and CB(S), defined by(1.20), which induces an isometry between corresponding spaces of measures. For laterreference we synthesize these observations into the following remark.

Remark 1.6. Inspired by Example 1.5 we note that [C(S)]v-convergence is equivalentto CB(Sv)-convergence, i.e., the sequence µnn is [C(S)]v-convergent to µ if and only ifv · µnn is CB(Sv)-convergent to v · µ

The most important gain of strong convergence is that the limit relation in (1.8) holdsuniformly in g ∈ [C]v, ‖g‖v ≤ 1. Nevertheless, as shown in [52], on a Cv-space weakconvergence of measures is equivalent to uniform convergence of integrals with respect toequicontinuous families of functions, i.e., relatively compact subsets K ⊂ [C]v. In general,weak differentiation is strictly weaker than strong differentiability, i.e., the weak*-topologyis strictly coarser than the norm topology; see Example 1.6.

Page 40: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES
Page 41: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

2. MEASURE-VALUED DIFFERENTIATION

This chapter is devoted to a detailed analysis of the concept of measure-valued differ-entiation and its applicability. New results will be established, by combining functionalanalytic and measure theoretical techniques and some applications will be provided.

2.1 Introduction

Measure-valued differentiation can be described in a general setting as follows: Considera family Φθ : θ ∈ Θ of linear functionals on some Banach space V , where Θ is an openconnected subset of R. For fixed θ ∈ Θ, provided that for each x ∈ V the limit

Φ′θ(x) := lim

ξ→0

Φθ+ξ(x)− Φθ(x)

ξ(2.1)

exists in R, it follows that Φ′θ is a linear operator on V . Therefore, the formal derivative

ddθ

Φθ has the following operator representation

∀x ∈ V :d

dθΦθ(x) = Φ′

θ(x).

If V is a space of functions and Φθ is an integral operator, i.e., it can be representedas the integral with respect to some measure µθ, then we obtain a sensible concept ofmeasure-valued differentiation.

Provided that the limit in (2.1) exists for each x ∈ V it follows that

limξ→0

Φθ+ξ − Φθ

ξ= Φ′

θ, (2.2)

where the above convergence holds in the weak-* topology. Therefore, following theterminology in Definition 1.1 it is natural to call the differentiability concept describedby (2.1) weak differentiability. This concept was first introduced in [47] for V = CB. Thegeneral definition for V = [D]v, is postponed to Section 2.2.1.

It is also possible to define a concept of strong differentiability by requiring that thelimit relation in (2.2) holds in a strong (norm) sense. As explained in Section 1.4, strongdifferentiability, which relies on strong convergence, allows for a more powerful analysis.Nevertheless, weak differentiability is the minimal condition for (2.1) to hold true for eachx ∈ V , which makes it attractive for applications. The aim of this chapter is to studyboth types of differentiability and their range of application. In addition, the concept ofregular differentiability will be introduced to ensure a smooth extension of the propertiesof the classical weak convergence of positive measures to signed measures. As it will turn

Page 42: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

30 2. Measure-Valued Differentiation

out, regular differentiability is a stronger property than weak differentiability, weaker thanstrong differentiability and it is fulfilled by the usual weakly differentiable distributions.

The chapter is organized as follows: In Section 2.2 the concept of measure-valued dif-ferentiation is discussed. In particular, we provide a representation of the weak derivativeof a probability measure which will be crucial for our further analysis. Weak differen-tiability of product measures is treated in Section 2.3 while in Section 2.4 we investigatethe relation between weak differentiability and set-wise differentiation. Eventually, inSection 2.5 we illustrate by means of two examples how weak derivatives lead to gradientestimators for some common applications.

2.2 The Concept of Measure-Valued Differentiation

In what follows we assume that (D, v) is a Banach base and µθ : θ ∈ Θ ⊂ Mv(S) is afamily of (signed) measures, where Θ is an open connected subset of R. In Section 2.2.1we define and study several types of measure-valued differentiation and in Section 2.2.2 wediscuss convenient representations of weak derivatives of probability measures. Eventually,in Section 2.2.3 we establish some results which prove to be useful when assessing weakdifferentiability and computing weak derivatives. We illustrate the results by severalexamples of weakly differentiable (usual) distributions.

2.2.1 Weak, Strong and Regular Differentiability

We now define the concept of weak differentiability.

Definition 2.1. Let (D, v) be a Banach base on S. We say that the mapping µ∗ : Θ →Mv

is weakly [D]v-differentiable at θ or, µθ is weakly differentiable for short, if there existsµ′θ ∈Mv such that

∀g ∈ [D]v : limξ→0

1

ξ

(∫g(s)µθ+ξ(ds)−

∫g(s)µθ(ds)

)=

∫g(s)µ′θ(ds). (2.3)

Consequently, we call µ′θ the weak derivative1 of µθ. If the left-hand side of the aboveequation equals zero for all g ∈ [D]v, then we say that the weak derivative of µθ is notsignificant.

In addition, we say that µ∗ is weakly [D]v-differentiable if µθ is weakly [D]v-differentiablefor each θ ∈ Θ and we denote by µ′∗ the mapping

∀θ ∈ Θ : µ′∗(θ) = µ′θ.

Remark 2.1. As mentioned before, differentiability of probability measures in the senseof Definition 2.1 was originally introduced for [D]v = CB in [47] and received a thoroughtreatment in [48]. Other early traces are [39] and [40]. In [31], this concept is extended togeneral [D]v-differentiability and it is shown that [D]v-derivatives yield efficient unbiasedgradient estimators. A recent result in this line of research shows that this class of gradientestimators can outperform single-run estimators such as those provided by‘ infinitesimalperturbation analysis; see [33].

1 Note that a weak derivative is unique in the sense specified by Lemma 1.2.

Page 43: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

2.2. The Concept of Measure-Valued Differentiation 31

Note that in Definition 2.1 equation (2.3) is equivalent to

µθ+ξ − µθ

ξ

[D]v=⇒ µ′θ, (2.4)

i.e., (µθ+ξ − µθ)/ξ converges weakly, in [D]v sense, to µ′θ. Consequently, we say that µθ isregularly [D]v-differentiable (shortly: regularly differentiable) if the convergence in (2.4) isregular and we say that µθ is strongly [D]v-differentiable (shortly: strongly differentiable)if the convergence in (2.4) holds in the strong (v-norm) sense, i.e.,

limξ→0

∥∥∥∥µθ+ξ − µθ

ξ− µ′θ

∥∥∥∥v

= 0 (2.5)

Strong differentiability implies weak differentiability since (2.5) implies that (2.3) holdstrue for each g ∈ [D]v. However, strong differentiability is a more powerful tool since itimplies that (2.3) holds true uniformly with respect to g ∈ [D]v, with ‖g‖v ≤ 1. Indeed,(2.5) is equivalent to

limξ→0

sup‖g‖v≤1

∣∣∣∣1

ξ

(∫g(s)µθ+ξ(ds)−

∫g(s)µθ(ds)

)−

∫g(s)µ′θ(ds)

∣∣∣∣ = 0.

However, the converse is not true since, in general, there exist weakly differentiable dis-tributions which are not strongly differentiable as will be illustrated by an example; seeExample 2.6 later on in this section. Moreover, by Theorem 1.2 (ii) we conclude thatregular differentiability is equivalent to

limξ→0

∥∥∥∥µθ+ξ − µθ

ξ

∥∥∥∥v

= ‖µ′θ‖v. (2.6)

and strong differentiability implies regular differentiability which, by definition, impliesweak differentiability.

We continue our analysis by presenting two results which will establish connectionsbetween the three types of convergence/differentiability on Mv.

The first result will show that weak differentiability implies strong continuity. Thisresult will be particulary useful in Chapter 3 when we establish strong bounds on pertur-bations. The precise statement is as follows.

Theorem 2.1. Let µ∗ : Θ →Mv be a [D]v-continuous measure-valued mapping such thatµθ is [D]v-differentiable. Then for each closed neighborhood V of 0, such that θ + ξ ∈ Θfor each ξ ∈ V there exists some M > 0 such that

∀ξ ∈ V : ‖µθ+ξ − µθ‖v ≤ M |ξ|.In words, µθ is v-norm continuous.

Proof. For ξ such that θ + ξ ∈ Θ let us define the measure-valued mapping

µξ =

(µθ+ξ − µθ)/ξ, ξ 6= 0;µ′θ, ξ = 0.

By hypothesis, µ∗ is [D]v-continuous on V and Theorem 1.2 (i) concludes the proof.

Page 44: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

32 2. Measure-Valued Differentiation

In general, checking strong and regular differentiability, as defined by (2.5) and (2.6),respectively, might be a very demanding task and it is desirable to have easily verifiablesufficient conditions instead. In the following, we express such sufficiency conditions bymeans of continuity of the weak derivative mapping µ′∗. More specifically, the followingresult shows that, provided that µ∗ is weakly differentiable, strong (resp. regular) con-tinuity of µ′∗ at θ implies strong (resp. regular) differentiability of µ∗, at θ. The precisestatement is as follows.

Theorem 2.2. Let µ∗ : Θ →Mv be weakly [D]v-differentiable.

(i) If µ′∗ is strongly continuous at θ, then µθ is strongly differentiable.

(ii) If µ′∗ is regularly continuous at θ, then µθ is regularly differentiable.

Proof. Applying the Mean Value Theorem to the mapping θ 7→ ∫g(s)µθ(ds) yields

∀g ∈ [D]v :

∫g(s)µθ+ξ(ds)−

∫g(s)µθ(ds) = ξ

∫g(s)µ′θ+ξg

(ds), (2.7)

for some ξg depending on g and satisfying 0 < |ξg| < |ξ|.(i) Let ε > 0 be arbitrary and choose ζ > 0 such that

∀ξ ∈ (−ζ, ζ) : ‖µ′θ+ξ − µ′θ‖v < ε.

Hence, for all g ∈ [D]v satisfying ‖g‖v ≤ 1 and ξ ∈ (−ζ, ζ) it holds that∣∣∣∣∫

g(s)(µθ+ξ − µθ − ξ · µ′θ)(ds)

∣∣∣∣ = |ξ| ·∣∣∣∣∫

g(s)(µ′θ+ξg− µ′θ)(ds)

∣∣∣∣≤ |ξ| · ‖µ′θ+ξg

− µ′θ‖v ≤ ε|ξ|.Taking the supremmum with respect to ‖g‖v ≤ 1 in the above inequality we concludethat

‖µθ+ξ − µθ − ξ · µ′θ‖v ≤ ε|ξ|.Since ε was arbitrary, dividing both sides in the above inequality by |ξ| and letting ξ → 0,proves the claim.

(ii) By hypothesis, the mapping ‖µ′∗‖v is continuous at θ and for arbitrary ε > 0 chooseζ > 0 such that

∀ξ ∈ (−ζ, ζ) : ‖µ′θ+ξ‖v ≤ ‖µ′θ‖v + ε.

Therefore, from (2.7) we conclude that

∀ξ ∈ (−ζ, ζ) :

∥∥∥∥µθ+ξ − µθ

ξ

∥∥∥∥v

≤ ‖µ′θ‖v + ε. (2.8)

Since ε was arbitrary chosen, letting ξ → 0 in (2.8) yields

lim supξ→0

∥∥∥∥µθ+ξ − µθ

ξ

∥∥∥∥v

≤ ‖µ′θ‖v.

Now, in accordance with (2.6), Remark 1.5 concludes the proof.

Page 45: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

2.2. The Concept of Measure-Valued Differentiation 33

Basic Rules of Weak Differentiation

In the following we discuss some basic rules of weak differentiation. More specifically,we are interested under which transformations weak differentiability of a measure-valuedmapping is preserved. To this end, recall that if λ ∈ M and f ∈ F we define f · λ ∈ Mas follows:

∀s ∈ S : µ(ds) := f(s)λ(ds).

Note that, if λ ∈Mv and f ∈ [F ]v, for some v ∈ C+, then f ·λ is finite and if f is a constantfunction then we recover the scalar multiplication on the space of measures. The followingtwo results are useful in applications. The first result shows that [D]v-differentiation actsas a linear operator.

Lemma 2.1. If µθ and ηθ are [D]v-differentiable then any linear combination α·µθ+β ·ηθ,with α, β ∈ R, is [D]v-differentiable and it holds that

∀α, β ∈ R : (α · µθ + β · ηθ)′ = α · µ′θ + β · η′θ.

Proof. Basic properties of classical derivatives show that

d

∫g(s)(α · µθ + β · ηθ)(ds) = α · d

∫g(s)µθ(ds) + β · d

∫g(s)ηθ(ds)

= α

∫g(s)µ′θ(ds) + β

∫g(s)η′θ(ds)

=

∫g(s)(α · µ′θ + β · η′θ)(ds),

holds true for any g ∈ [D]v. Therefore, Lemma 1.2 concludes the proof.

Let v, ϑ ∈ C+. The next result provides sufficient conditions such that [D]ϑ-differentiabilityof a measure λθ implies [D]v-differentiability of the re-scaled measure fθλθ.

Lemma 2.2. Let λ∗ : Θ → Mϑ be a measure-valued mapping and consider a family ofmeasurable functions h and fθ, for θ ∈ Θ, such that h · v ∈ [F ]ϑ and fθ · v ∈ [F ]ϑ, foreach θ ∈ Θ, for some v ∈ C+. Assume further that the derivative f ′θ(s) := d

dθfθ(s) exists

for each s ∈ S and satisfies

∀s ∈ S : supθ∈Θ

|f ′θ(s)| ≤ h(s).

If µθ := fθ · λθ, for θ ∈ Θ, then we have:

(i) If λθ is [F ]ϑ-differentiable then µθ is [F ]v-differentiable and it holds that

µ′θ = f ′θ · λθ + fθ · λ′θ. (2.9)

(ii) If fθ ∈ C and λθ is [C]ϑ-differentiable then µθ is [C]v-differentiable and (2.9) holdstrue.

Page 46: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

34 2. Measure-Valued Differentiation

(iii) If λθ = λ, for each θ ∈ Θ, then the conditions of the lemma can be relaxed toh · v ∈ L1(λ) and fθ · v ∈ L1(λ) and

µ′θ = f ′θ · λ.

Proof. (i) The conclusion is equivalent to

µθ+ξ − µθ

ξ

[F ]v=⇒ f ′θ · λθ + fθ · λ′θ,

for ξ → 0. Moreover, simple algebra shows that

µθ+ξ − µθ

ξ=

fθ+ξ − fθ

ξ· λθ + fθ · λθ+ξ − λθ

ξ+ (fθ+ξ − fθ) · λθ+ξ − λθ

ξ. (2.10)

We start by analyzing the first term in (2.10). According to the the Dominated Conver-gence Theorem,

∀g ∈ [F ]v : limξ→0

∫g(s)

fθ+ξ(s)− fθ(s)

ξλθ(ds) =

∫g(s)f ′θ(s)λθ(ds). (2.11)

Indeed, note that the integrand in the left-hand side satisfies

∀s ∈ S : limξ→0

g(s)fθ+ξ(s)− fθ(s)

ξ= g(s)f ′θ(s),

and by the Mean Value Theorem we have

∀ξ :

∣∣∣∣g(s)fθ+ξ(s)− fθ(s)

ξ

∣∣∣∣ ≤ |g(s)|h(s) ≤ ‖g‖v v(s)h(s).

We turn now to the second term in (2.10). Since fθ · v ∈ [F ]ϑ, we conclude that

∀g ∈ [F ]v : limξ→0

∫g(s)fθ(s)

(λθ+ξ − λθ

ξ

)(ds) =

∫g(s)fθ(s)λ

′θ(ds). (2.12)

Finally, for arbitrary ξ and g ∈ [F ]v we have∣∣∣∣∫

g(s)fθ+ξ(s)− fθ(s)

ξ(λθ+ξ − λθ)(ds)

∣∣∣∣ ≤ ‖g‖v

∫v(s)h(s)|λθ+ξ − λθ|(ds)

≤ ‖g‖v ‖h · v‖ϑ ‖λθ+ξ − λθ‖ϑ.

Therefore, since h · v ∈ [F ]ϑ, by letting ξ → 0 in the above inequality, we conclude fromTheorem 2.1 that

∀g ∈ [F ]v : limξ→0

∫g(s)

fθ+ξ(s)− fθ(s)

ξ(λθ+ξ − λθ)(ds) = 0, (2.13)

which together with (2.11) and (2.12), concludes the proof.(ii) If fθ is continuous it follows that for any g ∈ [C]v we have g · fθ ∈ [C]ϑ and,

consequently, (2.12) holds true.(iii) The proof is similar to that of the first part and where we take into account that

the expressions the left-hand side of (2.12) and (2.13) vanish.

Remark 2.2. The statement in Lemma 2.2 admits several variations which would makethe result stronger. For instance, the condition “the derivative f ′θ(s) exists for each s ∈ S”can be replaced by “both the right and the left-sided derivatives exist for each s ∈ S andthe derivative f ′θ(s) exists µθ-a.e.”

Page 47: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

2.2. The Concept of Measure-Valued Differentiation 35

Higher-order Differentiation

We conclude this section by discussing higher-order differentiation. To this end, note that(2.3) in Definition 2.1 is equivalent to

∀g ∈ [D]v :d

∫g(s)µθ(ds) =

∫g(s)µ′θ(ds),

i.e., one can interchange integration with differentiation. In the same vein one can intro-duce higher-order differentiation. More specifically, for n ≥ 1 we say that µθ in n-timesweakly differentiable if there exist µ

(n)θ ∈Mv such that

∀g ∈ [D]v :dn

dθn

∫g(s)µθ(ds) =

∫g(s)µ

(n)θ (ds). (2.14)

Remark 2.3. Note that, just like in conventional analysis, higher-order derivatives satisfy

∀0 ≤ j ≤ n− 1 :(µ

(j)θ

)′= µ

(j+1)θ ,

provided that µθ is n times weakly differentiable. Indeed, since µθ is (j + 1) times weaklydifferentiable it follows that

∀g ∈ [D]v :

∫g(s)µ

(j+1)θ (ds) =

dj+1

dθj+1

∫g(s)µθ(ds)

=d

(dj

dθj

∫g(s)µθ(ds)

)

=d

∫g(s)µ

(j)θ (ds).

Therefore, the measures(µ

(j)θ

)′and µ

(j+1)θ agree when considering integrands from [D]v

and by Lemma 1.2 we conclude that they are equal in the sense of Remark 1.2.

2.2.2 Representation of the Weak Derivatives

In general, weak derivatives are abstract objects (that is, signed measures). For instance,if µ∗ : Θ → M1, i.e., µθ is a probability measure for each θ ∈ Θ then there exists some(abstract) measurable space (Ω,K) and some measurable mapping (random variable)X : Ω → S such that for all θ ∈ Θ we have

∀g ∈ [D]v :

∫g(s)µθ(ds) =

∫g(X(ω))Pθ(dω),

where Pθ is a probability measure on (Ω,K) satisfying

∀A ∈ S : Pθ(X ∈ A) = µθ(A), (2.15)

i.e., X is a random variable distributed according to µθ. It follows that for each θ ∈ Θwe have the following representation

∀g ∈ [D]v :

∫g(s)µθ(ds) = Eθ[g(X)], (2.16)

Page 48: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

36 2. Measure-Valued Differentiation

where Eθ denotes the expectation operator on the probability field (Ω,K,Pθ). Moreover,the representation in (2.16) is valid whenever (2.15) holds true. Inspired by the aboveremarks, we give the following definition:

Definition 2.2. Let µi,∗ : Θ → M1(Si), for i ∈ I, be an arbitrary family of measure-valued mappings. We say that Eθ is an expectation operator consistent with Xi ∼ µi,θ, foreach i ∈ I, if there exists some probability field (Ω,K), on which random variables Xi aredefined, for each i ∈ I, and there exists2 a family of probability measures Pθ : θ ∈ Θ on(Ω,K) satisfying

∀θ ∈ Θ, i ∈ I, A ∈ S : Pθ(Xi ∈ A) = µi,θ(A)

and for each θ ∈ Θ, Eθ coincides with the expectation operator on (Ω,K,Pθ).

Therefore, weak differentiability of µθ provides the means of evaluating the derivativesof the expression Eθ[g(X)], for g ∈ [D]v, provided that Eθ is an expectation operatorconsistent with X ∼ µθ. Note that the derivative of the right-hand side in (2.16) satisfies

∀g ∈ [D]v :

∫g(s)µ

(n)θ (ds) =

dn

dθnEθ[g(X)]

but does not admit a representation as in (2.16) since µ(n)θ fails to be a probability mea-

sure. Fortunately, if µ(n)θ is a finite measure, a convenient representation for higher-order

derivatives of probability measures in terms of random variables is possible via the Hahn-Jordan decomposition. This is useful in applications as it provides unbiased gradientestimators for Eθ[g(X)].

For technical reasons we distinguish between the case

infv(s) : s ∈ S > 0

which we will call the standard case and the case

infv(s) : s ∈ S = 0

which will be referred to as the non-standard case.

The Standard Case

Note that, if (D, v) is a Banach base and infv(s) : s ∈ S > 0 it holds that

CB ⊂ [C]v ⊂ [D]v.

For fixed n ≥ 1, letting g = IS in (2.14) yields µ(n)θ (S) = 0. Let

µ(n)θ =

(n)θ

]+

−[µ

(n)θ

]−

be the Hahn-Jordan decomposition of µ(n)θ . It follows that

(n)θ

]+

(S) =[µ

(n)θ

]−(S), (2.17)

2 It can be shown that such objects always exist!

Page 49: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

2.2. The Concept of Measure-Valued Differentiation 37

provided that µ(n)θ is a finite measure. Denoting by c

(n)θ the common value in (2.17) one

can represent the nth-order derivative µ(n)θ as follows

µ(n)θ = c

(n)θ

(n+)θ − µ

(n−)θ

), (2.18)

where c(n)θ > 0 (if the nth derivative is significant) and µ

(n±)θ ∈ M1. Therefore, provided

that Eθ is an expectation operator consistent with X ∼ µθ and X(n±) ∼ µ(n±)θ , for n ≥ 1,

respectively, we have

∀n ≥ 1 :dn

dθnEθ[g(X)] = c

(n)θ Eθ

[g

(X(n+)

)− g(X(n−)

)]. (2.19)

Note that a representation as in (2.18) is not unique. However, the representation provided

by the Hahn-Jordan decomposition has the property that it minimizes the constant c(n)θ

and we call it the orthogonal representation.Therefore, one can identify the weak derivative µ

(n)θ with any triple

(c(n)θ , µ

(n+)θ , µ

(n−)θ

)∈ R×M1 ×M1,

satisfying equation (2.18). This fact will be exploited in the following. For ease of writing,

for n = 1, i.e., µ(n)θ = µ′θ, we use the simplified notation

(cθ, µ

+θ , µ−θ

).

The Non-Standard Case

In the non-standard case, we drop the assumption infv(s) : s ∈ S > 0, so we allow vto take very small values (close to, or even 0). However, we may assume without loss ofgenerality that

∀θ ∈ Θ : µθ(S \ Sv) = 0,

since within our theory we consider the trace of µθ on Sv. Unfortunately, in this caseIS /∈ [D]v and a representation as in (2.18) can not be obtained in a straightforward way.

Example 2.1. Let v(s) = 1/s, for s > 0, and consider the family

∀θ ∈ [0, 1] : µθ :=

(1− θ) · µ + θ · δ1/θ, θ ∈ (0, 1],µ, θ = 0,

for some µ ∈ M1v. Then µ∗ is weakly CB-continuous at θ = 0 but fails to be CB-

differentiable since the family

µξ − µ0

ξ: ξ > 0

=

δ1/ξ − µ : ξ > 0

is not tight and, consequently, the limit limξ→0(δ1/ξ − µ) does not exist in MB; see Ex-ample 1.8. However, it turns out that µ∗ is Cv-differentiable at θ = 0 since

∀g ∈ Cv : limξ↓0

1

ξ

(∫g(s)µξ(ds)−

∫g(s)µ0(ds)

)= lim

ξ↓0g

(1

ξ

)−

∫g(s)µ(ds),

Page 50: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

38 2. Measure-Valued Differentiation

which yields µ′0 = −µ. Therefore, a representation as in (2.18) is not possible for µ′0. Inaddition, note that µ∗ is strongly Cv-differentiable. Indeed, we have

limξ↓0

∥∥∥∥µξ − µ0

ξ− µ′0

∥∥∥∥v

= limξ↓0

‖δ1/ξ − µ + µ‖v = limξ↓0

ξ = 0.

Note that a representation as in (2.18) holds true whenever µθ is CB(Sv)-differentiable.The following result shows that the representation in (2.18) is still possible, under slightlyless restrictive conditions.

Lemma 2.3. Let µ∗ : Θ → Mv be [D]v-differentiable at θ, such that µθ(Sv) is constantwith respect to θ. If there exists a neighborhood V of 0 such that the family

µθ+ξ − µθ

ξ: ξ ∈ V \ 0

is tight then it holds that µ′θ(Sv) = 0.

Proof. Let us define the sequence fn : V \ 0 → R, for n ≥ 1, as follows:

∀n ≥ 1, ξ ∈ V \ 0 : fn(ξ) :=

∫min1, n · v(s)

(µθ+ξ − µθ

ξ

)(ds).

Formally, our statement is equivalent to

µ′θ(Sv) = limn→∞

limξ→0

fn(ξ) = limξ→0

limn→∞

fn(ξ) = 0. (2.20)

In the following we show that the sequence fnn satisfies the conditions of Theorem B.2(see the Appendix) to prove that interchanging limit operations in (2.20) is justified.

First, note that [D]v-differentiability of µθ implies that

∀n ≥ 1 : limξ→0

fn(ξ) = Ln :=

∫min1, n · v(s) µ′θ(ds), (2.21)

since, for n ≥ 1, the mapping s 7→ min1, n · v(s) is continuous and has finite v-norm;hence, belongs, by assumption, to [D]v.

On the other hand, we have

∀ξ ∈ V \ 0 : limn→∞

fn(ξ) =

(µθ+ξ − µθ

ξ

)(Sv) = 0.

Moreover, by hypothesis, for each ε > 0 there exists some compact set Kε ⊂ Sv such that

∀ξ ∈ V \ 0 :

∣∣∣∣µθ+ξ − µθ

ξ

∣∣∣∣ (Sv \Kε) < ε. (2.22)

Since v is continuous it follows that 1/v is bounded on Kε, i.e.,

M := sups∈Kε

1

v(s)< ∞.

Page 51: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

2.2. The Concept of Measure-Valued Differentiation 39

Choosing now some nε ≥ M it follows that the following inclusion holds true:

s : nε · v(s) < 1 ⊂ s : M · v(s) < 1 ⊂ Sv \Kε.

Therefore, for each n ≥ nε and ξ ∈ V \ 0 it holds that

|fn(ξ)| ≤∫|1−min1, n · v(s)|

∣∣∣∣µθ+ξ − µθ

ξ

∣∣∣∣ (ds)

≤∫Is: nε·v(s)<1(s)

∣∣∣∣µθ+ξ − µθ

ξ

∣∣∣∣ (ds)

=

∣∣∣∣µθ+ξ − µθ

ξ

∣∣∣∣ (s : nε · v(s) < 1) ≤∣∣∣∣µθ+ξ − µθ

ξ

∣∣∣∣ (Sv \Kε)

and by (2.22) we conclude that the sequence fnn converges to 0, uniformly with respectto ξ ∈ V \ 0, i.e., for each ε > 0 there exists nε ≥ 1 such that

∀n ≥ nε, ξ ∈ V \ 0 : |fn(ξ)| < ε.

Applying now Theorem C.2 to the sequence fnn concludes the proof.

Note that, if in Lemma 2.3 µ∗ is regularly [D]v-differentiable at θ then the conclusion isimmediate. Indeed, by part (ii) of Theorem 1.1 we conclude that µ∗ is regularly CB(Sv)-differentiable at θ. The following representation result for the weak derivatives in thenon-standard case is a consequence of Lemma 2.3.

Corollary 2.1. Let µ∗ : Θ →M1v be n times [D]v-differentiable at θ, for some n ≥ 1 and

let k be such that 1 ≤ k ≤ n. If there exists a neighborhood Vk of 0 such that the family

µ

(k−1)θ+ξ − µ

(k−1)θ

ξ: ξ ∈ Vk \ 0

is tight then the kth-order derivative µ(k)θ admits a representation as in (2.18).

Proof. For n = 1 the proof follows from Lemma 2.3 by taking into account that µθ(Sv) = 1,for each θ ∈ Θ. Indeed, it follows that µ′θ is a finite measure such that µ′θ(Sv) = 0 and,consequently, admits a representation as in (2.18).

By finite induction, for n ≥ 2, one can apply (again) Lemma 2.3 to µ(k−1)∗ which, by

Remark 2.3, is [D]v-differentiable at θ and satisfies µ(k−1)θ (Sv) = 0, for each θ for which

the derivative exists.

Therefore, we conclude that the “triple” representation of the weak derivatives in thenon-standard case is still possible, under some additional conditions.

Page 52: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

40 2. Measure-Valued Differentiation

2.2.3 Computation of Weak Derivatives and Examples

We start with the following remark.

Remark 2.4. It is worth noting that, in principle, weak derivatives can be computed ina straightforward way if it holds that

∀θ ∈ Θ : µθ(ds) = fθ(s) · λ(ds),

i.e., if µθ has a density fθ with respect to some λ ∈M. Indeed, by part (ii) of Lemma 2.2we have

∀g ∈ [D]v :dn

dθn

∫g(s)fθ(s)λ(ds) =

∫g(s)

dn

dθnfθ(s)λ(ds), (2.23)

provided that fθ(s) is n-times differentiable at θ, for all s ∈ S, and interchanging differ-entiation and integral is justified. Hence, we have

µ(n)θ (ds) =

dn

dθnfθ(s) · λ(ds),

and a weak derivative can be easily computed by considering the positive and the negativeparts of dn

dθn fθ(s), i.e.,

(n)θ

]+

(ds) =

(dn

dθnfθ(s)

)

+

λ(ds),[µ

(n)θ

]−(ds) =

(dn

dθnfθ(s)

)

−λ(ds),

where, for a ∈ R, we set a+ := maxa, 0 and a− := max−a, 0 = a+ − a.

We illustrate the concept of weak differentiation with a few families of measures thatare of importance in applications. More examples can be found in Section H of the Ap-pendix. For ease of exposition we agree on the following notations to be used throughoutthis thesis: Let ` denote the Lebesgue measure on S = Rn, for some n ≥ 1, and forarbitrary A ∈ S we denote by UA the uniform distribution on A, i.e.,

∀x ∈ S : UA(dx) :=1

`(A)IA(x)dx.

Example 2.2. Let µ ∈ Mv. If µθ = µ, for all θ ∈ Θ, then µθ is obviously weakly[F ]v-differentiable since

∀g ∈ [F ]v :d

∫g(s)µθ(ds) = 0.

In this case the weak derivative is not significant and we set µ′θ = (1, µ, µ).

Example 2.3. The Dirac distribution δθ, for θ ∈ [a, b] ⊂ R, fails to be weakly [D]v-differentiable for any sensible set D. Indeed,

∫g(x)δθ(dx) = g(θ) is differentiable at θ

only if g is differentiable at θ. This however would impose quite strong restrictions on theperformance measures to be analyzed.

Nevertheless, the mapping δ∗ is weakly [C]v-continuous for any v ∈ C+ and it is stronglycontinuous at θ only if v(θ) = 0. Therefore, the Dirac distribution δθ is a standard exampleof distribution which is weakly continuous everywhere but nowhere weakly differentiable.

Page 53: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

2.2. The Concept of Measure-Valued Differentiation 41

Example 2.4. Let S = x1, x2, with the discrete topology and for θ ∈ [0, 1] let us consider

βθ = (1− θ) · δx1 + θ · δx2 ,

i.e., the Bernoulli distribution with mass points x1, x2 and probability weights 1−θ, θ,respectively. To avoid trivialities we assume x1 6= x2. Then it holds that

∀g ∈ F :d

∫g(x)βθ(dx) =

d

((1− θ)g(x1) + θg(x2)

)= g(x2)− g(x1) .

This means that βθ is weakly [F ]v-differentiable, for any v ∈ C+ and

β′θ = δx2 − δx1 ,

so that the weak derivative can be represented as β′θ = (1, δx2 , δx1). In addition, by Theo-rem 2.2 (i) it follows that βθ is strongly differentiable.

Furthermore, as it can be easily seen, higher-order derivatives exist but are not signif-icant in this situation and we set β

(n)θ = (1, βθ, βθ), for n ≥ 2.

Example 2.5. Let S = [0,∞) with the usual topology, Θ = (a, b), for 0 < a < b < ∞,and choose µθ(dx) = θ exp(−θx) · `(dx), i.e., µθ denotes the exponential distribution withrate θ. Moreover, if vp(x) = 1 + xp, for some p ≥ 0, then µθ is weakly [F ]vp-differentiableand its derivative satisfies

µ′θ(dx) = (1− θx) exp(−θx)`(dx).

In addition, µθ is n-times [F ]vp-differentiable, for all n ≥ 1, and higher-order derivativescan be computed in the same way, by differentiating the density

fθ(x) = θ exp(−θx)

in the classical sense. Consequently, for each n ≥ 1 we obtain

µ(n)θ (dx) = (−1)nxn−1(θx− n) exp(−θx)`(dx)

and an orthogonal representation can be obtained as explained in Remark 2.4. To see that,we show that the conditions of Lemma 2.2 are fulfilled. Indeed, note that fθ · vp ∈ L1(`),for each θ ∈ (a, b) and p ≥ 0, and for n ≥ 0 we have

∀θ ∈ (a, b), x ≥ 0 : |xn−1(θx− n) exp(−θx)| ≤ xn−1(θx + n) exp(−θx).

Therefore, if for n ≥ 0 we set

∀x ≥ 0 : hn(x) := xn−1(bx + n) exp(−ax)

it follows that for each n ≥ 0 we have

∀x ≥ 0 : supθ∈Θ

∣∣∣∣dn

dθnfθ(x)

∣∣∣∣ ≤ hn(x)

Page 54: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

42 2. Measure-Valued Differentiation

and hn · vp ∈ L1(`), for each p ≥ 0, and part (ii) of Lemma 2.2 concludes the proof.

Furthermore, one can easily check that µ(n+1)∗ is strongly continuous on Θ, for n ≥ 1, and it

follows by Theorem 2.2 that µθ is n times strongly (in particular, regularly) differentiable,for each n ≥ 1.

Finally, if we denote by εn,θ the Erlang distribution with parameters n, θ , i.e., theconvolution3 of n exponential distributions with rate θ, then we have

µ(n)θ =

( n!

θn , εn,θ, εn+1,θ), if n is odd,( n!

θn , εn+1,θ, εn,θ), if n is even, n ≥ 1,

which yields another representation for the higher-order derivatives of µθ, which is moreconvenient for applications.

Example 2.6. Let S = [0,∞), and denote by ψθ the uniform distribution on the interval[0, θ), i.e., ψθ = U[0,θ), for θ ∈ (0, b), with b > 0. Note that, one can extend the measure-valued mapping ψ∗ in 0, by setting ψ0 = δ0. It turns out that ψ∗ is weakly continuousat 0 and it is strongly continuous at 0 only if v(0) = 0. Therefore, by Theorem 2.1 weconclude that, in general, ψ∗ is not weakly differentiable at θ = 0.

Take D as the set C(S). Since the density θ−1I[0,θ)(x) is not differentiable (not evencontinuous) with respect to θ, Lemma 2.2 does not apply in this situation and we calculatethe weak derivative ψ′θ, for θ > 0, by definition. For each g continuous at θ, we have

∫g(s)ψ′θ(ds) = lim

ξ→0

1

ξ

(1

θ + ξ

∫ θ+ξ

0

g(s)ds− 1

θ

∫ θ

0

g(s)ds

),

which yields

∀g ∈ C :

∫g(s)ψ′θ(ds) =

1

θg(θ)− 1

θ2

∫ θ

0

g(s)ds.

Hence, ψθ is weakly [C]v-differentiable, for any v ∈ C+ and

ψ′θ =1

θδθ − 1

θψθ,

or in triplet representation ψ′θ = (θ−1, δθ, ψθ).It follows from Theorem 2.2 and Example 2.3 that ψθ is regularly differentiable and it

is strongly differentiable at θ only if v(θ) = 0. Indeed, one can check that

limξ→0

∥∥∥∥ψθ+ξ − ψθ

ξ− ψ′θ

∥∥∥∥v

= 2v(θ).

Higher-order derivatives of ψθ do not exist. This stems form the fact that the Diracmeasure δθ fails to be weakly differentiable; see Example 2.3.

The following example is rather technical and is intended to show that, in general,weak differentiability does not imply regular differentiability.

3 Note that ε1,θ = µθ.

Page 55: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

2.2. The Concept of Measure-Valued Differentiation 43

Example 2.7. Let ψθ denote the uniform distribution on [0, θ), introduced in Example 2.6and consider the following family of distributions

∀θ ∈ [0, 1] : φθ = ψ1 + θ · (δ0 − ψθ),

where, by convention, ψ0 = δ0. Note that, for each θ ∈ [0, 1], φθ is a probability measureand by Lemma 2.2 φθ is weakly [C]v-differentiable, for any v ∈ C+. Indeed, we have

∀θ > 0 : φ′θ = (δ0 − ψθ)− θ · ψ′θ = (δ0 − ψθ)− (δθ − ψθ) = δ0 − δθ.

Furthermore, φθ has a right-hand side weak derivative at θ = 0, which equals the nullmeasure ∅, but fails to be regularly differentiable since

limξ↓0

∥∥∥∥φξ − φ0

ξ

∥∥∥∥v

= limξ↓0

‖δ0 − ψξ‖v = 2v(0) 6= 0 = ‖∅‖v,

provided that v(0) > 0.

Truncated Distributions

We conclude this section by treating a special class of weakly differentiable distributions.Truncated distributions play an important role in our analysis as they are typical examplesof weakly, but not strongly, differentiable distributions. In particular, it will turn out thatthe uniform distribution presented in Example 2.6 belongs to this class.

Let X be a real-valued random variable and let −∞ ≤ a < b ≤ ∞ be such thatP(a < X < b) > 0. By a truncation µ|(a,b) of the distribution µ of X we mean theconditional distribution of X on the event a < X < b. In formula:

∀A : µ|(a,b)(A) :=µ(A ∩ (a, b))

µ((a, b))=P(A ∩ a < X < b)P(a < X < b) .

If X (resp. µ) has a probability density ρ, then the mapping

∀x ∈ R : f(x) :=ρ(x)∫ b

aρ(s)ds

· I(a,b)(x) (2.24)

is the probability density of the truncated distribution µ|(a,b).Truncated distributions arise naturally in applications. Indeed, consider a constant

a > 0 modeling a traveling time in a transportation network. It is quite common to add anormally distributed noise, say Z, to a in order to model some intrinsic randomness; see[30]. Since, for practical reasons, it is important to ensure that P(a+Z < 0) = 0 (so thattraveling times stay larger than zero), one considers a truncated version of Z. In otherwords, the distribution of a + Z is conditioned on the event a + Z > θ for θ > 0 small.

Note that f as defined by (2.24) is still a probability density if we only require that ρis a non-negative integrable function on (a, b), i.e., µ is a locally finite measure and notnecessarily a probability measure on R. For instance, in some models one can observe thatsome random variable takes values within some given interval, but its distribution densityis proportional to a certain function which is not integrable on that interval. Therefore,one can obtain a truncated distribution out of any locally finite measure by an appropriatere-scaling (see, e.g., Pareto, uniform), as the following example illustrates.

Page 56: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

44 2. Measure-Valued Differentiation

Example 2.8. In the following we provide several examples.

(i) Letting ρ(x) = x, a = 0 and b < ∞ in (2.24) one recovers the uniform distribution on(0, b), cf. Example 2.6.

(ii) Letting ρ(x) = x−(β+1), for some β > 0, a > 0 and b = ∞ in (2.24) one obtains thePareto distribution with density

f(x) = βaβx−(β+1)I(a,∞)(x).

(iii) For ρ(x) = e−λx, for some λ > 0, and b = ∞ one obtains the shifted exponentialdistribution with density

f(x) = λe−λ(x−a)I(a,∞)(x).

In the setting of this section, the truncated density (2.24) is considered with a = θand a < b ≤ ∞; more formally, a parametric family of left-side truncated distributionsµ|(θ,b) is introduced with density given by

fθ(x) =ρ(x)∫ b

θρ(x)dx

I(θ,b)(x). (2.25)

The remainder of this section is devoted to computation of the weak derivative of a left-sidetruncated distribution µ|(θ,b) generated by a density ρ, i.e., µ|(θ,b) has a Lebesgue densityfθ given by (2.25). In words, we are interested in the sensitivity of µ|(θ,b) with respect tothe point of truncation θ. To this end, let v ∈ C+(R) be such that

∫v(x)ρ(x)dx < ∞,

i.e., v ∈ L1(µ|(θ,b)), for any θ. Using standard computations we obtain

∀g ∈ [C]v :d

∫∞θ

g(x)ρ(x)dx∫∞θ

ρ(x)dx=

ρ(θ)∫∞

θg(x)ρ(x)dx(∫∞

θρ(x)dx

)2 − g(θ)ρ(θ)∫∞θ

ρ(x)dx

=ρ(θ)∫∞

θρ(x)dx

(∫g(x)µθ(dx)−

∫g(x)δθ(dx)

),

provided that ρ is continuous at θ. Hence, one can represent the derivative as follows:

(µ|(θ,b))′ = (cθ, µ(θ,b), δθ), cθ =

ρ(θ)

µ((θ, b)).

We conclude that a left-side truncated distribution µ|(θ,b) generated by a density ρis weakly Cv-differentiable, for v ∈ L1(µ), provided that ρ is continuous at θ, and itsweak derivative can be represented as the re-scaled difference between the original trun-cated distribution µ|(θ,b) and the Dirac distribution δθ which assigns total mass to thepoint of truncation. Therefore, by Theorem 2.2 (ii) this implies that µ|(θ,b) is regularlydifferentiable, since c· is continuous at θ and both µ|(θ,b) and δθ are weakly continuous.Moreover, a similar argument as in Example 2.6 shows that µ|(θ,b) is, in general, notstrongly differentiable and higher-order derivatives do not exist.

A similar result holds true for right-side truncated distributions µ|(a,θ). Precisely, if µhas a density ρ then µ|(a,θ) is weakly [C]v-differentiable, for v ∈ L1(µ), provided that ρ iscontinuous at θ, and its weak derivative can be represented as follows; see Example 2.6:

(µ|(a,θ))′ = (cθ, δθ, µ(a,θ)), cθ =

ρ(θ)

µ((a, θ)).

Page 57: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

2.3. Differentiability of Product Measures 45

2.3 Differentiability of Product Measures

In this section we will establish sufficient conditions for weak differentiability of productmeasures. As it will turn out, the product of weakly differentiable measures is againweakly differentiable provided that the functional spaces are Banach bases. The mainresult is the following theorem.

Theorem 2.3. Let (D(S), v) and (D(T), u) be Banach bases on S and T, respectively. Letµθ ∈ Mv(S) be [D(S)]v-differentiable, ηθ ∈ Mu(T ) be [D(T)]u-differentiable. Then, theproduct measure µθ × νθ is [D(S)⊗D(T)]v⊗u-differentiable, and it holds that

(µθ × ηθ)′ = (µ′θ × ηθ) + (µθ × η′θ).

Proof. For ξ such that θ + ξ ∈ Θ, set

µξ =µθ+ξ − µθ

ξ− µ′θ; ηξ =

ηθ+ξ − ηθ

ξ− η′θ.

By hypothesis, µξ[D]v=⇒ ∅ and ηξ

[D]u=⇒ ∅, for ξ → 0, where ∅ denotes the null measure.

Simple algebra shows that the proof of the claim follows from

ξ · (µξ + µ′θ)× (ηξ + η′θ) + µθ × ηξ + µξ × ηθ[D]v⊗u=⇒ ∅, (2.26)

for ξ → 0. Hence, to conclude the proof, we show that each term on the left side of (2.26)converges weakly to null measure ∅.

Since µξ + µ′θ[D]v=⇒ µ′θ and ηξ + η′θ

[D]u=⇒ η′θ, applying Theorem 1.2 yields

supξ∈V \0

‖µξ + µ′θ‖v < ∞ and supξ∈V \0

‖ηξ + η′θ‖u < ∞,

for any compact neighborhood V of 0. Therefore, applying the Cauchy-Schwartz inequal-ity (1.22) together with Theorem 1.3 yields

∣∣∣∣ξ∫

g(s, t)((µξ + µ′θ)× (ηξ + η′θ)

)(ds, dt)

∣∣∣∣ ≤ |ξ| · ‖g‖v⊗u · ‖µξ + µ′θ‖v · ‖ηξ + η′θ‖u.

Letting ξ → 0 in the above inequality it follows that the first term in (2.26) convergesweakly to ∅.

The second and the third terms in (2.26) are symmetric so they can be treated similarly.For instance, for the second term in (2.26) note that

∫g(s, t)(µθ × ηξ)(ds, dt) =

∫ ∫g(s, t)µθ(ds) ηξ(dt) =

∫Hθ(g, t)ηξ(dt),

where Hθ(g, t) =∫

g(s, t)µθ(ds) for all t and for all g. Theorem 1.3 implies that the pair(D(S)⊗D(T), v⊗ u) is a Banach base and by applying the Chauchy-Schwartz Inequalityyields

∀t ∈ T :|Hθ(g, t)|

u(t)≤ ‖g(·, t)‖v

u(t)‖µθ‖v ≤ ‖g‖v⊗u ‖µθ‖v,

Page 58: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

46 2. Measure-Valued Differentiation

where the second inequality follows from (see (1.30) within the proof of Theorem 1.3)

∀s ∈ S, t ∈ T : |g(s, t)| ≤ ‖g‖v⊗uv(s)u(t).

Consequently, Hθ(g, ·) ∈ [D(T)]u, for g ∈ [D(S)⊗D(T)]v⊗u. We have assumed that ηθ is

[D(T)]u-differentiable, which yields ηξ[D]u=⇒ ∅. Hence,

limξ→0

∫Hθ(g, t)ηξ(dt) → 0,

which shows that the second term in (2.26) converges weakly to ∅. This concludes theproof of the statement.

Remark 2.5. It is worth noting that if we see D as a particular class of functions, i.e.,continuous or measurable, then the condition

D(S× T) ⊂ D(S)⊗D(T)

is satisfied for any D ∈ C,F, i.e., one considers the same class of functions D onS, T and on the product space S × T; see (1.28) and (1.29) in Example 1.7. It followsfrom Theorem 2.3 that weak differentiability is preserved by the product measure in bothcontinuity and measurability paradigms. Since in the context of our applications we willconsider a particular class of functions, e.g., continuous, measurable, we will denote byD the corresponding class of functions on each space under consideration.

For instance, choosing D(S) = C(S), D(T) = C(T), v ≡ 1, u ≡ 1 in Theorem 2.3 weconclude from (1.28) that weak CB-differentiability is preserved by the product measure,i.e., for each g ∈ CB(S× T) it holds that

d

∫g(s, t)µθ(ds)ηθ(dt) =

∫g(s, t)µ′θ(ds)ηθ(dt) +

∫g(s, t)µθ(ds)η′θ(dt).

This is asserted in [48] but no proof is given.

Extension to Finite Products of Measures

In what follows we extend Theorem 2.3 to finite product measures. To this end, let usconsider a finite family of positive mappings vi ∈ C+(Si), a finite family of measure-valuedmappings µi,∗ : Θ →Mvi

(Si), for 1 ≤ i ≤ n, and define the product mapping

Π∗ : Θ →M(σ(S1 × . . .× Sn)),

where σ(S1 × . . .× Sn) denotes the product Borel field on S1 × . . .× Sn, as follows:

∀θ ∈ Θ : Πθ = µ1,θ × . . .× µn,θ. (2.27)

Moreover, to simplify the notation, we denote the tensor product v1⊗ . . .⊗ vn (see (1.27)for a definition) by ~v. In formula:

∀(s1, . . . , sn) ∈ S1 × . . .× Sn : ~v(s1, . . . , sn) = v1(s1) · . . . · vn(sn). (2.28)

The following statement follows by finite induction from Theorem 2.3.

Page 59: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

2.3. Differentiability of Product Measures 47

Theorem 2.4. If µi,θ is weakly [D(Si)]vi-differentiable, for 1 ≤ i ≤ n, then the product

measure Πθ is weakly [D(S1)⊗ . . .⊗D(Sn)]~v-differentiable and

Π′θ =

n∑i=1

µ1,θ × . . .× µ′i,θ × . . .× µn,θ.

Moreover, if for θ ∈ Θ, µi,θ ∈ M1(Si) and µ′i,θ = (ci,θ, µ+i,θ, µ

−i,θ), for 1 ≤ i ≤ n, then an

instance of the weak derivative Π′θ is given by (Cθ, Π

+θ , Π−

θ ), where

Cθ =n∑

i=1

ci,θ; Π±θ =

n∑i=1

ci,θ

· µ1,θ × . . .× µ±i,θ × . . .× µn,θ. (2.29)

The following two results are immediate consequences of Theorem 2.4.

Corollary 2.2. Consider the Banach base ((D(S), v) and denote the k-fold product of µθ

by Πθ(k), for k ≥ 1. Assume that µθ has [D(S)]v-derivative (cθ, µ+θ , µ−θ ). Then Πθ(n) is

[D(S1)⊗ . . .⊗D(Sn)]~v-differentiable and we have4

d

∫g(x)Πθ(n, dx) = cθ

n∑j=1

∫g(s, t, u)Πθ(j − 1, ds)(µ+

θ − µ−θ )(dt)Πθ(n− j, du).

Proof. In Theorem 2.4 we let µ1,θ = . . . = µn,θ = µθ. If µ′θ = (cθ, µ+θ , µ−θ ) then the

conclusion follows from the equality ddθ

∫g(x)Πθ(n, dx) =

∫g(x)Π′

θ(n, dx).

Corollary 2.3. Random Variable Version of Theorem 2.4: Let Xi ∈ Si, for1 ≤ i ≤ n, be independent random variables, having distribution µi,θ, for 1 ≤ i ≤ n,respectively. If for 1 ≤ i ≤ n the distribution µi,θ is [D(Si)]vi

-differentiable, having deriva-tive (ci,θ, µ

+i,θ, µ

−i,θ), then for any g ∈ [D(S1)⊗ . . .⊗D(Sn)]~v it holds that

d

dθPg(θ) =

n∑j=1

cj,θEθ

[g(X1, . . . , X

+j , . . . , Xn)− g(X1, . . . , X

−j , . . . , Xn)

], (2.30)

where we denote Pg(θ) = Eθ [g(X1, . . . , Xn)] and Eθ denotes an expectation operator con-sistent with (X1, . . . , X

±j , . . . , Xn) ∼ µ1,θ × . . .× µ±j,θ × . . .× µn,θ, for 1 ≤ j ≤ n.

Remark 2.6. Note that, in Corollary 2.3, for any fixed j, X±j should be independent of

Xi : i 6= j. Nevertheless, note that it is not crucial that X+j and X−

j are mutually

independent. In addition, if we set P j±g (θ) := Eθ

[g(X1, . . . , X

±j , . . . , Xn)

], for 1 ≤ j ≤ n,

then P j±g denotes the counterpart of Pg in a system where the jth input variable Xj has

been replaced by X±j . Hence, according to (2.30) we have

d

dθPg(θ) =

n∑j=1

cj,θPj+g (θ)−

n∑j=1

cj,θPj−g (θ);

compare to (0.3). Therefore, an unbiased estimator for the stochastic gradient ddθ

Pg(θ)can be obtained according to (0.4).

4 By convention we disregard the void product Πθ(0).

Page 60: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

48 2. Measure-Valued Differentiation

2.4 Non-Continuous Cost-Functions and Set-Wise Differentiation

In the literature, differentiation of a probability measure µθ has also been defined asdifferentiability of the corresponding set function. That is

d

dθµθ(A) = µ′θ(A), (2.31)

for each A ∈ S; see, e.g., [39]. In fact, this holds true in the standard case when D = F(in particular, when µθ is strongly differentiable). However, this is not always the caseand the following example illustrates this. Taking ψθ to be the uniform distribution on[0, θ) and denoting the Lebesgue measure by ` it holds that

∀x > 0 : ψθ([0, x)) =1

θ`([0, x) ∩ [0, θ)) =

1

θminx, θ.

At θ = x, the left-sided derivative of ψθ([0, x)) equals 0 whereas the right-sided derivativeequals −1/x. Hence, ψθ fails to be weakly differentiable in the set-wise sense, whereas itis shown in Example 2.6 that ψθ is [C]v-differentiable.

Motivated by this remark, in this section we aim to identify the sets A which sat-isfy (2.31), provided that µθ is [C]v-differentiable, for some v ∈ C+. More generally,we investigate under which conditions [C]v-differentiability is suitable for differentiatingperformance measures generated by non-continuous cost-functions. Specifically, if µθ is[C]v-differentiable then, by Definition 2.1, we have

∀g ∈ [C]v :d

∫g(s)µθ(ds) =

∫g(s)µ′θ(ds). (2.32)

However, the elements of [C]v are, in general, not the only ones satisfying (2.32) as theremight be non-continuous functions g ∈ [F ]v which satisfy (2.32), as well; see Remark 1.4.

Starting point of our analysis is the Portmanteau Theorem (see Theorem F.1 in theAppendix) which asserts that the sequence µnn is CB-convergent to µ if and only ifµn(A) → µ(A) for each continuity set A of µ, i.e., µ(∂A) = 0. More generally, if forarbitrary g ∈ F we denote by Dg ⊂ S the set of discontinuities5 of g, then for eachbounded g ∈ F , such that µ(Dg) = 0, it holds that

limn→∞

∫g(s)µn(ds) =

∫g(s)µ(ds); (2.33)

see Theorem F.2 in the Appendix. The above result can be easily extended from proba-bility measures to general positive measures, as follows:

Lemma 2.4. Let the sequence µn : n ∈ N ⊂ M+ be [C]v-convergent to µ. Then, foreach mapping g ∈ [F ]v, such that µ(Dg) = 0, (2.33) holds true.

Proof. First, we show that the statement holds true for v = 1, i.e., [C]v = CB. Indeed,by hypothesis, (2.33) holds true for each g ∈ CB. Letting g = IS in (2.33) it follows thatµn(S) → µ(S), for n →∞. Moreover, this implies

µ(S) < ∞, supn∈N

µn(S) < ∞.

5 Note that, if g = IA, for some A ∈ S, then Dg = ∂A.

Page 61: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

2.4. Non-Continuous Cost-Functions and Set-Wise Differentiation 49

If µ(S) = 0, i.e., µ is the null measure, then the conclusion is immediate. Otherwise, ifµ(S) > 0, we define µ ∈M1 as follows:

∀A ∈ S : µ(A) =µ(A)

µ(S).

It follows that the set N0 := n ∈ N : µn(S) = 0 is finite and by considering the sequenceµn : n ∈ N \ N0 ⊂ M1, defined as

∀A ∈ S : µn(A) :=µn(A)

µn(S),

for each n ∈ N \ N0, we conclude that µn is CB-convergent to µ. Since µ(Dg) = 0 if andonly if µ(Dg) = 0, it follows that for each bounded g ∈ F , such that µ(Dg) = 0, we have

limn→∞

∫g(s)µn(ds) =

∫g(s)µ(ds).

Therefore, since µn(S) → µ(S), for n →∞, it follows that∫

g(s)µn(ds) = µn(S)∫

g(s)µn(ds) → µ(S)∫

g(s)µ(ds) =

∫g(s)µ(ds),

provided that µ(Dg) = 0, which proves the claim for v = 1.Let now v ∈ C+. According to Remark 1.6, [C(S)]v-convergence of µn towards µ is

equivalent to CB(Sv)-convergence of v · µn towards v · µ, where

∀η ∈Mv : (v · η)(ds) = v(s)η(ds).

By hypothesis, µ and µn, for n ∈ N, are v-finite measures, i.e., belong toMv, which impliesthat v · µ and v · µn, for n ∈ N are finite measures. Moreover, if Φ : [F(S)]v → FB(Sv)denotes the isometry defined in Example 1.5, i.e.,

∀s ∈ Sv, g ∈ [F(S)]v : (Φg)(s) =g(s)

v(s),

then it holds that DΦg ⊂ Dg and it follows that µ(DΦg) = 0, which implies (v·µ)(DΦg) = 0.Therefore, choose an arbitrary g ∈ [F(S)]v. It follows that Φg ∈ FB(Sv) and from thefirst part of the proof, for v = 1, we conclude that

limn→∞

∫g(s)µn(ds) = lim

n→∞

∫(Φg)(s)(v · µn)(ds) =

∫(Φg)(s)(v · µ)(ds) =

∫g(s)µ(ds),

provided that (v · µ)(DΦg) = 0. This concludes the proof.

Lemma 2.4 is the main technical tool that we use to analyze non-continuous cost-functions from a weak [C]v-differentiation perspective. In the following we apply thisresult to our differentiability setting. In particular, it will turn out that if µθ is regularly[C]v-differentiable then (2.31) holds true for each continuity set A of µ′θ. More specifically,the following statement holds true.

Page 62: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

50 2. Measure-Valued Differentiation

Theorem 2.5. If µ∗ : Θ → M1v is a [C]v-continuous measure-valued mapping such that

µθ is regularly [C]v-differentiable then:

(i) for each g ∈ [F ]v, such that |µ′θ|(Dg) = 0, it holds that

d

∫g(s)µθ(ds) =

∫g(s)µ′θ(ds). (2.34)

(ii) if A ∈ S such that A ⊂ Sv and A is a continuity set of µ′θ then A satisfies (2.31).

Proof. Regular [C]v-differentiability of µθ implies that

[µθ+ξ − µθ

ξ

]+[C]v=⇒ [µ′θ]

+,

[µθ+ξ − µθ

ξ

]−[C]v=⇒ [µ′θ]

and, since |µ′θ|(Dg) = 0 implies [µ′θ]±(Dg) = 0, Lemma 2.4 concludes the proof of (2.34).

Since A ⊂ Sv implies that ‖IA‖v < ∞, letting now g = IA in (2.34) concludes (ii).

Therefore, although weaker than [F ]v-differentiability, regular [C]v-differentiability ofµθ is a still a strong hypothesis since it implies that (2.32) holds true for each g ∈ [F(µ′θ)]v,where we denote by F(µ′θ) the linear space of |µ′θ|-a.e. continuous functions, i.e.,

F(µ′θ) := g ∈ F : |µ′θ|(Dg) = 0.

Note that regularity is a crucial assumption in Theorem 2.5. Indeed, let us revisitthe parametric distribution φθ introduced in Example 2.7, which is CB-differentiable forθ = 0. Since φ′0 = ∅ the set A = (0,∞) is a continuity set for φ′0 but it holds that

d

dθφθ(A)

∣∣∣θ=0

= −1 6= 0 = φ′0(A).

The following result is an immediate consequence of Theorem 2.5.

Corollary 2.4. Under the conditions put forward in Theorem 2.5, if there exists a neigh-borhood V of 0 such that for each ξ ∈ V we have θ + ξ ∈ Θ and the family

µθ+ξ − µθ

ξ: ξ ∈ V \ 0

is tight then each continuity set A of µ′θ satisfies (2.31).

Proof. Note that, by hypothesis, both families

(µθ+ξ − µθ

ξ

)±: ξ ∈ V \ 0

consist of positive measures and are tight. By Theorem 1.1 (ii) it follows that µθ isregularly CB-differentiable. Now taking into account that CB = [C]v, for v = 1 and thatfor each A ∈ S DIA = ∂A, the proof follows from Theorem 2.5.

Page 63: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

2.4. Non-Continuous Cost-Functions and Set-Wise Differentiation 51

Set-Wise Differentiation for Product Measures

We conclude this section by extending Theorem 2.5 to product measures. Note, however,that the decomposition (Cθ, Π

+θ , Π−

θ ) in Theorem 2.4 is not orthogonal, even though oneuses the orthogonal decomposition of µ′i,θ, for each 1 ≤ i ≤ n. Therefore, regular differ-entiability of µi,θ, for each 1 ≤ i ≤ n, does not imply in a straightforward way that Πθ isregularly differentiable and, in order to apply Theorem 2.5 to Πθ an additional argumentis needed. The following (weaker) result turns out to be useful in applications.

Theorem 2.6. If µi,∗ : Θ → M1v are such that µi,θ is regularly [C]vi

-differentiable, for1 ≤ i ≤ n, then for each measurable g ∈ [F ]~v satisfying

∀1 ≤ i ≤ n : (|µ1,θ| × . . .× |µ′i,θ| × . . .× |µn,θ|)(Dg) = 0 (2.35)

it holds thatd

∫g(x)Πθ(dx) =

∫g(x)Π′

θ(dx). (2.36)

Proof. By Theorem 2.4, Πθ is [C(S1× . . .×Sn)]~v-differentiable; see Remark 2.5. Moreover,note that for any ξ such that θ + ξ ∈ Θ we have

Πθ+ξ − Πθ

ξ=

n∑i=1

µ1,θ+ξ × . . .× µi−1,θ+ξ × µi,θ+ξ − µi,θ

ξ× µi+1,θ × . . .× µn,θ.

Hence, (Πθ+ξ − Πθ)/ξ = Υ+ξ −Υ−

ξ −, for some Υ±ξ ∈M+, i.e.,

Υ±ξ :=

n∑i=1

µ1,θ+ξ × . . .× µi−1,θ+ξ ×[µi,θ+ξ − µi,θ

ξ

]±× µi+1,θ × . . .× µn,θ

and regular differentiability of µi,θ, for 1 ≤ i ≤ n, ensures that for ξ → 0 we have

Υ±ξ

[C]~v=⇒ Υ± :=

n∑i=1

µ1,θ × . . .× µi−1,θ ×[µ′i,θ

]± × µi+1,θ × . . .× µn,θ.

Choose now g ∈ [F ]~v satisfying (2.35). It follows that

Υ±(Dg) ≤∑

(|µ1,θ| × . . .× |µ′i,θ| × . . .× |µn,θ|)(Dg) = 0

and by Lemma 2.4 we obtain

d

∫g(x)Πθ(dx) = lim

ξ→0

∫g(x)(Υ+

ξ −Υ−ξ )(dx) =

∫g(x)(Υ+ −Υ−)(dx). (2.37)

Finally, by the uniqueness of the [C]v-limit it follows that Π′θ = Υ+ −Υ− and using that

in (2.37) concludes the proof of (2.36).

Page 64: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

52 2. Measure-Valued Differentiation

2.5 Gradient Estimation Examples

In this section we present some basic applications of the weak differentiation theory. Morespecifically, if X is a random variable describing the state of a stochastic system in whichthe input distributions are weakly differentiable with respect to some design parameterθ, we provide an unbiased estimator for the gradient

d

dθEθ[g(X)],

for a certain class of performance measures g for which the above expression makes sense.The main theoretical tool to be used will be Theorem 2.4 which provides a representationof the weak derivative of the product measure. In Section 2.5.1 we construct a gradientestimator for a ruin probability, i.e., X is the indicator of the ruin event in some insurancemodel whereas in Section 2.5.2 we let X be the transient waiting time in a G/G/1 queue.

2.5.1 The Derivative of a Ruin Probability

Let us consider the following example. An insurance company receives premiums fromclients at some constant rate r > 0 while claims Yi : i ≥ 1 arrive according to a Poissonprocess with rate λ > 0. Let Xi : i ≥ 1 denote the inter-arrival times of the Poissonprocess and let Nτ denote the number of claims recorded up to some fixed time horizonτ > 0. Assume further that the values of the claims are i.i.d. random variables followinga Pareto distribution πθ, i.e.,

πθ(dx) =βθβ

xβ+1I(θ,∞)(x)dx,

for some β > 0 and assume that the claims are independent of the Poisson process.Let V (0) ≥ 0 denote the initial credit of the insurance company. The credit (resp.

debt) of the company right after the nth claim, denoted by V (n), follows the recursion

∀n ≥ 0 : V (n + 1) = V (n) + rXn+1 − Yn+1.

Ruin occurs before time τ if at least one n ≤ Nτ exists such that V (n) < 0. See Figure 2.1.We are interested in estimating the derivative with respect to θ of the probability of

ruin up to time τ . To this end, we denote by Rτ the event that ruin occurs before timeτ and note that

Rτ ∩ Nτ = n = (

n⋂

k=1

V (k) > 0)

=

r ·j∑

i=1

Xi >

j∑i=1

Yi,∀1 ≤ j ≤ n

.

Hence, considering the sequence gn : n ≥ 1, gn ∈ F(R2n) given by

gn(x1, . . . , xn, y1, . . . , yn) = 1−n∏

j=1

Ir·∑ji=1 xi>

∑ji=1 yi(x1, . . . , xn, y1, . . . , yn) (2.38)

we can write for each n ≥ 1

Pθ(Rτ ∩ Nτ = n) = Eθ

[INτ=ngn(X1, . . . , Xn, Y1, . . . , Yn)

], (2.39)

Page 65: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

2.5. Gradient Estimation Examples 53

where Eθ is an expectation operator consistent with (X1, . . . , Xn, Y1, . . . , Yn) ∼ µn × πnθ

and µ denotes the exponential distribution with rate λ.As explained in Section 2.2.3, the truncated distribution πθ is regularly CB-differentiable

and its weak derivative satisfies

π′θ =β

θ(πθ − δθ).

Therefore, if we let v = 1 and

µi,θ :=

πθ, 1 ≤ i ≤ n,µ, n + 1 ≤ i ≤ 2n,

we conclude by Theorem 2.4 that the product measure

Πθ = πnθ × µn

is CB-weakly differentiable and one can, according to (2.29), derive an instance of theweak derivative of Πθ using the following representation for the weak derivatives of theinput distributions:

µ′i,θ :=

θ, πθ, δθ), 1 ≤ i ≤ n,

(1, µ, µ), n + 1 ≤ i ≤ 2n.

Therefore, one can write (see Remark 2.5)

∀g ∈ CB(S2n) :d

∫g(s)Πθ(ds) =

∫g(s)Π′

θ(ds), (2.40)

where, according to (2.29), we have

Π′θ =

β

θ

n∑i=1

πi−1θ × (πθ − δθ)× πn−i

θ × µn

θ

n∑i=1

(Πθ − πi−1

θ × δθ × πn−iθ × µn

). (2.41)

Note, however, that the cost-function gn introduced for modeling the ruin probability isnot continuous; in formula: gn /∈ CB. Fortunately, by virtue of Theorem 2.6, the equalityin (2.40) still holds true if g satisfies

∀1 ≤ i ≤ n : (|πθ|i−1 × |π′θ| × |πθ|n−i × µn)(Dg) = 0.

In our case, we have

Dgn = ∂ (Rτ ∩ Nτ = n) ⊂n⋃

i=1

r ·

i∑j=1

xj =i∑

j=1

yj

,

which yields

(|πθ|i−1 × |π′θ| × |πθ|n−i × µn)(Dgn) ≤n∑

i=1

(r ·

i∑j=1

Xj =i∑

j=1

Yj

)= 0,

Page 66: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

54 2. Measure-Valued Differentiation

since Xi has a continuous (exponential) distribution, for each 1 ≤ i ≤ n. Hence, (2.40)applies for g = gn, even though gn /∈ CB.

Examining (2.41) we note that, while Πθ represents the distribution of the initialprocess, the product measure

πi−1θ × δθ × πn−i

θ × µn

represents the distribution of a modified process Vi(·), where the size of the ith claim hasbeen replaced by the constant θ. Consequently, if Ri

τ denotes the event that ruin occursbefore time τ , when the value of the ith claim is replaced by the constant θ, then, lettingg = gn in (2.40), it follows from (2.39) that

d

dθPθ (Rτ ∩ Nτ = n) =

β

θ

n∑i=1

(Pθ (Rτ ∩ Nτ = n)− Pθ

(Ri

τ ∩ Nτ = n))

θ

n∑i=1

((Rτ \Ri

τ

) ∩ Nτ = n) , (2.42)

where the last equality follows from the observation that Yi > θ, which implies thatRi

τ ⊂ Rτ . Moreover, the difference Rτ \Riτ represents the event that ruin occurs up to

time τ but it does not occur anymore if one reduces the value of the ith claim by Yi−θ. Agraphical representation of these facts can be found in Figure 2.1, where the dashed linerepresents the modified process Vi(·). One can easily note that the event is incompatiblewith Nτ < i, i.e., if the “reduced claim” comes after time τ . Hence, it holds that

∀i ≥ 1 : Pθ

(Rτ \Ri

τ

)= Pθ

((Rτ \Ri

τ ) ∩ Nτ ≥ i) . (2.43)

Let us consider now the following elementary identity6

Pθ (Rτ ) =∞∑

n=1

Pθ (Rτ ∩ Nτ = n) .

Provided that interchanging infinite summation with differentiation is allowed, we obtainthe following sequence of equalities

d

dθPθ (Rτ ) =

∞∑n=1

d

dθPθ (Rτ ∩ Nτ = n) (2.44)

(2.42)=

∞∑n=1

β

θ

n∑i=1

((Rτ \Ri

τ ) ∩ Nτ = n) (2.45)

(∗)=

β

θ

∞∑i=1

((Rτ \Ri

τ ) ∩ Nτ ≥ i) (2.46)

(2.43)=

β

θ

∞∑i=1

(Rτ \Ri

τ

), (2.47)

6 Note that ruin can not occur if Nτ = 0, i.e., if no claim is recorded up to time τ .

Page 67: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

2.5. Gradient Estimation Examples 55

-

6

­­­­­­­

­­­­­­­­­­­­­­

­­­­­­­

­­­­­­­­­­­­­­

­­­­­­­

θ

τX1 X2 X3 X4

Fig. 2.1: An occurrence of the event Rτ \R3τ and Nτ = 4. The dashed line represents a version

of the process where the value of the 3rd claim is reduced.

where the equality (*) follows by changing the summation order in (2.45), which is allowedbecause the series in (2.46) is absolutely convergent; see Theorem A.1 in the Appendix.Moreover, the kth remainder term of the series in (2.46) can be bounded as follows:

∞∑

i=k+1

((Rτ \Ri

τ ) ∩ Nτ ≥ i) ≤∞∑

i=k+1

Pθ(Nτ ≥ i) ≤ e−λτ

∞∑

i=k+1

∞∑j=i

(λτ)j

j!. (2.48)

Note that the above bound is independent of θ. Interchanging limit with differentiationin (2.44) is justified (see Theorem B.3 in the Appendix) provided that we deal withan uniformly convergent series of functions in θ. Hence, it suffices to show that thedouble sum in (2.48) converges to 0 as k → ∞. To see that, choose k ≥ 1 such that(λτ)/(k + 1) < q < 1. In particular, it follows that for each j ≥ k + 1 it holds that(λτ)/j < q < 1. Then we have

∞∑

i=k+1

∞∑j=i

(λτ)j

j!≤ (λτ)k

k!

∞∑

i=k+1

∞∑j=i

qj−k =(λτ)k

k!

q−k

1− q

∞∑

i=k+1

qi =(λτ)k

k!

q

(1− q)2.

Now choose an arbitrary ε > 0 and increase (if necessary) k in order to obtain

(λτ)k

k!≤ ε(1− q)2

q.

Since ε was arbitrary chosen, we conclude that (2.47) holds true and the expression

∂θ(n) :=β

θ

n∑j=1

(gn(X1, . . . , Xn, Y1, . . . , Yn)− gn(X1, . . . , Xn, Y1, . . . , Yj−1, θ, Yj+1, . . . , Yn))

Page 68: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

56 2. Measure-Valued Differentiation

provides an asymptotically unbiased estimator, i.e., the sequence ∂θ(n) converges in meanto a unbiased estimator, as n →∞, for the derivative of the ruin probability.

2.5.2 Differentiation of the Waiting Times in a G/G/1 Queue

Let us consider a G/G/1 queue where the service times have distribution µ and inter-arrival times have distribution η. If Wn : n ≥ 1 denotes the sequence of waiting times,then Lindley’s recursion yields

∀n ≥ 1 : Wn+1 = maxWn + Sn − Tn+1, 0,

where Sn : n ≥ 1 and Tn : n ≥ 1 denote the sequence of service and inter-arrivaltimes, respectively.

Let us assume that the service time distribution µ = µθ depends on some parameterθ ∈ Θ ⊂ R and the inter-arrival time distribution η is independent of θ. We will investigateunder which conditions the distribution of the (n+1)st waiting time Wn+1, which obviouslywill depend on θ, is weakly differentiable. To this end, we aim to apply Theorem 2.4and we consider the sequence of mappings wn : n ≥ 1, wn : R2n → R, defined asw1(s, t) = maxs− t, 0 and

∀n ≥ 1, σ, τ ∈ Rn, s, t ∈ R : wn+1(σ, s, τ, t) = maxwn(σ, τ) + s− t, 0. (2.49)

Note that wn ∈ C(R2n) and from Lindley’s recursion it follows that the waiting timessatisfy

∀n ≥ 1 : Wn+1 = wn(S1, . . . , Sn, T2, . . . , Tn+1).

In what follows, we fix n ≥ 1 and assume that µθ is [D]v-differentiable, having deriva-tive µ′θ = (cθ, µ

+θ , µ−θ ). By letting

µi,θ :=

µθ, 1 ≤ i ≤ n,η, n + 1 ≤ i ≤ 2n,

it follows that µi,θ is [D]v-differentiable, for all 1 ≤ i ≤ 2n, with derivatives

µ′i,θ =

(cθ, µ

+θ , µ−θ ), 1 ≤ i ≤ n,

(1, η, η), n + 1 ≤ i ≤ 2n,

provided that v ∈ L1(η). Therefore, Theorem 2.4 applies and leads us to conclude thatthe distribution of Wn+1 is weakly [D]ϑ-differentiable, for all ϑ ∈ C+([0,∞)) satisfying

‖ϑ wn‖~v < ∞,

where, in this case, we have (see (2.28) for the definition of ~v)

∀s1, . . . , sn, t1, . . . , tn : ~v(s1, . . . , sn, t1, . . . , tn) =n∏

i=1

v(si)n∏

i=1

v(ti).

Page 69: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

2.5. Gradient Estimation Examples 57

Now continuity of wn implies that the distribution of Wn+1 is weakly [D]ϑ-differentiable,for any ϑ ∈ C+([0,∞)), satisfying

sups1,...,sn∈Rt1,...,tn∈R

ϑ(wn(s1, . . . , sn, t1, . . . , tn))∏ni=1 v(si)

∏ni=1 v(ti)

< ∞. (2.50)

Note that (2.50) holds true if ϑ is non-decreasing and satisfies

∀x, y ≥ 0 : ϑ(x + y) ≤ γv(x)v(y), (2.51)

for some γ > 0. Indeed, note that for all n ≥ 1, wn in (2.49) satisfies

∀s1, . . . , sn, t1, . . . , tn ∈ R : wn(s1, . . . , sn, t1, . . . , tn) ≤ s1 + . . . + sn + t1 + . . . + tn.

Using monotonicity of ϑ, we conclude with (2.51) that

sups1,...,sn∈Rt1,...,tn∈R

ϑ(wn(s1, . . . , sn, t1, . . . , tn))∏ni=1 v(si)

∏ni=1 v(ti)

≤ γ2n−1.

In particular, if for all x ≥ 0, v(x) = ϑ(x) = eαx, for some α ≥ 0, then (2.51) is fulfilledfor γ = 1. We conclude that if µθ is [D]vα-differentiable, then the distribution of Wn+1

is [D]vα-differentiable as well. For later reference we synthesize our analysis into thefollowing statement:

Theorem 2.7. Let vα(x) = eαx, for all x ≥ 0, for some α ≥ 0. If the service timesdistribution µθ is [D]vα-differentiable then the distribution of the (n + 1)st waiting time is[D]vα-differentiable, for each n ≥ 1, and it holds that

∀f ∈ [D]vα :d

dθEθ[f(Wn+1)] = cθ

n∑

k=1

Eθ[f(W k+n+1)− f(W k−

n+1)], (2.52)

where in accordance with Corollary 2.3 we have

W k±n+1 = wn(S1, . . . , S

±k , . . . , Sn, T2, . . . , Tn+1) (2.53)

and Eθ is an expectation operator consistent with

∀1 ≤ k ≤ n : (S1, . . . , S±k , . . . , Sn, T2, . . . , Tn+1) ∼ µk−1

θ × µ±θ × µn−kθ × ηn.

Proof. We apply Corollary 2.3 to the family of random variables Xi : 1 ≤ i ≤ 2ndefined as

Xi :=

Si, 1 ≤ i ≤ n;Ti−n+1, n + 1 ≤ i ≤ 2n.

Since, by hypothesis, µθ is Cvα-differentiable, having weak derivative µ′θ = (cθ, µ+θ , µ−θ )

and η is trivially CB-differentiable, i.e., Cvα-differentiable for α = 0, its derivative beingnonsignificant, then (2.52) follows from (2.30) by letting g = wn.

Page 70: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

58 2. Measure-Valued Differentiation

To complete the proof, one has to show that for each α ≥ 0 vα satisfies

∀s1, . . . , sn, t1, . . . , tn : vα(wn(s1, . . . , sn, t1, . . . , tn)) ≤n∏

i=1

vα(si). (2.54)

To see that, note that, for n ≥ 1, wn in (2.49) satisfies

∀s1, . . . , sn, t1, . . . , tn : wn(s1, . . . , sn, t1, . . . , tn) ≤ s1 + . . . + sn;

the proof follows by induction. Now monotonicity of vα concludes the proof of (2.54).

Analyzing equation (2.53) we note that W k±n+1 denotes the (n + 1)st waiting time in a

modified queue, where the kth service time Sk has been replaced by S+k and S−k , respec-

tively. Hence, for each k ≥ 1, one can construct two parallel processes W k±· whose sample

paths coincide with those of the original process up to time k, after time k + 1 follow aparallel path with that of the original process and once that the “higher path” reacheslevel 0 the two paths coincide again (the two processes couple). A graphical representationof the two parallel processes W k±

n : n ≥ 1 can be seen in Figure 2.2.

-

6

n

W

1 2 3 4 5 6 7 8 9 10 11

­­­­­­@

@@

@¯¯¯¯¯¯¯¯¯¯@

@@

@##

##

HHHH

@@

@@

@@

@@

""""

""""

TTTTTTTT

TTTT

!!!!

!!!!

SSSSSSS ¡

¡¡

¡

W 5+·

W 5−·

Fig. 2.2: A sample path of the parallel processes W k±n n≥1, for k = 5. They are obtained by

replacing the 5th service time in the original queue by S+5 and S−5 , respectively.

In particular, Theorem 2.7 shows that the expression

∂θ := cθ

n∑

k=1

(f(W k+

n+1)− f(W k−n+1)

),

with W k±n+1 given by (2.53), provides an unbiased estimator for the gradient d

dθEθ[f(Wn+1)].

More specifically, provided that one has the means to simulate f(Wn+1), then by parallelsimulations one can also simulate the stochastic gradient of f(Wn+1). While the jointdistribution of the pair (S+

k , S−k ) plays no role in Theorem 2.7, it becomes crucial whensimulating the two parallel processes. For a better performance it is recommended thatthe correlation of the two random variables to be maximal; see [48]. For more details onthe relation between weak derivatives and unbiased estimators, we refer to [31].

Page 71: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

2.6. Concluding Remarks 59

2.6 Concluding Remarks

Throughout this chapter much work has been put into formalizing and studying a fewrelevant types of measure-valued differentiation. Concepts such as weak and strong dif-ferentiation have already been treated in the literature; see, e.g., [27], [32], [48] and recallthat strong differentiation is a particular case of Frechet differentiation. The concept ofregular differentiation, however, is rather new and is meant to ensure a “smooth” ex-tension of some properties related to classical weak convergence of measures to generalsigned measures. It turns out that, for some applications, e.g., set-wise differentiation,weak differentiability, which is a minimal differentiability condition, is not sufficient whilestrong differentiability is a too strong condition as it is not enjoyed by an important classof distributions; e.g., truncated distributions. Therefore, since regular differentiation isa property enjoyed by most of the common weakly differentiable distributions it makessense to consider and study such a concept.

An important aspect of the theory of measure-valued differentiation is the “triple”representation of the derivatives of probability measures which makes possible to representthe derivatives of an expected value as the re-scaled difference between two expectedvalues; see (2.19), or, when dealing with product measures, as a linear combination ofexpected values; see Corollary 2.3. This fact is important in simulations as it allows forunbiased (resp. asymptotically unbiased) gradient estimations with reduced variance forthe transient (resp. steady-state) performance measures of complex stochastic systems,compared to other parallel methods such as infinitesimal perturbation analysis and scorefunctions method; see, e.g., [25], [33], [39], [47], [48]. However, establishing the accuracyof the estimates is subject for future research.

Most of the results put forward in this chapter are new and are based on classicaltheory of weak convergence of probability measures and the link between measure theoryand functional analysis. Out of these results I would like to point out Theorem 2.3, whichis crucial for establishing weak differentiability of product measures and makes this theoryfruitful for applications. It also provides the means to represent the weak derivative of theproduct measure which leads to gradient estimations for complex systems; see Section 2.5.I would also like to mention Theorem 2.2, which establishes sufficient conditions for strongdifferentiability, for which I am grateful to Prof.Dr. A. Hordijk for his contribution inestablishing this result. The definition of the new concept of regular differentiabilityis motivated by the results in Section 2.4 which lead to gradient estimations for non-continuous performance measures in Section 2.5.1.

The theory of weak differentiation has been successfully applied to discrete-time stochas-tic processes, e.g., random walks or, more generally, Markov chains; see [22], [26], [27],[31], [32] and [33]. An interesting topic for future research is to extend these techniquesto continuous-time processes, e.g., diffusions, Levy processes, and to see to what extentthe resulting theory overlaps with the well known Malliavin Calculus.

Eventually, an important topic for future research is to develop applications in thearea of stochastic optimization and risk theory, based on weak differentiation theory.

Page 72: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES
Page 73: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

3. STRONG BOUNDS ON PERTURBATIONS BASED ONLIPSCHITZ CONSTANTS

It is known that, in classical analysis one can easily establish bounds on the variations ofa differentiable function by using the Mean Value Theorem. More specifically, differentia-bility implies local Lipschitz continuity, i.e., the variation of a differentiable function canbe bounded by means of Lipschitz constants, provided that the derivative is bounded on agiven domain. This chapter is intended to extend the classical results to measure-valuedmappings in order to establish bounds on perturbations for the performance measures ofparameter-dependent stochastic models.

3.1 Introduction

In classical analysis, a function f : S→ T, where (S, dS) and (T, dT) are metric spaces, iscalled Lipschitz continuous on A ⊂ S if there exists some constant L > 0 such that

∀s1, s2 ∈ A : dT(f(s1), f(s2)) ≤ L · dS(s1, s2) (3.1)

and it is called locally Lipschitz continuous if for each s ∈ S there exists a neighborhoodU of s such that f is Lipschitz continuous on U . Any constant L satisfying (3.1) is calleda Lipschitz constant. In addition, f is said to be Lipschitz continuous if it is Lipschitzcontinuous on S.

Obviously, Lipschitz continuity implies local Lipschitz continuity, but the converse is,in general, not true. A standard counterexample is the function f(x) = 1/x on (0,∞).Further, local Lipschitz continuity implies (uniform) continuity but the converse is, again,not true since any real-valued function which is continuous but nowhere differentiable(e.g., Weierstrass’s function) is not locally Lipschitz continuous.

On Banach spaces, most common examples of locally Lipschitz continuous functionsare the Frechet differentiable functions. Moreover, if f is Frechet differentiable and itsderivative is bounded on some domain A, it follows from the Mean Value Theorem that f isLipschitz continuous on A. In fact, on Euclidian spaces Lipschitz continuity is essentiallyequivalent to Frechet differentiability. More specifically, a function f : A ⊂ Rn → Rm isLipschitz continuous if and only if it is differentiable almost everywhere and the essentialsupremum of its derivative is finite (Rademacher’s Theorem).

Lipschitz constants play an important role in perturbation/sensitivity analysis as theyprovide bounds on the variation of functions. Starting from the fact that Theorem 2.1essentially says that weak differentiability implies strong local Lipschitz continuity weaim, in this chapter, to extend this result to product measures and to establish boundson perturbations for performance measures of stochastic systems by means of Lipschitzconstants which can be easily derived from the expression of weak derivatives.

Page 74: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

62 3. Strong Bounds on Perturbations Based on Lipschitz Constants

The setup of this chapter is as follows: let µi,θ, for 1 ≤ i ≤ n, be a family of probabilitymeasures depending on some parameter θ ∈ Θ and set

Pg(θ) := Eθ[g(X1, . . . , Xn)] =

∫. . .

∫g(s1, . . . , sn)Πθ(ds1, . . . , dsn), (3.2)

for a cost-function g, where Xi is distributed according to µi,θ, for each 1 ≤ i ≤ n, and Πθ

denotes the product of the measures µi,θ, for 1 ≤ i ≤ n; for a formal definition see (2.27).Throughout this chapter we study the following type of bounds on perturbations:

(i) Bounds on |Pg(θ2)− Pg(θ1)|, for some specified cost-function g.

(ii) Uniform (strong) bounds with respect to [D]v, i.e., for

sup‖g‖v≤1

|Pg(θ2)− Pg(θ1)|,

for some θ1, θ2 ∈ Θ.Starting point of the analysis put forward in this chapter is Theorem 2.1 which asserts

that weak [D]v-differentiability of a measure-valued mapping implies strong local Lipschitzcontinuity, i.e., for each neighborhood V of 0, there exists some constant M > 0 such that

∀ξ ∈ V, g ∈ [D]v :

∣∣∣∣∫

g(s)µθ+ξ(ds)−∫

g(s)µθ(ds)

∣∣∣∣ ≤ M |ξ|‖g‖v. (3.3)

A constant M , satisfying (3.3), is called a Lipschitz constant (bound) for µθ. Note thatany M ′ > M is a Lipschitz bound for µθ, provided that M is. Therefore, one can increasethe effectiveness of a bound by decreasing it, when possible. Although the Lipschitzbounds are, in general, not very effective they still play an important role when studyingstrong stability of stochastic systems. In other words, they are qualitatively importantrather than quantitatively. Extending this result to product measures leads to the desiredbounds, provided that the input distributions µi,θ are weakly differentiable, for 1 ≤ i ≤ n.

The chapter is organized as follows: In Section 3.2 we derive Lipschitz bounds for somestandard probabilistic models and in Section 3.3 we extend our analysis to the steady-statewaiting time. In particular, we show that the stationary distribution of waiting times inthe G/G/1 queue is strongly local Lipschitz continuous, provided that the service-timesdistribution is weakly differentiable.

3.2 Bounds on Perturbations

Theorem 2.1 establishes strong local Lipschitz continuity of weakly differentiable proba-bility measures. For practical purposes one is interested in calculating an actual Lipschitzbound. Therefore, this section is intended to show how Lipschitz bounds can be derivedfrom evaluating the weak derivative of a probability measure. While the procedure forderiving Lipschitz constants is rather similar to the one in classical analysis, the particularsetting we address here imposes, however, some specific formulation and this will be ex-plained in the main result of this section, Theorem 3.1. In Section 3.2.1 we derive boundson perturbations for product probability measures and in Section 3.2.2 we obtain similarresults for homogenous Markov chains and we illustrate the results with an applicationto the sequence of the waiting times in the G/G/1 queue.

Page 75: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

3.2. Bounds on Perturbations 63

3.2.1 Bounds on Perturbations for Product Measures

The aim of this section is to establish bounds on perturbations for product measures.This is useful when considering performance measures which depend on a finite collectionof random variables as in (3.2). We start with a basic result which establishes bounds onperturbations for one-dimensional distributions.

Theorem 3.1. Let µ∗ : Θ → M1v be [C]v-differentiable on Θ. For θ1, θ2 ∈ Θ, such that

θ1 < θ2, let us defineLv

µ(θ1, θ2) := supθ∈[θ1,θ2]

‖µ′θ‖v.

(i) Then it holds that‖µθ2 − µθ1‖v ≤ Lv

µ(θ1, θ2) (θ2 − θ1). (3.4)

(i) For any g ∈ [F ]v it holds that

|Eθ2 [g(X)]− Eθ1 [g(X)]| ≤ Lvµ(θ1, θ2) ‖g‖v (θ2 − θ1), (3.5)

where, for θ ∈ Θ, Eθ is an expectation operator consistent with X ∼ µθ.

(iii) If µ′θ = (cθ, µ+θ , µ−θ ) and g ≥ 0 then we can replace Lv

µ(θ1, θ2) in (3.5) by

Mvµ(θ1, θ2) = sup

θ∈[θ1,θ2]

(cθ ·max‖µ+θ ‖v, ‖µ−θ ‖v).

Proof. (i) Fix g ∈ [C]v. Applying the Mean Value Theorem yields∣∣∣∣∫

g(s) µθ2(ds)−∫

g(s) µθ1(ds)

∣∣∣∣ = (θ2 − θ1)

∣∣∣∣∫

g(s) µ′θg(ds)

∣∣∣∣ ,

for some θg ∈ (θ1, θ2), depending on g. On the other hand,

∀θ ∈ (θ1, θ2) :

∣∣∣∣∫

g(s) µ′θ(ds)

∣∣∣∣ ≤ ‖g‖v · ‖µ′θ‖v ≤ Lvµ(θ1, θ2) ‖g‖v,

according to Cauchy-Schwartz Inequality, and we conclude that

∀g ∈ [C]v :

∣∣∣∣∫

g(s)µθ2(ds)−∫

g(s)µθ1(ds)

∣∣∣∣ ≤ Lvµ(θ1, θ2) ‖g‖v (θ2 − θ1). (3.6)

Taking in (3.6) the supremum with respect to ‖g‖v ≤ 1, concludes (i).(ii) Applying again the Cauchy-Schwarz Inequality we obtain

∀g ∈ [F ]v :

∣∣∣∣∫

g(s)µθ2(ds)−∫

g(s)µθ1(ds)

∣∣∣∣ ≤ ‖g‖v‖µθ2 − µθ1‖v

and from (3.4) we conclude (ii).(iii) Finally, for g ≥ 0 we have∣∣∣∣

∫g(s)µ′θ(ds)

∣∣∣∣ = cθ

∣∣∣∣∫

g(s)µ+θ (ds)−

∫g(s)µ−θ (ds)

∣∣∣∣

≤ cθ ·max

∫g(s)µ+

θ (ds),

∫g(s)µ−θ (ds)

≤ cθ ·max‖µ+θ ‖v, ‖µ−θ ‖v‖g‖v

which, together with (3.5), concludes the proof of (iii).

Page 76: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

64 3. Strong Bounds on Perturbations Based on Lipschitz Constants

Lipschitz Bounds for Some Usual Distributions

In applications one is often interested in bounds of moments of certain random vari-ables. The following example illustrates how Theorem 3.1 applies to two usual types ofdistributions.

Example 3.1. Let S = [0,∞) and v(s) = sp, for each s ≥ 0 and some p ≥ 0.

(i) Let µθ denote the exponential distribution with rate θ discussed in Example 2.5.Standard calculations show that

‖µ′θ‖v =

∫ ∞

0

xp|1− θx|e−θxdx

=2e−1 + pγ(p + 1, 1)− pγ(p + 1, 1)

θp+1,

where γ and γ denote the superior (resp. inferior) incomplete Gamma functions,which are defined as follows

∀p > 0, x ≥ 0 : γ(p, x) =

∫ ∞

x

sp−1e−sds, γ(p, x) =

∫ x

0

sp−1e−sds.

Therefore, the Lipschitz constant Lvµ(θ1, θ2) in Theorem 3.1 is given by

Lvµ(θ1, θ2) =

2e−1 + pγ(p + 1, 1)− pγ(p + 1, 1)

θp+11

and the constant Mvµ(θ1, θ2) satisfies

Mvµ(θ1, θ2) = sup

θ∈[θ1,θ2]

max

e−1 − pγ(p + 1, 1)

θp+1,e−1 + pγ(p + 1, 1)

θp+1

=e−1 + pγ(p + 1, 1)

θp+11

.

In particular, if p ≥ 0 is an integer it holds that

Lvµ(θ1, θ2) =

2(1 + pp!∑p

k=0(1/k!))− pp!e

θp+11 e

and

Mvµ(θ1, θ2) =

1 + pp!∑p

k=0(1/k!)

θp+11 e

.

(ii) For the uniform distribution ψθ on [0, θ), in accordance with Example 2.6, we obtainthe following Lipschitz constants:

Lvψ(θ1, θ2) =

p + 2

p + 1θp−12

andMv

ψ(θ1, θ2) = θp−12 .

Example 3.1 illustrates the fact that the Lipschitz constants very often depend onthe values θ1, θ2 ∈ Θ. Thus, from this point of view, notations such as Lv

µ(θ1, θ2) andMv

µ(θ1, θ2) are justified. However, in what follows we omit specifying the values θ1, θ2

when not relevant.

Page 77: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

3.2. Bounds on Perturbations 65

Extension to Product Measures

Let us consider now (a) a finite family µi,θ : θ ∈ Θ ⊂ M1(Si) of probability measures (b)a family of non-negative, continuous mappings vi ∈ C+(Si) and (c) recall the definitionsof Πθ, ~v and Pg(θ) given in (2.27), (2.28) and (3.2), respectively.

Theorem 3.2. Let µi,∗ : Θ → M1vi

be [C(Si)]vi-differentiable on Θ, for each 1 ≤ i ≤ n,

and for arbitrary θ1, θ2 ∈ Θ, such that θ1 < θ2 set

∀1 ≤ i ≤ n : Li = supθ∈[θ1,θ2]

‖µ′θ,i‖vi.

(i) Then it holds that‖Πθ2 − Πθ1‖~v ≤ L∗(θ2 − θ1),

where

L∗ =n∑

i=1

(Li

i−1∏j=1

‖µj,θ2‖vj

n∏

k=i+1

‖µk,θ1‖vk

)(3.7)

and we agree that a void product, such as∏0

j=1 ‖µj,θ2‖vj, equal to 1.

(ii) For each g ∈ [F(S1 × . . .× Sn)]~v it holds that

|Pg(θ2)− Pg(θ1)| ≤ L∗‖g‖~v(θ2 − θ1). (3.8)

(iii) If g ≥ 0 and µ′i,θ = (ci,θ, µ+i,θ, µ

−i,θ), for 1 ≤ i ≤ n, then the constant L∗ in (3.8) can

be improved by replacing in (3.7) Li by

Mi = supθ∈[θ1,θ2]

(ci,θ ·max‖µ+i,θ‖v, ‖µ−i,θ‖v),

i.e., L∗ can be replaced by

M∗ :=n∑

i=1

(Mi

i−1∏j=1

‖µj,θ2‖vj

n∏

k=i+1

‖µk,θ1‖vk

). (3.9)

Proof. For arbitrary g ∈ [C(S1 × . . .× Sn)]~v, we have

∀(s1, . . . , sn) : |g(s1, . . . , sn)| ≤ ‖g‖~v · ~v(s1, . . . , sn) = ‖g‖~v · v(s1) · . . . · v(sn).

Therefore, for each 1 ≤ i ≤ n, the mapping gi defined as

gi(si) =

∫. . .

∫g(s1, . . . , sn)

i−1∏j=1

µj,θ2(dsj)n∏

k=i+1

µk,θ1(dsk)

is continuous (for a proof, use the Dominated Convergence Theorem) and satisfies (applyFubini Theorem)

‖gi‖vi≤ ‖g‖~v ·

i−1∏j=1

‖µi,θ2‖vj·

n∏

k=i+1

‖µk,θ1‖vk. (3.10)

Page 78: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

66 3. Strong Bounds on Perturbations Based on Lipschitz Constants

Therefore, gi ∈ [C(Si)]vi, for each 1 ≤ i ≤ n, and since µi,θ is weakly [C(Si)]vi

-differentiable,we conclude from Theorem 3.1 (ii) that

∣∣∣∣∫

gi(si)µi,θ2(dsi)−∫

gi(si)µi,θ1(dsi)

∣∣∣∣ ≤ Li‖gi‖vi(θ2 − θ1). (3.11)

On the other hand, simple algebraic calculations show that

∫g(s)Πθ2(ds)−

∫g(s)Πθ1(ds) =

n∑i=1

(∫gi(si)µi,θ2(dsi)−

∫gi(si)µi,θ1(dsi)

).

Hence, (3.11) together with (3.10) imply that

∀g ∈ [C(S1 × . . .× Sn)]~v :

∣∣∣∣∫

g(s)Πθ2(ds)−∫

g(s)Πθ1(ds)

∣∣∣∣ ≤ L∗‖g‖~v(θ2 − θ1)

holds true, for L∗ defined by (3.7). Taking in the above inequality the supremum withrespect to ‖g‖~v ≤ 1 concludes (i). A similar reasoning as in Theorem 3.1 concludes theproofs of (ii) and (iii).

Remark 3.1. If µi,θ is [C]vi-differentiable, having derivative µ′i,θ, we conclude from The-

orem 2.4 that Pg(θ) is differentiable with respect to θ, for g ∈ C~v. Therefore, one couldalso derive a Lipschitz bound for Pg(θ) by bounding the derivative P ′

g(θ) which, accordingto Theorem 3.1, satisfies

P ′g(θ) =

n∑i=1

∫. . .

∫g(s1, . . . , sn)µ′i,θ(dsi)

j 6=i

µj,θ(dsj).

Using a similar reasoning as in Theorem 3.2 one would obtain in (3.8) the followingLipschitz bound:

L′ =n∑

i=1

supθ∈[θ1,θ2]

(‖µ′i,θ‖vi

j 6=i

‖µj,θ‖vj

)

which, in general, is less accurate (larger) than L∗ defined in (3.7).

Corollary 3.1. Under the conditions put forward in Theorem 3.2, if for each 1 ≤ i ≤ nµi,θ = µθ and vi = v then

L∗ = Lµv

n∑

k=1

‖µθ2‖k−1v ‖µθ1‖n−k

v , M∗ = Mµv

n∑

k=1

‖µθ2‖k−1v ‖µθ1‖n−k

v .

A Simple Application from Finance

Let us consider the following simple example from finance. Assume that an investor ispurchasing S > 0 units worth of stock each month for a number n ≥ 1 months in a row.If we denote by Xk the spot price per share in month k, for 1 ≤ k ≤ n, then the amountpurchased in month k equals to S/Xk. Hence, the average price per share Xa he or she

Page 79: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

3.2. Bounds on Perturbations 67

pays over the n months is obtained by dividing the total amount of wealth spent dividedby the total number of shares purchased; in formula:

Xa =n · S

SX1

+ . . . + SXn

=n

1X1

+ . . . + 1Xn

,

i.e., the average price is just the harmonic mean of the spot prices X1, . . . , Xn.Let us fix n ≥ 1 and assume that Xi : 1 ≤ i ≤ n are i.i.d. random variables

with distribution µθ depending on some parameter θ. One is interested in studying thesensitivity of the expected average price with respect to θ, i.e., to obtain a bound for theperturbation

∆p(θ1, θ2) := |Eθ2 [Xa]− Eθ1 [Xa]| ,for some θ1, θ2 ∈ Θ, such that θ1 < θ2. To this end, note that the expected average pricecan be written as

Eθ[Xa] = Ph(θ) =

∫. . .

∫h(x1, . . . , xn)µθ(dx1) . . . µθ(dxn), (3.12)

where h is defined as

∀x1, . . . , xn > 0 : h(x1, . . . , xn) =n

1x1

+ . . . + 1xn

.

Letting v ∈ C+((0,∞)), v(x) = n√

x, it holds that

∀x1, . . . , xn > 0 : h(x1, . . . , xn) ≤ n√

x1 · . . . · xn = ~v(x1, . . . , xn).

Therefore, since h ≥ 0, by Theorem 3.2 (iii), one concludes that Ph(θ) is Lipschitzcontinuous with respect to θ, provided that µθ is [C]v-differentiable on Θ and by Corollary3.1, it follows that a Lipschitz bound is given by

M∗ = Mµv

n∑

k=1

‖µθ2‖k−1v ‖µθ1‖n−k

v . (3.13)

More specifically, noting that ‖h‖~v = 1, we conclude that

∆p(θ1, θ2) ≤ M∗(θ2 − θ1).

Example 3.2. If, for instance, µθ is the exponential distribution with rate θ (introducedin Example 2.5) then we have

∀θ ∈ Θ : ‖µθ‖v =

∫ ∞

0

n√

xθe−θxdx =1

n√

θΓ

(n + 1

n

),

where Γ denotes the usual Gamma function. Therefore, in accordance with Example 3.1,we obtain the following Lipschitz bound in (3.13):

M∗ =1e

+ 1nγ

(n+1

n, 1

)n√

θn+11

· θ2 − θ1

n√

θ2 − n√

θ1

·(

Γ(

n+1n

)n√

θ1θ2

)n−1

.

Page 80: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

68 3. Strong Bounds on Perturbations Based on Lipschitz Constants

3.2.2 Bounds on Perturbations for Markov Chains

Throughout this section we aim to derive bounds on perturbations for Markov chains. Forpractical reasons we consider homogenous Markov chains which are generated by transi-tion kernels. Eventually, we illustrate the results with an application to the sequence ofwaiting times in the G/G/1 queue. To this end, we briefly present the connection betweenMarkov chains and Markov operators and show how the concept of weak differentiationextends to transition kernels, providing the means of deriving Lipschitz bounds.

Markov Chains Generated By Markov Operators

Recall that a transition kernel on S is a mapping Q : S× S → R satisfying

(i) ∀A ∈ S, the mapping Q(·, A) is measurable,

(ii) ∀s ∈ S, Q(s, ·) ∈M(S).

If Q(s, ·) ∈M1, for all s ∈ S, we call Q a Markov operator . A transition kernel Q can beidentified with a linear operator (denoted also by Q) on the set of measurable mappingson S defined as

∀s ∈ S : (Qf)(s) =

∫f(x)Q(s, dx),

for all measurable f for which the right-hand side integral makes sense. Note that onecan recover the transition kernel Q from the operator Q, as follows:

∀s ∈ S, A ∈ S : Q(s, A) = (Q IA)(s).

For v ∈ C+ we introduce the v-norm of Q as follows:

‖Q‖v = sup‖f‖v≤1

‖Qf‖v = sup‖f‖v≤1

sups∈S

|(Qf)(s)|v(s)

, (3.14)

where the above supremum is taken with respect to f ∈ D. Note that, we have

∀f ∈ [F ]v : ‖Qf‖v ≤ ‖Q‖v · ‖f‖v. (3.15)

In particular, if ‖Q‖v < ∞, then Q maps [F ]v onto itself, i.e.,

f ∈ [F ]v ⇒ Qf ∈ [F ]v,

and note that, in general, such an implication does not hold true for C.

Remark 3.2. In general, determining the v-norm ‖Q‖v of a transition kernel Q is notan easy task since a similar method as in the case of measures is not appropriate. Forinstance, unlike in the measures case, there is no straightforward way to show that thesupremum in (3.14) is attained for a particular f . Consequently, the value ‖Q‖v maydepend on the choice of D.

However, if Q defines a positive operator (for instance, Q is a Markov operator), i.e.,

∀s ∈ S : Q(s, ·) ∈M+(S),

Page 81: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

3.2. Bounds on Perturbations 69

by monotonicity of the integral we obtain

‖Q‖v = sup‖f‖v≤1

∥∥∥∥∫

f(x)Q(·, dx)

∥∥∥∥v

=

∥∥∥∥∫

v(x)Q(·, dx)

∥∥∥∥v

= sups∈S

∫v(x)Q(s, dx)

v(s).

For general Q we can only show that the v-norm is bounded, i.e.,

‖Q‖v ≤ sups∈S

∫v(x)|Q(s, dx)|

v(s),

where we denote by |Q(s, ·)| the variation of the measure Q(s, ·).If Q1, Q2 are transition kernels on S we define the composition Q2Q1, as follows:

∀s ∈ S,∀A ∈ S : (Q2Q1)(s, A) =

∫Q2(x,A)Q1(s, dx).

It is immediate that Q2Q1 is itself a transition kernel on S, if Q1, Q2 are Markov operatorsso it is their composition Q2Q1 and the induced operator Q2Q1 is given by

(Q2Q1f)(s) =

∫f(y)(Q2Q1)(s, dy) =

∫ ∫f(y)Q2(x, dy)Q1(s, dx)

One can easily check that Q2Q1f = Q2(Q1f) and according to (3.15) we have

∀f ∈ F : ‖Q2Q1f‖v ≤ ‖Q2‖v · ‖Q1f‖v ≤ ‖Q2‖v‖Q1‖v‖f‖v.

Taking in the above inequality the supremum with respect to ‖f‖v ≤ 1 yields

‖Q2Q1‖v ≤ ‖Q2‖v · ‖Q1‖v.

Moreover, one can iterate the composition of kernels. By convention, for an arbitrarytransition kernel Q we define Q0 as the identity operator1 and for n ≥ 1 we define the“nth power” of Q as Qn := Qn . . . Q1, where Qi = Q, for each 1 ≤ i ≤ n. Then it holdsthat ‖Qn‖v ≤ ‖Q‖n

v , for each n ≥ 0.We say that the Markov chain Zn : n ≥ 0 is generated by the Markov operator Q if

for all n ≥ 0 and all measurable f it holds that

E [f(Zn+1) |Zn ] = (Qf)(Zn),

where the expression on the left-hand side denotes the conditional expectation of f(Zn+1)with respect to Zn. It turns out that, from a probabilistic point of view, the Markovchain Zn : n ≥ 0 is completely determined by the operator Q and the distribution ofZ0, which will be called the initial distribution and denoted by χ0. Indeed, one can showinductively that for all n ≥ 0 and measurable f we have

E [f(Zn) |Z0 ] = (Qnf)(Z0),

1 The identity operator corresponds to Dirac transition kernel 1(x,A) = δx(A). It follows that 1f = f ,for all measurable f and ‖1‖v = 1, for any v ∈ C+.

Page 82: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

70 3. Strong Bounds on Perturbations Based on Lipschitz Constants

which, by integration with respect to the initial distribution, yields

E [f(Zn)] = E [(Qnf)(Z0)] =

∫(Qnf)(s)χ0(ds). (3.16)

Therefore, if for n ≥ 0 we denote by χn the distribution of Zn, it follows that

∀n ≥ 0, A ∈ S : PZn ∈ A = χn(A) =

∫(QnIA)(s)χ0(ds). (3.17)

Example 3.3. Recall the G/G/1 queue described in Section 2.5.2. From Lindley’s recur-sion we conclude that, for all measurable f , it holds that

E [f(Wn+1)|Wn] = E [f (maxWn + Sn − Tn+1, 0) |Wn] ,

i.e., the Markov operator generating the sequence of waiting times satisfies

∀x ≥ 0, A ∈ S : Q(x,A) =

∫ ∫IA((x + s− t)+)η(dt)µ(ds),

or in functional operator form

∀x ≥ 0 : (Qf)(x) =

∫ ∫f((x + s− t)+)η(dt)µ(ds) = E [f ((x + S − T )+)] , (3.18)

where S and T are independent random variables distributed according to µ and η, respec-tively. Indeed, one can check using Lemma E.2 (see the Appendix) that

∀n ≥ 1 : E [f(Wn+1)|Wn] = (Qf)(Wn),

for each measurable f for which E [f(Wn+1)] exists.Furthermore, let v(x) = eαx, for some α ≥ 0. Then, we have

‖Q‖v = supx≥0

e−αx · E [eα(x+S−T )+

]= sup

x≥0E

[eα[(x+S−T )+−x]

];

see Remark 3.2. Since the mapping x 7→ α [(x + S − T )+ − x] is non-increasing on [0,∞),a simple application of the Dominated Convergence Theorem yields

‖Q‖v = E[e

αsupx≥0

[(x+S−T )+−x]]

= E[eα(S−T )+

],

provided that the right-hand side expectation above is finite.

Let Qθ : θ ∈ Θ be a family of Markov operators on S, for some Θ ⊂ R. We say thatQθ is weakly [D]v-differentiable if there exist a transition kernel Q′

θ, such that

∀s ∈ S, f ∈ [D]v :d

dθ(Qθf)(s) = (Q′

θf)(s).

Let Eθ be an expectation operator under which Zn : n ≥ 0 is a Markov chain generatedby the Markov operator Qθ, i.e., Zn : n ≥ 0 is a Markov chain satisfying Zn ∼ χn

θ , for

Page 83: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

3.2. Bounds on Perturbations 71

n ≥ 0, where χnθ is defined as in (3.17), for Q = Qθ. In addition, we assume that the

initial distribution χ0 is independent of θ, i.e., the expression

Eθ [f(Z0)] =

∫f(s)χ0(ds)

is constant2 with respect to θ for any measurable f .In the following we address the following problem: we aim to establish bounds for the

expression∆Z0

n,f (θ1, θ2) := |Eθ2 [f(Zn)]− Eθ1 [f(Zn)]| , (3.19)

for arbitrary θ1, θ2 ∈ Θ, such that θ1 < θ2, n ≥ 1 and f for which the right-handside is finite. The following result provides a bound for ∆Z0

n,f (θ1, θ2), assuming weakdifferentiability of Qθ.

Theorem 3.3. Let Qθ : θ ∈ Θ be a family of Markov operators on S, for some Θ ⊂ R,such that Qθ is weakly [D]v-differentiable for each θ ∈ Θ. Then, if Zn : n ≥ 0 is aMarkov chain generated by the operator Qθ, it holds that

∀f ∈ [D]v : ∆Z0n,f (θ1, θ2) ≤ CnL‖f‖v(θ2 − θ1)E [v(Z0)] , (3.20)

where3

Cn =n∑

k=1

‖Qθ2‖n−kv ‖Qθ1‖k−1

v , L = supθ∈[θ1,θ2]

‖Q′θ‖v.

Proof. Taking (3.16) into account we conclude that

∆Z0n,f (θ1, θ2) =

∣∣E [(Qn

θ2f)(Z0)

]− E [(Qn

θ1f)(Z0)

]∣∣ .

Consequently, the expression in (3.19) can be bounded as follows:

∆Z0n,f (θ1, θ2) ≤ E

[|(Qnθ2

f)(Z0)− (Qnθ1

f)(Z0)|]

≤ ‖Qnθ2−Qn

θ1‖v · ‖f‖v · E [v(Z0)] . (3.21)

Elementary algebraic calculations show that

Qnθ2−Qn

θ1=

n∑

k=1

Qn−kθ2

(Qθ2 −Qθ1)Qk−1θ1

.

Hence, using standard properties of operator norms, we arrive at

‖Qnθ2−Qn

θ1‖v ≤

n∑

k=1

‖Qθ2‖n−kv ‖Qθ2 −Qθ1‖v‖Qθ1‖k−1

v . (3.22)

Since Qθ is [D]v-differentiable on Θ, one can apply the Mean Value Theorem to themapping θ 7→ (Qθg)(x), which yields

∀x ∈ S, g ∈ [D]v : (Qθ2g)(x)− (Qθ1g)(x) = (θ2 − θ1) · (Q′θg)(x),

2 To illustrate this, we omit the subscript θ, writing E [f(Z0)] instead.3 It is not crucial to assume that L < ∞ since (3.20) is obviously satisfied by L = ∞.

Page 84: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

72 3. Strong Bounds on Perturbations Based on Lipschitz Constants

for some θ ∈ (θ1, θ2) depending on g and x. Thus, for ‖g‖v ≤ 1 we have

‖(Qθ2g)− (Qθ1g)‖v ≤ (θ2 − θ1)‖Q′θ‖v ≤ (θ2 − θ1) sup

θ∈[θ1,θ2]

‖Q′θ‖v.

Therefore, taking the supremum with respect to ‖g‖v ≤ 1 yields

‖Qθ2 −Qθ1‖v ≤ (θ2 − θ1) supθ∈[θ1,θ2]

‖Q′θ‖v,

which, together with (3.21) and (3.22), concludes the proof.

Application to the Transient Waiting Time

Let us consider the G/G/1 queue as introduced in Section 2.7 and let us assume thatthe service time distribution µθ depends on some design parameter θ ∈ Θ. Recall fromExample 3.3 that the corresponding sequence of waiting times is generated by the operatorQθ, defined as in (3.18), by letting µ = µθ. More specifically, we let S = [0,∞), considerµθ : θ ∈ Θ ⊂ M1(S) and denote by Qθ the Markov operator defined as

∀x ≥ 0, f ∈ F : (Qθf)(x) =

∫ ∫f ((x + s− t)+) η(dt)µθ(ds). (3.23)

for all θ ∈ Θ. Let θ1, θ2 ∈ Θ be such that θ1 < θ2. Using Theorem 3.3, we aim to establishbounds for the expression

∆xn,f (θ1, θ2) =

∣∣Exθ2

[f(Wn+1)]− Exθ1

[f(Wn+1)]∣∣ , (3.24)

for arbitrary n ≥ 1, x ≥ 0 and f ∈ [C]v, where Exθ denotes the expectation operator, when

W1 = x, and the service times follow distribution µθ. To do so, let us consider the familyof Markov operators Qθ : θ ∈ Θ introduced in (3.23) and let Zn := Wn+1, for eachn ≥ 1, i.e., χ0 = δx, in Theorem 3.3.

To apply Theorem 3.3 one has to investigate weak differentiability of Qθ. A formaldifferentiation of Qθ, in (3.23), with respect to θ yields

d

dθ(Qθf)(x) =

∫ ∫f ((x + s− t)+) η(dt)µ′θ(ds). (3.25)

It turns out that weak differentiability of Qθ is related to that of µθ. This relation willbe established by our next result. Specifically, we present a class of mappings v for whichCv-differentiability of µθ implies that of Qθ.

Lemma 3.1. Let (D, v) be a Banach base on S, µ∗ : Θ →M1v and let Qθ be defined as

in (3.23). If:

(i) µθ is weakly [D]v-differentiable,

(ii) for each x ≥ 0 the mapping v satisfies

sups≥0

∫v ((x + s− t)+) η(dt)

v(s)< ∞,

Page 85: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

3.2. Bounds on Perturbations 73

then Qθ is weakly [D]v-differentiable and (3.25) holds true, i.e.,

(Q′θf)(x) =

∫ ∫f ((x + s− t)+) η(dt)µ′θ(ds).

Proof. For s, x ≥ 0 and f ∈ F let

Hf (s, x) =

∫f ((x + s− t)+) η(dt).

From (i) we conclude that it suffices to show that Hf (·, x) ∈ [D]v, for all f ∈ [D]v andx ≥ 0. Indeed, differentiating (3.23) with respect to θ yields

∀x ≥ 0, f ∈ Cv :d

dθ(Qθf)(x) =

d

∫Hf (s, x)µθ(ds)

(i)=

∫Hf (s, x)µ′θ(ds),

which concludes (3.25).Condition (ii) essentially says that ‖Hf (·, x)‖v < ∞, for all x ≥ 0, provided that

‖f‖v < ∞. It follows that the implication

∀x ≥ 0 : f ∈ [D]v ⇒ Hf (·, x) ∈ [D]v (3.26)

holds true for D = F . In order to show that (3.26) holds true for D = C as well, onehas to check that for each x ≥ 0 it holds that Hf (·, x) is continuous provided that f iscontinuous. Indeed, let us assume that f is continuous, let s ≥ 0 be fixed and ε > 0.Since continuity of f implies uniform continuity on each compact set (see, e.g., [53]) itfollows that there exists some ζε > 0 such that for each s1, s2 ∈ [0, x + s + 1] it holds that

|s2 − s1| < ζε ⇒ |f(s2)− f(s1)| < ε.

Therefore, it follows that for any x ≥ 0 and |r| < min1, ζε we have

|Hf (s + r, x)−Hf (s, x)| =

∣∣∣∣∫

f ((x + s + r − t)+)− f ((x + s− t)+) η(dt)

∣∣∣∣

≤∫|f ((x + s + r − t)+)− f ((x + s− t)+)| η(dt)

≤∫

ε I[0,x+s+1](t)η(dt) = ε η([0, x + s + 1]),

where we used the fact that for t ≥ x + s + 1 and |r| < 1 we have

(x + s + r − t)+ = (x + s− t)+ = 0.

Since ε was arbitrary chosen it follows that Hf (·, x) is continuous which concludes theproof.

Let v(x) = eαx, for some α ≥ 0. Since for s, t, x ≥ 0 it holds that

eα(x+s−t)+ ≤ eα(x+s),

Page 86: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

74 3. Strong Bounds on Perturbations Based on Lipschitz Constants

we obtain

∀x ≥ 0 : sups≥0

∫eα(x+s−t)+η(dt)

eαs≤ eαx < ∞.

Provided that µθ is weakly Cv-differentiable, Lemma 3.1 applies and we conclude that Qθ

is weakly Cv-differentiable, as well. Moreover, provided that ‖f‖v ≤ 1, it holds that (seeRemark 3.2)

‖Q′θf‖v ≤ sup

x≥0

∫ ∫eα[(x+s−t)+−x]|µ′θ|(ds)η(dt)

=

∫ ∫eα(s−t)+ |µ′θ|(ds)η(dt)

Finally, taking the supremum with respect to ‖f‖v ≤ 1, yields

‖Q′θ‖v ≤ cθEθ

[eα(S+−T )+ + eα(S−−T )+

], (3.27)

where Eθ is an expectation operator consistent with (S±, T ) ∼ µ±θ × η, µ′θ = (cθ, µ+θ , µ−θ ).

Example 3.4. Let us consider a M/U/1 queue where service times have uniform distri-bution ψθ, on [0, θ) and inter-arrival times are exponentially distributed with rate λ, i.e.,the corresponding Markov operator Qθ is given by

∀x ≥ 0, f ∈ F : (Qθf)(x) =1

θ

∫ θ

0

∫ ∞

0

f ((x + s− t)+) λe−λt dt ds.

Then, according to Example 3.3, for v(x) = eαx, for some α ≥ 0, it holds that

∀α, λ, θ : ‖Qθ‖v = Eθ

[eα(S−T )+

]=

λ2[eαθ − 1] + α2[1− e−λθ]

αλ(α + λ)θ.

Similarly, according to (3.27) we conclude that

∀α, λ, θ : ‖Q′θ‖v ≤ λ2[(1 + αθ)eαθ − 1] + α2[1− (1 + λθ)e−λθ]

αλ(α + λ)θ

and a bound for the perturbation in (3.24) can be obtained according to (3.20).To illustrate the above findings we let α = λ and we obtain

∀λ, θ : ‖Qθ‖v =1

λθsinh(λθ), ‖Q′

θ‖v ≤ 1 + λθ

λθsinh(λθ),

where sinh denotes the hyperbolic sine function. Consequently, we have

Cn =n∑

k=1

(sinh(λθ2)

λθ2

)n−k (sinh(λθ1)

λθ1

)k−1

, L =1 + λθ2

λθ2

sinh(λθ2),

where we use the fact that the function x 7→ 1+xx

sinh(x) is non-decreasing. Substitutingthe above constants in (3.20) yields the following bound for the expression in (3.24):

∆xn,f (θ1, θ2)

(θ2 − θ1)≤ ‖f‖v(1 + λθ2)e

λx

n∑

k=1

(sinh(λθ2)

λθ2

)k (sinh(λθ1)

λθ1

)n−k

.

Page 87: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

3.3. Bounds on Perturbations for the Steady-State Waiting Time 75

3.3 Bounds on Perturbations for the Steady-State Waiting Time

Throughout this section we extend our results on the transient waiting times in theG/G/1 queue to stationary waiting times. More specifically, we show that the stationarydistribution in a G/G/1 queue governed by service time distribution µθ and inter-arrivaltimes distribution η is strongly Lipschitz continuous with respect to θ, provided that µθ

is weakly [C]v-differentiable for a certain class of mappings v ∈ C+(R).

3.3.1 Strong Stability of the Steady-State Waiting Time

A straightforward approach would be the one presented in Theorem 3.3, in Section 3.2.2,by letting n →∞ in (3.20), provided that the sequence of waiting times Wn : n ≥ 1 isweakly [C]v-convergent to the stationary waiting time W . Unfortunately, such an approachis to no avail since the constant Cn in (3.20) is unbounded with respect to n. This stemsfrom the fact that

∀α ≥ 0, θ ∈ Θ : ‖Qθ‖v = Eθ

[eα(S−T )+

] ≥ 1;

see Example 3.3. Therefore, a sharper approach is needed.A first observation is that for v(x) = eαx, with α > 0, the distribution of the (n + 1)st

waiting time is [C]v-differentiable, for all n ≥ 1, provided that µθ is [C]v-differentiable;see Theorem 2.7. Moreover, as shown by (2.52), the weak derivative can be expressed bysumming up differences between n pairs of parallel processes which, under the stabilitycondition, couple in finite time, almost surely; see Figure 2.2. Therefore, intuitively, anearly perturbation in the service time distribution counts less after n steps than a lateperturbation, provided that the process is stable. In other words, the “magnitude” ofthe perturbation will decrease with respect to n and eventually will vanish as n tends toinfinity. This is formalized in the following result.

Lemma 3.2. Let v(x) = eαx, for some α ≥ 0, such that µθ is [C]v-differentiable. Ifµ′θ = (cθ, µ

+θ , µ−θ ) then for all f ∈ [C]v and 1 ≤ k ≤ n it holds that

∣∣Eθ

[f(W k+

n+1)− f(W k−n+1)

]∣∣ ≤ 2‖f‖vEθ

[v(W k∗

n+1) · IW k∗k+1>0,...,W k∗

n+1>0], (3.28)

where W k±i are defined in (2.53) and W k∗

i : i ≥ 1 denotes the sequence of waitingtimes in a modified queue, where the kth service time Sk is replaced by maxS+

k , S−k ; seeSection 2.5.2.

Proof. First, note that the perturbation f(W k+n+1)−f(W k−

n+1) can only survive until the se-quence W k∗

i : i ≥ k+1 reaches 0. Indeed, it is immediate that W k∗i = maxW k+

i ,W k−i ,

for all i ≥ k +1. Consequently, if W k∗i = 0 then W k+

j = W k−j , for all j ≥ i; see Figure 2.2.

In formula:

f(W k+n+1)− f(W k−

n+1) 6= 0 ⊂n+1⋂

i=k+1

W k∗i > 0. (3.29)

Furthermore, the fact that v is non-decreasing implies that

|f(W k+n+1)− f(W k−

n+1)| ≤ |f(W k+n+1)|+ |f(W k−

n+1)| ≤ 2‖f‖v · v(W k∗n+1),

which, together with (3.29), proves the claim.

Page 88: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

76 3. Strong Bounds on Perturbations Based on Lipschitz Constants

Lemma 3.2 establishes a bound on the effect of the perturbation of the kth servicetime distribution, at time n. In the following, we show that the right-hand side in (3.28)is bounded by a geometric sequence. To this end, we consider the following operators

∀x ≥ 0, f ∈ F : (Pθf)(x) := Eθ

[f(x + S − T ) · Ix+S>T

],

(P ∗θ f)(x) := Eθ

[f(x + S∗ − T ) · Ix+S∗>T

].

where S± are S-measurable samples (see Remark 2.6) of µ±θ , respectively, and we define4

S∗ := maxS+, S−. Note that Pθ is different from Qθ in (3.23). Indeed, while Qθ in(3.23) denotes the transition kernel generating the sequence of waiting times, Pθ denotesthe corresponding taboo kernel , i.e., a transition kernel which avoids a certain subsetof the state-space (in this case the subset is 0). The following result will provide abound for the effect of the perturbation of the kth service time distribution in terms of‖Pθ‖v, ‖P ∗

θ ‖v, ‖f‖v and Eθ[v(Wk)].

Lemma 3.3. Under the conditions put forward in Lemma 3.2 it holds that∣∣Eθ

[f(W k+

n+1)− f(W k−n+1)

]∣∣ ≤ 2‖f‖v‖P ∗θ ‖v‖Pθ‖n−k

v Eθ [v(Wk)] .

Proof. For the ease of notation, for i ≥ k + 1, we set

Yi := v(W k∗i ) · IW k∗

k+1>0,...,W k∗i >0 = v(W k∗

i ) · IW k∗k+1>0 · . . . · IW k∗

i >0.

Using basic properties of conditional expectations (see Section D in the Appendix)one can show that

[Yi+1

∣∣W k∗i

]= Eθ

[v(W k∗

i+1) · IW k∗i+1>0

∣∣∣W k∗i

]· IW k∗

k+1>0,...,W k∗i >0

= (Pθv)(W k∗i ) · IW k∗

k+1>0,...,W k∗i >0

≤ ‖Pθ‖v · v(W k∗i ) · IW k∗

k+1>0,...,W k∗i >0 = ‖Pθ‖vYi.

Consequently, for n ≥ k we have

[Yn+1

∣∣Wk

]= Eθ

[Eθ

[Yn+1

∣∣W k∗n

] ∣∣Wk

] ≤ ‖Pθ‖vEθ

[Yn

∣∣Wk

]

and it follows by finite induction that

[Yn+1

∣∣Wk

] ≤ ‖Pθ‖n−kv Eθ

[Yk+1

∣∣Wk

]. (3.30)

Furthermore, we have

[Yk+1

∣∣Wk

]= (P ∗

θ v)(Wk) ≤ ‖P ∗θ ‖v · v(Wk). (3.31)

From (3.30) together with (3.31) one concludes that

[v(W k∗

n+1) · IW k∗k+1>0,...,W k∗

n+1>0∣∣∣Wk

]≤ ‖P ∗

θ ‖v‖Pθ‖n−kv v(Wk).

Taking now the expectation in the above inequality, yields

[v(W k∗

n+1) · IW k∗k+1>0,...,W k∗

n+1>0]≤ ‖P ∗

θ ‖v‖Pθ‖n−kv Eθ [v(Wk)] . (3.32)

Therefore, Lemma 3.2 concludes the proof.

4 Note that, the distribution of S∗ depends on the joint distribution of the pair (S+, S−). While thisis not directly relevant here, in some numerical applications this fact should not be overlooked.

Page 89: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

3.3. Bounds on Perturbations for the Steady-State Waiting Time 77

Remark 3.3. Recall that the G/G/1 queue is stable if

Eθ[S − T ] =

∫s µθ(ds)−

∫t η(dt) < 0. (3.33)

If the queue is stable, it is known that the sequence Wn : n ≥ 1 converges in distributionto the steady-state waiting time W . Moreover, if we denote by W the maximal waitingtime in the queue, i.e.,

W = supWn : n ≥ 1,then W is almost surely finite and has the same distribution as W . For more details onstability of queues see, e.g., [43]. In the following, we denote by Θs ⊂ Θ the stabilitysubset of Θ, i.e.,

Θs := θ ∈ Θ : Eθ[S − T ] < 0.Let vα(x) = eαx, for some α ≥ 0. Then we have

‖Pθ‖vα = supx≥0

[eα(S−T ) · Ix+S>T

]

and by the Dominated Convergence Theorem we obtain (see Example 3.3)

‖Pθ‖vα = Eθ

[eα(S−T )

],

provided that the expectation in the right-hand side is finite.

Remark 3.4. In general, requiring that ‖Pθ‖vα is finite is a quite restrictive conditionsince, in particular, this requires that all moments of S exist. A sufficient condition for‖Pθ‖vα < ∞ is that µθ has sub-exponential tail, for each θ, i.e.

(C) : ∃γ, β, M > 0 : PθS > x ≤ γe−βx,∀x ≥ M.

Indeed, let us assume that condition (C) holds true for some θ ∈ Θ. Since eαS is a strictlypositive random variable it holds that

[eαS

]=

∫ ∞

1

eαS > x

dx ≤ eαM + γ

∫ ∞

eαM

e−βα

ln xdx.

Hence, we conclude that

∀α < β : ‖Pθ‖vα = Eθ

[eα(S−T )

] ≤ Eθ

[eαS

] ≤ eαM

(1 +

γαe−βM

β − α

)< ∞.

The key observation is that, under the stability condition, given by (3.33), we have‖Pθ‖vα < 1, for some α > 0, which means that the bound in (3.28) decreases at a geometricrate. More specifically, given θ1, θ2 ∈ Θs, such that θ1 < θ2, there exist sufficiently smallα > 0 such that ‖Pθ‖vα < 1, uniformly in θ ∈ [θ1, θ2]. The precise statement is as follows.

Lemma 3.4. For arbitrary α ≥ 0 let vα(x) = eαx, for all x ≥ 0, and θ1, θ2 ∈ Θs be suchthat θ1 < θ2. If µθ is weakly [C]vα∗ -continuous on [θ1, θ2], for some α∗ > 0, then thereexists α > 0 such that for each α ∈ (0, α) it holds that

supθ∈[θ1,θ2]

‖Pθ‖vα < 1. (3.34)

Page 90: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

78 3. Strong Bounds on Perturbations Based on Lipschitz Constants

Proof. Let F : [0,∞)× [θ1, θ2] → R ∪ ∞ be defined as

∀α, θ : F (α, θ) = ‖Pθ‖vα = Eθ

[eα(S−T )

]. (3.35)

Note that, by hypothesis we have that F (α, θ) < ∞ for all α ∈ [0, α∗] and θ ∈ [θ1, θ2].Moreover, for α ∈ [0, α∗) and θ ∈ [θ1, θ2] we have

∀n ≥ 0 : supy∈R

|y|ne(α−α∗)y =

[1

(α∗ − α)e

]n

.

Therefore, it follows that5

∀n ≥ 0, y ∈ R : |y|neαy ≤[

1

(α∗ − α)e

]n

eα∗y (3.36)

and by letting y = S − T in (3.36) and taking expected values, we arrive at

[|S − T |neα(S−T )] ≤

[1

(α∗ − α)e

]n

[eα∗(S−T )

]< ∞.

On the other hand, for θ ∈ [θ1, θ2] we have F (0, θ) = 1, limα↑∞

F (α, θ) = ∞ and

limα↓0

d

dαF (α, θ) = lim

α↓0Eθ

[(S − T )eα(S−T )

]= Eθ[S − T ] < 0.

Moreover, the second derivative with respect to α satisfies

∀α ∈ (0, α∗), θ ∈ [θ1, θ2] :d2

dα2F (α, θ) = Eθ

[(S − T )2eα(S−T )

]> 0.

Hence, we conclude that F is strictly convex in α and consequently for each θ ∈ [θ1, θ2]there exist an unique α > 0 satisfying F (α, θ) = 1. If we denote this value by αθ then

∀α ∈ (0, αθ) : F (α, θ) < 1.

Continuity6 of F in both α and θ implies continuity of the implicit function θ 7→ αθ; see,e.g., [38]. Therefore, we have infαθ : θ ∈ [θ1, θ2] > 0. Letting

α = minα∗, infαθ : θ ∈ [θ1, θ2]

concludes the proof.

Now we are able to state and prove the main result of this section. The precisestatement is as follows.

5 Note that, if v(y) = eα∗y and g(y) = |y|neαy, for y ∈ R, then g ∈ [C]v and ‖g‖v = [(α∗ − α)e]−n.Consequently, the inequality in (3.36) reads |g(y)| ≤ ‖g‖v · v(y).

6 Note that, if α < α∗ then [C]vα∗ -continuity implies [C]vα -continuity; see Remark 1.1.

Page 91: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

3.3. Bounds on Perturbations for the Steady-State Waiting Time 79

Theorem 3.4. Let vα(x) = eαx, for α ≥ 0, and θ1, θ2 ∈ Θ be such that θ1 < θ2. If µθ is[C]vα∗ -continuous on Θ, for some α∗ > 0 then for each α ∈ (0, α), i.e., α satisfies (3.34)(see Lemma 3.4), we have:

(i) For each θ ∈ [θ1, θ2] the distribution of Wn is [C]vα-convergent to its stationarydistribution, i.e.,

∀f ∈ [C]vα : limn→∞

Eθ[f(Wn)] = Eθ[f(W )].

(ii) If, in addition, µθ is weakly [C]vα-differentiable on Θ, for some α ∈ (0, α), then thestationary distribution of the sequence Wn : n ≥ 1 is strongly vα-norm Lipschitzcontinuous, i.e., there exist Kα(θ1, θ2) > 0 such that

∀f ∈ [C]vα :|Eθ2 [f(W )]− Eθ1 [f(W )]|

θ2 − θ1

≤ ‖f‖vαKα(θ1, θ2). (3.37)

Moreover, the constant Kα(θ1, θ2) can be chosen as

Kα(θ1, θ2) = 2 supθ∈[θ1,θ2]

cθ‖P ∗θ ‖vα

(1− ‖Pθ‖vα)2< ∞. (3.38)

Proof. First, we show that for α ∈ (0, α) (see Lemma 3.4) and θ ∈ [θ1, θ2] it holds that

supn≥0

Eθ [vα(Wn+1)] ≤ 1

1− ‖Pθ‖vα

. (3.39)

Indeed, from Lindley’s recursion we have

∀n ≥ 1 : Eθ

[eαWn+1

]= PθWn+1 = 0+ Eθ

[eαWn+1 · IWn+1>0

]

≤ 1 + Eθ

[eα(Wn+S−T )

]

≤ 1 + ‖Pθ‖vαEθ

[eαWn

]

and from finite induction it follows that

∀n ≥ 0 : Eθ

[eαWn+1

] ≤n∑

k=0

‖Pθ‖kvα≤ 1

1− ‖Pθ‖vα

.

Taking the supremum with respect to n ≥ 0 concludes the proof of (3.39).As explained in Remark 3.3, the distribution of Wn is CB-convergent to the stationary

distribution. Then, according to Theorem 1.1 (i), [C]vα-convergence follows from theuniform integrability of the sequence vα(Wn) : n ≥ 1. A sufficient condition, accordingto Lemma 1.1, is the existence of a function ϑ satisfying

supn≥1

Eθ[ϑ(vα(Wn))] < ∞, limx→∞

ϑ(x)

x= ∞.

Recall that vα(x) = eαx, for some α ∈ (0, α). Choosing some ε ∈ (0, α−α) it follows from(3.39) that the function ϑ defined as

∀x ≥ 0 : ϑ(x) = xα+ε

α ,

Page 92: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

80 3. Strong Bounds on Perturbations Based on Lipschitz Constants

i.e., ϑ(vα(Wn)) = e(α+ε)Wn , satisfies the desired conditions, which concludes part (i) ofthe theorem.

Let α ∈ (0, α). For n ≥ 0 and f ∈ [C]vα the Mean Value Theorem yields

Eθ2 [f(Wn+1)]− Eθ1 [f(Wn+1)]

θ2 − θ1

= cθ

n∑

k=1

[f(W k+

n+1)− f(W k−n+1)

],

for some θ ∈ [θ1, θ2], depending on f and n and Lemma 3.3 implies that

|Eθ2 [f(Wn+1)]− Eθ1 [f(Wn+1)]|θ2 − θ1

≤ 2cθ‖f‖vα‖P ∗θ ‖vα

n∑

k=1

‖Pθ‖n−kvαEθ [vα(Wk)] .

Therefore, taking (3.39) into account we conclude that

|Eθ2 [f(Wn+1)]− Eθ1 [f(Wn+1)]|θ2 − θ1

≤ 2cθ‖f‖vα‖P ∗θ ‖vα

1− ‖Pθ‖vα

n∑

k=1

‖Pθ‖n−kvα

≤ 2‖f‖vα

cθ‖P ∗θ ‖vα

(1− ‖Pθ‖vα)2. (3.40)

and taking in (3.40) the supremum with respect to θ ∈ [θ1, θ2] yields

|Eθ2 [f(Wn+1)]− Eθ1 [f(Wn+1)]|θ2 − θ1

≤ 2‖f‖vα supθ∈[θ1,θ2]

cθ‖P ∗θ ‖vα

(1− ‖Pθ‖vα)2. (3.41)

Letting now n →∞ in (3.41) and taking (i) into account concludes (ii), i.e., (3.37) holdstrue for Kα(θ1, θ2) given by (3.38).

The following result is a direct consequence of Theorem 3.4.

Corollary 3.2. The stationary distribution of the waiting times in a G/G/1 queue withparameter-dependent service time distribution µθ is locally Lipschitz continuous on thestability set Θs, provided that the service time distribution is weakly [C]vα-differentiable onΘs, for some α > 0.

Proof. Let us denote by σθ, for θ ∈ Θs, the stationary distribution of the Markov chainWn : n ≥ 1 with respect to the expectation operator Eθ; see Remark 3.3. Since Θs isan open set, for arbitrary θ ∈ Θs we choose θ1, θ2 ∈ Θs such that θ1 < θ < θ2 and applyTheorem 3.4. By taking in (3.37) the supremum with respect to ‖f‖vα ≤ 1 we obtain

‖σθ2 − σθ1‖vα ≤ (θ2 − θ1)Kα(θ1, θ2),

with Kα(θ1, θ2) given by (3.38), which concludes the proof.

Page 93: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

3.3. Bounds on Perturbations for the Steady-State Waiting Time 81

3.3.2 Comments and Bound Improvements

This section is intended to illustrate how the results in Section 3.3.1, in particular The-orem 3.4, can be used in practice and what issues have to be taken into account whendoing so. In particular, we show how the bounds obtained in Section 3.3.1 can be slightlyimproved, in order to derive more accurate bounds.

We start by noting that in Theorem 3.4 vα = eα·, for some α > 0 satisfying (3.34); seeLemma 3.4. Examining the proof of Lemma 3.4 it turns out that α has to be small enoughto satisfy (3.34). In principle, the largest value α will decrease as θ2 − θ1 increases. Inwords, the larger the perturbation of the parameter, the smaller α will be and apparentlythe less performance measures will be in [C]vα . But in fact, this is not the real issue since,by construction, we have α > 0 which means that usual performance measures, such asbounded and continuous mappings and moments belong to [C]vα , for α satisfying (3.34).There is, however, a trade-off between decreasing α and the quality of bounds in (3.37)which is related to the vα-norm of f . More specifically, the vα-norm of a typical functionwill increase as α decreases; for instance, if f is the identity mapping note that

‖f‖vα =1

αe.

Therefore, we conclude that, while in principle Theorem 3.4 applies to any typicalcontinuous mapping f , the quality of the bound depends on the vα-norm of f which, incertain situations, can be prohibitively large. Nevertheless, Theorem 3.4 is still a worthytheoretical result where, depending on the situation, the bounds can be improved by usingparticular properties of the performance measure under consideration.

In addition, note that αθ in the proof of Theorem 3.4 is defined as an implicit functionin θ which, in practice, makes it quite difficult to calculate exactly. It is, however, worthnoting that if µθ : θ ∈ Θ is a stochastically monotone family, say increasing, then thingsbecome simpler. Indeed, the function F defined by (3.35) is non-decreasing in θ and asimple analysis shows that αθ is non-increasing with respect to θ, which yields

α = minα∗, αθ2.Moreover, if µ′θ = (cθ, µ

+θ , µ−θ ) then µ+

θ is stochastically larger than µ−θ and one can chooseS± ∼ µ±θ such that S+ ≥ S−, a.s.

‖P ∗θ ‖vα = Eθ

[eα(S+−T )

].

Recall that vα(x) = eαx, for some α ≥ 0, and the right-hand side in (3.37) dependson α through vα. By Remark 1.1 it follows that [C]vα-differentiability, for some α > 0,implies [C]vβ

-differentiability for any β ∈ (0, α). Therefore, for fixed f , one can obtain amore accurate Lipschitz bound in (3.41) by minimizing the right-hand side in (3.37) withrespect to β ∈ (0, α), i.e., to replace Kα(θ1, θ2) in (3.38) by

Lα(θ1, θ2) = 2 infβ∈(0,α)

(‖f‖vβ

supθ∈[θ1,θ2]

cθ‖P ∗θ ‖vβ

(1− ‖Pθ‖vβ)2

).

We conclude this section with two examples which illustrate the above facts.

Page 94: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

82 3. Strong Bounds on Perturbations Based on Lipschitz Constants

Example 3.5. We revisit the M/U/1 queue treated in Example 3.4. Standard computa-tion shows that

∀λ, θ, α : cθ =1

θ, ‖Pθ‖v =

λ(eαθ − 1)

αθ(α + λ), ‖P ∗

θ ‖v =λeαθ

α + λ,

which leads to the following Lipschitz bound in (3.37):

Kα(θ1, θ2) = 2 supθ∈[θ1,θ2]

α2λθ(α + λ)eαθ

[αθ(λ + α)− λ(eαθ − 1)]2.

Moreover, for fixed λ > 0, the stability set is given by Θs = [0, 2λ−1) and αθ is the uniquesolution α > 0 of the equation

λ(eαθ − 1) = αθ(α + λ).

Since µθ is stochastically increasing, in this case, it holds that α = αθ2. If, for instance,λ = 1 and θ2 = 1 then α = α1 ≈ 1.7934. If θ2 = 1.8, i.e., a high traffic rate we haveα ≈ 0.9984 whereas for θ2 = 0.1, i.e., a small traffic rate it turns out that α ≈ 1.8768.

For f(x) = x, we have ‖f‖v = (αe)−1 and we obtain

L =2

e· inf

α∈(0,α)sup

θ∈[θ1,θ2]

αλθ(α + λ)eαθ

[αθ(λ + α)− λ(eαθ − 1)]2.

Things become somewhat easier when considering the M/M/1 case, as the followingexample shows.

Example 3.6. Let us replace in Example 3.4 µθ by the exponential distribution with rateθ. Then

∀λ, θ, α < θ : cθ =1

θe, ‖Pθ‖vα =

λθ

(α + λ)(θ − α), ‖P ∗

θ ‖vα =

θ − α

)2

eαθ ,

which leads to the following Lipschitz bound in (3.37):

Kα(θ1, θ2) = 2 supθ∈[θ1,θ2]

θ

((α + λ)

α(θ − λ− α)

)2

e−θ−α

θ .

In this situation, the stability set is given by Θs = (λ,∞) and αθ can be found in explicitform as the unique positive solution of the equation

λθ = (α + λ)(θ − α).

It turns out that αθ = θ − λ and since µθ is stochastically decreasing, in this case, weconclude that α = αθ1 = θ1 − λ.

Page 95: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

3.4. Concluding Remarks 83

3.4 Concluding Remarks

This chapter presents an important class of applications of weak differentiation theory.Starting from the observation that the gradient provides relevant information about thelocal variation of some function we perform a sensitivity analysis for some common math-ematical models, among which Markov chains are maybe the most important ones. Inthis setting we derive bounds on perturbations for transient performance measures in Sec-tion 3.2.2 and, moreover, under stability conditions we extend our analysis to steady-stateperformances in Section 3.3.

Sensitivity analysis, based on weak differentiation, has been investigated in [33] andthe theory of weak differentiation was applied for studying stability of stationary Markovchains in [27]. In addition, the stability of steady-state performances of a Markov chainhas been investigated in [35]. Here, we present a general (unified) approach which ap-plies to virtually any stochastic system defined by a finite family of independent randomvariables and for a large class of performance measures. Unfortunately, while bounds onperturbations can be easily established by using representations of weak derivatives, themain pitfall of this method is the poor accuracy of the bounds which stems from thefact that the bound should apply to a highly diversified class of performance measures.Therefore, improving the bounds is conditioned on restricting their range of applicabilityand it is subject to future research.

Another possible direction of research is to establish results regarding weak differ-entiability of the stationary distribution of stable stochastic processes, in both discreteand continuous-time, provided that the theory of weak differentiation can be extendedto the later ones. For instance, an interesting application would be to investigate weakdifferentiability of the stationary distribution of one-dimensional diffusions with reflectingbarrier(s) with respect to the barrier(s) level.

Eventually, it is worth noting that the methods presented in this chapter can be appliedto study sensitivity of non-parametric models. That is, to study the influence of replacingan input distribution, say µ, by another one, say η. This can be achieved by consideringthe parametric family of mixed distributions µθ : θ ∈ Θ defined as follows:

∀θ ∈ [0, 1] : µθ := (1− θ) · µ + θ · η. (3.42)

Obviously, µ0 = µ, µ1 = η and the parameter θ can be seen as a measure of the deviationfrom the initial distribution µ. It readily follows that the distribution µθ given by (3.42)is [F ]v-differentiable, for any v ∈ C+ ∩ L1(µ, η) and its weak derivative satisfies

∀θ ∈ [0, 1] : µ′θ = η − µ.

If, for instance, µ is an exponential distribution and η is a non-exponential distributionhaving the same mean, i.e., ∫

s η(ds) =

∫s µ(ds),

then one can use Theorem 3.4 to evaluate the steady-state effect of deviations from theM/G/1 regime in a stable queue.

Page 96: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES
Page 97: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

4. MEASURE-VALUED DIFFERENTIAL CALCULUS

Throughout this chapter we aim to further extend the theory of differentiation for productmeasures, in order to develop a weak differential calculus, i.e., higher-order differentiationformulas and results on analyticity (read: Taylor series expansions). A first step intothis direction has already been made in Section 2.3 where Theorem 2.3 and its extensionto finite products (in Theorem 2.4) establish rules of differentiation for the first-orderderivative of a product measure. In this chapter we extend these rules to higher-orderdifferentiation.

4.1 Introduction

Starting point of our analysis will be the resemblance of Theorem 2.3 with classical analysis(differentiation formula for products of functions). Based on this, it is reasonable toexpect that a “Leibnitz-Newton” rule for higher-order derivatives of the product of twomeasures would hold true, as well. Such a result will be established and then extended tofinite products of measures, in Section 4.2.1. Like in conventional analysis, the extendedLeibnitz-Newton rule, established by Theorem 4.2, will serve as a basis for measure-valued differential calculus. In addition to that, it provides the theoretical backgroundfor a formal differential calculus for a particular class of random objects, to be introducedin Chapter 5.

The similarities between classical and measure-valued differential calculus extend fur-ther to analyticity, which is a crucial condition for performing Taylor series expansions.This lead us to introduce and study the concept of weak analyticity in Section 4.2.2. Itwill turn out that, just like in conventional analysis, products of weakly analytic measuresare again weakly analytic. This result will be very important in applications as it pro-vides Taylor series approximations for the performance measures of parameter-dependentstochastic models with weakly analytic input distributions.

Although weak analyticity means actually point-wise with respect to g in [D]v, for someBanach base (D, v), it will turn out that some stronger results hold true. More specifically,we will show that the Taylor series attached to some weakly analytic probability measureconverges strongly on some domain, i.e., “weak analyticity implies strong analyticity.”This fact leads to the concept of [D]v-radius of convergence.

The chapter is organized as follows: In Section 4.2 we extend Theorem 2.4 to higher-order differentiation and we introduce and study the concept of weak analyticity whilein Section 4.3 we illustrate the concept of weak analyticity by evaluating the completiontime in a stochastic activity network.

Page 98: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

86 4. Measure-Valued Differential Calculus

4.2 Leibnitz-Newton Rule and Weak Analyticity

In this section we continue the analysis of product measures and show that, like in conven-tional analysis, properties such as higher-order differentiation and analyticity are inheritedby products of measures. In Section 4.2.1 we present a generalized Leibnitz-Newton rulefor weak derivatives and in Section 4.2.2 we will deal with analyticity issues.

4.2.1 Leibnitz-Newton Rule and Extensions

Inspired by Theorem 2.3, we proceed to establish the Leibnitz-Newton product rule whichextends Theorem 2.3 to higher-order derivatives. The precise statement is the following.

Theorem 4.1. Let (D(S), v) and (D(T), u) be Banach bases on S and T, respectively. Ifµθ is n-times [D(S)]v-differentiable and if ηθ is n-times [D(T)]u-differentiable, then theproduct measure µθ × ηθ ∈ M(σ(S × T )) is n-times [D(S)⊗D(T)]v⊗u-differentiable andit holds that

(µθ × ηθ)(n) =

n∑j=0

(n

j

) (µ

(j)θ × η

(n−j)θ

).

Proof. We proceed by induction over n ≥ 1. For n = 1 the assertion reduces to Theo-rem 2.3. Assume now that the conclusion holds true for n ≥ 1. Then,

(µθ × ηθ)(n+1) =

(n∑

j=0

(n

j

) (µ

(j)θ × η

(n−j)θ

))′

=n∑

j=0

(n

j

) (µ

(j)θ × η

(n−j)θ

)′.

Applying Theorem 2.3 to the derivatives in the right-hand side, the proof follows frombasic algebraic calculations, just like in conventional analysis, by taking into account thatweak derivatives satisfy (see Remark 2.3)

∀j ≥ 0 :(µ

(j)θ

)′= µ

(j+1)θ .

The next result is a generalization of Theorem 4.1 and introduces the general formulaof the weak differential calculus. Recall the definitions of Πθ and ~v given in Section 2.3by (2.27) and (2.28), respectively!

Theorem 4.2. For 1 ≤ i ≤ k, let (D(Si), vi) be Banach bases on Si such that µi,θ isn-times [D(Si)]vi

-differentiable. Then, Πθ is n-times [D(S1)⊗ . . .⊗D(Sk)]~v-differentiableand it holds that

Π(n)θ =

∈J (k,n)

(n

1, . . . , k

)· (µ1,θ)

(1) × . . .× (µk,θ)(k), (4.1)

where, for k, n ≥ 1, we set

J (k, n) := = (1, . . . , k) : 0 ≤ i ≤ n, 1 + . . . + k = n.

Page 99: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

4.2. Leibnitz-Newton Rule and Weak Analyticity 87

Proof. The proof follows from Theorem 4.1, via finite induction over k.

Theorem 4.2 can be seen as the generalized Leibniz-Newton rule for measure-valueddifferentiation. It provides an expression for the higher-order derivatives of finite productmeasures, provided that they exist. However, obtaining an instance of the weak derivativeof such a product, i.e., a “triplet representation” is not straightforward since we deal witha sum of product signed measures and obtaining the Hahn-Jordan decomposition of Π

(n)θ

in (4.1) is quite demanding even in simple cases. Such a triplet representation would beuseful in applications, as explained in Section 2.2.2 and in what follows we aim to establishsuch a result.

An instance of the weak derivative Π(n)θ in (4.1) can be obtained by inserting the

appropriate weak derivatives for the measures µ(i)i,θ and rearranging terms in (4.1). In order

to present the result we introduce the following notations. For = (1, . . . k) ∈ J (k, n)we denote by ν() the number of non-zero elements of the vector and by I() the set ofvectors ı ∈ −1, 0, +1k such that ıl 6= 0 if and only if l 6= 0 and such that the productof all non-zero elements of ı equals one, i.e., there is an even number of −1. For ı ∈ I(),we denote by ı the vector obtained from ı by changing the sign of the non-zero elementat the highest position.

Corollary 4.1. Under the conditions put forward in Theorem 4.2, let µi,θ have mth-order[D]vi

-derivative

µ(m)i,θ =

(c(m)i,θ , µ

(m,+)i,θ , µ

(m,−)i,θ

),

for 0 ≤ m ≤ n, with c(0)i,θ = 1 and µ

(0,0)i,θ = µi,θ. For n ≥ 1, an instance

(C

(n)θ , Π

(n,+)θ , Π

(n,−)θ

)

of Π(n)θ is given by

C(n)θ =

∈J (k,n)

2ν()−1

(n

1, . . . , k

) k∏i=1

c(i)i,θ , (4.2)

Π(n,+)θ =

∈J (k,n)

(n

1, . . . , k

)∏ki=1 c

(i)i,θ

C(n)θ

·∑

ı∈I()

µ(1,ı1)1,θ × · · · × µ

(k,ık)k,θ ,

Π(n,−)θ =

∈J (k,n)

(n

1, . . . , k

)∏ki=1 c

(i)i,θ

C(n)θ

·∑

ı∈I()

µ(1 ,ı1)1,θ × · · · × µ

(k ,ık)k,θ ,

where, for convenience, we identify

∀1 ≤ i ≤ k : µ(i,+1)i,θ = µ

(i,+)i,θ , µ

(i,−1)i,θ = µ

(i,−)i,θ , µ

(0,0)i,θ = µi,θ.

For practical purposes, the above result becomes more useful when formulated in termsof random variables. The precise statement is as follows.

Page 100: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

88 4. Measure-Valued Differential Calculus

Corollary 4.2. Random Variable Version of Theorem 4.2: Under the conditionsput forward in Corollary 4.1, if Xi are random variables having distributions µi,θ, for1 ≤ i ≤ n, respectively, then for each g ∈ [D(S1)⊗ . . .⊗D(Sk)]~v we have

dn

dθnPg(θ) =

∈J (k,n)

Cθ()∑

ı∈I()

[g

(X

(1,ı1)1 , . . . , X

(k,ık)k

)− g

(X

(1 ,ı1)1 , . . . , X

(k ,ık)k

)],

where Pg(θ) = Eθ [g(X1, . . . , Xk)], Eθ is an expectation operator consistent with

∀ ∈ J (k, n), ı ∈ −1, 0, 1k :(X

(1,ı1)1 , . . . , X

(k,ık)k

)∼ µ

(1,ı1)1,θ × · · · × µ

(k,ık)k,θ

and for ∈ J (k, n) we set

Cθ() :=

(n

1, . . . , k

) k∏i=1

c(i)i,θ .

4.2.2 Weak Analyticity

In this section we introduce the concept of weak [D]v-analyticity for probability measuresand we provide results regarding the radius of convergence of the Taylor series and weakanalyticity of product measures.

Definition 4.1. Let (D, v) be a Banach base on S. We call the measure-valued mappingµ∗ : Θ →Mv weakly [D]v-analytic at θ, or weakly [D]v-analytic for short, if

• all higher-order [D]v-derivatives of µθ exist,

• exists a neighborhood V of θ such that for all ξ, satisfying θ + ξ ∈ V , it holds that

∀g ∈ [D]v :

∫g(s)µθ+ξ(ds) =

∞∑n=0

ξn

n!·∫

g(s)µ(n)θ (ds). (4.3)

The expression Tn(µ, θ, ξ) defined as

∀n ≥ 0, ξ ∈ R : Tn(µ, θ, ξ) :=n∑

k=0

ξk

k!· µ(k)

θ (4.4)

will be called the nth-order Taylor polynomial of µ∗ in θ. In addition, for fixed g ∈ [D]v,the maximal set Dθ(g, µ) for which the equality in (4.3) holds true is called the domain ofconvergence of the Taylor series.

Remark 4.1. Note that the nth-order Taylor polynomial Tn(µ, θ, ξ) defined by (4.4) is,in fact, an element in Mv and defines a linear functional on [D]v. Therefore, (4.3) isequivalent to

∀ξ; θ + ξ ∈ V : Tn(µ, θ, ξ)[D]v=⇒ µθ+ξ. (4.5)

Moreover, since all higher-order derivatives of µθ exist it follows by Theorem 2.1 that foreach n ≥ 1 µ

(n)θ is strongly continuous and by Theorem 2.2 (i) we conclude that µ

(n−1)θ

is strongly differentiable. In particular, it follows that if µθ is weakly analytic then it isstrongly differentiable of any order n ≥ 1.

Page 101: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

4.2. Leibnitz-Newton Rule and Weak Analyticity 89

Note that the domain of convergence Dθ(g, µ) of the series in (4.3) depends on g. Ournext result provides a set Dv

θ(µ) ⊂ Θ where the Taylor series in (4.3) converges for allg ∈ [D]v. The precise statement is as follows.

Theorem 4.3. Let (D, v) be a Banach base on S such that µθ is [D]v-analytic. Then foreach g ∈ [D]v the Taylor series in (4.3) converges for all ξ such that |ξ| < Rv

θ(µ), whereRv

θ(µ) is given by

1

Rvθ(µ)

= lim supn∈N

(‖µ(n)

θ ‖v

n!

) 1n

. (4.6)

In particular, the set Dvθ(µ) := Θ ∩ (θ −Rv

θ(µ), θ + Rvθ(µ)) satisfies

∀g ∈ [D]v : Dvθ(µ) ⊂ Dθ(g, µ).

Proof. We apply the Cauchy-Hadamard Theorem; see Theorem A.2 in the Appendix. Itfollows that the radius of convergence Rθ(g, µ) of the Taylor series in (4.3) is given by

1

Rθ(g, µ)= lim sup

n∈N

∣∣∣∫

g(s)µ(n)θ (ds)

∣∣∣n!

1n

,

i.e., the series converges for |ξ| < Rθ(g, µ) and it suffices to show that

∀g ∈ [D]v : Rvθ(µ) ≤ Rθ(g, µ). (4.7)

This follows from the Cauchy-Schwartz inequality. To see this, note that

∣∣∣∣∫

g(s)µ(n)θ (ds)

∣∣∣∣1n

≤(‖g‖v · ‖µ(n)

θ ‖v

) 1n

which, together with the fact that limn→∞

n√‖g‖v = 1, for g ∈ [D]v, concludes the proof.

The non-negative number Rvθ(µ) is called the [D]v-radius of convergence of µθ and

the set Dvθ(µ) is called the [D]v-domain of convergence of µθ. Note, however, that in

general this is not the maximal set for which the series converges for all g ∈ [D]v sincethe inequality in (4.7) may be strict.

Example 4.1. Let µθ denote the exponential distribution cf. Example 2.5. We show thatthe [F ]v-radius of convergence of µθ satisfies Rv

θ(µ) = θ, for v(x) = 1 + x, which showsthat the Taylor series converges for |ξ| < θ.

Recall that an instance of the nth-order derivative µ(n)θ is given by

µ(n)θ =

( n!

θn , εn,θ, εn+1,θ), if n is odd,( n!

θn , εn+1,θ, εn,θ), for n even,

where, for n ≥ 1,

εn,θ(dx) =θn · xn−1

(n− 1)!e−θxdx.

Page 102: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

90 4. Measure-Valued Differential Calculus

Consequently, the v-norm ‖µ(n)θ ‖v satisfies

∣∣∣∣∫

v(x)µ(n)θ (dx)

∣∣∣∣ ≤ ‖µ(n)θ ‖v ≤ n!

θn

∫v(x)εn+1,θ(dx) +

n!

θn

∫v(x)εn,θ(dx).

Elementary computation shows that for p ≥ 1 we have

∫xp εn,θ(dx) =

θn

(n− 1)!

∫xn+p−1e−θxdx =

1

θp· (n + p− 1)!

(n− 1)!.

Hence, for v(x) = 1 + x we obtain the following inequalities

1

θn+1≤ ‖µ(n)

θ ‖v

n!≤ 2n + 2θ + 1

θn+1.

Finally, we obtain

1

Rvθ(µ)

= lim supn∈N

(‖µ(n)

θ ‖v

n!

) 1n

=1

θ.

The same result holds true if one replaces v by any polynomial function.

Remark 4.2. Theorem 4.3 shows that the Taylor series converges for |ξ| < Rvθ(µ), i.e.,

θ + ξ ∈ Dvθ(µ). However, in general, the convergence of the Taylor series does not imply

analyticity. Indeed, it can happen that the Taylor series is convergent but the limit doesnot coincide with the “true value”. A standard example is that of the function

f(x) =

e−

1x2 , x 6= 0,

0, x = 0,

for which all higher-order derivatives in 0 are equal to 0 but the function has obviouslystrictly positive values in any neighborhood of 0. Therefore, the maximal neighborhood Vfor which (4.3) holds true may not be equal to the domain of convergence of the Taylorseries in the right-hand side of (4.3).

Nevertheless, since most of the usual functions, for which the Taylor series converge,are analytic we will assume in the following that the Taylor series converges to the “truevalue” for any ξ such that θ + ξ ∈ Dv

θ(µ).

The [D]v-domain of convergence Dvθ(µ) plays an important role in applications. The

following result, which is a consequence of Theorem 4.3, will show that the sequence ofTaylor polynomials converges strongly for |ξ| < Rv

θ(µ).

Theorem 4.4. Let (D, v) be a Banach base on S such that µθ is [D]v-analytic with [D]v-radius of convergence Rv

θ(µ). Then,

∀ξ; |ξ| < Rvθ(µ) : lim

n→∞‖Tn(µ, θ, ξ)− µθ+ξ‖v = 0.

Page 103: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

4.2. Leibnitz-Newton Rule and Weak Analyticity 91

Proof. By hypothesis, we have

‖Tn(µ, θ, ξ)− µθ+ξ‖v =

∥∥∥∥∥∞∑

k=n+1

ξk

k!· µ(k)

θ

∥∥∥∥∥v

≤∞∑

k=n+1

|ξ|kk!

∥∥∥µ(k)θ

∥∥∥v. (4.8)

Let ξ be such that |ξ| < Rvθ(µ) and choose ε > 0 such that |ξ|+ ε < Rv

θ(µ). Since

1

Rvθ(µ)− ε

>1

Rvθ(µ)

= lim supn∈N

(‖µ(n)

θ ‖v

n!

) 1n

it follows that there exists some nε ≥ 1 such that

∀k ≥ nε :

(‖µ(k)

θ ‖v

k!

) 1k

<1

Rvθ(µ)− ε

.

Consequently, we conclude from (4.8) that for each n ≥ nε it holds that

‖Tn(µ, θ, ξ)− µθ+ξ‖v ≤∞∑

k=n+1

( |ξ|Rv

θ(µ)− ε

)k

=Rv

θ(µ)− ε

Rvθ(µ)− ε− |ξ|

( |ξ|Rv

θ(µ)− ε

)n+1

, (4.9)

since, by assumption, |ξ| < Rvθ(µ)−ε. Therefore, the conclusion follows by letting n →∞

in (4.9).

Example 4.2. Let us consider the Bernoulli distribution1 βθ introduced in Example 2.4.Since β′θ = δx2 − δx1 and higher-order derivatives β

(n)θ , for n ≥ 2, are not significant it

follows that βθ is weakly analytic and the radius of convergence is ∞ (note that the Taylorseries is finite). Indeed, we have

∀θ, ξ ∈ R : βθ+ξ = (1− θ − ξ) · δx1 + (θ + ξ) · δx2

= (1− θ) · δx1 + θ · δx2 + ξ · (δx2 − δx1)

= βθ + ξ · β′θ.

Example 4.3. Let us revisit Example 4.1. We aim to show that the exponential distri-bution µθ is [F ]v-analytic for any polynomial v, i.e., we show that (4.3) holds true for|ξ| < θ, D = F and polynomial v. To this end, note that the density f(x, θ) of µθ isanalytic (in classical sense) in θ, i.e.,

∀x > 0,∀ξ ∈ R : f(x, θ + ξ) =∞∑

k=0

ξk

k!

dk

dθkf(x, θ).

1 Note that, for θ ∈ [0, 1], βθ is a probability distribution while, for general θ ∈ R, βθ is a (signed)measure having total mass 1.

Page 104: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

92 4. Measure-Valued Differential Calculus

Hence, (4.3) is equivalent to

∀g ∈ [F ]v :∞∑

k=0

ξk

k!

∫g(x)

dk

dθkf(x, θ)dx =

∫g(x)

∞∑

k=0

ξk

k!

dk

dθkf(x, θ)dx.

Fix g ∈ [F ]v. In order to apply the Dominated Convergence Theorem it suffices to showthat for each ξ such that |ξ| < θ the function

Fθ(ξ, x) :=∞∑

k=0

∣∣∣∣g(x)ξk

k!

dk

dθkf(x, θ)

∣∣∣∣

is integrable with respect to x. Computing the derivatives of f(x, θ); see Example 2.5, wearrive at the following inequality

Fθ(ξ, x) ≤ |g(x)|∞∑

k=0

|ξ|kk!

(θxk + kxk−1)e−θx ≤ ‖g‖v(θ + |ξ|)v(x)e−(θ−|ξ|)x.

Since the right-hand side above is obviously integrable for |ξ| < θ we conclude that forθ > 0 the exponential distribution µθ is weakly [F ]v-analytic, for any polynomial v, andthe corresponding Taylor series converges for |ξ| < θ; compare to Example 4.1.

In classical analysis it is well known that the product of two analytic functions isagain analytic. The following theorem establishes the counterpart of this fact for weakanalyticity of measures. Namely, if µθ and ηθ are weakly analytic measures then theproduct (µ× η)θ is again weakly analytic, where

∀θ ∈ Θ : (µ× η)θ := µθ × ηθ.

The precise statement is as follows.

Theorem 4.5. Let (D(S), v) and (D(T), u) be Banach bases on S and T, respectively. Letµθ be [D(S)]v-analytic and ηθ be [D(T)]u-analytic with domains of convergence Dv

θ(µ) andDu

θ (η), respectively. Then the product measure µθ × ηθ is [D(S) ⊗ D(T)]v⊗u-analytic andits domain of convergence Dv⊗u

θ (µ× η) satisfies

Dvθ(µ) ∩Du

θ (η) ⊂ Dv⊗uθ (µ× η). (4.10)

More specifically, if θ + ξ ∈ Dvθ(µ) ∩Du

θ (η) and g ∈ [D(S)⊗D(T)]v⊗u it holds that

∫g(s, t)(µ× η)θ+ξ(ds, dt) =

∞∑

k=0

ξk

k!

∫g(s, t)(µ× η)

(k)θ (ds, dt). (4.11)

Proof. Recall that, by definition, we have

Dvθ(µ) = Θ ∩ (θ −Rv

θ(µ), θ + Rvθ(µ)), Du

θ (η) = Θ ∩ (θ −Ruθ (η), θ + Ru

θ (η)).

Hence, if we set Rθ := minRvθ(µ), Ru

θ (η) it follows that

Dvθ(µ) ∩Du

θ (η) = Θ ∩ (θ −Rθ, θ + Rθ).

Page 105: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

4.2. Leibnitz-Newton Rule and Weak Analyticity 93

Next, we show that (4.11) holds true for any ξ such that |ξ| < Rθ and g ∈ [D(S)⊗D(T)]v⊗u.To this end, note that according to Theorem 4.1 all higher-order derivatives of (µ × η)θ

exist. In addition, the right-hand side of (4.11) can be re-written as

limk→∞

0≤j+l≤k

ξj+l

j!l!

∫ ∫g(s, t)µ

(j)θ (ds)η

(l)θ (dt). (4.12)

Let us consider the Taylor polynomials2 Tn : [D(S)]v → R defined as

∀n ≥ 0 : Tn(f) :=n∑

j=0

ξj

j!

∫f(s)µ

(j)θ (ds),

for f ∈ [D]v. First, note that according to (1.30) it holds that

‖g(·, t)‖v ≤ ‖g‖v⊗uu(t).

Therefore, g(·, t) ∈ [D(S)]v, for each t ∈ T, and by hypothesis we conclude from (4.5) that

∀t ∈ T :

∫g(s, t)µθ+ξ(ds) = lim

n→∞Tn(g(·, t)).

In addition, an application of the Cauchy-Schwarz inequality yields

∀t ∈ T : |Tn(g(·, t))| ≤ ‖Tn‖v‖g(·, t)‖v ≤ ‖g‖v⊗uu(t) supn≥0

‖Tn‖v. (4.13)

Next, we show that the Dominated Convergence Theorem applies to the sequence ofmappings t 7→ Tn(g(·, t))n≥1, when integrated withe respect to ηθ, for each θ ∈ Θ.Indeed, we note that weak analyticity of µθ implies that Tn(f) : n ∈ N is bounded foreach f ∈ [D(S)]v. Applying the Banach-Steinhaus Theorem; see Lemma 1.4, we concludethat supn ‖Tn‖v < ∞. Therefore, if for n ≥ 0 we set

∀t ∈ T : Hn(t) = Tn(g(·, t)),

it follows from (4.13) that Hn ∈ [D(T)]u and

‖Hn‖u ≤ ‖g‖v⊗usupn≥0

‖Tn‖v.

Since, by hypothesis, u ∈ L1(ηθ : θ ∈ Θ), the Dominated Convergence Theorem appliesto the sequence Hnn and yields

∫g(s, t)(µ× η)θ+ξ(ds, dt) = lim

n→∞

∫Tn(g(·, t))ηθ+ξ(dt). (4.14)

2 For ease of notation we replace Tn(µ, θ, ξ) by Tn. Recall that the Taylor polynomials Tn, for n ≥ 0,are linear functionals on [D]v (see Remark 4.1) and by Theorem 4.3 weak analyticity of µθ implies thatfor each ξ, satisfying |ξ| < Rv

θ(µ), µθ+ξ is the [D]v-limit of the sequence Tn; see (4.5).

Page 106: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

94 4. Measure-Valued Differential Calculus

Moreover, from [D(T)]u-analyticity of ηθ we conclude that the right-hand side in (4.14)equals to

limn→∞

limm→∞

m∑

l=0

ξl

l!

∫Tn(g(·, t))η(l)

θ (dt).

Therefore, we conclude that the left-hand side of (4.11) equals to

limn→∞

limm→∞

m∑

l=0

n∑j=0

ξj+l

j!l!

∫ ∫g(s, t)µ

(j)θ (ds)η

(l)θ (dt). (4.15)

The power series in (4.15) is convergent for |ξ| < Rθ. Hence it is absolutely convergent,so its limit is not affected by re-shuffling terms and from the Rearrangements Theorem(see Theorem A.1 in the Appendix) it follows that the limits in (4.15) and (4.12) coincide,i.e., (4.11) holds true for |ξ| < Rθ. Therefore, it follows that (µ× η)θ is [D(S)⊗D(T)]v⊗u-analytic and the inclusion in (4.10) holds true.

Just like in conventional analysis, Theorem 4.5 can be extended to finite products ofmeasures.

Corollary 4.3. For 1 ≤ i ≤ k, let (D(Si), vi) be a Banach base on Si such that µi,θ isweakly [D(Si)]vi

-analytic having domain of convergence Dviθ (µi), respectively. Then, Πθ is

[D(S1)⊗. . .⊗D(Sk)]~v-analytic and for each ξ such that θ+ξ ∈ Dviθ (µi), for each 1 ≤ i ≤ k,

it holds that

∀g ∈ [D(S1)⊗ . . .⊗D(Sk)]~v :

∫g(s)Πθ+ξ(ds) =

∞∑n=0

ξn

n!

∫g(s)Π

(n)θ (ds).

Proof. This follows by finite induction from Theorem 4.5.

4.3 Application: Stochastic Activity Networks (SAN)

Stochastic Activity Networks (SAN) such as those arising in Project Evaluation ReviewTechnique (PERT) form an important class of models for systems and control engineering.Roughly, a SAN is a collection of activities, each with some (deterministic or random)duration, along with a set of precedence constraints, which specify that activities beginonly when certain others have finished. Such a network can be modeled as a directedacyclic weighted graph (V , E ⊂ V × V) with one source, one sink node and additive3

weight-function τ : E → R.A simple example is provided in Figure 4.1 below. The network has 5 nodes, labeled

from 1 (source) to 5 (sink) and the edges denote the activities under consideration. Theweights Xi, 1 ≤ i ≤ 7, denote the durations of the corresponding activities. For instance,activity 6 can only begin when both activities 2 and 3 have finished. For a more detailedoverview of stochastic activity networks we refer to [49].

Let P denote the set of all paths from the source to the sink node. Should (some)durations be random variables, we assume them mutually independent. However, note

3 The weight of any path is given by the sum of the weights of the subsequent edges.

Page 107: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

4.3. Application: Stochastic Activity Networks (SAN) 95

•1

©©©©©©©©©©©©©*

X1

HHHHHHHHHHHHHj

X2

•3

©©©©©©©©©©©©©*

•2HHHHHHHHHHHHHjX3

?

•X6

X4

4• -•X7

X5

5

......................................................

...................................................

.................................................

............................................................................................

.................................................................................... ....................................... ........................ ........................... .............................

................................

..................................

.....................................

.......................................

........................................

..

......................................

......

..................................

..........

..............................

............

............................

............

........................

..............

........................

...............

........................................

R

Fig. 4.1: A Stochastic Activity Network with source node 1 and sink node 5.

that in general the path weights are not independent. The completion time, denoted byT , is defined as the weight of the maximal path, i.e.,

T = maxτ(π) : π ∈ P.For instance, in the above example, the set of paths from source node 1 to sink node 5, is

P = (1, 2, 5); (1, 2, 4, 5); (1, 2, 3, 4, 5); (1, 3, 4, 5).Thus, the completion time in this case can be expressed as

T = maxX1 + X5; X1 + X4 + X7; X1 + X3 + X6 + X7; X2 + X6 + X7.One of the most challenging problems in this area is to compute the expected completion

time E[T ]. Distribution free bounds for E[T ] are provided in [18]. In the followingwe aim to establish a functional dependence between a particular parameter, e.g., theexpected duration of some particular task(s), and the expected completion time of thesystem. Here, we propose a Taylor series approximation for a SAN with exponentiallydistributed activity times, where the computation of higher-order derivatives relies on theweak differential calculus presented in this chapter.

We start by considering S = [0,∞) with the usual metric and v : S → R defined asv(x) = 1 + x. Next, we define gT : S7 → R,

gT (x1, . . . , x7) := maxx1 + x5; x1 + x4 + x7; x1 + x3 + x6 + x7; x2 + x6 + x7,i.e., T = gT (X1, . . . , X7) and

E[T ] =

∫. . .

∫gT (x1, . . . , x7)µ1(dx1) . . . µ7(dx7),

Page 108: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

96 4. Measure-Valued Differential Calculus

where we denote by µi the distribution of Xi, for 1 ≤ i ≤ 7. In accordance with Theo-rem 4.2 it holds that if µi is weakly differentiable with respect to some parameter θ, forall 1 ≤ i ≤ 7, then the distribution of T is weakly differentiable with respect to θ, as well.Roughly speaking, that means that “the distribution of T is differentiable with respect toeach µi.”

4

Assume for instance that the random variables Xi, for 1 ≤ i ≤ 7, are independent andexponentially distributed with rates λi, respectively. We let λ1 = λ3 = θ be variable andlet the other rates be fixed, i.e., deterministic and not a function of θ. By Example 4.3,the exponential distribution is weakly [F ]v-analytic, for v(x) = 1 + x, and the domainof convergence is given by |ξ| < θ. Since the distributions which are independent of θare trivially weakly analytic, we conclude from Theorem 4.5 that the joint distributionof the vector (X1, . . . , X7) is weakly [F(S7)]v⊗...⊗v-analytic. Moreover, the domain ofconvergence of the corresponding Taylor series includes the set ξ : |ξ| < θ. Finally, wenote that

|gT (x1, . . . , x7)| ≤7∏

i=1

(1 + xi) = (v ⊗ . . .⊗ v)(x1, . . . , x7),

i.e., gT belongs to [F(S7)]v⊗...⊗v, the 7-fold product of the Banach base (F , v).Next we proceed to the computation of derivatives in accordance with Corollary 4.2.

Since only the derivatives of µ1,θ and µ3,θ are significant we consider, for j, k ≥ 0, amodified network where X1 is replaced by the sum of j independent samples from anexponentially distributed random variable with rate θ and X3 is replaced by the sum ofk independent samples from the same distribution whereas all other durations remainunchanged, i.e., we replace the exponential distributions of X1 and X3 by the Erlang εj,θ

and εk,θ distributions, respectively; see Example 2.5.More specifically, let X1,l : l ≥ 1 and X3,l : l ≥ 1 be two sequences of i.i.d. random

variables having exponential distribution with rate θ and let Tj,k denote the completiontime of the modified SAN, i.e.,

Tj,k = gT (X1, . . . , X7),

where we define

∀1 ≤ i ≤ 7 : Xi :=

∑jl=1 X1,l, i = 1;∑kl=1 X3,l, i = 3;

Xi, i /∈ 1, 3.We have T1,1 = T and we agree that Tj,k = 0 if either j = 0 or k = 0. With this notationCorollary 4.2 yields

∀n ≥ 0 :dn

dθnEθ[T ] = (−1)n n!

θn

j+k=n

Eθ[Tj+1,k+1 − Tj+1,k − Tj,k+1 + Tj,k] (4.16)

and for each n ≥ 1 we obtain by

Tn(θ, ξ) :=n∑

m=0

(−1)m

θ

)m ∑

j+k=m

Eθ[Tj+1,k+1 − Tj+1,k − Tj,k+1 + Tj,k] (4.17)

4 Note that for a deterministic system, i.e., all the weights are deterministic, the completion time is,in general, not everywhere differentiable w.r.t. the weights. That is because the Dirac distribution δθ isnot weakly differentiable w.r.t. θ.

Page 109: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

4.4. Concluding Remarks 97

the nth order Taylor polynomial for Eθ+ξ[T ] at θ, where, for θ ∈ Θ, Eθ denotes an expec-tation operator consistent with Xi ∼ µi, for i /∈ 1, 3, X1,l ∼ µ1,θ and X3,l ∼ µ3,θ, for alll ≥ 1. Therefore, the coefficients of Taylor polynomials are completely determined by thevalues Eθ[Tj,k], for j, k ≥ 0.

Moreover, using a monotonicity argument, one can easily check that

∀j, k ≥ 0 : |Eθ[Tj+1,k+1 − Tj+1,k − Tj,k+1 + Tj,k]| ≤ 2Eθ[X3,k+1] =2

θ. (4.18)

Hence, a bound for the error of the nth order Taylor polynomial is given by

∀|ξ| < θ : |Eθ+ξ[T ]−Tn(θ, ξ)| ≤ 2

θ

∞∑

k=n+1

(k + 1)

( |ξ|θ

)k

=2

θ

(n + 2)− (n + 1) |ξ|θ(

1− |ξ|θ

)2

( |ξ|θ

)n+1

=2

θ

1 + (n + 1)(1− ρ)

(1− ρ)2ρn+1, (4.19)

where, for simplicity, we set ρ := |ξ|/θ.Example 4.4. In order to perform a numerical experiment, we consider the followingvalues:

λ1 = λ3 = θ, λ6 = 1, λ2 = λ4 =1

2, λ5 =

1

5, λ7 =

1

3.

Computing the coefficients of the Taylor polynomial is quite demanding and it is worthnoting that the coefficients can alternatively be evaluated by simulation. Figure 4.2 showsthe Taylor polynomial T3(1, ξ) of order 3 compared to the interpolation polynomial, withseven equidistant nodes, corresponding to E1+ξ[T ], in the range |ξ| ≤ 0.6. As the Figure 4.2shows, the difference between the two estimates is quite insignificant in the range |ξ| ≤ 0.4.On the other hand, the relative error for the Taylor polynomial, according to (4.19), isbelow 3.4% in this range.

4.4 Concluding Remarks

In this chapter we have extended the theory of weak differentiation to higher-order deriva-tives in order to construct a measure-valued differential calculus. This allows for studyinganalyticity related issues which, in turn, lead to Taylor series approximations for per-formance measures of parameter-dependent stochastic systems. Similar issues have beenaddressed in [7], [12], [24], [55].

The main result of this chapter, Theorem 5.5, which shows that products of weaklyanalytic measures are again weakly analytic, is the main theoretical tool for performingTaylor series approximations based on weak differentiation. As illustrated by Example 4.4,in practice, the exact calculation of the Taylor series coefficients is quite demanding andthis seems to be the main pitfall of this method. Therefore, simulation of the Taylor seriescoefficients plays a key role in applying this method.

Page 110: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

98 4. Measure-Valued Differential Calculus

9

9.5

10

10.5

11

0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

theta

Fig. 4.2: The Taylor polynomial T3(1, ξ) (thick line) compared to the interpolation polynomial,with seven equidistant nodes, corresponding to E1+ξ[T ] (thin line), in Example 4.4.

The gain of the method put forward in this chapter, however, comes from the factthat it is suitable for evaluating (it provides asymptotically unbiased estimators) thefunctional dependence between a performance measure of a certain stochastic system andsome intrinsic parameter rather than approximating the value of the performance measureunder consideration for some particular parameter θ, which can be easily achieved by usingclassical simulation.

As the numerical experiments have revealed, in many situations the Taylor seriesobtained by using weak differential calculus provide a quite good approximation for thetrue value (seen as a function in θ). In addition, we strongly believe that weak Taylor seriesare more efficient than interpolation-based methods, i.e., one simulates the performanceof the system under consideration for some particular values of the parameter θ, in agiven interval and then use interpolation and continuity properties of the correspondinginterpolation operator to estimate the true functional dependence. The advantage of themethod presented here comes from the fact that, while the complexity (the number ofsimulations) of the methods is comparable, the resulting estimates, when using Taylorseries approximations, are prone to lower variance, i.e., faster convergence, and a verylikely reason is that, unlike interpolation polynomials, Taylor polynomials do not involve,in general, divisions by small numbers.

While the above facts rely rather on intuition sustained by some unsystematic exper-iments, establishing accurate error bounds for the estimates, which lead to determiningthe convergence rates of the simulation process, and minimizing the errors by choosingconvenient representations for the weak derivatives are topics for future research.

Page 111: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

5. A CLASS OF NON-CONVENTIONAL ALGEBRAS WITHAPPLICATIONS IN OR

In this chapter we apply the measure-valued differential calculus presented in Chapter 4to distributions of random matrices in some special class of non-conventional algebras inorder to construct Taylor series approximations for performance measures of stochasticdynamic systems whose dynamics can be modeled by a general matrix-vector multiplica-tion.

5.1 Introduction

Throughout this chapter we consider stochastic systems whose time-evolution can bemodeled as follows:

∀k ≥ 0 : V (k + 1) = X(k + 1)¯ V (k), (5.1)

where ¯ denotes a general matrix-vector multiplication, V (k), for k ≥ 0, is a finitedimensional vector denoting the kth state of the system and X(k), for k ≥ 1, is a matrixof appropriate size describing the transition of the kth to the (k +1)st state of the system.It follows that the kth state of such a system can be expressed as

∀k ≥ 1 : V (k) = X(k)¯ . . .¯X(1)¯ V (0), (5.2)

provided that the matrix-vector multiplication ¯ is associative. That is, the evolutionof the system is completely determined by the initial state V (0) and the sequence oftransitions X(k) : k ≥ 1. This general model arises when dealing with Discrete EventSystems (DES), e.g., queueing networks, stochastic activity networks and stochastic Petri-nets, where the state dynamic can be modeled through a matrix-vector multiplication ineither conventional, max-plus or min-plus algebra. For instance, the optimal cost problemin transportation networks leads to min-plus models whereas synchronization models leadto max-plus algebra. More concrete examples with concrete interpretations can also befound in [4], [13], [15], [29] and [45]. For time-homogenous, deterministic max-plus-linearsystems, i.e., X(k) = X, for all k ≥ 0, powerful tools exist for evaluating the system; see,e.g., [4], [29].

Assuming that the distributions of the input variables depend on some design pa-rameter θ, this chapter deals with the problem of computing the expected value of thestate vector Eθ[V (k)] or, more generally, Eθ[g(V (k))], for some cost-function g and a fixedhorizon k ≥ 1, as a function of θ. This problem is known to be notoriously difficult asexact formulae exist only for some special cases.

In the steady-state case, remarkable results have been obtained in [7] for the station-ary waiting time in max-plus-linear queueing networks with Poisson arrival stream, using

Page 112: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

100 5. A Class of Non-Conventional Algebras with Applications in OR

light-traffic approximation. These results have been extended to polynomially boundedperformance measures in [1], [5], and explicit expressions for the moments, Laplace trans-forms and tail probabilities of waiting times are given in [2], [3]. Taylor series approxi-mations have been successfully applied to control of max-plus-linear DES. Applicationsbased on the concept of variability expansion can be found in [20], [59].

Here, we propose Taylor series approximations based on the measure-valued differentialcalculus developed in Chapter 4 and for ease of implementation we introduce a analogousdifferential calculus for random matrices which in practice is easier to work with.

The chapter is organized as follows. In Section 5.2 we define the concept of topologicalalgebra of matrices for which we introduce the concept of weak differentiability, in Sec-tion 5.3, and construct a formal weak differential calculus in Section 5.4. Eventually, weillustrate the results by two examples, in Section 5.5.

5.2 Topological Algebras of Matrices

In this section we consider a separable, locally compact metric space (S, d) endowedwith two binary associative operators, denoted by ¦ and ∗, such that (S, ¦) and (S, ∗) aremonoids with unit elements 1¦ and 1∗, respectively. Assume further that ¦ is commutativeand 1¦ is absorbing for ∗, i.e.,

∀s ∈ S : 1¦ ∗ s = s ∗ 1¦ = 1¦.

For integers m,n ≥ 1 denote by Mm,n(S) the set of m,n matrices with elements fromS. The generalized product of matrices X ∈ Mm,k(S) and Y ∈ Mk,n(S), denoted byX ¯(¦·∗) Y or simply X ¯ Y , when no confusion occurs, is defined as follows:

[X ¯(¦·∗) Y ]ij := (Xi1 ∗ Y1j) ¦ (Xi2 ∗ Y2j) ¦ · · · ¦ (Xik ∗ Ykj), (5.3)

for each pair (i, j) with 1 ≤ i ≤ m, 1 ≤ j ≤ k. Note that a “zero” element 0(¦·∗) canbe introduced on Mm,n by considering a matrix with all entries equal to 1¦. Moreover, ifm = n = k then ¯ defines an internal operation on Mn,n and admits a neutral element,denoted by I(¦·∗), which can be constructed just like in conventional algebra by setting allthe entries of the matrix to 1¦ except from those on the main diagonal which are set to1∗. We omit the subscript (¦ · ∗) if no confusion occurs.

For each m,n ≥ 1 the set Mm,n(S) becomes a separable metric space when endowedwith the metric ℘m,n given by

∀X, Y : ℘m,n(X, Y ) := maxd(Xij, Yij) : 1 ≤ i ≤ m, 1 ≤ j ≤ n. (5.4)

In the sequel, we use the notations Mn(S) for Mn,n(S), ℘n for ℘n,n, and omit specifyingthe underlying space S, when no confusion occurs, by writing Mm,n instead of Mm,n(S).

Assume now that the mappings ¦ and ∗ are bi-continuous with respect to d. It followsthat for all m,n, k ≥ 1 the mapping

¯ : Mm,n ×Mn,k → Mm,k

is continuous if one endows Mm,n with the metric ℘m,n, for all m,n ≥ 1. In particular,if m = n = k, then ¯ denotes an internal associative binary operation, i.e., (Mn,¯) is a

Page 113: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

5.2. Topological Algebras of Matrices 101

monoid, which acts as a bi-continuous mapping with respect to the corresponding metric℘n on Mn. In addition, we have (M1,¯) = (S, ∗) and, in general, (Mm,n, ℘m,n), with ℘m,n

defined by (5.4), for m, n ≥ 1, is a metric space which inherits most of the topologicalproperties of (S, d), such as separability and local compactness.

We synthesize the above construction into the following definition.

Definition 5.1. We call the pair A := ((Mm,n, ℘m,n) : m,n ≥ 1,¯) a topological alge-bra of matrices over the space (S, d, ¦, ∗) if

(i) (S, d) is a separable, locally compact metric space,

(ii) (S, ¦) and (S, ∗) are monoids with unit elements 1¦ and 1∗, respectively,

(iii) ¦ and ∗ are bi-continuous mappings with respect to d,

(iv) 1¦ is absorbing for ∗, i.e., 1¦ acts as “zero” element,

(v) ¦ is commutative and ∗ distributes over ¦,(vi) for m,n ≥ 1, Mm,n denotes the set of m,n matrices with entries in S,

(vii) for m,n ≥ 1, ℘m,n is defined as in (5.4),

(viii) for m,n, k ≥ 1, ¯ : Mm,n ×Mn,k → Mm,k is defined as in (5.3).

By an upper-bound on a metric space we mean a real-valued, continuous, non-negativemapping. Let A = ((Mm,n, ℘m,n) : m,n ≥ 1,¯) be a topological algebra of matricesover the space (S, d, ¦, ∗).

We call the family ‖ · ‖ := ‖ · ‖m,n : m,n ≥ 1 a pseudo-norm on A if

(i) for all m,n ≥ 1, ‖ · ‖m,n is an upper-bound on Mm,n,

(ii) the family ‖ · ‖ satisfies either: for each m,n, k ≥ 1,

∀X ∈ Mm,n, Y ∈ Mn,k : ‖X ¯ Y ‖m,k ≤ ‖X‖m,n + ‖Y ‖n,k

or for each m, n, k ≥ 1,

∃γ > 0 : ∀X ∈ Mm,n, Y ∈ Mn,k : ‖X ¯ Y ‖m,k ≤ γ · ‖X‖m,n · ‖Y ‖n,k.

The pseudo-norm will be called additive (resp. multiplicative) according to the operationin the right-hand side and we say that (A, ‖ · ‖) is a pseudo-normed topological algebra ofmatrices . For simplicity we use the notation ‖ · ‖ for ‖ · ‖m,n. Note that, by definition,the mapping ‖ · ‖m,n is continuous with respect to ℘m,n, for all m,n ≥ 1, and if m = nthen ‖ · ‖n satisfies

∀X, Y ∈ Mn : ‖X ¯ Y ‖n ≤ ‖X‖n + ‖Y ‖n,

if ‖ · ‖n is additive or

∀X, Y ∈ Mn : ‖X ¯ Y ‖n ≤ γ · ‖X‖n · ‖Y ‖n,

Page 114: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

102 5. A Class of Non-Conventional Algebras with Applications in OR

if ‖ · ‖ is multiplicative. We call the pair (Mn, ℘n,¯, ‖ · ‖) a pseudo-normed topologicalmonoid . In addition, note that ‖ · ‖n is by no means a norm on Mn since, in general, Mn

can not be organized as a linear space. Moreover, ‖X‖n = 0 does not imply in generalX = 0.

Example 5.1. We enumerate here some classical examples of such structures arising inmodeling theory.

(i) ¦ = +, ∗ = × with 1¦ = 0 and 1∗ = 1. This is noticeably the conventionalalgebra setting and we choose S = R endowed with the usual metric. The mapping‖ · ‖ : Mm,n → R defined as

‖X‖ = max(i,j)

|Xij|

is an upper-bound on Mm,n. In addition, for X ∈ Mm,n and Y ∈ Mn,k it holds that

∀(i, j) : |[X ¯ Y ]ij| ≤n∑

l=1

|Xil| · |Ylj| ≤ n · ‖X‖ · ‖Y ‖.

Taking the maximum with respect to (i, j) in the left-hand side, we obtain

∀X,Y : ‖X ¯ Y ‖ ≤ n · ‖X‖ · ‖Y ‖.

Therefore, ‖ · ‖ is a multiplicative pseudo-norm for the conventional algebra of ma-trices.

(ii) ¦ = max, ∗ = + with 1¦ = −∞ and 1∗ = 0, i.e., we are dealing with the so calledmax-plus algebra. We take S = R∪ −∞. An appropriate metric on S is given by

d(x, y) =

|x−y|1+|x−y| , x, y ∈ R,

1, x ∈ R, y = −∞ or y ∈ R, x = −∞,0, x = y = −∞.

(5.5)

Obviously, d(x, y) = d(y, x) > 0 and d(x, x) = 0, for all x, y ∈ S. To see that d is ametric on S, one has to show that d satisfies the triangle inequality, i.e.,

∀x, y, z ∈ S : d(x, y) ≤ d(x, z) + d(z, y).

This is not straightforward and it can be proved by considering several cases. Herewe sketch the proof by considering the two non-trivial cases.(a) If x, y, z ∈ R then taking into account that (x, y) 7→ |x− y| defines a metric onR, by the triangle inequality we have

|x− y| ≤ |x− z|+ |z − y|.

In addition, the mapping f(t) = t1+t

, for t ≥ 0, is nondecreasing and simple algebrashows that it satisfies

∀t1, t2 ≥ 0 : f(t1 + t2) ≤ f(t1) + f(t2).

Page 115: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

5.2. Topological Algebras of Matrices 103

Then, it follows that for each x, y, z ∈ R it holds that

d(x, y) = f(|x−y|) ≤ f(|x−z|+ |z−y|) ≤ f(|x−z|)+f(|z−y|) = d(x, z)+d(z, y).

(b) If x, y ∈ R and z = −∞ then we have

d(x, y) =|x− y|

1 + |x− y| < 1 < 2 = d(x, z) + d(z, y).

Therefore, d in (5.5) defines a metric on S. For X ∈ Mm,n set

‖X‖ =

max(i,j)

|Xij|, if ∃ (i, j) s.t. Xij 6= −∞,

0, otherwise.

Since the trace of d on R is equivalent1 to the usual metric on R, it follows that ‖ · ‖is an upper-bound on Mm,n. Moreover, for X ∈ Mm,n and Y ∈ Mn,k it satisfies

∀(i, j) : |[X ¯ Y ]ij| =∣∣∣∣max1≤l≤n

(Xil + Ylj)

∣∣∣∣ ≤ ‖X‖+ ‖Y ‖. (5.6)

Taking the maximum over all (i, j) in the left-hand side of (5.6), yields

∀X, Y : ‖X ¯ Y ‖ ≤ ‖X‖+ ‖Y ‖,

which shows that ‖ · ‖ is an additive pseudo-norm for the max-plus algebra.

(iii) ¦ = min, ∗ = + with 1¦ = ∞ and 1∗ = 0, i.e., we obtain the min-plus algebra ofmatrices. Set S = R ∪ ∞ and define the metric d on S, as follows:

d(x, y) =

|x−y|1+|x−y| , x, y ∈ R,

1, x ∈ R, y = ∞,1, y ∈ R, x = ∞.

In addition, for X ∈ Mm,n let us define

‖X‖ =

max(i,j)

|Xij|, if ∃i, j s.t. Xij 6= ∞,

0, otherwise.

Following the same line of argument as in the above example, we conclude that dis a metric on S and ‖ · ‖ is an additive pseudo-norm for the topological algebra ofmatrices induced by (S, d, min, +).

(iv) ¦ = max, ∗ = × with 1¦ = −∞ and 1∗ = 1. Set S = R ∪ −∞, where we agreethat

∀x ∈ R : −∞× x = x×−∞ = −∞,

1 i.e., both metrics generate the same topology.

Page 116: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

104 5. A Class of Non-Conventional Algebras with Applications in OR

i.e., −∞ is absorbing for ×. We choose d and ‖ · ‖ just like in (ii) above. In orderto show that ‖ · ‖ is a pseudo-norm for this algebra of matrices we note that forX ∈ Mm,n and Y ∈ Mn,k it holds that

∀(i, j) : |[X ¯ Y ]ij| =∣∣∣∣max1≤l≤n

(Xil · Ylj)

∣∣∣∣ ≤ ‖X‖ · ‖Y ‖.

This leads to ‖X¯Y ‖ ≤ ‖X‖·‖Y ‖, for each X, Y for which the matrix multiplicationmakes sense, i.e., ‖ · ‖ is a multiplicative pseudo-norm.

(v) ¦ = min, ∗ = × with 1¦ = ∞ and 1∗ = 0. We choose S = R ∪ ∞ and agree that

∀x ∈ R : ∞× x = x×∞ = ∞.

We also choose d and ‖ · ‖ exactly as in (iii). Following the same arguments as inthe above example we obtain that ‖ · ‖ is a multiplicative pseudo-norm.

5.3 Dp-Differentiability

In many mathematical models, which can be described by one of the settings enumeratedin Example 5.1, one is interested to assess the behavior of the integrals of the moments‖X‖p, for p ≥ 1. An efficient way to do this is to consider a particular set of test-functionsDp, i.e., the class of polynomially bounded mappings, to be introduced in Section 5.3.1.In Section 5.3.2 we discuss the concept of Dp-differentiability of random matrices.

5.3.1 Dp-spaces

Let (X, d) be a metric space with upper-bound ‖ · ‖ and for p ≥ 1 let us denote by vp themapping defined as

∀x ∈ X : vp(x) = max1, ‖x‖p. (5.7)

Note that vp ∈ C+(X). In addition, if (D(X), vp) is a Banach base on X, we define Dp(X)or Dp, when no confusion occurs, as follows:

Dp(X) := [D(X)]vp =

g ∈ D(X) : sup

x∈X

|g(x)|vp(x)

< ∞

. (5.8)

Note that the spaces Dp, for p ≥ 0, are Banach spaces and enjoy the property that q < pimplies Dq ⊂ Dp. Indeed, note first that for q < p it holds that

∀x ∈ X : vq(x) = max 1, ‖x‖q ≤ max 1, ‖x‖p = vp(x) (5.9)

and consequently, for g ∈ Dq, it follows that

∀x ∈ X : |g(x)| ≤ ‖g‖vqvq(x) ≤ ‖g‖vqvp(x),

i.e., for q < p we have ‖g‖vp ≤ ‖g‖vq , for all g ∈ D(X). The next result provides the maintechnical tool for dealing with Dp-spaces.

Page 117: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

5.3. Dp-Differentiability 105

Lemma 5.1. Let (X, dX), (Y, dY) and (Z, dZ), be metric spaces equipped with upper-bounds‖ · ‖X, ‖ · ‖Y and ‖ · ‖Z, respectively. Let h : X × Y → Z be continuous and definew : X× Y→ R as

∀x ∈ X, y ∈ Y : w(x, y) = vp(x)vp(y);

see (5.7) for a definition of vp. If any of the following conditions holds:

(α) there exist constants CX, CY > 0 such that

∀x ∈ X, y ∈ Y : ‖h(x, y)‖Z ≤ CX‖x‖X + CY‖y‖Y,

(β) there exist C > 0 such that

∀x ∈ X, y ∈ Y : ‖h(x, y)‖Z ≤ C‖x‖X‖y‖Y,

then ‖g h‖w < ∞ for any g ∈ Dp(Z).

Proof. First, note that the conclusion reduces to

sup(x,y)

‖h(x, y)‖pZ

vp(x)vp(y)< ∞,

since, by hypothesis, for z ∈ Z we have |g(z)| ≤ ‖g‖vpvp(z).If (α) holds true then for each x ∈ X and y ∈ Y it holds that

‖h(x, y)‖pZ ≤

p∑i=0

(p

i

)CiX‖x‖i

XCp−iY ‖y‖p−i

Y ≤p∑

i=0

(p

i

)CiXC

p−iY vp(x)vp(y), (5.10)

where the inequality in (5.10) follows from (5.9). Therefore, from (5.10) we conclude that

∀x ∈ X, y ∈ Y :‖h(x, y)‖p

Zvp(x)vp(y)

≤ (CX + CY)p.

Assume now that (β) holds true. Then, for x ∈ X, y ∈ Y it holds that

‖h(x, y)‖pZ ≤ C‖x‖p

X‖y‖pY ≤ Cvp(x)vp(y).

Consequently, we have

∀x ∈ X, y ∈ Y :‖h(x, y)‖p

Zvp(x)vp(y)

≤ C.

Therefore, if (α) holds true then ‖g h‖w ≤ (CX+CY)p‖g‖vp whereas if (β) holds true we

have ‖g h‖w ≤ C‖g‖vp , which concludes the proof.

In the following, we let X = Mm,n, i.e., X consists of (m, n) matrices in some pseudo-normed topological algebra. Recall that Mm,n becomes a separable, locally compact met-ric space, when endowed with the metric ℘m,n, so that the theory of weak differentiationcan be easily adapted to this setting. The next result, which is an immediate consequenceof Lemma 5.1, will put forward a remarkable property of Dp-spaces which will be cru-cial for introducing a weak differential calculus for random matrices on pseudo-normedtopological algebras.

Page 118: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

106 5. A Class of Non-Conventional Algebras with Applications in OR

Corollary 5.1. Let (A, ‖ · ‖) = ((Mm,n, ℘m,n) : m, n ≥ 1,¯, ‖ · ‖) be a pseudo-normedtopological algebra of matrices and for each n,m ≥ 1 and p ≥ 0 let vp and Dp(Mm,n) bedefined as in (5.7) and (5.8), respectively. Then, for each m,n, k ≥ 1 and g ∈ Dp(Mm,k)the mapping

∀X ∈ Mm,n, Y ∈ Mn,k : (X, Y ) 7→ g(X ¯ Y )

belongs to [D(Mm,n ×Mn,k)]w where w : Mm,n ×Mn,k → R is defined as

∀X ∈ Mm,n, Y ∈ Mn,k : w(X,Y ) := vp(X) · vp(Y ).

Proof. The proof follows from Lemma 5.1. Indeed, if we let X = Mm,n, Y = Mn,k andZ = Mm,k with the usual ℘ metric and upper-bound and h = ¯ and note that h iscontinuous then Lemma 5.1 concludes the proof.

5.3.2 Dp-Differentiability for Random Matrices

Let X ∈ Mm,n be a random matrix defined on some probability space (Ω,K) havingdistribution µθ, for θ ∈ Θ, i.e., X : Ω → Mm,n is measurable and for each θ ∈ Θ thereexists some probability measure Pθ on K such that

∀θ ∈ Θ : Pθ(X ∈ N) = µθ(N),

for each Borel subset N of Mm,n. Recall that if µθ is weakly Dp-differentiable, withderivative (cθ, µ

+θ , µ−θ ), it follows that

∀g ∈ Dp :d

dθEθ [g(X)] = cθ Eθ

[g(X+)− g(X−)

], (5.11)

where Eθ is an expectation operator consistent with X ∼ µθ and X± ∼ µ±θ .Assume now that X and Y are stochastically independent random matrices such that

their distributions are Dp-differentiable. In order to study differentiability properties ofthe distribution of their product X ¯ Y one can apply Theorem 2.3 to the distributionsof X and Y and obtain the following result.

Theorem 5.1. Let X ∈ Mm,n and Y ∈ Mn,k be stochastically independent random ma-trices with distributions µθ and ηθ, respectively. Assume further that the distributions µθ

and ηθ are Dp-differentiable, having weak derivatives (cµθ , µ

+θ , µ−θ ) and (cη

θ , η+θ , η−θ ), respec-

tively. Then the distribution of the product X ¯ Y is again Dp-differentiable and for eachg ∈ Dp(Mm,k) it holds that

d

dθEθ[g(X ¯ Y )] = cµ

θE[g(X+ ¯ Y )− g(X− ¯ Y )

]+ cη

θEθ

[g(X ¯ Y +)− g(X ¯ Y −)

],

Proof. It follows from Theorem 2.3 that µθ×ηθ is [D(Mm,n×Mn,k)]w-differentiable, where

∀X ∈ Mm,n, Y ∈ Mn,k : w(X, Y ) := vp(X) · vp(Y )

and (5.12) holds true for all g chosen such that the mapping

∀X ∈ Mm,n, Y ∈ Mn,k : (X, Y ) 7→ g(X ¯ Y )

belongs to [D(Mm,n ×Mn,k)]w. Therefore, from Corollary 5.1 we conclude that the dis-tribution of X¯Y is Dp(Mm,k)-differentiable and (5.12) holds true for g ∈ Dp(Mm,k).

Page 119: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

5.3. Dp-Differentiability 107

Remark 5.1. Since, throughout this chapter, our focus is on random matrices rather thanon their distributions we will use in the remainder of this chapter the notation E[g(Xθ)]instead of Eθ[g(X)], which would be consistent with the theory in Chapter 2, in order toemphasize the dependence on θ. In this notation (5.11) will be re-written as

∀g ∈ Dp :d

dθE [g(Xθ)] = cθ E

[g(X+

θ )− g(X−θ )

], (5.12)

and a weak derivative of Xθ will be formally denoted by X ′θ = (cθ, X

+θ , X−

θ ).

We say that the random matrix Xθ is Dp-differentiable with respect to θ if its dis-tribution is Dp-differentiable. Consequently, we call the triple (cθ, X

+θ , X−

θ ) a weak Dp-derivative of Xθ. In the same vein we define higher-order differentiation for randommatrices. More specifically, we say that Xθ is n times Dp-differentiable, for n ≥ 1, if its

distribution is n times Dp-differentiable. It follows that there exists c(n)θ > 0 and random

variables X(n±)θ such that

∀g ∈ Dp :dn

dθnE[g(Xθ)] = c

(n)θ E

[g

(X

(n+)θ

)− g

(X

(n−)θ

)]. (5.13)

Therefore, we call the triple

X(n)θ :=

(c(n)θ , X

(n+)θ , X

(n−)θ

)(5.14)

a nth-order weak derivative of Xθ and we set

X(n)θ := (1, Xθ, Xθ),

if the nth-order derivative is not significant.

Example 5.2. We revisit Example 2.4. Assume that Xθ is a Bernoulli distributed randomvariable, with point masses A,B ∈ Mm,n and parameter θ ∈ [0, 1], such that X0 = A. Ifwe interpret A as a random variable having Dirac distribution δA then an instance of aweak-derivative of Xθ is given by X ′

θ = (1, B, A). Therefore, it holds that

∀g :d

dθE[g(Xθ)] = E[g(B)− g(A)] = g(B)− g(A).

Furthermore, by Example 2.4 it follows that higher-order derivatives X(n)θ , for n ≥ 2, are

not significant.

Note that the representation of the nth-order derivative X(n)θ in (5.14) is not unique.

However, by definition, any triplet representation of X(n)θ should satisfy (5.13). Moreover,

Theorem 5.1 can be re-phrased as: “If Xθ and Yθ are stochastically independent, Dp-differentiable random matrices then the product Xθ ¯ Yθ is Dp-differentiable as well.” Inaddition, the derivative d

dθE[g(Xθ ¯ Yθ)] can be evaluated according to (5.12).

On the other hand, provided that the product Xθ ¯ Yθ is Dp-differentiable, it wouldbe desirable to have a formula such as

(Xθ ¯ Yθ)′ = X ′

θ ¯ Yθ + Xθ ¯ Y ′θ . (5.15)

Page 120: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

108 5. A Class of Non-Conventional Algebras with Applications in OR

Unfortunately, an equation like (5.15) does not make sense since the weak derivativeof a random matrix Xθ is not a matrix anymore and has an algebraic meaning whenidentified with a triple (cθ, X

+θ , X−

θ ), where cθ > 0 and X±θ are again random matrices.

Consequently, the expression in the right-hand side in (5.15) has no meaning. Therefore,in order to develop a differential calculus similar to the classical analysis, i.e., to establisha connection between (5.12) and (5.15), we need to embed the algebra of matrices into aricher one, where ¯ multiplication between random matrices and their derivatives makessense. Moreover, the extended algebra should be consistent with the original one, i.e.,the extended ¯ multiplication should coincide with the original ¯ multiplication, whenrestricted to simple matrices.

Motivated by the above remarks, we will introduce in Section 5.4 an extended algebrawhere the definition of the derivatives, as given by (5.14), is correct and where equalitiessuch as (5.15) have a precise interpretation.

5.4 A Formal Differential Calculus for Random Matrices

Throughout this section we consider a pseudo-normed topological algebra of matrices

(A, ‖ · ‖) = ((Mm,n, ℘m,n) : m,n ≥ 1,¯, ‖ · ‖)

and for m,n ≥ 1 and p ≥ 0 we consider the mapping vp and the spaceDp = [D]vp as definedby (5.7) and (5.8), respectively. Since in applications working with random variables isoften more natural than working with their distributions (measures), we develop in thefollowing a weak Dp-differential calculus for random matrices in the algebra (A, ‖ · ‖).Starting point of our analysis will be Theorem 5.1 which asserts that Dp-differentiabilityof random elements Xθ and Yθ is inherited by the ¯ product Xθ¯Yθ in a pseudo-normedtopological algebra of matrices. In Section 5.4.1 we construct an extension A∗ of thealgebra A and in Section 5.4.2 we show that a weak differential calculus, similar to theclassical one, holds true on the extended algebra A∗.

5.4.1 The Extended Algebra of Matrices

Let m,n ≥ 1 and denote by Mm,n the set of all finite sequences of triples (c, A,B), withc ∈ R+ and A,B ∈ Mm,n. A generic element of Mm,n is thus given by

τ =((c1, A1, B1), (c2, A2, B2), . . . , (cn, An, Bn)

),

where n = nτ < ∞ is called the length of τ . If τ is of length one, i.e., nτ = 1, we call itelementary. Note that the weak derivative of a random matrix is elementary in Mm,n.

On Mm,n we introduce the addition, denoted by +, as the concatenation of strings.For example, let τ ∈ Mm,n be given by τ = (τi : 1 ≤ i ≤ n), with τi elementary, for each1 ≤ i ≤ n. Then we write τ =

∑ni=i τi. More generally, for σ, τ ∈ Mm,n, the application

of the + operator yields

σ + τ = (σ1, . . . , σnσ , τ1, . . . , τnτ ) =nσ∑i=1

σi +nτ∑j=1

τj.

Page 121: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

5.4. A Formal Differential Calculus for Random Matrices 109

For an elementary τ = (c, A, B) ∈ Mm,n we define the conjugate

τ := (c, B, A)

and extend it to general τ = (τ1, · · · , τn) as follows: τ := (τ1, · · · , τn).On Mm,n we introduce a scalar multiplication as follows: for elementary τ = (c, A, B)

and a real number r we set r · τ = (r · c, A, B) and extend it to general τ such that itdistributes over +, i.e., r · τ =

∑nτ

i=1 r · τi. Next, we introduce multiplication, denoted alsoby ¯, as follows2:

σ ¯ τ := cσcτ · ((1, Aσ ¯ Aτ ,0), (1, Bσ ¯Bτ ,0), (1,0, Aσ ¯Bτ ), (1,0, Bσ ¯ Aτ )) ,

for elementary σ = (cσ, Aσ, Bσ) ∈ Mm,n and τ = (cτ , Aτ , Bτ ) ∈ Mn,k and we extend thisoperation to general elements via additivity. Specifically, if σ = (σ1, · · · , σnσ) ∈ Mm,n

and τ = (τ1, · · · , τnτ ) ∈ Mn,k then we set

σ ¯ τ =nσ∑i=1

nτ∑j=1

σi ¯ τj.

Finally, we embed Mm,n into Mm,n via a monomorphism ι given by

X ι = ι(X) = (1, X,0)

and we define the ι-extension gι of a function g : Mm,n → R in the following way: for

σ = ((c1, A1, B1), (c2, A2, B2), . . . , (cnσ , Anσ , Bnσ)) ∈ Mm,n

we set

gι(σ) =n∑

i=1

ci (g(Ai)− g(Bi)) .

Simple manipulations on the above introduced operations show that gι is linear withrespect to addition and homogenous with respect to scalar multiplication, i.e., for anyg : Mm,n → R, σ, τ ∈ Mm,n, and cσ, cτ ∈ R it holds that

gι(cσ · σ + cτ · τ) = cσ · gι(σ) + cτ · gι(τ).

In addition, using the properties of the morphism ι we deduce that

σ ¯ τ = cσcτ[(Aσ ¯ Aτ )ι + (Bσ ¯Bτ )ι + (Aσ ¯Bτ )ι + (Bσ ¯ Aτ )ι

].

The set Mm,n can be embedded into the product space

Σm,n := (R×Mm,n ×Mm,n)N,

i.e., the space of all (infinite) sequences of triples, via the morphism

∀τ := (τ1, . . . , τnτ ) ∈ Mm,n : τ 7→ (τ1, . . . , τnτ ,0ι, . . .) = τ + 0ι + . . . .

2 Recall that 0 denotes the “zero” element on Mm,n

Page 122: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

110 5. A Class of Non-Conventional Algebras with Applications in OR

Then, Mm,n is isomorphic to the subset of Σm,n consisting of all finite sequences and, forconvenience, we identify any element τ with its image.

On the space Σm,n one can introduce a metric Λm,n, in the following way: for elemen-tary σ = (cσ, Aσ, Bσ) and τ = (cτ , Aτ , Bτ ) we set

Λm,n(σ, τ) := max|cσ − cτ |, ℘m,n(Aσ, Aτ ), ℘m,n(Bσ, Bτ ),

and extend Λm,n to general elements by letting

Λm,n(σ, τ) :=∞∑

n=i

1

2i

Λm,n(σi, τi)

1 + Λm,n(σi, τi),

for σ = (σi : i ≥ 1) and τ = (τi : i ≥ 1). It turns out that Λm,n is indeed a metric onΣm,n, so that (Σm,n, Λm,n) is a metric space. Therefore, if we identify Mm,n with a subsetof Σm,n then (Mm,n, Λm,n) becomes a metric space such that g : Mm,n → R is continuous(resp. Borel measurable). It follows that the ι-extension gι of g is continuous (resp. Borelmeasurable) on Mm,n.

The structureA∗ := ((Mm,n, Λm,n) : m,n ≥ 1,¯)

will be called the ∗-extension of A = ((Mm,n, ℘m,n) : m,n ≥ 1,¯), or the extendedalgebra, for short. Unfortunately, note that Mm,n has very poor algebraic properties. Forexample, the addition fails to be commutative and, moreover, does not admit a neutral(zero) element. Consequently, the ¯ multiplication on the extended algebra A∗ is nota proper extension of the original ¯ multiplication on A. To see this, note that forX ∈ Mm,n and Y ∈ Mn,k we have

(X ι ¯ Y ι) = (1, X,0)¯ (1, Y,0)

= (1, X ¯ Y,0) + (1,0¯ 0,0) + (1,0, X ¯ 0) + (1,0,0¯ Y )

= (1, X ¯ Y,0) + (1,0,0) + (1,0,0) + (1,0,0)

= (X ¯ Y )ι + 0ι + 0ι + 0ι 6= (X ¯ Y )ι. (5.16)

However, one can avoid this inconveniences by dealing with weak equalities on Mm,n aswe are about to explain. On Mm,n the equality σ = τ means that σ equals componentwiseto τ . Since our aim is to study expressions such as E[g(X)] for random matrices X ∈ Mm,n

and a certain class of functions g, we introduce the concept of weak equality on Mm,n.Precisely, let D be a set of mappings of Mm,n to R. We say that random elementsσ, τ ∈ Mm,n are weakly equal with respect to D, and we write σ ≡D τ , or σ ≡ τ , when noconfusion occurs, if

∀g ∈ D : E[gι(σ)] = E[gι(τ)].

Obviously, if σ and τ are non-random then the above condition becomes

∀g ∈ D : gι(σ) = gι(τ).

Remark 5.2. By using weak equalities on the extended algebra A∗ one can derive someinteresting facts. In the following we enumerate a few of them.

Page 123: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

5.4. A Formal Differential Calculus for Random Matrices 111

(i) The addition on Mm,n is commutative, in a weak sense, with respect to any classof functions. Moreover, 0ι = (1,0,0) is a zero element. Note, however, that thisis not unique. Indeed, any finite sum of elements of type (c,X, X), with c > 0 andX ∈ Mm,n acts as a neutral element for the addition.

(ii) The extended ¯ multiplication is a proper extension of the original ¯ multiplicationof matrices, i.e.,

∀X ∈ Mm,n, Y ∈ Mn,k : (X ¯ Y )ι ≡D X ι ¯ Y ι,

where the above weak equality holds true with respect to any class of functions D.Indeed, this follows from (5.16) and (i) above.

(iii) If Xθ ∈ Mm,n is Dp-differentiable, having derivative (cθ, X+θ , X−

θ ), then it followsfrom (5.12) that

∀g ∈ Dp(Mm,n) :d

dθE [g(Xθ)] = E

[gι

(cθ, X

+θ , X−

θ

)].

Hence, by setting X ′θ := (cθ, X

+θ , X−

θ ) ∈ Mm,n it follows that

∀g ∈ Dp(Mm,n) :d

dθE [g(Xθ)] = E [gι (X ′

θ)] . (5.17)

In particular, note that if X ′θ = (cθ, X

+θ , X−

θ ) is another representation of the deriva-tive of Xθ then it holds that X ′

θ ≡Dp X ′θ. The same fact holds true for higher-order

derivatives, which means that the definition of the derivatives of a random matrix,as given by (5.13), is correct.

5.4.2 Dp-Differential Calculus

In practice, checking Dp-differentiability of a random matrix is not straightforward. Inmany applications, however, the distribution of the random matrix Xθ depends on θthrough the distribution of some of its entries [Xθ]ij, for some pair of indices (i, j). Itis natural that one would expect that Dp-differentiability of Xθ is related to that of itsentries. In the following we give a precise meaning to the above statement. To this endrecall the notations in Section 4.2.1. Specifically, for k, n ≥ 1 set

J (k, n) := = (1, . . . , k) : 0 ≤ l ≤ n, 1 + . . . + k = n.For = (1, . . . k) ∈ J (k, n) we denote by ν() the number of non-zero elements of thevector and by I() the set of vectors ı ∈ −1, 0, +1k such that ıl 6= 0 if and only ifl 6= 0 and such that the product of all non-zero elements of ı equals one, i.e., there is aneven number of −1. For ı ∈ I(), we denote by ı the vector obtained from ı by changingthe sign of the non-zero element at the highest position.

Lemma 5.2. Let Ul,θ : 1 ≤ l ≤ k ⊂ M1 be a collection of n-times Dp-differentiable,independent random variables with Dp-derivatives given by

∀1 ≤ l ≤ k, 1 ≤ m ≤ n :(c(m)l,θ , U

(m,+)l,θ , U

(m,−)l,θ

).

Page 124: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

112 5. A Class of Non-Conventional Algebras with Applications in OR

If for each θ ∈ Θ the entries of matrix Xθ satisfy

∀(i, j) : [Xθ]ij = Xij(U1,θ, . . . , Uk,θ),

for some measurable mappings Xij, then Xθ is n times Dp-differentiable, provided thatsome positive constants d1, . . . , dk exist, such that

∀u1, . . . , uk : ‖X(u1, . . . , uk)‖ ≤ d1‖u1‖+ . . . + dk‖uk‖,where X(u1, . . . , uk) denotes the matrix with entries Xij(u1, . . . , uk) : (i, j). In addition,

the nth-order derivative X(n)θ can be represented in the extended algebra as follows:

∈J (k,n)

Cθ()∑

ı∈I()

(1, X

(U

(j1,ı1)1,θ , . . . , U

(k,ık)k,θ

), X

(U

(1 ,ı1)1,θ , . . . , U

(k ,ık)k,θ

)),

where, for = (1, . . . , k) ∈ J (k, n) we set

Cθ() :=

(n

1, . . . , k

) k∏

l=1

c(l)l,θ

and, for convenience, we identify

∀1 ≤ l ≤ k : U(l,+1)l,θ = U

(l,+)l,θ , U

(l,−1)l,θ = U

(l,−)l,θ , U

(0,0)l,θ = Ul,θ.

Proof. Let us define h : M1 × . . .×M1 → Mm,n as follows:

∀u1, . . . , uk : h(u1, . . . , uk) := X(u1, . . . , uk).

A successive application of Lemma 5.1 concludes the first part of the proof. The secondpart follows by applying Corollary 4.2 to the random elements Ul,θ : 1 ≤ l ≤ k andtaking Remark 5.2 (iii) into account.

The basis of our Dp-differential calculus for random matrices is the following resultwhich follows directly from Theorem 5.1 by re-writing (5.15) as an weak equality in theextended algebra A∗.

Theorem 5.2. Let Xθ ∈ Mm,n, Yθ ∈ Mn,k be stochastically independent, Dp-differentiablerandom matrices with Dp-derivatives X ′

θ and Y ′θ , respectively. Then the generalized product

Xθ ¯ Yθ ∈ Mm,k is Dp-differentiable and we have

(Xθ ¯ Yθ)′ ≡Dp X ′

θ ¯ Y ιθ + X ι

θ ¯ Y ′θ .

Proof. From (5.12) in Theorem 5.1 we conclude that

∀g ∈ Dp :d

dθE[g(Xθ ¯ Yθ)] = E [gι(X ′

θ ¯ Y ιθ + X ι

θ ¯ Y ′θ )] . (5.18)

On the other hand, since Xθ ¯ Yθ is Dp-differentiable, it follows from (5.17) that

∀g ∈ Dp :d

dθE[g(Xθ ¯ Yθ)] = E [gι ((Xθ ¯ Yθ)

′)] ,

which together with (5.18) imply that

∀g ∈ Dp : E [gι ((Xθ ¯ Yθ)′)] = E [gι(X ′

θ ¯ Y ιθ + X ι

θ ¯ Y ′θ )] .

This concludes the proof.

Page 125: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

5.4. A Formal Differential Calculus for Random Matrices 113

The following result is the counterpart of the generalized Leibniz-Newton differentia-tion rule for random matrices.

Theorem 5.3. Let Xθ(i), for 1 ≤ i ≤ k, be a sequence of mutually independent, n-timesDp-differentiable random matrices such that the generalized product

Xθ := Xθ(k)¯ . . .¯Xθ(1)

is well defined. Then Xθ is Dp-differentiable and if we denote by [Xθ(i)](m) the mth-order

derivative of Xθ(i), for all 1 ≤ i ≤ k, 1 ≤ m ≤ n, then it holds that

X(n)θ ≡Dp

∈J (k,n)

(n

1, . . . , k

)· [Xθ(k)](k) ¯ . . .¯ [Xθ(1)](1),

where, for 1 ≤ i ≤ k, we agree that [Xθ(i)](0) = [Xθ(i)]

ι.

Proof. For a proof, note first that the function

h(xk, . . . , x1) = xk ¯ . . .¯ x1

satisfies the conditions of Lemma 5.1 and then apply Corollary 4.2 to random variablesXθ(i) : 1 ≤ i ≤ k.

We conclude this section with discussing the concept of Dp-analyticity of randommatrices. We say that the random matrix Xθ is Dp-analytic if its distribution is Dp-analytic. Therefore, in accordance with Definition 4.1, it turns out that the randommatrix Xθ is Dp-analytic if the following two conditions are satisfied:

• all higher-order derivatives X(n)θ , for n ≥ 1, exist,

• there exist some neighborhood V of θ such that

∀ξ; θ + ξ ∈ V : Xθ+ξ ≡Dp

∞∑n=0

ξn

n!·X(n)

θ .

Example 5.3. Let us revisit Example 5.2. If Xθ is Bernoulli distributed with point massesA,B and parameter θ ∈ [0, 1] it follows that all higher order derivatives of Xθ exist andone can easily check that for any p ≥ 1 it holds that

∀θ ∈ [0, 1] : Xθ ≡Dp X0 + θ ·X ′0 ≡Dp

∞∑n=0

θn

n!·X(n)

0 ,

since, for n ≥ 2, the derivatives X(n)0 are not significant. It follows that X0 is weakly

Dp-analytic, for any p ≥ 1; see Example 4.2.

Consequently, we extend concepts such as Taylor polynomials, radius and domain ofconvergence to analytic random matrices by means of their distribution.

Page 126: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

114 5. A Class of Non-Conventional Algebras with Applications in OR

Theorem 5.4. Let Ul,θ : 1 ≤ l ≤ k ⊂ M1 be a collection of Dp-analytic, independentrandom variables with corresponding domains of convergence Dp

θ(Ul), for 1 ≤ l ≤ k,respectively. If for each θ ∈ Θ the entries of matrix Xθ satisfy

∀(i, j) : [Xθ]ij = Xij(U1,θ, . . . , Uk,θ),

for some measurable mappings Xij, then Xθ is Dp-analytic, provided that some positiveconstants d1, . . . , dk exist, such that

∀u1, . . . , uk : ‖X(u1, . . . , uk)‖ ≤ d1‖u1‖+ . . . + dk‖uk‖,where X(u1, . . . , uk) denotes the matrix with entries Xij(u1, . . . , uk) : (i, j). More specif-ically, for each ξ such that θ + ξ ∈ Dp

θ(Ul), for any 1 ≤ l ≤ k, it holds that

Xθ+ξ ≡Dp

∞∑n=0

ξn

n!X

(n)θ .

Proof. The existence of the derivatives X(n)θ follows from Lemma 5.2. Now apply Corol-

lary 4.3 to the distributions µl,θ of Ul,θ and use Lemma 5.1.

Theorem 5.4 relates weak analyticity of a random matrix to that of its entries. Inapplications this will be an important technical tool, used to prove weak analyticity of arandom matrix. Since in many models the state of a stochastic system is described bya finite product of random matrices, our next result will show that products of weaklyanalytical random matrices are again weakly analytical, in a Dp-sense.

Theorem 5.5. Let Xθ(i), for 1 ≤ i ≤ k, be a sequence of stochastically independent, Dp-analytic random matrices, having domains of convergence Dp

θ(X(i)), respectively, suchthat the generalized product

Xθ := Xθ(k)¯ . . .¯Xθ(1)

is well defined. Then Xθ is Dp-analytic. Specifically, for any ξ such that θ+ξ ∈ Dpθ(X(i)),

for each 1 ≤ i ≤ k, it holds that

Xθ+ξ ≡Dp

∞∑n=0

ξn

n!·X(n)

θ .

Proof. The existence of the derivatives X(n)θ follows from Theorem 5.3. To conclude the

proof, apply Corollary 4.3 to the distributions µi,θ of Xθ(i).

We have constructed a weak Dp-differential calculus for random matrices by “translat-ing” the results in Chapter 4 in terms of random objects. Apart from being more handyworking with random objects rather than working with probability distributions, this dif-ferential calculus has also the advantage that is based on a single class of cost-functions oneach space, namely Dp. Therefore, Dp can be seen as a “universal” class of cost-functions.The trade-off, however, is that we restrict our analysis to pseudo-normed algebras ofmatrices, i.e., we impose some restrictions on the upper-bounds under consideration.

Page 127: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

5.5. Taylor Series Approximations for Stochastic Max-Plus Systems 115

5.5 Taylor Series Approximations for Stochastic Max-Plus Systems

In this section we illustrate our theory with two parameter-dependent max-plus dynamicsystems. The first one, treated in Section 5.5.1 is inspired by F. Baccelli & D. Hong(see [6]) and describes a cyclic, multi-server station whereas in Section 5.5.2 we showhow one can model a stochastic activity network as a max-plus dynamic system. In bothsituations we perform Taylor series approximations.

5.5.1 A Multi-Server Network with Delays/Breakdowns

Let us consider a cyclic network with two stations, where the first station has one serverand the second one has two servers. The network has three customers two of whichinitially are beginning their service whereas the third one is in the buffer of the multi-sever station, just about to enter in the server. Assume that the service time at the singleserver station is σ time units, the service time at the multi-server station is τ time unitsand assume that each customer, after finishing its service at one station, instantly movesto the other station where he/she either waits in the buffer if the station is busy or entersthe available server and begins its service. This system is called the default system. Inthe following we consider two variations of the default system: the delayed system and thebreakdown system. The delayed system differs from the default in that the service timeat the multi-server station is increased by an amount δ. In the breakdown system oneserver is removed from the multi-server station modeling a breakdown of the particularserver. The three systems are illustrated in Figure 5.1.

The above three systems can be modeled as (max, +)-linear systems. Indeed, if wechoose as the state-variable a 4-dimensional vector V (k) such that V 1(k) denotes the kth

arrival epoch at the single-server station, V 2(k) denotes the kth departure epoch from thesingle-server station, V 3(k) denotes the kth arrival epoch at the multi-server station andV 4(k) denotes the kth departure epoch from the multi-server station, where V i(k), for1 ≤ i ≤ 4, denote the components of the vector V (k), then the dynamics of each of thethree systems is given by

∀k ≥ 0 : V (k + 1) = X ¯ V (k),

where ¯ denotes the (max, +) matrix-vector multiplication and if we set

D :=

σ ε τ εσ ε ε εε 0 ε 0ε ε τ ε

, Pd :=

σ ε τ + δ εσ ε ε εε 0 ε 0ε ε τ + δ ε

, Pb :=

σ ε τ εσ ε ε εε 0 τ εε ε τ ε

,

then we have X = D for the default system, X = Pd for the delayed system and X = Pb

for the system with breakdowns; see [6] for a proof.One can construct two hybrid stochastic systems out of these three, as follows: First,

we consider a system with delays, i.e., each transition takes place according to the defaultmatrix D with a certain probability 1 − θ and according to the delayed matrix Pd, withprobability θ ∈ [0, 1). The dynamic of such a system is thus given by

∀k ≥ 0 : Vθ(k + 1) = Xθ(k + 1)¯ Vθ(k),

Page 128: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

116 5. A Class of Non-Conventional Algebras with Applications in OR

µ´¶³µ´¶³

µ´¶³

?

D

-

¾•

••

σ

τ

Default System

µ´¶³µ´¶³

µ´¶³

6

Pd

-

¾•

••

σ

τ+δ

Delayed System

µ´¶³µ´¶³

µ´¶³

6

Pb

-

¾•σ

•¡¡¡@@@

Breakdown System

τ

Fig. 5.1: A multi-server, cyclic network with perturbations (delay/breakdown).

where Vθ(0) = V (0) = 0 and Aθ(k) : k ≥ 1 is a sequence of i.i.d. random matriceshaving their common distribution given by

∀θ ∈ [0, 1] : µθ = (1− θ) · δD + θ · δPd.

Secondly, we consider a system with random breakdowns which is actually defined inthe same way as the first one, but one replaces the perturbation matrix Pd by Pb. Thisyields for the common distribution of X(k) : k ≥ 1

∀θ ∈ [0, 1] : ηθ = (1− θ) · δD + θ · δPb.

Therefore, in both situations the kth state vector Vθ(k) is given by

∀k ≥ 1 : Vθ(k) = Xθ(k)¯ . . .¯Xθ(1)¯ V (0). (5.19)

Since X0(i) is Dp-analytic, for any for any 1 ≤ i ≤ k and p ≥ 1 (see Example 5.3), byTheorem 5.5 it follows that the product

X0(k)¯ . . .¯X0(1)

is Dp-analytic, for any p ≥ 1, and by Theorem 5.3 it holds that

Xθ(k)¯ . . .¯Xθ(1) ≡Dp

∑n≥0

∈J (k,n)

θn

1! · · · k!· [X0(k)](k) ¯ . . .¯ [X0(1)](1),

Page 129: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

5.5. Taylor Series Approximations for Stochastic Max-Plus Systems 117

for any θ ∈ [0, 1], where, for 1 ≤ i ≤ k and j ≥ 0 we have

[Xθ(i)](j) =

(1, Xθ(i),0), j = 0,

(1, Pd, D), j = 1,

(1,0,0) j ≥ 2;

see Example 5.2. It follows that for n > k the nth-order derivatives of the productX0(k)¯ . . .¯X0(1) are not significant. Hence, the Taylor series is finite.

Fix now a finite horizon k ≥ 1. In the following we illustrate how weak analyticityof the product X0(k) ¯ . . . ¯ X0(1) can be used to compute the expected values of thecomponents V i

θ (k), for 1 ≤ i ≤ 4, of the vector Vθ(k). To this end, define, for 1 ≤ i ≤ 4,the mappings gi : M4 → R as follows:

∀X ∈ M4 : gi(X) = (X ¯ V (0))i

and note that gi ∈ Dp for each p ≥ 1 and 1 ≤ i ≤ 4 and from (5.19) we conclude that

∀1 ≤ i ≤ 4 : V iθ (k) = gi(Xθ(k)¯ . . .¯Xθ(1)).

Therefore, it follows that

E[V i

θ (k)]

= E [gi(Xθ(k)¯ . . .¯Xθ(1))]

=k∑

n=0

∈J (k,n)

θn

1! · · · k!E

[gι

i

([X0(k)](k) ¯ . . .¯ [X0(1)](1)

)]

=k∑

n=0

uk(n)θn, (5.20)

where, for 0 ≤ n ≤ k, we set

uk(n) :=∑

∈J (k,n)

1

1! · · · k!E

[gι

i

([X0(k)](k) ¯ . . .¯ [X0(1)](1)

)]

=∑

∈J (k,n)

1

1! · · · k!gι

i

([X0(k)](k) ¯ . . .¯ [X0(1)](1)

), (5.21)

since the product [X0(k)](k) ¯ . . .¯ [X0(1)](1) is deterministic.To evaluate the coefficients uk(n) : 0 ≤ n ≤ k let us introduce now the following

notations: [k] := 1, 2, . . . , k and for I ⊂ [k] denote by |I| its cardinal number and set

ΠI := Bk ¯ . . .¯B1,

where

Bi =

Pd, i ∈ I,

D, i /∈ I.

Page 130: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

118 5. A Class of Non-Conventional Algebras with Applications in OR

For instance, we have Π∅ = Dk,

∀1 ≤ i ≤ k : Πi = Di−1 ¯ Pd ¯Dk−i

1 ≤ i < j ≤ k : Πi,j = Di−1 ¯ Pd ¯Dj−i−1 ¯ Pd ¯Dk−j

and Π[k] = P kd . Since [X0(i)]

(i) is non significant for i ≥ 2, from (5.21) it follows that

∀0 ≤ n ≤ k : uk(n) =∑

gιi

([X0(k)](k) ¯ . . .¯ [X0(1)](1)

),

where the above sum is taken with respect to all = (1, . . . , k) ∈ J (k, n) for which allcomponents satisfy 1, . . . , k ∈ 0, 1. Moreover, if we set

I := i ∈ [k] : i = 1it follows that

∀0 ≤ n ≤ k : uk(n) =∑

|I|=n

gιi

([X0(k)](k) ¯ . . .¯ [X0(1)](1)

).

Introducing now the sums

∀1 ≤ i ≤ 4, 0 ≤ n ≤ k : σi(n) =∑

|I|=n

gi(ΠI),

we conclude from (5.21) that the coefficients of the Taylor series satisfy

uk(n) =n∑

l=0

(−1)n−l

(k − l

n− l

)σi(l)

and from (5.20) we conclude that for any 1 ≤ i ≤ 4 it holds that

∀θ ∈ [0, 1] : E[g(V iθ (k))] =

k∑n=0

[n∑

l=0

(−1)n−l

(k − l

n− l

)σi(l)

]θn. (5.22)

Remark 5.3. One could also arrive to (5.22) by using the equality

E[g(V iθ (k))] =

I⊂[k]

gi(ΠI)θ|I|(1− θ)k−|I| =

k∑n=0

σi(n)θn(1− θ)k−n.

and by calculating the coefficients of θm, for 1 ≤ m ≤ k, in the right-hand side above.

Example 5.4. For a numerical example set: k = 10, σ = 14, τ = 24 and δ = 7. A graphicrepresentation of the Taylor polynomials of degree 1, 2 and 3, respectively, along with thetrue expected value of the first component of the state-vector Vθ(10) for the system withdelays can be seen in Figure 5.2. For the system with breakdowns one can use a similarreasoning by replacing Pd by Pb. The corresponding Taylor polynomials of degree 1, 2 and3, along with the true expected value of the first component of the state-vector Vθ(10) arerepresented in Figure 5.3. In both pictures the thick line represents the true value.

Page 131: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

5.5. Taylor Series Approximations for Stochastic Max-Plus Systems 119

-θ0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50

6

150

155

160

165p=1

p=2

p=3

.

............................................................................................................................

............................................................................................................................

............................................................................................................................

............................................................................................................................

............................................................................................................................

............................................................................................................................

............................................................................................................................

............................................................................................................................

............................................................................................................................

............................................................................................................................

.

..............................................................

..............................................................

.............................................................

..............................................................

.............................................................

............................................................

............................................................

............................................................

............................................................

...........................................................

...........................................................

...........................................................

...........................................................

..........................................................

..........................................................

..........................................................

.........................................................

..................................................................................................................

........................................................

.

............................................................................................................................

...................................................

............................................................... ........ .......... .......

........... ...............................

...........................................

......................................................

.............................................

............................... ........... ........ ........

.....................................

........................................

............................................

......................................................

................................

...................................

.......................................................................................................................

.......................................................................................................................

.......................................................................................................................

........................................................................................................................

.............................................

............................................

...........................................

.........................................................................

...........................................................

......................................................................................

......................................................................................

.....................................................................................

.....................................................................................

....................................................................................

....................................................................................

...................................................................................

Fig. 5.2: Taylor approximations of orders 1, 2 and 3 along with the true value of E[V 1

θ (10)]

(thick line), for the system with delays.

-θ0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50

6

150

160

170

180

p=1

p=2

p=3

.

.........................................................................................................................

.........................................................................................................................

.........................................................................................................................

.........................................................................................................................

.........................................................................................................................

.........................................................................................................................

.........................................................................................................................

.........................................................................................................................

.........................................................................................................................

.........................................................................................................................

.

.........................................................................................................................

.........................................................................................................................

.................................................

................................................... ........... ........ .......

.........................................................................................................................

.........................................................................................................................

.........................................................................................................................

.........................................................................................................................

....... ........ ........... ...............................

...........................................

..........................

.........................................................................................................................

.........................................................................................................................

.

.........................................................................................................................

.................................................

................................................... ........... ........ .......

..........................................................................................................................

...............................

.....................................................

........................................

..............................................................

..............................................................

..............................................................

...............................................................

...............................................................

...............................................................

................................

.............................

..........................

.........................................

.............................

................................

..................................

....................................

..................................................................

..................................................................

.

......................................................................................

.................. ................ .............. ............ .......... ........ ............

......................................................................................

............ ........ .......... ............ .............. ...................................................................... .............. ............ .......... ........ ............ .............

..................................

...........................................

............................................................................. .............

.............................................

.............................................

.............................................

.............................................

.............................................

..............................................

Fig. 5.3: Taylor approximations of orders 1, 2 and 3 along with the true value of E[V 1

θ (10)]

(thick line), for the system with breakdowns.

Page 132: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

120 5. A Class of Non-Conventional Algebras with Applications in OR

5.5.2 SAN Modeled as Max-Plus-Linear Systems

Let us re-visit the case of stochastic-activity-network models described in Section 4.3.Recall that an SAN can be described as a directed, acyclic, graph (V , E ⊂ V × V), withone source and one sink node and additive weight mapping τ : E → R. For convenienceV = 1, 2, . . . , n and, since we deal with a directed, acyclic graph the nodes can beordered such that whenever i is connected to j, i.e., (i, j) ∈ E , it holds that i < j.

Recall that the completion time of a SAN is given by the weight of a “critical” path(where critical can be maximal or minimal, according to the situation). For i ∈ V letus denote by ti the completion time of the ith node, i.e., the completion time of theSAN obtained by removing the nodes k : k ≥ i + 1 and the adjacent edges form theoriginal graph. In general, computing the completion times of large SAN might be verydemanding. Classical exhaustive “walk-through graph” methods are hard to implement.That is because the set of paths from source to sink node may become very large (it mayhave up to 2n elements).

Alternatively, one can model a SAN as a dynamic max-plus system and compute thevector of completion times t := (t1, . . . , tn)t, where for a matrix A we denote by At itstransposition, using the following scheme.

Algorithm 1. The following algorithm yields the vector of completion times in a SAN:

1. Construct the incidence n× n matrix A of the given graph.

2. For i = 1 up to n, consider the matrix A(i), obtained from the identity matrix I byreplacing the ith row with the ith row of A.

3. Denote by e1 the first unit vector (0, ε, . . . , ε)t and set

t := A(n)¯ A(n− 1)¯ · · · ¯ A(1)¯ e1.

4. Recover the completion time of the ith node of the SAN from the ith component ofthe vector t.

Remark 5.4. The incidence matrix A is sub-diagonal and it follows that A(i) differsfrom the identity matrix by at most (i−1) entries. Moreover, since A(1) = I, the identitymatrix, i.e., t1 = e, it can be omitted. Finally, provided that the weights τ(e) : e ∈ Eare mutually independent, the matrices A(i), for 1 ≤ i ≤ n are stochastically independent.

For instance, let us consider the SAN example studied in Section 4.3 where Xi, for1 ≤ i ≤ 7 denote the weights (durations) of the subsequent activities (see Figure 4.1) andrecall that ε := −∞. Then the vector of completion times for this SAN can be obtainedby considering the following matrices:

A(2) =

0 ε ε ε εX1 0 ε ε εε ε 0 ε εε ε ε 0 εε ε ε ε 0

, A(3) =

0 ε ε ε εε 0 ε ε ε

X2 X3 0 ε εε ε ε 0 εε ε ε ε 0

,

Page 133: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

5.5. Taylor Series Approximations for Stochastic Max-Plus Systems 121

A(4) =

0 ε ε ε εε 0 ε ε εε ε 0 ε εε X4 X6 0 εε ε ε ε 0

, A(5) =

0 ε ε ε εε 0 ε ε εε ε 0 ε εε ε ε 0 εε X5 ε X7 0

.

It is easy to check that the following matrix-vector product in max-plus algebra

t := A(5)¯ A(4)¯ A(3)¯ A(2)¯ e

yields the vector of completion times for the SAN under consideration. More specifically,t = (t1, t2, t3, t4, t5) ∈ [0,∞)5 has the property that ti equals to the completion time atthe ith node. In particular, the completion time T of the full network is given by t5, i.e.,T = t5. It follows that the expected completion time T of the SAN can be written as

E[T ] = E [(A(5)¯ A(4)¯ A(3)¯ A(2)¯ e)5] . (5.23)

Recall that in Section 4.3 the weights Xi, for 1 ≤ i ≤ 7, were independent exponentiallydistributed random variables with rates λi, respectively. We have assumed further thatλ1 = λ3 = θ while the other rates are independent of θ. In the following we formalizethe reasoning put forward in Section 4.3, i.e., performing Taylor series approximationsof the expected completion time Tθ of the SAN with respect to parameter θ, in terms ofDp-differential calculus for random matrices.

To start with, note that for each 1 ≤ i ≤ 5 the mapping

A ∈ M5 : A 7→ [A¯ e]i

belongs to any Dp-space, for p ≥ 0. Therefore, by Theorem 5.5 it follows that analyticityof E[Tθ] follows from Dp-analyticity of the product

Aθ := Aθ(5)¯ Aθ(4)¯ Aθ(3)¯ Aθ(2), (5.24)

where we use the notation Aθ(i) instead of A(i) in order to illustrate the dependenceof their distributions on θ. We agree that Aθ(i) is constant if its distribution does notdepend on θ. Note that, in this case, only Aθ(2) and Aθ(3) depend on θ.

Since the exponential distribution is weakly [D]v-analytic, for any polynomial v (seeExample 4.3), it follows by Theorem 5.4 that the matrices Aθ(i), for 2 ≤ i ≤ 5, areDp-analytic and by applying Theorem 5.5 we conclude that the product Aθ in (5.24) isDp-analytic. In addition, for each ξ such that |ξ| < θ it holds that

Aθ+ξ ≡Dp

∞∑n=0

ξn

n!· A(n)

θ . (5.25)

To compute the derivatives A(n)θ one can use Lemma 5.2. To this end, let us consider

two sequences X1,l : l ≥ 1 and X3,l : l ≥ 1 of i.i.d. random variables having exponen-tial distribution with rate θ and let Tj,k denote the completion time of the modified SAN,

Page 134: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

122 5. A Class of Non-Conventional Algebras with Applications in OR

i.e., if one replaces X1 by∑j

l=1 X1,l and X3 by∑k

l=1 X3,l; see Section 4.3. For instance, ifwe set

∀n ≥ 0 : Sn3 :=

n∑

l=1

X3,l,

the derivatives A(n)θ (3) of Aθ(3) can be expressed as (c

(n)θ (3), A

(n+)θ (3), A

(n−)θ (3)) where, for

each n ≥ 1, cnθ (2) = n!

θn and

A(n,+)θ (3) =

0 ε ε ε εε 0 ε ε ε

X2 Sn3 0 ε ε

ε ε ε 0 εε ε ε ε 0

, A(n,−)θ (3) =

0 ε ε ε εε 0 ε ε ε

X2 Sn+13 0 ε ε

ε ε ε 0 εε ε ε ε 0

,

for n odd and

A(n,+)θ (3) =

0 ε ε ε εε 0 ε ε ε

X2 Sn+13 0 ε ε

ε ε ε 0 εε ε ε ε 0

, A(n,−)θ (3) =

0 ε ε ε εε 0 ε ε ε

X2 Sn3 0 ε ε

ε ε ε 0 εε ε ε ε 0

,

for n even. One can proceed similarly for calculating the derivatives A(n)θ (2) of Aθ(2), for

n ≥ 1.Now Theorem 5.3 allows us to compute the higher-order derivatives of the product

Aθ in (5.24). Since only A(2) and A(3) depend on θ, it follows that the higher-orderderivatives of E[Tθ] can be written as follows:

∀n ≥ 1 :dn

dθnE[Tθ] =

dn

dθnE

[g5

(A

(n)θ

)]

=dn

dθnE [g5 (Aθ(5)¯ Aθ(4)¯ Aθ(3)¯ Aθ(2))]

=∑

j+k=n

E[gι5

(Aθ(5)¯ Aθ(4)¯ A

(j)θ (3)¯ A

(k)θ (2)

)]

= (−1)n n!

θn

j+k=n

E[Tj+1,k+1 − Tj+1,k − Tj,k+1 + Tj,k],

where g5 : M5 → R, g5(X) = [X ¯ e]5, i.e., we obtain the following sequence of Taylorpolynomials

Tn(θ, ξ) :=n∑

m=0

(−1)m

θ

)m ∑

j+k=m

Eθ[Tj+1,k+1 − Tj+1,k − Tj,k+1 + Tj,k],

for n ≥ 0 and |ξ| < θ. The above Taylor series is identical to the one in (4.17). Hence,we can proceed just like in Section 4.3.

Page 135: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

5.6. Concluding Remarks 123

5.6 Concluding Remarks

In this chapter we have considered parameter-dependent stochastic systems whose phys-ical evolution is modeled by a matrix-vector multiplication in some general algebra. Toanalyze these systems we have adapted the measure-valued differentiation theory to ran-dom matrices and it turned out that, in the Dp-space setting, a weak differential calculus,similar to the classical one, holds true for random matrices. The key result of this chap-ter states that Dp-differentiability (resp. analyticity) of random matrices Xθ and Yθ isinherited by their generalized product Xθ¯Yθ for a certain class of matrix multiplicationoperators ¯. Based on this differential calculus we have derived Taylor series approxima-tions for DES.

As illustrated in this chapter, Taylor series approximations provide rather accurateestimations. A similar problem, for (max-plus)-linear stochastic systems with parameter-dependent Poisson input has been addressed in [7] where the coefficients of the Taylorseries appear as the expectations of polynomials of some input variables of the system.In addition, the method was successfully applied to derive Taylor series expansions forLyapunov exponents of ergodic (max-plus)-linear systems. In [6] Taylor series expan-sions are obtained for the max-plus Lyapunov exponent of an i.i.d. sequence of Bernoullidistributed random matrices (in particular for the network with breakdowns presented inSection 5.5.1), where the derivatives are evaluated using specific max-plus techniques suchas backwards coupling. A theory of Taylor series expansion of products in the (max-plus)algebra is provided in [21].

The analysis put forward in this chapter is meant to be a first step into developinga general theory to comprise a wider range of applications. In this sense, challengingtopics for future research are, for instance, to adapt the theory of weak differentiationto the random horizon setting (in order to construct Taylor series approximations forLyapunov exponents, whose existence in the case of generalized linear stochastic systemscan be shown by using sub-additive ergodic theory; see, e.g., [37, 21]), to develop efficientalgorithms for evaluating the derivatives based on the particularities of the model and toobtain accurate estimates for the error of the Taylor polynomials.

Page 136: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES
Page 137: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

APPENDIX

A. Convergence of Infinite Series of Real Numbers

Let us consider a sequence an : n ≥ 0 of real numbers and consider the infinite series∑∞n=0 an. The series is said to be convergent if the sequence Sn defined as

∀n ≥ 0 : Sn :=n∑

k=0

ak

is convergent in R and is said to be divergent otherwise. The limit of the sequence Snn≥0

(provided that it exists) is called the sum of the series. In addition, the convergent series∑∞n=0 an is said to be absolutely convergent if the series

∑∞n=0 |an| is convergent and we

call it conditionally convergent if∑∞

n=0 |an| is divergent.

Theorem A.1. The Rearrangements Theorem: Let∑∞

n=0 an be a convergent series.Then,

(i) If the series is absolutely convergent then for any permutation σ of the set of non-negative integers the series

∑∞n=0 aσ(n) is convergent and has the same sum as the

original series.

(ii) If the series is conditionally convergent then for any S ∈ R there exists a permutationσ such that the series

∑∞n=0 aσ(n) converges to S.

Theorem A.2. Cauchy-Hadamard Theorem: Let an : n ≥ 0 be a sequence of realnumbers and let

R :=1

lim supn→∞

n√|an|

,

where we agree that 1/∞ = 0 and 1/0 = ∞. Then, the power series

∞∑n=0

anξn

is absolutely convergent, uniformly with respect to |ξ| < R.

For a proof of these results see, e.g., [53].

Page 138: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

126 Appendix

B. Interchanging Limits

In this section we state two standard results from classical analysis which establish suffi-cient conditions for interchanging limits with continuity and differentiability.

Theorem B.1. Interchanging Limits: Let am,n : m,n ∈ N be a double-indexedsequence of real numbers such that am,n converges to some limit cn, for m →∞, uniformlywith respect to n ∈ N. If, in addition, the limit

bm := limn→∞

am,n

exists for each m ∈ N, then the sequence bmm converges and it holds that

limn→∞

cn = limm→∞

bm.

That is, interchanging of limits is justified and we have

limn→∞

limm→∞

am,n = limm→∞

limn→∞

am,n.

Theorem B.2. Interchanging Limit and Continuity: Assume that fn : A ⊂ S→ R,for n ∈ N, defines a sequence of functions which converge uniformly on A to some functionf and let x be an accumulation point of A. If, in addition, the limit

Ln := lims→x

fn(s)

exists for each n ∈ N, then the sequence Lnn converges and it holds that

lims→x

f(s) = limn→∞

Ln.

That is, interchanging the limit with continuity is justified and we have

lims→x

limn→∞

fn(s) = limn→∞

lims→x

fn(s).

Theorem B.3. Interchanging Limit and Differentiation: Let A ⊂ R be a compactinterval and consider a sequence of functions fn : A → R, for n ≥ 1, satisfying:

(i) fn is differentiable on A for each n ≥ 1,

(ii) there exists some x0 ∈ A such that the sequence fn(x0)n≥1 converges.

If the sequence f ′nn≥1 converges uniformly on A, then the sequence fnn≥1 convergesuniformly on A, to some function f , and we have

∀x ∈ A : f ′(x) = limn→∞

f ′n(x).

Theorem B.4. Interchanging Limit and Integration: If fn : [a, b] → R, for n ≥ 1,is a sequence of Riemann integrable functions that converges uniformly on [a, b] then thelimit f : [a, b] → R is Riemann integrable and it holds that

∫ b

a

f(x)dx = limn→∞

∫ b

a

f(x)dx.

For a proof of these results we refer to [53].

Page 139: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

Appendix 127

C. Measure Theory

In this section we list a few standard results on measure theory. In what follows weassume that S is a metric space endowed with the Borel σ-field S. If µ ∈M we say thata property holds true almost everywhere with respect to |µ|, and we write |µ|-a.e., if theproperty holds true for each s ∈ S except on a set A ∈ S such that |µ|(A) = 0.

Theorem C.1. Dominated Convergence Theorem: Let µ ∈ M and assume thatfn : n ∈ N ⊂ L1(|µ|) is a sequence of functions such that fn → f , |µ|-a.e. and thereexist g ∈ L1(|µ|) such that |fn| ≤ g, |µ|-a.e., for all n ∈ N. Then,

limn→∞

∫fn(x)µ(dx) =

∫f(x)µ(dx).

Theorem C.2. Monotone Convergence Theorem: Let µ ∈ M and assume thatfn : n ∈ N ⊂ L1(|µ|) is a sequence of non-negative functions such that fn → f , |µ|-a.e.and satisfying fn ≤ fn+1, |µ|-a.e., for all n ∈ N. Then,

limn→∞

∫fn(x)µ(dx) =

∫f(x)µ(dx).

Theorem C.3. Radon-Nikodym Theorem: Let µ be a positive measure on S and νbe a finite signed measure on S. If |ν| is absolutely continuous with respect to µ then thereexists f ∈ L1(µ) such that

∀A ∈ S : ν(A) =

∫f(x)IA(x)µ(dx).

f is called the Radon-Nikodym derivative and is unique µ-a.e.

Theorem C.4. Lebesgue Decomposition Theorem: Let λ be a positive measure onS and ν be a finite signed measure on S. Then there exist some uniquely determined finitesigned measures µ, κ such that

• |µ| is absolutely continuous with respect to λ,

• |κ| and λ are orthogonal,

• ν = µ + κ.

For the proof of the above results we refer to [14].

Page 140: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

128 Appendix

D. Conditional Expectations

This section is intended to list the definition and the basic properties of the conditionalexpectation as stated in any standard textbook on probability theory.

Theorem D.1. Existence of the Conditional Expectation: Let (Ω,K,P) be a prob-ability field and B ⊂ K be a σ-field. For any random variable X ∈ L1(K,P) there existsa P-a.s. unique random variable Z ∈ L1(B,P), denoted by E[X|B], such that

B ∈ B : E[ZIB] = E[XIB].

The random variable Z is called the conditional expectation of X with respect to B.If Y ∈ L1(K,P) then we define the conditional expectation of X with respect to Y asE[X|Y ] := E[X|σ(Y )], where σ(Y ) denotes the σ-field generated by Y .

The conditional expectation acts as a projection operator from L1(K,P) onto L1(B,P).As an operator, the conditional expectation enjoys the following basic properties3:

(a) Is identic when restricted to L1(B,P), i.e., if X ∈ L1(B,P) then E[X|B] = X.

(b) Is idempotent, i.e., E [E[X|B] |B ] = E[X|B]

(c) Preserves the total expectation, i.e., E [E[X|B]] = E[X].

(d) Is linear, i.e., ∀u, v ∈ R : E[uX + vY |B] = uE[X|B] + vE[Y |B].

(e) It is positive (monotone), i.e., if X ≥ 0 then E[X|B] ≥ 0 and, more generally

X ≤ Y =⇒ E[X|B] ≤ E[Y |B].

(f) Is contractive (in particular, continuous), i.e., |E[X|B]| ≤ E[|X||B], which implies

‖E[X|B]‖L1 ≤ ‖X‖L1 .

(g) Is consistent with σ-fields embedding, i.e., if A ⊂ B is a subfield then

E [E[X|B] |A ] = E[X|A].

(h) If X ∈ L1(B,P) is bounded and Y ∈ L1(K,P) then E[XY |B] = XE[Y |B] and itfollows that E [XE[Y |B]] = E[XY ].

(i) If X is independent of B then E[X|B] = E[X].

(j) Z = E[X|B] if and only if Z ∈ L1(B,P) and for each B-measurable random variableY it holds that E[ZY ] = E[XY ]. In particular, Z = E[X|Y ], for some Y ∈ L1(K,P),if and only if Z ∈ L1(σ(Y ),P) and for each Borel measurable function f

E[Zf(Y )] = E[Xf(Y )].

3 Note that the above equalities hold P-a.s.

Page 141: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

Appendix 129

E. Fubini Theorem and Applications

The following theorem is the basis for calculating multiple integrals, i.e., integrals withrespect to finite products of measures, in measure and probability theory (for a proof see,e.g., [14]).

Theorem E.1. Fubini Theorem: Let (S,S, µ) and (T, T , η) be σ-finite measure spaces.If g ∈ L1(µ × η) then g(s, ·) ∈ L1(η) µ-a.e. and g(·, t) ∈ L1(µ) η-a.e. Furthermore, themappings s 7→ ∫

g(s, t)η(dt) and t 7→ ∫g(s, t)µ(ds) belong to L1(µ) and L1(η), respectively

and it holds that∫

g(s, t)(µ× η)(ds, dt) =

∫ [∫g(s, t)η(dt)

]µ(ds) =

∫ [∫g(s, t)µ(ds)

]η(dt).

The following result, which is useful for calculating conditional expectations, followsfrom Fubini Theorem.

Lemma E.2. Let X and Z be independent random variables, defined on a common prob-ability space, taking values in measurable spaces (S,S) and (T, T ), respectively. For anybounded Borel measurable mapping Φ defined on S× T the function

∀x ∈ S : φ(x) := E [Φ(x, Z)]

is measurable on S and it holds that

E[Φ(X, Z)

∣∣X]= φ(X), a.s.

Proof. Let us denote by µ and η the distributions of X and Z, respectively. It followsthat φ satisfies

∀x ∈ S : φ(x) =

∫Φ(x, z)η(dz)

and measurability of φ follows from Fubini Theorem. Therefore, φ(X) is σ(X)-measurable.Let us consider an arbitrary σ(X)-measurable random variable Y , i.e., Y = f(X), for

some Borel function f . Then, using again Fubini Theorem, one can show that

E [Φ(X, Z)Y ] = E [Φ(X,Z)f(X)] =

∫ [∫Φ(x, z)f(x)µ(dx)

]η(dz)

=

∫f(x)

[∫Φ(x, z)η(dz)

]µ(dx)

= E [f(X)φ(X)] = E [φ(X)Y ] ,

which, in accordance with property (j) of conditional expectations (see Section D of theAppendix), concludes the proof.

Page 142: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

130 Appendix

F. Weak Convergence of Measures

In the following we assume that µ, µn : n ≥ 1 are probability measures on some metricspace (S, d) and we denote by “⇒” the classical weak convergence of probability measures,i.e., µn ⇒ µ if

∀g ∈ CB(S) : limn→∞

∫g(s)µn(ds) =

∫g(s)µ(ds).

Theorem F.1. The Portmanteau Theorem: The following assertions are equivalent:

(i) µn ⇒ µ.

(ii) For any closed subset F of S it holds that lim supn µn(F ) ≤ µ(F ),

(iii) For any open subset G of S it holds that lim infn µn(G) ≥ µ(G),

(iv) If A is a continuity set of µ, i.e., µ(∂A) = 0 then limn µn(A) = µ(A).

Theorem F.2. The Extension Theorem: If µn ⇒ µ then for any measurable mappingg satisfying:

(i) g is uniformly integrable with respect to µn : n ≥ 1,(ii) the set of discontinuities Dg of g satisfies µ(Dg) = 0,

it holds that

limn→∞

∫g(s)µn(ds) =

∫g(s)µ(ds).

Theorem F.3. Prokhorov Theorem: Assume that the metric space (S, d) is separable.

(i) Any tight family of probability measures is relatively compact4 with respect to thetopology induced by the weak convergence.

(ii) If, in addition, (S, d) is complete then any relatively compact family of probabilitymeasures is tight.

For a proof of these results see, e.g., [8].

4 i.e., every sequence has an weakly convergent subsequence.

Page 143: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

Appendix 131

G. Functional Analysis

Here we list several standard results in functional analysis which are mentioned in thisthesis. For the proofs of the results stated below we refer to [19].

Theorem G.1. Banach-Steinhaus Theorem: Let U be a norm space, V be a Banachspace and let

Φn : n ∈ N ⊂ LB(V ,U)

be a sequence of bounded operators such that

∀x ∈ V : supn∈N

‖Φn(x)‖U < ∞.

Then it holds that supn ‖Φn‖ < ∞.

Theorem G.2. Banach-Alaoglu Theorem: Let V be a norm space and let us denoteby V∗ its topological dual. Then, the set

Φ ∈ L(V ,R) : ‖Φ‖ ≤ 1 ⊂ V∗

is compact in the weak-* topology. In particular, it follows that any strongly boundedsubset of V∗ is relatively compact, i.e., its closure is compact, in the weak-* topology.

In the following, we assume that (S, d) is a locally compact metric space and wedenote by CK(S) ⊂ C(S) the linear space of continuous functions f with compact support,i.e., there exists some compact K ⊂ S such that f(s) = 0 for each s /∈ K. Note that,by Weierstrass’s Theorem, any continuous function is bounded on compact sets and itfollows that CK(S) ⊂ CB(S). The following result shows that the topological dual of anylinear space which includes CK(S) is a subspace of M(S), i.e., a space of measures.

Theorem G.3. Riesz Representation Theorem: If T : CK(S) → R is a linearfunctional on CK(S) there exist a unique Radon measure µ ∈M(S) such that

∀f ∈ CK(S) : Tf =

∫f(s)µ(ds).

In addition, the operator norm of T coincides with the total variation norm of µ, i.e.,

‖T‖ = ‖µ‖ = |µ|(S)

and it follows that the functional T is bounded (in particular, continuous) if and only ifµ is a finite measure, i.e., ‖µ‖ < ∞.

Page 144: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

132 Appendix

H. Overview of Weakly Differentiable Distributions

Name (Base) Distribution (µθ) Weak Derivative (µ′θ) cθ µ+θ µ−θ

Bernoulli: βθ

(F , vp), p ≥ 0(1− θ) · δx1 + θ · δx2

δx2 − δx11 δx2 δx1

Binomial: B0n,θ

(F , vp), p ≥ 0

∑nj=0

(nj

)(1−θ)jθn−j ·δj

∑nj=0

(nj

)(1−θ)j−1

θ1+j−n [n(1−θ)−j]·δj

nθ B0

n,θ B1n,θ

Poisson: P 0θ

(F , vp), p ≥ 0

∑∞n=0

θn

n!e−θ · δn

∑∞n=0

nθn−1−θn

n!e−θ · δn 1 P 1

θ P 0θ

mixed: µθ

(F , vp), p ≥ 0(1− θ) · µ + θ · η η − µ 1 η µ

exponent: ε1,θ

(F , vp), p ≥ 0θe−θxI(0,∞)(x)dx (1− θx)e−θxI(0,∞)(x)dx 1

θε1,θ ε2,θ

uniform: ψθ

(C, vp), p ≥ 0

1θI[0,θ)(x)dx 1

θ· δθ − 1

θ2 I[0,θ)(x)dx 1θ

δθ ψθ

Pareto: πθ

(C, vp), p < β

βθβ

xβ+1 I(θ,∞)(x)dx β2θβ−1

xβ+1 I(θ,∞)(x)dx− βθ· δθ

βθ

πθ δθ

Gaussian: γθ

(F , vp), p ≥ 01

θ√

2πe−

(x−a)2

2θ2 dx (x−a)2−θ2

θ4√

2πe−

(x−a)2

2θ2 dx1θ

mθ γθ

Tab. 5.1: An overview on differentiability properties.

Table 5.1 presents weak derivatives of some distributions on R, commonly used in practice. Foreach distribution, an instance of a weak derivative and a suitable Banach-base are provided. Continuousdistributions are given by means of their Lebesgue densities. The following notation has been used:

• Bkn,θ, for 0 ≤ k ≤ n, denotes the distribution of the number of successes in a sequence of n

independent Bernoulli experiments with probability of success θ, conditioned on the event thatthe first k experiments were successful, i.e., Bk

n,θ =∑n−k

j=0

(n−k

j

)θn−k−j(1− θ)j · δk+j .

• P kθ , for k ≥ 0, denotes the k-units shift of the Poisson distribution, i.e., the distribution of X + k,

where X is an Poisson variable with rate θ. In formula: P kθ :=

∑∞n=0

θn

n! e−θ · δk+n.

• εn,θ, for n ≥ 1, denotes the Erlang distribution, i.e., the distribution of the sum of n independentexponential variables with rate θ. In formula: θnxn−1

(n−1)! e−θxI(0,∞)(x)dx.

• mθ denotes the double-sided Maxwell distribution. Precisely, if (X, Y, Z) is a 3-dimensional vectorsuch that its components are independent standard gaussian variables and V denotes its magnitude,i.e., V =

√X2 + Y 2 + Z2 then mθ denotes the distribution of a+θSV , where S is a variable taking

values ±1 with probability 1/2, independent of V . In formula: mθ(dx) := (x−a)2

θ3√

2πe−

(x−a)2

2θ2 dx.

Page 145: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

SUMMARY

MEASURE–VALUED DIFFERENTIATION FOR FINITE PRODUCTS OFMEASURES:

THEORY AND APPLICATIONS

This thesis is devoted to the theory of weak differentiation of measures. The basicobservation is that, formally, the weak derivative of a parameter-dependent probabilitydistribution µθ is in general a finite signed measure which can be represented as there-scaled difference between two probability distributions. This fact allows for a usefulrepresentations of the derivative d

dθEθ[g(X)] of the expected value Eθ[g(X)], for some

predefined class D of cost-functions g, where X is a random variable with distribution µθ.Many mathematical models are described by a finite family of independent random

variables and this is the reason why differentiability properties as well as representationsfor weak derivatives of product measures are studied in this thesis. To develop the theory,concepts and results from measure theory and functional analysis are required and thenecessary prerequisites are presented in Chapter 1.

In Chapter 2 we develop the theory of first-order differentiation. Main results, suchas the product rule of weak differentiation and a representation theorem for the weakderivatives of product measures, are established. A product rule for weak differentiationof probability measures was conjectured (without a proof) in [48]. At the end of thechapter two gradient estimation examples are provided.

In Chapter 3 we illustrate how the theory of measure-valued differentiation can be ap-plied in order to establish bounds on perturbations for general stochastic models. Specialattention is paid to the sequence of waiting times in the G/G/1 queue for which we showthat the strong stability property holds true provided that the service-time distributionis weakly differentiable with respect to some class of sub-exponential cost-functions.

In Chapter 4 we extend our analysis to higher-order differentiation, which leads us toestablish a measure-valued differential calculus. Analyticity issues are also treated andTaylor series approximation examples are provided.

Eventually, in Chapter 5 we apply the results established in Chapter 4 to the class ofdiscrete event systems whose state dynamic can be formalized into a matrix-vector multi-plication in some general, non-conventional algebra (e.g., max-plus or min-plus algebra).A key result shows that, for some class of polynomially bounded cost-functions, weak dif-ferentiability of two random matrices Xθ and Yθ is inherited by their generalized productXθ ¯ Yθ, which allows us to develop a weak differential calculus for random matrices.

Page 146: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES
Page 147: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

SAMENVATTING

MAATWAARDIGE DIFFERENTIATIE VOOR EINDIGE PRODUCTMATEN:THEORIE EN TOEPASSINGEN

In dit proefschrift wordt een theorie van zwakke differentiatie van kansverdelingen gep-resenteerd. De fundamentele observatie is dat de zwakke afgeleide van een kansverdelingµθ, die afhankelijk is van een parameter θ, kan herschreven worden als het gescaleerdverschil tussen twee kansverdelingen. Dit feit leidt naar nuttige representaties van deafgeleide d

dθEθ[g(X)] van de gemiddelde waarde Eθ[g(X)], voor iedere g uit een vooraf

gedefinieerd onderverdeling D van kostenfuncties, waarbij X is een stochastische vari-abele met kansverdeling µθ.

Veel wiskundige modellen zijn beschreven door een eindige verzameling van onafhanke-lijk stochastische variabelen en dit is de reden waarom zwakke afgeleiden van productenvan kansverdelingen in dit proefschrift onderzocht worden. Voor de opbouw van de theo-rie zijn resultaten uit de maattheorie en de functionaal analyse nodig. Deze worden danook in Hoofdstuk 1 voorgesteld.

Hoofdstuk 2 behandelt eerste-orde zwakke differentiatie. Hoofdresultaten zoals de pro-ductregel voor zwakke differentiatie en de representatiestelling van het zwakke afgeleidenvan productmaten worden aangetoond. Een productregel voor zwakke differentiatie vankansverdelingen was verondersteld (zonder bewijs) in [48]. Twee voorbeelden van gradientschatters beeindingen dit hoofdstuk.

In Hoofdstuk 3 laten we zien hoe de theorie van de differentiatie van kansverdelin-gen kan worden toegepast om bovengrenzen voor storingen van parameter-afhankelijkestochastische modellen te berekenen. Bijzondere aandacht wordt besteed aan de wacht-tijden van het G/G/1 wachtrijsysteem. Het hoofdresultaat van dit hoofdstuk laat ziendat zwakke differentieerbaarheid van de bedientijden, “sterke stabiliteit” van de station-aire kansverdeling van de wachttijden geeft, met betrekking tot een bepaalde klasse vansub-exponentiele kostenfuncties.

In Hoofdstuk 4 breiden wij onze analyse uit naar hogere-orde differentiatie en eenzwakke differentiaalrekening voor maatwaardige functies wordt voorgesteld. Een onder-zoek op het gebied van Taylor-reeks ontwikkelingen gebaseerd op zwakke afgeleiden wordtook uitgevoerd.

Afsluitend passen wij in Hoofdstuk 5 de resultaten uit Hoofdstuk 4 toe op discrete-tijd systemen die kunnen worden beschreven door een matrix-vector vermenigvuldigingin een aantal algemene, niet-conventionele algebras (bv. max-plus of min-plus algebra).Een belangrijk resultaat is dat voor sommige klassen van polynomiaal begrensde kosten-functies zwakke differentieerbaarheid van twee stochastische matrices Xθ en Yθ de zwakkedifferentieerbaarheid van het algemeen product Xθ ¯ Yθ impliceerd. Dit feit laat ons toeom een zwakke differentiaalrekening voor stochastische matrices te ontwikkelen.

Page 148: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES
Page 149: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

BIBLIOGRAPHY

[1] Ayhan, H. and Baccelli, F. Expressions for joint Laplace transforms for stationarywaiting times in (max, +)-linear systems with Poisson input. Queueing Systems -Theory and Applications, 37(4), pp. 291–328, 2001.

[2] Ayhan, H. and Seo, D. Tail probability of transient and stationary waiting times in(max, +)-linear systems. IEEE Transaction on Automatic Control, 47, pp. 151–157,2000.

[3] Ayhan, H. and Seo, D. Laplace transform and moments of waiting times in Poissondriven (max, +)-linear systems. Queueing Systems - Theory and Applications, 37,pp. 405–436, 2001.

[4] Baccelli, F., Cohen, G., Olsder, G.J., and Quadrat, J.-P. Synchronisation and Lin-earity. John Wiley and Sons, New-York, 1992.

[5] Baccelli, F., Hasenfuß, S., and Schmidt, V. Expansions for steady-state characteris-tics of (max,+)-linear systems. Stochastic Models, 14, pp. 1–24, 1998.

[6] Baccelli, F. and Hong,D. Analytic expansions of (max,+)-linear Lyapunov exponents.Annals of Applied Probability, 10, pp. 779–827, 2000.

[7] Baccelli, F. and Schmidt, V. Taylor series expansions for Poisson-driven (max,+)-linear systems. Annals of Applied Probability, 6, pp. 138–185, 1996.

[8] Billingsley, B. Weak Convergence of Probability Measures. J.Wiley and Sons, NewYork, 1966.

[9] Bobrowski, A. Functional Analysis for Probability and Stochastic Processes. Cam-bridge University Press, Cambridge, 2005.

[10] Bourbaki, N. Integration I. Springer Verlag, New York, 2004.

[11] Buck, R.C. Bounded continuous functions on a locally compact space. MichiganMath. Journal, 5(2), pp. 95–104, 1958.

[12] Cao, X.R. The Mac-Laurin series for performance functions of Markov chains.Adv.Appl.Prob, 30, pp. 676–692, 1998.

[13] Cohen, J.E. Subadditivity, generalized products of random matrices and operationresearch. SIAM Review, 30(1), pp. 69–86, 1988.

[14] Cohn, D. Measure Theory. Birkauser, Stuttgart, 1980.

Page 150: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

138 Bibliography

[15] Cunighame-Green, R.A. Minimax Algebra. Lecture Notes in Economics and Mathe-matical Systems, vol.166. Springer-Verlag, Berlin, 1979.

[16] Dekker, R. and Hordijk, A. Average, sensitive and Blackwell optimal policies indenumerable Markov decision chains with unbounded rewards. Tech. report no. 83-36, Institute of Applied Mathematics and Computing Science, 1983.

[17] Dekker, R. and Hordijk, A. Average, sensitive and Blackwell optimal policies in denu-merable Markov decision chains with unbounded rewards. Mathematics of OperationsResearch, 13, pp. 395–421, 1988.

[18] Devroye, L.P. Inequalities for completion times of stochastic pert networks. Mathe-matics of Operation Research, 4, pp.441–447, 1979.

[19] Dunford, N. Linear Operators. Pure and applied mathematics. New York, Inter-science Publishers, 1971.

[20] Heidergott, B. Variability expansion for performance characteristics (max, +)-linearsystems. Proceedings of the 6th Workshop on Discrete Event Systems (WODES), pp.245–250, Zaragoza/Spain, 10/2002.

[21] Heidergott, B. Max-Plus linear Stochastic Systems and Perturbation Analysis. TheInternational Series of Discrete Event Systems, 15. Springer-Verlag, Berlin, 2006.

[22] Heidergott, B. and H.Leahu. Bounds on perturbations for discrete event systems.Proceedings of the 8th Workshop on Discrete Event Systems (WODES), pp. 378–383, Ann Arbor/Michigan, 07/2006.

[23] Heidergott, B. and H.Leahu. Series expansions of generalized matrix products. Pro-ceedings of the 44th IEEE Conference on Decision and Control and European ControlConference, pp. 7793–7798, Sevilla/Spain, 12/2005.

[24] Heidergott, B. and Hordijk, A. Taylor series expansions for stationary Markov chains.Advances in Applied Probability, 23, pp.1046–1070, 2003.

[25] Heidergott, B. and Hordijk, A. Single-run gradient estimation via measure-valueddifferentiation. IEEE Transactions on Automatic Control, 49, pp. 1843–1846, 2004.

[26] Heidergott, B., Hordijk, A., and Leahu, H. Strong bounds on perturbations. (toappear), 2007.

[27] Heidergott, B., Hordijk, A., and Weißhaupt, H. Measure-valued differentiation forstationary Markov chains. Mathematics of Operations Research, 31, pp. 154–172,2006.

[28] Heidergott, B. and Leahu, H. Differentiability of product measures. Research Mem-orandum 2008-5, Vrije Universiteit Amsterdam, The Netherlands, 2008.

[29] Heidergott, B., Olsder, G.J., and van der Woude, J. Max Plus at Work: Modellingand Analysis of Synchronized Systems. Princeton University Press, 2006.

Page 151: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

Bibliography 139

[30] Heidergott, B. and Vazquez-Abad, F. Gradient estimation for a class of systems withbulk services: a problem in public transportation. Tech. report no. 057/4, TinbergenInstitute, Amsterdam, 2003.

[31] Heidergott, B. and Vazquez-Abad. F. Measure-valued differentiation for randomhorizon problems. Markov Processes and Related Fields, 12, pp. 509–536, 2006.

[32] Heidergott, B. and Vazquez-Abad, F. Measure valued differentiation for Markovchains. Journal of Optimization and Applications, 136, pp. 187–209, 2008.

[33] Heidergott, B., Vazquez-Abad, F., and Volk-Makarewicz, W. Sensitivity estimationfor gaussian systems. European Journal of Operational Research, 187, pp. 193–207,2008.

[34] Hordijk, A. and Yushkevich, A.A. Blackwell optimality in the class of all policies inMarkov decision chains with a Borel state space and unbounded rewards. Mathemat-ics of Operations Research, 50, pp. 421–448, 1999.

[35] Kartashov, N. Strong Stable Markov Chains. VSP, Utrecht, 1996.

[36] Kelley, J.L. General Topology. Springer, New-York, 1975.

[37] Kingman, J.F.C. Subadditive ergodic theory. The Annals of Probability, 1(6), pp.883–909, 1973.

[38] Kumagai, S. An implicit function theorem: Comment. Journal of OptimizationTheory and Applications, 31(2), pp. 285–288, 1980.

[39] Kushner, H. and Vazquez-Abad, F. Estimation of the derivative of a stationarymeasure with respect to a control parameter. Journal of Applied Probability, 29, pp.343–352, 1992.

[40] Kushner, H. and Vazquez-Abad, F. Stochastic approximations for systems of interestover an infinite time interval. SIAM Journal on Control and Optimization, 29, pp.712–756, 1996.

[41] Lewis, D.R. Integration with respect to vector measures. Pacific Journal of Mathe-matics, 33(1), pp. 157–165, 1970.

[42] Lipman, S. On dynamic programming with unbounded rewards. Mangement Science,21, pp. 1225–1233, 1974.

[43] Loynes, R.M. The stability of a queue with non-independent inter-arrival and servicetimes. Proceedings of the Cambridge Philosophical Society, 58, pp. 497-520, 1962.

[44] Meyn, S.P. and Tweedie, R.L. Markov Chains and Stochastic Stability. Springer,London, 1993.

[45] Moisil, Gr.C. Sur une representation des graphes qui interviennent dans l’economiedes transports. Communications de l’Academie de la R.P. Roumaine, 10, pp.647–652,1960.

Page 152: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

140 Bibliography

[46] Nachbin, L. Weighted approximation for algebras and modules of continuous func-tions: real and self-adjoint complex cases. Annals of Mathematics, 81(2), pp. 289–302,1965.

[47] Pflug, G. Derivatives of probability measures - concepts and applications to theoptimization of stochastic systems. In Lecture Notes in Control and InformationScience 103, pages 252–274. Springer, Berlin, 1988.

[48] Pflug, G. Optimization of Stochastic Models. Kluwer Academic, Boston, 1996.

[49] Pich, M., Loch, C., and de Meyer, A. On uncertainty, ambiguity and complexity inproject management. Management Science, 75(2), pp. 137–176, 1996.

[50] Prolla, J.B. Bishop’s generalized Stone-Weierstrass theorem for weighted spaces.Mathematische Annalen, 191(4), pp. 283–289, 1971.

[51] Prolla, J.B. Weighted space of vector-valued continuous functions. Annali di matem-atica pura ed applicata, 89(1), pp. 145–157, 1971.

[52] Rao, R.R. Relations between weak and uniform convergence of measures with appli-cations. The Annals of Mathematical Statistics, 33(2), pp. 659–680, 1962.

[53] Rudin, W. Principles of Mathematical Analysis. McGraw-Hill, 1976.

[54] Rudin, W. Functional Analysis. McGraw-Hill, 1991.

[55] Seidel, W., Kocemba, K.v., and Mitreiter, K. On Taylor series expansions for waitingtimes in tandem queues: An algorithm for calculating the coefficients and an inves-tigation of the approximation error. Performance Evaluation, 38(3), pp. 153–173,1999.

[56] Semadeni, Z. Banach Spaces of Continuous Functions. Polish Scientific Publishers,Warszawa, 1971.

[57] Summers, W.H. Dual spaces of weighted spaces. Transactions of the AmericanMathematical Society, 151(1), pp. 323–333, 1970.

[58] Summers, W.H. A representation theorem for biequicontinuous completed tensorproducts of weighted spaces. Transactions of the American Mathematical Society,146, pp. 121–131, 1970.

[59] van den Boom, T., De Schutter, B., and Heidergott, B. Complexity reduction in MPCfor stochastic (max, +)-linear systems by variability expansion. Proceedings of the41st IEEE Conference on Decision and Control, pp. 3567–3572, Las Vegas/Nevada,12/2002.

[60] Wells, J. Bounded continuous vector-valued functions on a locally compact space.Michigan Math. Journal, 12(1), pp. 119–126, 1965.

Page 153: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

INDEX

algebraconventional, 102max-plus, 102min-plus, 103topological, 101

Banachbase, 18space, 15

continuityLipschitz, 61local Lipschitz, 61regular, 22strong, 21weak, 21

convergencedomain, 89radius, 89regular, 13strong, 21weak, 10

differentiationregular, 31strong, 31weak, 30

distributionBernoulli, 41Dirac, 13Erlang, 42exponential, 41Pareto, 44truncated, 43uniform, 40

dualalgebraic, 16topological, 16topology, 16

fieldσ, 5Borel, 5

integrablep-, 6Lebesgue, 6uniformly, 6

kerneltaboo, 76transition, 68

linearfunctional, 16operator, 15space, 14

Lipschitzconstant, 61continuity, 61local continuity, 61

Markovchain, 68operator, 68

measure-valued mapping, 8continuous, 7finite, 5orthogonal, 7positive, 5probability, 8Radon, 5regular, 5signed, 5singular, 7variation, 7

monoid, 100topological, 102

Page 154: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

142 INDEX

networkmulti-server, 115queueing, iiistochastic activity, 94

norm, 15v-, 17operator, 16pseudo-, 101

additive, 101multiplicative, 101

semi-, 14space, 15supremum, 15topology, 15total variation, 7weighted total variation, 20

operatorbounded, 15expectation, consistent, 36isometric, 15linear, 15Markov, 68norm, 16

regularcontinuity, 22convergence, 13differentiation, 31

ruin, 52probability, 52problem, v

spaceCv, 8D, 19Banach, 15compact, 4complete, 4linear, 14locally compact, 4metric, 3norm, 15separable, 4topological, 3weighted, 26

strongbounds, 62continuity, 21convergence, 21differentiation, 31

theoremBanach-Alaoglu, 22Banach-Steinhaus, 21Cauchy-Hadamard, 89Dominated Convergence, 20Fubini, 25Mean-Value, 32Monotone Convergence, 12Portmanteau, 48Prokhorov, 26Riesz Representation, 19

tight, 6time

completion, 95waiting, 56

topologicalalgebra, 101

pseudo-normed, 101dual, 16space, 3

topology, 2dual, 16locally convex, 15norm, 15strict, 26uniform, 15weak-*, 16

upper-bound, 101

waiting time, 56weak

analyticity, 88continuity, 21convergence, 10derivative, 30differentiation, 30equality, 110limit, 11topology, 16

Page 155: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

LIST OF SYMBOLS AND NOTATIONS

Dvθ(·): [D]v-domain of convergence of the

weak Taylor series, 89Dg: set of discontinuities of g, 48Pg(θ): performance measure, iiiRv

θ(·): [D]v-radius of convergence of the weakTaylor series, 89

T : the completion time of a SAN, 95X ι: canonical embedding of X into the ex-

tended algebra, 109X

(n)θ : the nth-order derivative of the ran-

dom matrix Xθ, 107[·]±: Hahn-Jordan decomposition, 7Π∗: product measure mapping, 46Θ ⊂ R: set of parameters, iiiΘs ⊂ Θ: the stability set, 77τ : the conjugate of τ in the extended al-

gebra, 109βθ: Bernoulli distribution, 41χ0: initial distribution (Markov chain), 69δθ: Dirac distribution, 13`: the Lebesgue measure on a Euclidean

space, 7≡D: weak equality w.r.t. the space of test

functions D, 111∂θ: gradient estimator for Pg(θ), ivN: the set of natural numbers, 5R: the set of real numbers, iiiS: complete, separable metric space, ivSv ⊂ S: support of v, 9L∗: Lipschitz constant corresponding to the

product measure, 65Lv

µ: Lipschitz constant of µ∗ in v-norm, 63M∗: Lipschitz constant corresponding to

the product measure (non-negativecost-functions), 65

Mvµ: Lipschitz constant of µ∗ in v-norm

for non-negative cost-functions, 63Tn(µ, θ, ξ):nth Taylor polynomial, 88

A: algebra of matrices, 101A∗: extended algebra of matrices, 108C: space of continuous mappings, 4C+ ⊂ C: non-negative mappings, 4CB ⊂ C: bounded mappings, 4Cv ⊂ C: v-bounded mappings, 9D: space of test functions, with typically

D = C,F , 19Dp: the [D]v-space induced on D by the

weight vp, 104F : space of Borel measurable mappings, 5FB ⊂ F : bounded mappings, 5Lp: space of p-integrable mappings, 6M: space of regular measures, 8M+ ⊂M: positive measures, 8M1 ⊂M: probability measures, 8MB ⊂M: finite measures, 8Mv ⊂M: v-finite measures, 20M1

v ⊂M:Mv ∩M1, 20P : the set of paths through a SAN, 94S: Borel field on S, 5UA: uniform distribution on A, 40V∗: topological dual of V , 16Mm,n: class of m,n matrices, 100µ∗:measure-valued mapping, 9¯: generalized matrix product, 99⊗: functional tensor product, 24Mm,n: extended algebra of matrices, 108D

=⇒: weak convergence of measures w.r.t.the space of test-functions D, 11

πθ: Pareto distribution, 52ψθ: uniform distribution, 42εn,θ: Erlang distribution, 42∅: the null measure, 43~v: tensor product v1 ⊗ . . .⊗ vn, 46℘m,n: canonical metric on Mm,n, 100gι: ι-extension of the mapping g, 109vp: polynomial weight of degree p, 104

Page 156: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES
Page 157: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

ACKNOWLEDGMENTS

This thesis is the result of the research carried out at Vrije Univeristeit Amsterdam andTechnische Universiteit Delft. Both institutions have offered me a very supporting andstimulating work environment. I would also like to acknowledge the contribution of DutchFoundation for Technology (Technologiestichting STW) which supported financially myfour years contract with VU Amsterdam, within the research project “Modeling andAnalysis of Operations in Railway Networks: the Influence of Stochasticity” (joint projectbetween VU Amsterdam and TU Delft). However, many individuals deserve my gratitude.

I am especially grateful to my supervisor Bernd Heidergott for both professional andpersonal aspects of our collaboration which led to the completion of this monograph.While his remarks and suggestions over the material were determinant for me in finding (Ihope) the best way to put forward the results of my research I would also like to mentionthat both his optimism and faith in me helped me to get over the critical moments.Therefore, I would like to take this opportunity to thank him for guiding me in the lastfour years and for his notable contribution in the development of my research profile.

I am indebted to Prof.Dr. F.M. Dekking, Prof.Dr. G.M. Koole, Prof.Dr. G.Ch. Pflug,Dr. A.N. de Ridder and Dr. F.M. Spieksma for taking the time to read the manuscriptand providing me with valuable feedback.

I am thankful to all my friends for their constant support. The list being quite long,I would like, however, to mention my good friends Daniel and Vlad for making my ac-commodation period in the Netherlands smoother and for making themselves availablewhenever I needed their help. Special thanks to my friend Ana, which apart from beinga very good friend, helped me with virtually any computer-related problem that I haveencountered in my work.

Finally, my gratitude goes to my beloved parents, Veve and Viorel, for their uncondi-tional support. Thank you for being there for me and for backing up and understandingmy decisions, regardless of your will.

Page 158: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES
Page 159: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

TINBERGEN INSTITUTE RESEARCH SERIES

The Tinbergen Institute is the Institute for Economic Research, which was founded in 1987by the Faculties of Economics and Econometrics of the Erasmus Universiteit Rotterdam,Universiteit van Amsterdam and Vrije Universiteit Amsterdam. The Institute is namedafter the late Professor Jan Tinbergen, Dutch Nobel Prize laureate in economics in 1969.The Tinbergen Institute is located in Amsterdam and Rotterdam. The following booksrecently appeared in the Tinbergen Institute Research Series:

378 M.R.E. BRONS, Meta-analytical studies in transport economics: Methodology and applications.

379 L.F. HOOGERHEIDE, Essays on neural network sampling methods and instrumental variables.

380 M. DE GRAAF-ZIJL, Economic and social consequences of temporary employment.

381 O.A.C. VAN HEMERT, Dynamic investor decisions.

382 Z. AOVOV, Liking and disliking: The dynamic effects of social networks during a large-scaleinformation system implementation.

383 P. RODENBURG, The construction of instruments for measuring unemployment.

384 M.J. VAN DER LEIJ, The economics of networks: Theory and empirics.

385 R. VAN DER NOLL, Essays on internet and information economics.

386 V. PANCHENKO, Nonparametric methods in economics and finance: dependence, causality andprediction.

387 C.A.S.P. S, Higher education choice in The Netherlands: The economics of where to go.

388 J. DELFGAAUW, Wonderful and woeful work: Incentives, selection, turnover, and workers’ mo-tivation.

389 G. DEBREZION, Railway impacts on real estate prices.

390 A.V. HARDIYANTO, Time series studies on Indonesian rupiah/USD rate 1995 2005.

391 M.I.S.H. MUNANDAR, Essays on economic integration.

392 K.G. BERDEN, On technology, uncertainty and economic growth.

393 G. VAN DE KUILEN, The economic measurement of psychological risk attitudes.

394 E.A. MOOI, Inter-organizational cooperation, conflict, and change.

395 A. LLENA NOZAL, On the dynamics of health, work and socioeconomic status.

396 P.D.E. DINDO, Bounded rationality and heterogeneity in economic dynamic models.

397 D.F. SCHRAGER, Essays on asset liability modeling.

398 R. HUANG, Three essays on the effects of banking regulations.

399 C.M. VAN MOURIK, Globalisation and the role of financial accounting information in Japan.

400 S.M.S.N. MAXIMIANO, Essays in organizational economics.

Page 160: MEASURE{VALUED DIFFERENTIATION FOR FINITE PRODUCTS OF MEASURES

401 W. JANSSENS, Social capital and cooperation: An impact evaluation of a womens empowermentprogramme in rural India.

402 J. VAN DER SLUIS, Successful entrepreneurship and human capital.

403 S. DOMINGUEZ MARTINEZ, Decision making with asymmetric information.

404 H. SUNARTO, Understanding the role of bank relationships, relationship marketing, and organi-zational learning in the performance of peoples credit bank.

405 M.A. DOS REIS PORTELA, Four essays on education, growth and labour economics.

406 S.S. FICCO, Essays on imperfect information-processing in economics.

407 P.J.P.M. VERSIJP, Advances in the use of stochastic dominance in asset pricing.

408 M.R. WILDENBEEST, Consumer search and oligopolistic pricing: A theoretical and empiricalinquiry.

409 E. GUSTAFSSON-WRIGHT, Baring the threads: Social capital, vulnerability and the well-beingof children in Guatemala.

410 S. YERGOU-WORKU, Marriage markets and fertility in South Africa with comparisons to Britainand Sweden.

411 J.F. SLIJKERMAN, Financial stability in the EU.

412 W.A. VAN DEN BERG, Private equity acquisitions.

413 Y. CHENG, Selected topics on nonparametric conditional quantiles and risk theory.

414 M. DE POOTER, Modeling and forecasting stock return volatility and the term structure ofinterest rates.

415 F. RAVAZZOLO, Forecasting financial time series using model averaging.

416 M.J.E. KABKI, Transnationalism, local development and social security: the functioning of sup-port networks in rural Ghana.

417 M. POPLAWSKI RIBEIRO, Fiscal policy under rules and restrictions.

418 S.W. BISSESSUR, Earnings, quality and earnings management: the role of accounting accruals.

419 L. RATNOVSKI, A Random Walk Down the Lombard Street: Essays on Banking.

420 R.P. NICOLAI, Maintenance models for systems subject to measurable deterioration.

421 R.K. ANDADARI, Local clusters in global value chains, a case study of wood furniture clusters inCentral Java (Indonesia).

422 V.KARTSEVA, Designing Controls for Network Organizations: A Value-Based Approach.

423 J. ARTS, Essays on New Product Adoption and Diffusion.

424 A. BABUS, Essays on Networks: Theory and Applications.

425 M. VAN DER VOORT, Modelling Credit Derivatives.

426 G. GARITA, Financial Market Liberalization and Economic Growth.

427 E.BEKKERS, Essays on Firm Heterogeneity and Quality in International Trade.