Bosq Nguyen A Course in Stochastic Processes.pdf

Embed Size (px)

Citation preview

  • A COURSE IN STOCHASTIC PROCESSES

  • THEORY AND DECISION LIBRARY

    General Editors: W. Leinfellner (Vienna) and G. Eberlein (Munich)

    Series A: Philosophy and Methodology of the Social Sciences

    Series B: Mathematical and Statistical Methods

    Series C: Game Theory, Mathematical Programming and Operations Research

    Series D: System Theory, Knowledge Engineering and Problem Solving

    SERIES B: MATHEMATICAL AND STATISTICAL METHODS VOLUME 34

    Editor: H. J. Skala (paderborn); Assistant Editor: M. Kraft (paderborn); Editorial Board: J. Aczel (Waterloo, Ont), G. Bamberg (Augsburg), H. Drygas (Kassel), W. Eichhorn (Karlsruhe), P. Fishburn (Murray Hill, N.J.), D. Fraser (Toronto), W. Janko (Vienna), P. de Jong (Vancouver), T. Kariya (Tokyo), M. Machina (La Jolla, Calif.), A. Rapoport (Toronto), M. Richter (Kaiserslautern), B. K. Sinha (Cattonsville, Md.), D. A. Sprott (Waterloo, Ont.), P. Suppes (Stanford, Calif.), H. Theil (St. Augustine, Fla.), E. Trillas (Madrid), L. A. Zadeh (Berkeley, Calif.).

    Scope: The series focuses on the application of methods and ideas of logic, mathematics and statistics to the social sciences. In particular, formal treatment of social phenomena, the analysis of decision making, information theory and problems of inference will be central themes of this part of the library. Besides theoretical results, empirical investigations and the testing of theoretical models of real world problems will be subjects of interest. In addition to emphasizing interdisciplinary communication, the series will seek to support the rapid dissemination of recent results.

    The titles published in this series are listed at the end of this volume.

  • A COURSE IN STOCHASTIC PROCESSES

    Stochastic Models and Statistical Inference

    by

    DENIS BOSQ Institut de Statistique,

    Universite Pierre et Marie Curie, Paris, France

    and HUNG T. NGUYEN

    Department of Mathematical Sciences, New Mexico State University,

    Las Cruces, New Mexico, U.SA.

    ~ ..

    " Springer-Science+Business Media, B.Y.

  • A C.I.P. Catalogue record for this book is available from the Library of Congress.

    ISBN 978-90-481-4713-7 ISBN 978-94-015-8769-3 (eBook) DOI 10.1007/978-94-015-8769-3

    Printed on acid-free paper

    All Rights Reserved 1996 Springer Science+Business Media Dordrecht

    Originally published by Kluwer Academic Publishers in 1996. Softcover reprint of the hardcover I st edition 1996

    No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical,

    including photocopying, recording or by any information storage and retrieval system, without written permission from the copyright owner.

  • Contents

    Preface

    1 Basic Probability Background 1.1 Events and Probabilities ......... . 1.2 Random variables and their distributions 1.3 Expectation .. 1.4 Limit theorems 1.5 Exercises

    2 Modeling Random Phenomena 2.1 Random Phenomena ..... . 2.2 Stochastic Processes . . . . . . 2.3 Distributions of Stochastic Processes 2.4 Some Important Properties of Stochastic Processes 2.5 Exercises ...................... .

    3 Discrete - Time Markov Chains 3.1 The Markov Model ............. . 3.2 Distributions of Markov Chains ...... . 3.3 Classification and Decomposition of States . 3.4 Stationary Distributions 3.5 Exercises ....... .

    4 Poisson Processes 4.1 Motivation and Modeling 4.2 Axioms of Poisson Processes . 4.3 Interarrival Times ..... . 4.4 4.5 4.6

    Some Properties of Poisson Processes . Processes related to Poisson Processes

    Exercises

    v

    IX

    1 1

    10 18 24 27

    33 33 34 35 39 42

    45 45 48 51 62 73

    79 79 81 84 87 91 92

  • vi

    5 Continuous - Time Markov Chains 5.1 Some typical examples ...... . 5.2 Computational aspects . . . . . . . 5.3 Distributions of Birth and Death Chains . 5.4 Exercises

    6 Random Walks 6.1 Motivation and definitions ............ . 6.2 Asymptotic behavior of the simple random walk 6.3 Returns to the origin. 6.4 First passage times 6.5 A classical game 6.6 Exercises

    7 Renewal Theory 7.1 Motivation and examples 7.2 The counting process. 7.3 Renewal equations 7.4 Renewal Theorems 7.5 Exercises

    8 Queueing Theory 8.1 Modeling and structure .... . 8.2 The queue MIMll ....... . 8.3 The queues M/M/s, 1 < s ::; 00 . 8.4 The queue MIGll 8.5 Exercises ............ .

    Contents

    95 95

    100 109 114

    117 117 122 126 131 135 142

    147 147 150 153 157 166

    171 171 173 180 182 186

    9 Stationary Processes 189 9.1 Autocovariance, Spectral Density, and Partial Autocorrelation 189 9.2 Linear Prediction and the Wold Decomposition 194 9.3 Limit Theorems for Stationary Processes. 196 9.4 Stationary Processes in Continuous Time 198 9.5 Exercises .................. 202

    10 ARMA model 10.1 Linear Processes ..... 10.2 Autoregressive Processes . 10.3 Moving Average Processes 10.4 ARMA Processes ..... 10.5 Nonstationary Models and Exogeneous Variables 10.6 Exercises ..................... .

    205 205 207 211 213 214 215

  • Contents

    11 Discrete-Time Martingales 11.1 Generalities . . . . . . . . 11.2 Examples and Applications 11.3 Convergence of Martingales 11.4 Exercises

    vii

    219 219 221 225 230

    12 Brownian Motion and Diffusion Processes 233 12.1 Gaussian Processes 233 12.2 Brownian Motion. . 235 12.3 Stochastic Integral . 238 12.4 Diffusion Processes 242 12.5 Processes Defined by Stochastic Differential Equations 247 12.6 Exercises ......................... 250

    13 Statistics for Poisson Processes 255 13.1 The Statistical Model 255 13.2 Estimation ........... 256 13.3 Tests . . . . . . . . . . . . . . . 260 13.4 Estimation for Poisson processes 261 13.5 Confidence Intervals and Tests for A 263 13.6 Inference for Point Processes .... 265 13.7 Exercises

    14 Statistics of Discrete-Time Stationary Processes 14.1 Stationarization ................... . 14.2 Nonparametric Estimation in Stationary Processes 14.3 Statistics of ARMA Processes 14.4 Exercises

    15 Statistics of Diffusion Processes 15.1 Nonparametric Estimation in Continuous Time Processes 15.2 Statistics of Wiener Processes .. 15.3 Estimation in Diffusion Processes 15.4 Exercises ............ .

    A Measure and Integration A.1 Extension of measures . A.2 Product measures. . . . A.3 Some theorems on integrals

    267

    271 271 275 280 284

    287 287 291 293 295

    299 299 301 302

  • viii

    B Banach and Hilbert Spaces B.1 Definitions .. . B.2 V-spaces .. . B.3 Hilbert spaces . B.4 Fourier series . B.5 Applications to probability theory

    List of Symbols

    Bibliography

    Partial Solutions to Selected Exercises

    Index

    Contents

    305 305 306 307 308 309

    311

    313

    315

    347

  • Preface

    This text is an Elementary Introduction to Stochastic Processes in discrete and continuous time with an initiation of the statistical inference. The material is standard and classical for a first course in Stochastic Processes at the senior/graduate level (lessons 1-12). To provide students with a view of statistics of stochastic processes, three lessons (13-15) were added. These lessons can be either optional or serve as an introduction to statistical inference with dependent observations. Several points of this text need to be elaborated,

    (1) The pedagogy is somewhat obvious. Since this text is designed for a one semester course, each lesson can be covered in one week or so. Having in mind a mixed audience of students from different departments (Math-ematics, Statistics, Economics, Engineering, etc.) we have presented the material in each lesson in the most simple way, with emphasis on moti-vation of concepts, aspects of applications and computational procedures. Basically, we try to explain to beginners questions such as "What is the topic in this lesson?" "Why this topic?", "How to study this topic math-ematically?". The exercises at the end of each lesson will deepen the stu-dents' understanding of the material, and test their ability to carry out basic computations. Exercises with an asterisk are optional (difficult) and might not be suitable for homework, but should provide food for thought. The purpose of the book, viewed as a text for a course or as a reference book for self study, is to provide students with an pleasant introduction to the Theory of Stochastic Processes (without tears!). After completing the course, the students should be able to take more advanced and technical courses or to read more specialized books on the subject.

    (2) In writing the text we face the following dilemma. In general, mea-sure theory is not required for a First Course in Stochastic Processes. On the other hand, it is true that measure theory is the language of probabil-ity theory. When presenting the material, even at the simplest level, some

    ix

  • x Preface

    aspects to measure theory are necessary to make the treatment rigorous. After all, this is a text about theory. Our approach is this. We do not require measure theory for this text. However, whenever necessary, we will call upon some facts from measure theory. A short appendix at the end of the text contains these facts in some detail as well as other topics which might not be familiar to the audience.

    (3) The standard prerequisite is a solid first course in probability theory and some calculus. However, lesson 1 is devoted to a complete review of the probability background needed for this text. Lessons 1-12 form a core of a course in stochastic processes. As far as the statistical part of the book (lessons 13-15) is concerned, when using, for example, in a Seminar on Ini-tiation of Statistics in Random Processes, students need a basic knowledge of a first course in mathematics statistics.

    A selected bibliography at the end of the book suggests some appropriate references for this purpose as well as for further reading on topics omitted in this text.

    The real level of the course depends upon the background of the au-dience. More specifically, depending on the interests and background of the mixture of students, some aspects of measure theory, advanced top-ics, generalities of results, and complete proofs, etc. can be emphasized appropriately.

    We would like to thank Professor H. Skala, Editor of the Series "Math-ematical and Statistical Methods", for giving us the opportunity to write a text in our own style. We extend our thanks also to Dr. Paul Roos and Ms Angelique Hempel at Kluwer Academic for advising us during the preparation of the manuscript.

    We are grateful to Dr. Tonghui Wang of New Nexico State University for proof reading of the text, and for his penetrating remarks and suggestions concerning the final version of the text. The Camera-ready version as well as the design of the book is also due to him.

    The first named author would like to thank Emmanuel Guerre for pro-viding some exercises.

    The second named author would like to thank his department head, Professor Douglas Kurtz for his encouragement.

    Denis Bosq and Hung T. Nguyen Paris and Las Cruces, Winter, 1995

  • Lesson 1

    Basic Probability Background

    This Lesson is a review of basic concepts in probability theory needed for this Text. The notation in this Lesson will be used throughout the Text un-less otherwise stated. We emphasize computational aspects. The Appendix at the end of this Text contains additional topics.

    1.1 Events and Probabilities This section aims at providing the motivation for using probability spaces to model random phenomena.

    By an experiment, we mean the making of an observation. The result of an experiment is called an outcome. The collection of all possible outcomes of an experiment & is called the sample space and is denoted by n. By a random experiment, we mean an experiment such that observations under identical conditions might not lead to the same outcome.

    Suppose we consider the random experiment consisting of rolling two dice. The sample space is

    n = {(i,j): i, j = 1,2, ,6}. Consider the event "the sum of two numbers shown is equal to 7". This event A consisting of sample points (i, j) such that i + j = 7. Thus an event is a subset of the set n, and we write

    A~O ( A is contained in 0).

    1

  • 2 Lesson 1

    If we perform the experiment and obtain the outcome (2,5), then, since (2,5) E A (the point (2,5) belongs to A or is a member of A), we say that the event A is realized, or A occurs.

    Since we cannot predict exactly what the outcome will be in a random experiment such as this, we ask " what is the chance that A will occur?" The answer to this question will be a number P(A) called the probability of the event A. In an experiment whose sample space 0 is finite, it is possible to assign a number P(A) to all subsets A of O. The point is this. Since we are interested in probabilities of events, subsets of a general sample space o (such as 0 = IR == (-00,00), the set of real numbers) are considered as events only if their probabilities can be assigned.

    In our actual example, the collection A of all events is P(O), the power set of 0, that is, the collection of all possible subsets of 0, including the empty set 0 and O. Events are stated in natural language and hence com-pound events are formed by using logical connectives like "not", "and", and "or". In the context of random experiments, events are subsets of o. The modeling of the above connectives in the context of Set Theory is as follows.

    The negation (or complement) of A is AC = {w EO: w A}, where stands for "is not a member of" .

    For A, B E 0, "A and B" is defined as

    An B = {w EO: wE A, WEB}, where n stands for "intersection"; "A or B" is

    AU B = {w EO: wE A, or wEB}, where U stands for "union". Note that the "or" here is not exclusive, i.e., we allow w E A U B if w belongs to both A and B.

    In our example, since A = P(O), A is closed under all above set op-erations, that is, if A, B E A then A c, A n B, and A U B all belong to A-

    We describe now the way to assign probability to events in our example. It is plausible that any outcome (i, j) will have the same chance to occur. Thus we assign to each w = (i,j) a number f(w), called the probability of event A = {w}. By its meaning, 0 ~ f(w) ~ 1. Here since f(w) is the same for all w E 0, we obtain that

    1 1 f(w) = #(0) = 36'

    where #(0) denotes the cardinality (number of elements) of o. Observe that f : 0 -+ [0,1] and satisfying the condition 'EWEO f(w) = 1. Such a function is called a probability mass function.

  • Basic Probability Background 3

    Now for an event such as {w = (i,j): i+j = 7}, how to assign P(A) from f? Recall that A occurs ifthe outcome is any (i, j) such that i + j = 7. Thus it is plausible that

    #(A) 6 1 P(A) = #(0.) = 36 = 6'

    which values P(A) = EWEA f(w). The operator P is a mapping from A to [0,1], satisfying the conditions (i) p(o.) = 1 and (ii) For A, BE A with AnB = 0 (A and Bare disjoint or incompatible),

    P(A U B) = P(A) + P(B). Such an operator P is called a probability measure. The condition (ii) above is referred to as the finite additivity property of P. Specifically,

    k

    P(A1 U ... U Ak) = I: P(Ai), i=1

    when Ai n Aj = 0, 1 $ i =1= j $ k. The triple (0., A, P) above is called a probability space. A probability

    space is a model for a random experiment. Let us extend the above modeling of random experiments with finite

    sample spaces to the case of experiments with countable infinite sample spaces (0. is countable if there is an one-to-one correspondence between 0. and the set IN = {O, 1,2, ... ,} of non-negative integers). We say that 0. is discrete if 0. is finite or countable infinite.

    As an example of an experiment with infinite many outcomes, consider the experiment of tossing a fair coin until we first obtain a Head. The outcome of this experiment is the number of tosses needed to obtain the first Head. Obviously, 0. = {I, 2, ... , }. As in the finite case, we first assign f( n) to each n E 0., where n stands for the outcome "the first Head occurs on toss n". When tossing a coin n times, there are 2n possible combinations of Heads and Tails, only one of which corresponds to the above outcome, namely the first n - 1 tosses yield Tails, and the nth toss yields Head. Thus

    Since

    1 f(n) = 2n ' n ~ 1.

    00 00 I: f(w) = I: f(n) = I: 1/2n = 1, wEn n=1 n=1

  • 4 Lesson 1

    f is a probability mass function, where the summation E is an infinite one. In the discrete case, we can assign probabilities to all possible subsets

    of n via the formula

    P(A) = 1: f(w), A~n, wEA

    where f(w) = P({w}). Thus the collection of events A = p(n). The probability measure P satisfies the following IT-additivity property: For any sequence An, n ~ 1 of subsets of n (that is An E A, n ~ 1),

    where the An's are disjoint (that is , An n Am = 0 for n :/= m),

    P CQl An) = ; P(An). Consider random experiments with uncountably many outcomes (con-

    tinuous case). The typical and important experiment in this category is "choosing a number at random from the unit interval [0, 1]". Here n = [0,1]. Recall that for n discrete, we specify A and P as follows.

    (i) Assign to each wEn its probability value f(w) and (ii) assign to each A ~ n its probability P(A) = EWEA f(w). Now, n = [0,1] is uncountably infinite, we cannot proceed as in the

    discrete case. Indeed, if f(w) denotes the probability of getting the point w in [0,1]' then by the nature of the experiment, f(w) should be constant, say a (every point in [0,1] has the same chance to be selected). But since the probability of [0,1] is 1, a must be zero! (Take n points Wi, i = 1,2, .. " n in [0,1], where n ~ [l/a], the integer part of l/a, that is the smallest integer greater than l/a, then E?=l f(wd = na > 1 if a :/= 0). As we will see, the probability value f(w) = for each wE [0,1] does make sense, but this assignment f(.) cannot be used to define probabilities on subsets of [0, 1]. For example, what is the chance that the point chosen at random will be in the interval [0,0.25]? It is clear that the answer should be 0.25. More generally, if I is a sub-interval of [0, 1], then P(I) should be IJI, the length of I.

    The above suggests that, for uncountable n, we have to assign proba-bility directly to subsets (events), that is we need to specify a probability measure P. The probability measure P should be such that P(I) = III for any sub-interval I of [0,1]. When I reduces to a singleton {w}, P( {w}) = 0.

    The next question is "What is the domain A of P"? In other words, what are the subsets of [0,1] which are events? Recall that, a subset of n is qualified as an event if we can assign to it a probability value. If we set A = P([O, 1]), as in the discrete case, then we must ask the following

  • Basic Probability Background 5

    question: Is there a probability measure P on 1'([0, 1]) such that P(I) = III for any sub-interval I of [0, I]? Note that, to be consistent with the discrete case, P needs to be u-additive.

    It turns out that the answer for this mathematical problem is NO. The reason is that 1'([0,1]) is too big. Thus not all subsets of [0, 1] are events, that is, A is a proper subset of 1'([0, 1]). To determine A, we observe that A should contain intervals, and for any A E A, P(A) should be derived from P{I) = III for interval I. Furthermore, as in the discrete case, A should be a u-jield, that is, A is a collection of subsets of 0 satisfying

    (i) 0 E A, (ii) A E A implies that AC E A, and (iii) For any sequence An E A, n ~ 1, U~=l An E A.

    Remarks.

    (a) The above algebraic structure of A expresses the fact that A should be large enough to contain all events of interest.

    (b) (ii) and (iii) above imply that if An E A, n ~ 1, then n~=lAn E A ( exercises).

    (c) If (iii) above is replaced by

    A, B E A =} AU B E A,

    then A is called a jield. Note that a u-field is a jield (Exercise). Thus we arrive at the general probabilistic model for an arbitrary ran-

    dom experiment (discrete or continuous):

    Definition 1.1 A probabilistic model for a random experiment is a prob-ability space (0, A, P), where 0 is the sample space, A is a u-jield of subsets (events) of 0, and P is a probability measure dejined on A (for A E A, P(A) is the probability of A). A probability measure is a map P: A - [0,1] such that

    (i) P(O) = 1 and (ii) For any sequence An E A, n > 1, where the An's are pairwise

    disjoint,

    P (Q, An) = ~P(An) ( u-additivity). The pair (0, A) is called a measurable space.

  • 6 Lesson 1

    Let us go back to the specification of A for 0 = [0,1]. In view of the requirements imposed on A discussed earlier, A should be au-field containing intervals. Thus we take A to be the smallest u-field containing intervals. If C denotes the collection of all sub-intervals of [0,1], then we write A = u(C), which is called the u-field generated by C. Remarks.

    (a) The above u-field is called the Borel u-field of [0, 1] and is denoted by 8([0,1]). Elements of 8([0,1]) are called Borel subsets of [0, 1].

    (b) For 0 = JR = (-00,00), JR = [-00,00], JR+ = [0,00) and ni+ = [0,00], we have similarly 8(JR), 8(JR), 8(JR+), and 8(JR+), respectively.

    The above choice of A as the Borel u-field 8([0,1]) is justified by the existence of a unique probability measure P on it such that P(I) = III for any sub-interval I of [0, 1]. We omit the technical details. The P so defined is sometimes denoted as dL( x) or dx and is called the ebesgue measure on [0,1].

    We close this section by mentioning some useful properties of P. The proofs of these properties are left as exercises.

    Let (0, A, P) be a probability space. (i) P is monotone increasing, i.e., if A, B E A with A ~ B, then

    P(A) $ P(B). (ii) For A, BE A, P(AUB) = P(A)+P(B)-p(AnB). More generally,

    P (Q A.) ~ ~?(A;)-"t1," p(A,nAJ)+- -_+(_1)"+' P (0 A,) . (Poincare's formula).

    (iii) For A E A, P(AC) = 1 - P(A). (iv) For An E A,

    P (Q. An) ~ ~P(An) ( sub-u-additivity). (v) Limits of events. As in Real Analysis, we proceed as follows. A

    sequence of subsets An, n ~ 1, of 0, is increasing if An ~ An+1, "In ~ 1.

  • Basic Probability Background 7

    For such a sequence, we define the limit as follows.

    00

    lim An = U An. n ..... oo

    n=l

    Similarly, the sequence An is decreasing if An+l ~ An, 'in ~ 1, and 00

    limAn=nA n~oo n n=l

    If the sequence is arbitrary, then Bn = U~nAi is a decreasing sequence and Dn = n~nAi is an increasing sequence. Thus we define

    00 00 00 00

    lim supAn = n U Ai and liminfAn = U n Ai. n ..... oo n ..... oo n=l i=n n=li=n

    Note that lim inf An ~ lim sup An. n ..... oo n ..... oo

    When liminfAn = lim supAn , n ..... oo n ..... oo

    we say that limn ..... oo An exists and is equal to

    lim inf An = lim sup An n ..... oo n ..... oo

    Note that limsupn ..... oo An is also written as (An i.o.), where i.o. stands for "infinitely often", since w E lim supn ..... oo An if and only if wEAn for infinitely many An. Also w E lim infn ..... oo An if and only if wEAn for all but a finite number of n.

    If An E A, n ~ 1, is either an increasing or decreasing sequence of events, then

    lim P(An) = P ( lim An) . n-+oo n-+oo

    (Monotone continuity)

    (vi) Borel-Cantelli Lemma. Let An E A, n ~ 1. (a) llI::=lP(An) < 00, then P(limsupn ..... ooAn) = O. (b) lithe An's are independent (see definition below) and I::=1 P(An)

    = 00, then P (limsupn ..... oo An) = 1.

  • 8 Lesson 1

    (vii) Conditional probability and independence. Let A E A with P(A) # 0. The conditional probability of B E A given A is defined to be P(A n B)/P(A) and is denoted by P(BIA). (For P(A) = 0, P(.IA) is undefined). For fixed A, the set-function P(.IA), defined on A, is a probability measure.

    From the definition of conditional probability, we obtain the multiplica-tion formula for probabilities

    P(A n B) = P(A)P(BIA). More generally, if A1, A2 ,, An are events such that P (n?:::i1 Ai) # 0, then

    P (0 Ai) = P(At)P(A2 IAt) ... P (Ani :0: Ai) . The following law of total probability is useful in computing probabilities

    of complicated events. Let {A1' A2 , , An} be a measurable partition of 0, Ai E A, i =

    1,2, ... , n, the Ai'S are disjoint and U?=l Ai = O. Assuming that P(Ai) > 0, i = 1,2,, n, then for any B E A,

    n

    P(B) = L: P(BIAi)P(Ai). i=l

    As a consequence, we obtain Bayes' formula: If P(B) > 0, then for any j = 1,2, ,n,

    P(Aj )P(BIAj ) P(AjIB) = L:?=1 P(Ai)P(BIAi)

    The above formulas can be extended to an infinitely countable partition of 0 (Exercise).

    We turn now to the concept of independence of events. This is a stochas-tic independence concept, since it is defined in terms of P.

    For A, BE A with P(A), P(B) # 0, it is intuitive that "A is indepen-dent of B" (with respect to P) when

    P(AIB) = P(A), and similarly, "B is independent of A" when

    P(BIA) = P(B). In both cases,

    P(A n B) = P(A)P(B), (1.1)

  • Basic Probability Background 9

    which is taken to be the definition of the independence of two events A and B. Note that (1.1) is symmetric in A and B and makes sense for all events (even if P(A) or P(B) = 0).

    It should be noted that in general, disjoint events are not independent! If A n B = 0 and P(A) f. 0, P(B) f. 0, then (1.1) cannot hold.

    Viewing {A, B} as a collection of events, the above concept of (stochas-tic) independence is extended to an arbitrary collection of events as follows. Let I be an arbitrary set. A collection {Ai, i E I} ~ A is said to be inde-pendent if for any finite J ~ I, we have

    P (n Aj) = II P(Aj), jeJ jeJ

    where the symbol I1 stands for "product". In particular, when 1= {1, 2,, n}, the events A l , A2 ,, An are (mutually) independent if for k = 1,2,, nand 1 ~ i l , < i2 < ... , < ik ~ n,

    P (QA;;) = ilP(A,;) The independence of Ai'S implies that any two events Ai and Aj are in-dependent (pairwise independent). However, the converse does not hold (Exercise) .

    Viewing {A} and {B} as two collections of events, we define independent collections of families of events as follows.

    Let I be a set and Ci ~ A, i E I. Then the collections Ci'S are said to be independent if for any finite J ~ I and all Ai E Ci , i E I,

    P (n Ai) = II P(Ai). iEJ iEJ

    In particular, when I = {1, 2, ... ,} and Cn = {An}, n E I, the infinite sequence of events An, n ~ 1 is independent if any finite number of An's are (mutually) independent.

    Finally, note that for A, B, G E A, we say that A and B are independent given G if

    P(A n BIG) = P(AIG)P(BIG). The general concept of conditional independence appears naturally in the context of Markov processes (e.g., Lesson 3 and Lesson 5), and will be formulated in the context of random variables in Section 1.3.

  • 10 Lesson 1

    1.2 Random variables and their distributions In performing a random experiment such as rolling two dice, we might be interested in various numerical quantities resulting from the outcomes of the experiment. For example, X ="the sum of two numbers shown" and Y ="the product of two numbers shown" .

    Since X, Y,"', are names of some quantities, they are called variables. A variable, like X, can take different values: 2,3, "', 11. Unlike determin-istic variables where we can assign values to them directly, the values of X depend on the outcomes of the roll of two dice. For example, the value X = 3 corresponds to the outcome w = (1,2) or w = (2,1). Variables of this type are called random variables. Since the values of a random vari-ables depend on the outcomes of random experiments, these variables are in fact functions of outcomes.

    For 0 = {(i,j) : i,j = 1,2", ,6}, X: 0 -+ JR and X(w) = X(i,j) = i +j.

    Many quantities of interest in the real world can be viewed as random variables, such as the annual income of an individual randomly selected from a population, the number of car accidents at a given location and a given time of the day, the arrival times of customers between 9 am to 4 pm, at a bank, ....

    Let (0, A, P) be the model of the random experiment underlying a random variable X. The range of X (i.e, the possible values that X can take) is a subset of the real line JR. (X is called a random vector i.f its range is some subset of a Euclidian space JRd, d> 1, and a random element when its range is of some more complicated nature, such as the infinite dimensional space of continuous functions). In a problem involving X, we are interested in various events which can be "described" by X, such as "X belongs to some subset A of JR". This event (X E A) occurs when the outcome w is such that X(w) EA. Thus (X E A) = {w EO: X(w) E A}. Since X : n -+ JR, we can write

    (X E A) = X-1(A), where X-l : P(JR) -+ P(O) is defined by

    X-1(A) = {w : X(w) E A} ~ 0, A~JR. Since P is specified on (0, A), we can assign to (X E A) the probability value P(X E A) = P(X-1(A)), provided X-1(A) E A. When 0 is discrete, we take A = P(O) so that X-1(A) E A for all A ~ JR. But for infinitely uncountable n, this is not true since not all subsets of n are events (elements of A). Recall that subsets of 0 are qualified as events only if we can assign

  • Basic Probability Background 11

    probability values to them. Now, on JR, there is a natural u-field namely the Borel u-field 8(JR) generated by the intervals of JR (see Exercise 5 and the Appendix). Also, for technical reasons given in the next section, events associated with random variables are (X E A) for A E 8(JR). Thus we arrive at the following definition.

    Definition 1.2 Let (n,A) and (JR,8(JR be two measurable spaces. A random variable is a map X : n -+ JR such that for any A E 8(JR) , X-I(A) E A. Remarks.

    (a) A map X satisfying the condition in the above definition is called a measurable function. More specifically, X is a A - 8(JR) measurable func-tion. Note that the probability P on (U, A) plays no role in the definition.

    (b) If the range of the random variable X is discrete (continuous), the X is called a discrete (continuous) random variable.

    (c) By technical reasons, we might need to consider extended random variables, that is, we allow oo as values. In this case, X : n -+ IR = [-00,00] and by definition, X is a (extended) random variable if {w : X(w) ::; t} E A for any t E JR.

    (d) More generally, for d ~ 1, a measurable mapping X : (n,A) -+ (JRd,8(JRd is called a random vector. Write X = (Xl, X 2 , , X d ), where Xk : n -+ JR, k = 1,2,, d, then it can be shown that X is a random vector if and only if each Xk is a random variable. Note that elements of 8(JRd) are Borel sets of JRd (see Appendix).

    I

    EXaIIlple 1.1 (a) The number of heads obtained in tossing a coin five times and number of tosses needed to obtain the first head in repeated tosses of a coin are examples of discrete random variables.

    (b) The waiting time for service of a customer in a queue and the time at which some event of interest (such as breakdowns, earthquakes, .. -) occurs are examples of continuous random variables.

    It can be shown that if X and Y are random variables defined on the same (n,A), then X Y, XY, max(X, Y), and min(X, Y) are also ran-dom variables. Also if X n , n ~ 1 is a sequence of random variables, then sUPn X n , infn Xn are extended random variables (Exercise). It is also the case for the following quantities:

    limsupXn = lim (SUPXk) n_oo n_oo k$n

  • 12 Lesson 1

    and

    liminfXn = lim (inf Xk) . n-+oo n-+oo k$n

    In particular, when liffin-+oo Xn exists (That is when limsuPn-+oo Xn lim infn-+oo X n ), it is also a random variable.

    The simplest random variables are indicator functions of events (sets). Let A S; 0, then the function 1A : 0- {O, 1} defined by

    1 {1 if wE A A(W) = 0 elsewhere

    is called the indicator (function) of set A. Obviously, if A E A, then 1A is a random variable. The events associated with X = 1A are {0, 0, A, AC}. This is a sub-IT-field of A and is called the IT-field generated by 1A, denoted by IT(1A). Since

    IT(1A) = {(1A)-l(B) : B E 8(JR)} , we define the IT-field generated by an arbitrary random variable X as

    IT(X) = {X-l(B) : BE 8(JR)}. This is indeed a sub-IT-field of A (Exercise).

    Let P be a probability measure on (O,A). When dealing with random variables defined on (0, A, P), we are interested, not in P, but in probability measures over their ranges. For example, let (0, A, P) be the finite prob-ability space describing the random experiment of rolling two dice. Let X denote the sum of two numbers shown. Since = {(i,j) : i,j = 1,2,, 6} is finite, the range of X is also finite: R(X) = {2, 3, ... , 12}. The proba-bility measure Px on (R(X), P(R(X))) is defined by

    Px(A) = P (X-l(A , VA S; R(X). This probability measure (induced by X) describes the probabilistic "be-havior" of X on its range. Here, since R(X) is finite, it suffices to specify the values

    Px(x) = P(X = x), Vx E R(X). In our example,

    P(X = 2) = P{(1, 1)} = 1/36, P(X = 3) = P{(1, 2), (2, 1)} = 2/36, ... ,

  • Basic Probability Background 13

    x = x 2 3 4 5 6 7 8 9 10 11 12 P() 1 2 3 4 5 6 5 4 3 2 1

    XX 36 36 36 36 36 36 36 36 36 36 36

    The function Px(.) : 'R(X) - [0,1] is a robability mass function (Exen(x) Px (x) = 1). The knowledge of the probability mass function of X is equivalent to that of the probability measure Px, since for A ~ 'R(X),

    Px(A) = L: Px(x). xeA

    Also X can be characterized by its umulative distribution function(CDF or distribution function for short) as

    Fx: IR- [0,1]' Fx(x) = P(X ::; x). Since

    Fx(x) = L: Px(y), y$x

    and for x E 'R(X) = {2, 3, ... , 12}, Px(x) = Fx(x) - Fx(x - 1).

    In general, for real-valued random variables, it turns out that the distribu-tion function

    Fx : IR - [0,1]' Fx(x) = P(X ::; x). determines completely the induced probability measure Px(A) = P(X-l(A)) on (IR, B(IR)). See Appendix. Thus, in any case, distribution functions characterize probabilistic structures of random variables.

    There are three types of distribution functions.

    (a) F is piecewise constant. There are at most a countable number of jump points Xl, X2, , at which ~F(xn) = F(xn) - F(xn -) > 0, where F(x-) denotes the left limit at x, i.e., liffiyjx F(y). In this case, the asso-ciated probability mass function is f(xn) - ~F(xn) with En f(xn) = l. The random variable having such a distribution function is called a discrete random variable.

    (b) absolutely continuous distribution functions. By this, we mean a distribution function F of the form

    F(x) = 1:00 f(y)dy,

  • 14 Lesson 1

    where 1 : IR --+ IR+ and J:O I(y)dy = 1. 1 is called a probability den-sity function. Random variables having this types of distribution functions are referred to as continuous random variables. Note that, except on a countable set of points, F(:c) is differentiable and F'(:c) = I(:c).

    (c) Singular distribution functions. There are distribution functions F which are continuous (there are no mass points, that is P(X = :c) = for all :c), but have all their points of increase (that is, points :c such that F(:c + c) - F(:c - c) > for all c > 0) on sets of zero "Lebesgue measure". As an example, let X = E:=l Xn/(3n) where the Xn's are independent with the same distribution

    P(Xn = 0) = 1 - P(Xn = 2) = 1/2. Then the distribution F of X is continuous, and yet F does not admit a density. This can be seen as follows.

    Each point :c E [0,1] can be represented in ternary notation as :c = E:=l an/(3n), where an E {O, 1, 2}. The range of X is the subset A of [0,1] consisting of:c such that an E {0,2}. Now A (the Cantor set) is obtained as A = n~l Bn , where the Bn's are constructed as follows. Starting with [0,1], we divide [0, 1] into 3 sub-intervals oflength 1/3 and delete the closed middle interval [1/3, 2/3], the remaining is Bl = [0,1/3) U (2/3, 1]. In the step two, divide [0,1/3) and (2/3,1]' each into 3 sub-intervals of length 1/32 , delete the closed middle interval, the remaining is

    B2 = [0'312)U(322,~)U(~,~)U(:2,1], and so on. Note that the Bn's decrease, and each Bn is the disjoint union of 2n sub-intervals, each of length 1/3n . Thus the "length" of A is:

    L(A) = lim L(Bn) = lim (-32) n = 0. n--+oo n--+oo

    But since A is the range of X, we have P(X E A) = 1. These facts show that X does not have an absolutely continuous distribution F. It can be shown, however, that F is continuous.

    Every distribution function F can be written in the form aF1 + f3F2 + -yFa, where a + f3 + -y = 1 and Fl, F2 , Fa are of types (a), (b), (c) above respectively.

    Distribution lunctions 01 random vectors are defined as follows.

  • Basic Probability Background 15

    Definition 1.3 Let X = (Xl,"" Xn) be a random vector. Then Fx : IRn -+ [0,1], Fx (Xl,'" ,Xn) = P(XI :::; Xl," ',Xn :::; Xn)

    is called the joint distribution function of Xi'S. The joint density function is

    /(Xl' X2, " Xn) =!l :n Fx a (when it exists). Xl X2'" Xn

    For 1 :::; i l < i2 < ... < ik :::; n, the joint distribution of the random vector (Xil' Xi 2 ,"', Xi,,) is F(i 1 ,i2 .... ,i,,) (Xit> X'2"", Xi,,) = Fx (00,, Xit> 00,"', X'2' 00,"', Xi", 00,), and is a k-dimensional marginal distribution.

    For example, the marginal distribution of Xi is

    F,(Xi) = F(oo,, Xi, 00,,00). (an expression like F(x, 00) means liIlly ..... oo F(x, y)).

    We discuss now the concept of conditional distributions. Let (0, A, P) be a probability space. Recall that, for fixed A E A with P(A) =F 0, the set-function

    PA(.) : A -+ [0,1]' PA(B) = P(BIA) is a probability measure on A and is called the conditional probability mea-sure given A.

    In applications, when several random variables are involved, we are often interested in computing expressions like P(Y E AIX = x), denoted also as PfI:' for event A in the range of Y. As a function of A for fixed x, this set-function is called the conditional law of Y given that X = x. The associated distribution function F(ylx) = P(Y :::; ylX = x) is the conditional distribution of Y given X = x. This function is well defined when P(X = x) =F 0. For example, suppose that X is discrete with support {Xl, X2,""} (that is, P(X = xn) > 0, n ~ 1, and E~=l P(X = xn) = 1), then F(.lxn ) represents the distribution of Y after observing the value Xn of X. Before observing X, P(Y E AIX) is a random variable defined by

    00

    P(Y E AIX)(w) = E P(AIBn)IB,,(w), n=l

    where Bn = {w : X(w) = xn }. Note that {Bn, n ~ I} forms a partition of O.

  • 16 Lesson 1

    When X is continuous random variable (so that P(X = x) = 0, 'ix), the situation is delicate! Note that, in the discrete case, we never have to consider conditional probabilities with respect to events where probabilities are zero. The situation is different for continuous X: all observed values of X are not mass points. For example, let X be the outcome of randomly selecting a point in the unit interval [0, 1]. For X = x, we build a unbalanced coin with probability of getting a head in a single toss equal to x. Let Y denote the number of heads_ obtained when tossing that coin 10 times. Obviously, the probability of getting k heads is

    P(Y = klX = x) = ( ~o ) xk(l- X)lO-k while P(X = x) = 0.

    The conditional distribution F(ylx) = P(Y ~ ylX = x) in such cases can be defined rigorously by using some sophisticated mathematics (known as the "Radon-Nikodym theorem", see Appendix). some details will be given in the next section.

    For computational purpose, when the pair of random variables (X, Y) has a joint density function f(x, y), then

    F(ylx) = 1Yeo f(zlx)dz, where the conditional density function of Y given X = x is

    f( I ) - f(x, y) y x - fx(x) for fx(x) "I 0,

    and is defined as zero for fx(x) = 0, and fx(x) is the marginal density function of X given by

    fx(x) = 1: f(x, y)dy. In view of the independence of events, the independence of random

    variables is expressed as follows.

    The random variables Xl, ... , Xn are said to be (mutually) independent if

    n

    P(Xl E A l , ,Xn E An) = II P(Xi E Ai) i=l

    for all choices of Ai E B(JR), i = 1,2, ... , n.

  • Basic Probability Background 17

    The interpretation is this. The information related to each Xi is the O'-field generated by Xi:

    O'(Xi) = {X;I(B) : BE B(IR)}. Saying that the Xi'S are independent is the same as saying that the col-lections of events {O'(Xi) : i = 1, .. ,n} are independent. (See Section 1.1).

    In this spirit, independence of an arbitrary collection of random vari-ables (such as an infinite sequence of random variables) is defined similarly.

    For discrete or continuous random variables Xl,"', X n , the indepen-dence of Xi'S is expressed simply as

    n

    f(Xl,"" Xn) = II fk(Xk), V(XI, ""Xn ) E IRn , k=1

    where f is the joint mass (or density) probability function of the Xi'S and fi is the marginal mass (or density )probability function of Xi'

    Sums of independent random variables appear often in the studies of stochastic processes. The following is the formula for obtaining their dis-tributions.

    Suppose that X and Yare two independent discrete random variables with values in to, 1, 2, ... }. The distribution of Z = X + Y is completely determined by the mass probability function

    fx(n) = P(Z = n) = P(X + Y = n), n ;?: 0. Now, for fixed n, (X + Y = n) = Uk=o(X = k, Yn - k). Since the events {w : X(w) = k, Yew) = n - k}, k = 0,1"", n are disjoint, we have

    n n

    P(X + Y = n) = L P(X = k, Y = n - k) = L P(X = k)P(Y = n - k), k=O k=O

    by independence. The counter-part of this formula in the continuous case IS

    fz(z) = I: fx(x)Jy(z - x)dx, z E IR, in symbol, fz = fx * fy. The operation * is called convolution. Note that fx * fy = fy * fx. More generally, the convolution of two distribution functions F and G is defined as:

    F * G(z) = I: F(z - x)dG(x).

  • 18 Lesson 1

    We conclude this section with a remark on the conditional independence of random variables.

    The Markov property ( Lesson 2) states that "given the present, the future is independent of the past". Now, in the context of random phe-nomena, these states are functions of random variables. Thus we need to formulate rigorously the concept of conditional independence of random variables. Since independence of random variables is essentially related to u-fields generated by them, the appropriate place for formulating this concept is in the next section.

    1.3 Expectation Consider a random experiment such as rolling two dice and let X be the sum of two numbers shown. What is the average (mean or expected) value of X? We will answer this question using our modeling scheme. The experiment is modeled by the probability space (n,A,p), where n = (i,j) : i,j = 1,2, .. " 6}, A = pen), and P( {w}) = 1/36, 'Vw En. The random quantity X is modleed as a random variable, that is, a map from n to {2, 3,", 12}. The probability mass function of X is

    I(k) = P({w : X(w) = k}), k E {2, 3, .. " 12}. If we repeat the experiments n times, then each value k is expected to appear about nl(k) times. Thus the average of the results of X is

    12 12 ~ L:(nl(k))k = L:kl(k). 1:=2 1:=2

    Thus, for random variables with finite ranges, the expected value (or mean, or expectation) of X is taken to be

    E(X) = L:xP(X = x). :c

    The extension of this formula to random variables whose ranges are infinite (countable or not)is a little delicate. To avoid meaningless expressions such as 00 - 00, we first consider random variables with constant sign, say, non-negative (extended) random variables.

    A random variable X with finite range {Xl, X2,"', xn} can be written as

    n

    X(w) = L: xi1A;(w), i=l

  • Basic Probability Background 19

    where Ai = {w : X(w) = Xi}. Note that the Ai'S form a (measurable) partition of n. Such a variable is called a simple random variable. We have

    n

    E(X) = L XP(Ai). i=1

    Now, let X be an extended non-negative random variable defined on (n, A, P) Then X is the limit (pointwise) of an increasing sequence of simple random variables. It suffices to consider

    n2"-1 .

    Xn(w) = L 2zn l[~~x

  • 20 Lesson 1

    Definition 1.4 Let X be an extended random variable. (i) If both E(X+) and E(X-) are 00, we say that the expectation of X

    does not exist. (ii) When not both E(X+) and (X-) are 00, we say that the expectation

    of X exists and is equal to E(X) = E(X+) - E(X-).

    (iii) If both E(X+) and (X-) are finite, then we say that the expectation of X is finite and that X is integrable,

    ( EIXI < 00 ::::> E(X+) < 00 and E(X-) < 00. ) Note that E(X) can be used to define the integral on (11, A, P) as

    InX(w)dP(w). The following properties of expectation are easy to check. (a) X $ Y implies that E(X) $ E(Y). (b) For any real numbers a and (3, E(aX + (3Y) = aE(X) + (3E(Y). For computations, we have

    E(X) = I>f(x) (if X is discrete) :c

    and E(X) = 1: xf(x)dx (if X is continuous).

    More generally, if t/J : IR --+ IR (measurable), then

    E(t/J(X)) = 1: t/J(x)f(x)dx. If Xl, X 2, ... ,Xn are independent random variables, then

    E (g Xi) = g E(Xi) (Exercise) . Note that, for an infinite sequence of independent random variables X n , n ~ 1, it might happen that

    E (ii Xn) # ii E(Xn). (See Exercise 10 of Lesson 11).

    Let n ~ 1 be an integer. If X ~ 0 or xn is integrable, then E(xn) is called the moment of X of order n (or nth order moment of X). Note that

  • Basic Probability Background 21

    if E(xn) < 00, then E(xm) < 00 for m :::; n. However, X might not have moments of order> n.

    For n = 2, the quantity E(X - E(X))2 is called the variance of X and is denoted as Var (X) or simply V(X), its positive square root is called the standard deviation of X. For two random variables X and Y, having second moments, the covariance of X and Y is the quantity

    cov (X, Y) = E [(X - E(X))(Y - E(Y))]. If cov (X, Y) = 0, then X and Yare said to be un correlated. Of course, independent random variables are uncorrelated, but the converse is not true.

    Now, we consider the important concept of conditional independence.

    Consider two random variables X and Y, defined on (0, A, P). We are going to formulate the notion of expectation of Y when we observe X.

    First, suppose that X is discrete with range{xn, n ~ 1}. The variable X induces a (measurable) partition (finite or countable) of 0:

    Dn = {w : X(w) = xn} n ~ 1. When X = X n, we might be interested in P(AIX = xn) for A E A and E(YIX = xn). Of course P(AIX = xn) = P(AIDn).

    Before observing X, the conditional probability of the event A given X is a random variable defined as

    P(AIX)(w) = Ep(AIDn )1D,,(W). n~1

    If Y is a random variable with finite range {Y1, Y2, ... , Ym}, then m

    E(Y) = E YiP(Bi), Bi = {w : Y(w) = yd, i=1

    thus, by analogy, m

    E(YIX = Xn) E(YIDn) = LYiP(B;lDn ) i=1

    1 m P(Dn) ?:y;P(B; n Dn) = E(YIDn)

    .=1 -,- , .

  • 22 Lesson 1

    In general, if the extended random variable Y whose expectation exists, then E(YID) exists for D E A with P(D) > 0, where

    E(YID) = 10 Y(W)dPD(W), and PD(.) denotes the conditional probability measure on A defined by

    PD(A) = P(AID), AEA. It can be shown that

    E(YID) = E(YID)/P(D). Now, consider the partition Dn , n ~ 1, induced by the discrete random

    variable X. Before observing X, the conditional expectation of Y given X, denoted as E(YIX), is a random variable. The above discussions leads to the following definition.

    Definition 1.5 Lei Y be an extended random variable whose expectation exists and X be a discrete random variable. Then the conditional expecta-tion of Y given X is a random variable defined by

    E(YIX)(w) = L E(YIX = xn)l(x=x,,)(w). n?:l

    The dependence of the expectation of Y on X can be also expressed in terms of the u-field u(X) generated by X. Here u(X) is the u-field generated by the partition Dn = {w : X(w) = xn}, n ~ 1. Note that u(X) represents the information about X. Thus we can write

    E(YIX) = E(Ylu(X. Note that P(AIX) = P(Alu(X. With this identification, we have the following:

    (i) The random variable E(Ylu(X is u(X)-measurable and for any A E u(X),

    where

    L Y(w)dP(w) = L E(Ylu(X(w)dP(w), L Y(w)dP(w) = 10 lA(w)Y(w)dP(w).

    (ii) By E(YIX1 , ,Xk), we mean E(Ylu(Xl' ,Xk.

  • Basic Probability Background 23

    (iii) We can define the conditional expectation of Y with respect to any sub-u-field of A, as a function on 0 satisfying the conditions in (i). In partic-ular, when X is continuous, E(YIX) is still well-defined in this framework. The existence of a function E(YIX) satisfying the conditions in (i) is proved by using a theorem in Measure Theory, known as the Radon-Nikodyn the-orem (see Appendix).

    We list below some useful properties of conditional expectations (Exer-cise). Let V be a sub-u-field of A.

    (a) E('IV) is increasing and linear: X :::; Y ==> E(XIV):::; E(YIV) (a.s.)

    where a.s. stands for almost surely, that is, the property is true on a subset 0 0 ~ 0 with P(Oo) = 1. Also for a, /3 E JR,

    E (aX + /3YIV) = aE(XIV) + /3E(YIV), (a.s.) (b) For V = {0, O}, E(XIV) = E(X). (c) E (E(XIV)) = E(X). (d) If C is a sub u-filed of V and C ~ V, then

    E (E(XIV)IC) = E(XIC) (a.s.) (e) If X is independent of V, that is, independent of {lD D E V},

    then E(E(XIV)) = E(X) (a.s.)

    (f) If Y is V-measurable, then E(XYIV) = Y E(XIV) (a.s.)

    (g) Jensen's inequality: If : JR -+ JR is a convex funtion, and (X) is integrable, then

    (E(XIV)) :::; E ((X)IV) (a.s.) We close this section with the definition of conditional independence of

    random variables.

    Definition 1.6 We say that X and Yare conditionally independent given Z if for any A E u(X) and B E u(Y), we have

    P (A n Blu(Z)) = P (Alu(Z)) P (Blu(Z)) (a.s.)

  • 24 Lesson 1

    1.4 Limit theorems When using stochastic processes to model random phenomena (Lesson 2), we are interested in their behavior for large values of the time parameter (as well as other properties such as their time dependent structures). The con-cept of limits of sequences of random variables is suitable for investigating this property.

    Let X n , n ~ 1, be a sequence of random variables defined on O. There are different kinds of convergence for (Xn, n ~ 1). Definition 1.7 The sequence (Xn, n ~ 1) is said to converge in proba-bility to a random variable X if for any c > 0,

    lim P(IXn - XI > c) = 0, n ..... oo

    in symbol, Xn ~ X. The interpretation is this. With high probability, Xn is close to X for

    large values of n.

    A stronger concept of convergence is

    Definition 1.8 The sequence (Xn, n ~ 1) is said to converge almost surely (or with probability one) to X if

    P (w : Xn(w) --+ X(w)) = 1, in symbol, Xn ~ X. Remarks.

    (i) It can be shown that if Xn ~ X, then Xn ~ X. The converse does not hold. See Exercise 1.25.

    (ii) To prove the a.s. convergence, the following equivalent criterion is useful:

    Xn~X ::::> lim P (sup IXk - XI > c) = 0 n ..... oo k~n

    for any c > O.

    (iii) For random variables with finite second moments, Tchebychev's inequality is useful for checking convergence in probability:

    P (IX - E(X)I ~ c) ~ V(X)jc2 Concerning the moments of random variables, we have

  • Basic Probability Background 25

    Definition 1.9 Let each X n , n ~ 1 and X have finite moments of order k. Then the sequence (Xn , n ~ 1) converges in k-mean to X if

    lim E (IXn - Xlk) = 0, n_oo

    in symbol, Xn ~ X. In particular, when k = 2, the L2-convergence is also called the convergence in mean square.

    Remarks.

    (i) The Lk-convergence implies the convergence in probability.

    (ii) If Xn ~ X, then liffin_oo E(Xn) = E(X). (iii) There are no simple relations between Lk-convergence and a.s.-

    convergence. Finally, we are interested in the limiting distribution of the Xn's.

    Definition 1.10 Let X n , n ~ 1 and X be random variables with distri-bution functions Fn, n ~ 1 and F, respectively. Then Xn is said to be converge in distribution to X, denoted by Xn ~ X, if

    lim Fn(x) = F(x) \:Ix E C(F), n-oo

    where C(F) denotes the subset of IR on which F is continuous. Remarks.

    (i) If Xn ~ X, then Xn ~ X. (ii) When Xn ~ X, F is called the limiting distribution of the sequence

    (Xn , n ~ 1). The two important results in Probability Theory related to various

    modes of convergence of random variables are the following.

    A. Laws of large numbers.

    There are two types of laws of large numbers, which are "strong" (a.s.) and "weak" (in probability), according to the convergence concept involved.

    (a) Weak law of large numbers. If (Xn , n ~ 1) is a sequence of in-dependent random variables having the same distribution (identically dis-tributed) with finite mean Il, then

    Xl +X2 + +Xn n

    p ~ Il, as n -+ 00.

  • 26 Lesson 1

    (b) Strong law of large numbers. If (Xn, n ~ 1) is a sequence of independent, identically distributed random variables with E(IXll) < 00, then

    Xl +X2 + +Xn ~ E(X!) , n

    as n --+ 00.

    B. Central limit theorem.

    This theorem concerns the limiting distribution of the partial sums Sn = Xl + X2 + ... + Xn property centered and normalized. Specifically, if (Xn, n ;::: 1) is a sequence of independent, identically distributed random variables with finite common second moment, then

    Sn - nE(X!) D N(O 1) = ---+ "as n --+ 00,

    u"n

    where u is the standard deviation of Xl and N(O, 1) denotes the standard normal random variable with probability density function given by

    1 _x2/2 f(x) = .,fi;e , x E JR.

    Remarks.

    (a) Saying that the sequence Zn = (Sn - nE(X!)j(uVn) ~ N(O,I) is the same as

    lim P(Zn ~ t) = 1t !.::e-x2 / 2dz, n--+oo -00 v27r

    Vt E JR.

    (b) The proof of the centallimit theorem involves the transformation of the distribution functions, known as "Fourier transform". Specifically, let f be the probability density function of the random variable X. Then the characteristic function of X is defined to be:

    j(t) = E(eitx ) = 1: eitx f(z)dx, Vt E JR, where i is the usual complex number R. The transformation j is "char-acteristic" in the sense that it determines completely the distribution of X. This transformation is useful in finding distribution functions. Other transformations are

    (i) Generating functions. If X is a non-negative, integer-valued ran-dom variable, the the generating function of X is defined by

    00

    q,(t) = E(tX) = L P(X = n)tn n=O

  • Basic Probability Background 27

    for It I < 1. (ii) Laplace transform. For X ~ 0, the Laplace transform of the

    density f of X is

    1/;(t) = E(e-tX ) = 100 e-tx f(x)dx for any complex t.

    1.5 Exercises 1.1. Specify (O,A, P) for the following random experiments.

    (i) Tossing a balanced coin five times. (ii) An urn contains 10 white and 4 black balls. Five balls will be drawn

    (without replacement) from the urn. An outcome is defined as the number of black balls obtained in the drawn sample.

    (iii) Consider a unbalanced coin with probability of getting a head in each toss equal p. Toss that coin (independently) until the first head ap-pears. An outcome is defined as the number of tosses needed.

    1.2. Let (0, A, P) be a probability space. Show that (i) A is a field. (ii) If A, B E A, then A - B = {w : w E A,w ~ B} EA. (Hint: first

    prove the DeMorgan's Laws: (A n B)C = AC U B C, (A U B)C = N nBc.) (iii) If An E A, n 2:: 1, then n~=lA, EA. (iv) If A, B E A with A ~ B, then P(A) ~ P(B). (v) If An E A, n 2:: 1, then

    P (91 An) ~ ; P(An). (vi) If A, B E A, then

    P(A U B) = P(A) + P(B) - P(A n B).

    1.3. Let An ~ 0, n ~ 1. (i) Show that lim inf An ~ lim sup An. n ..... oo n ..... oo

  • 28 Lesson 1

    (ii) Verify that

    l~~~f An = {w : f: 1A :',(w) < oo} n=l

    whereas

    lim sup An = {w : f: 1An(w) = oo}. n-+oo n=l

    Give an interpretation for these events.

    1.4. Let 0 be an infinitely countable space and let f: 0 -+ [0,1] such that EWEO f(w) = 1. Define P : P(O) -+ [0,1] by

    P(A) = I: f(w), A E P(O). wEA

    Verify that P is a probability measure on the measurable space (0, P(O). 1.5. Let 0 be a set and C = P(O).

    (i) Show that the collection of IT-fields containing C is not empty. (ii) If Al and A2 are two IT-fields containing C, then Al n A2 is also a

    IT-field containing C, where Al nA2 = {A: A E A l , A E A2 }. (iii) Show that the intersection of all IT-fields containing C is the smallest

    IT-field containing C (the IT-field Al is smaller than the IT-field A2 if Al ~ A2 ). 1.6. Let (0, A, P) be a probability space. An infinite countable (measur-able) partition of 0 is a sequence An E A, n ~ 1 such that An n Am = 0 for n # m, and U~=lAn = O. Let {An, n ~ 1} be an infinite countable partition of 0 and B E A. Show that

    00

    P(B) = I: P(An)P(BIAn) n=l

    and for P(B) > 0, P(Am)P(BIAm)

    P(AmIB ) = ........ 00 ...,of A \ rofrol A \' "1m ~ 1.

    1.7. Consider the following events in the experiment of tossing of a fair coin twice: A =" a head occurs on the first toss", B =" a head occurs on

  • Basic Probability Background 29

    the second toss", and C ="exactly one head occurs". Are A, B, C pairwise independent? Are A, B, C mutually independent?

    1.8. Let (0, A, P) be a probability space. Let A, B, C E A such that P(A n B) > O. Show that if P(ClA n B) = P(CIA) then Band C are independent given A.

    1.9. Let (0, A, P) be a probability space. Let X : 0 -+ JR. (i) Show that for A, An ~ JR, n ~ 1,

    X-l(AC) = (X-l(A))C, X-l COl An) = nOl X-l(An), and

    X- l CQ An) = [1 X-l(An). (ii) Let X- l (B(JR)) = {X-l(A) : A E B(JR)}, Verify that X- l (B(JR))

    is a u-field on O. Let X(A) = {X(A) : A E A}, where X(A) = {X(w) : wE A}. Is X(A) a u-field on JR?

    (iii) Let Px(.) = P (X-l(.)) on B(JR). Verify that Px(.) is a probability measure.

    10. Let X be a random variable taking values in 1R = [-00,00]. Recall that such an extended random variable is defined by the condition: {w : X(w) ~ t} E A for any t E JR.

    (i) Verify that 00

    {w: X(w) < oo} = U{w: X(w) ~ n}. n=l

    (ii) Use (i) to show that {w: X(w) = oo}, {w: X(w) = -oo} E A.

    (iii) Verify that if (Xn, n ~ 1) is a sequence of extended random vari-ables, then

    {w: supXn(w) ~ t} = n{w: Xn(w) ~ t}, '

  • 30 Lesson 1

    1.11. Let F be the distribution function of a random variable X, that is,

    F : IR --t [0,1]' F(x) = P(X ~ x). Show that

    (i) F is monotone non-decreasing, i.e., x < y implies F(x) ~ F(y). (ii) li~-+_oo F(x) = 0, liII1n-+oo F(x) = 1. (iii) F is right-continuous, i.e., liIlly",,,, F(y) = F(x) for any x E IR.

    1.12. A random variable X taking values in an interval [a, b] ~ IR is said to be uniformly distributed on [a, b] if it is a continuous random variable with the probability function given by

    1 f(x) = b _ a l[a,6](x),

    (i) Find the distribution function of X. (ii)Compute P(X > a) for a < a < b.

    x E IR.

    1.13. Let f(x) = ie-AI"'I, x E IR (for some A > 0). (i) Verify that f is a probability density function. (ii) Find the associated distribution function.

    1.14". Let X : (0, A) ~ (JR, B(JR. Show that X is a random variable if and only if one of the following conditions holds. For all x E JR,

    (i){w: X(w)~X}EA. (ii) {w : X(w) > x} EA. (iii) {w : X(w) ~ x} E A. (iv) {w : X(w) < x} EA.

    1.15. Compute the means and variances of the following random variables. (i) Binomial: f(k) = (~)pk(I-P)n-k,k=0,1,2, ... ,nwithgiVen

    nand p E [0,1]. (ii) Geometric: f(k) = p(1 '- p)k-l, k = 1,2, with p E [0,1]. (iii) Poisson: f( n) = e-A An / n!, n = 0, 1,2, ... with A > 0. (iv) Exponential: f(x) = Ae-A"'I(o,oo)(x) with A > 0. (v) Normal: f(x) = e-("'-JJ)2/(2u2) /(.../2-i, x E JR with I' E IR and (J' > 0.

  • Basic Probability Background 31

    (vi) Gamma (n, A): /(x) = Ae-A~(AX)n-l j(n - 1)11[0,00)(x) with A > 0 and n> O.

    1.16*. Let X be a random variable taking values in {O, 1,2, .}. Show that 00

    E(X) = L P(X > n). n=O

    1.17*. Show that (i) If X ~ 0 then E(X) = 1000 P(X > t)dt. (ii) For any real-valued random variable X,

    E(X) = 100 P(X > t)dt -1~ P(X $ t)dt. (iii) E(IXlk) = kIt t k- 1 P(IXI > t)dt.

    1.18. Let X : (0, A, P) -+ ni+ = [0,00] be a non-negative random vari-able. For each integer n, define

    n2"-1 .

    Xn(w) = L 2zn 1[~~x O. Show that E(X) = 00. 1.21. Let X be a random variable with values in {Xl, X2,, xn }. let Dk = {w : X(w) = Xk}, k = 1,2,, n.

    (i) Verify that the Dk'S form a (measurable) partition of O. (ii) For A E A, show that E(AIX) = P(A). (iii) Let Y be a discrete random variable, independent of X. Show that

    P(X + Y = niX = m) = P(Y = n - m).

  • 32 Lesson 1

    1.22. Prove the properties of conditional expectation listed at the end of Section 1.3.

    1.23. Show that (i) The characteristic function of N(O, 1) is e- t2 / 2 What is the charac-

    teristic function of N(I', 0'2)? (ii) The generating function ofthe Poisson random variable with f( n) =

    e- A An In!, n = 0,1,2 .. is e-(l-t)A.

    1.24. Let Xl, X 2 , " Xn be independent random variables. Show that the characteristic (respectively, generating) function of the sum Xl +X2 + ... + Xn is the product of the characteristic (respectively, generating) function of the Xl's, j = 1,2"", n. 1.25. Let X, Xn , n ~ 1, be random variables defined on (S1,A, P).

    (i) Show that A = {w : Xn(w) -+ X(w), as n -+ co} EA. (ii) Let An(c) = {w: IXn(w) - X(w)1 > c}. Show that

    Xn ~ X if and only if P (lim sup An (c)) = 0, n-+oo

    for any c > O.

    (iii) Suppose that the Xn's are independent with 1

    P(Xn = 1) = - = 1 - P(Xn = 0). n

    Show that Xn ~ O. Does Xn converge a.s. to O? (Hint: use Borel-Cantelli lemma.)

  • Lesson 2

    Modeling Random Phenomena

    In this Lesson, we motivate the use of the concept of Stochastic Processes as a means to model random phenomena. It is emphasized that the analysis of random phenomena in terms of stochastic processes relies heavily on the mathematical theory of probability.

    2.1 Random Phenomena As opposed to deterministic phenomena, random phenomena are those whose outcomes cannot be predicted with certainty, under identical con-ditions. We are all familiar with gambling schemes such as "tossing a fair coin", "rolling a pair of dice", etc. Random phenomena which evolve in time are the subject of this text. The following are examples of such phe-nomena.

    Example 2.1 A xerox machine in an office is either "out of order" or "in operating condition". Let Xn denote the state of machine, say, at 8:00am of the nth day. This is an example of a random phenomenon which evolves in discrete time and which has a finite number of "states".

    Example 2.2 Let X t , t ~ 0 denote the state of a patient (with a specific disease) at time t. Suppose there are four possible states: 1 = the patient is identified as having the disease; 2 = recovery; 3 = death due to disease; 4 = death due to some other cause. This is a random phenomenon which evolves in continuous time and which has a finite "state space".

    33

  • 34 Lesson 2

    Example 2.3 In Example 2.1, starting with n = 1, let Yn be the number of days (among the first n days) where the machine is "out of order". The sequence {Yn , n ~ I} constitutes a random phenomenon evolving in discrete time and having an infinitely countable state space {O, 1,2", .}.

    Example 2.4 Consider an event such as "the arrival of a customer for service at a bank". Obviously, such events occur at random times. If we let Tn, n ~ 1 denote the arrival time of the nth customer, then the sequence {Tn, n ~ I} constitutes a random phenomenon evolving in discrete time and having a continuous state space [0,00).

    Example 2.5 In Example 2.4, if we let Nt, t ~ 0 be the number of events that have occurred in the time interval [0, tJ, then the family {Nt, t ~ O} constitutes a random phenomenon evolving in continuous time and having a discrete state space.

    Example 2.6 An example of a random phenomenon evolving in continu-ous times and having a continuous state space, say (-00,00), is the famous Brownian motion. It was observed that small particles immersed in a liquid exhibit irregular motions. Thus the displacement of a particle at time t, X t , along some axis from its starting position, is a random quantity. The motion of the particle is a random phenomenon evolving in continuous time and having a continuous state space.

    2.2 Stochastic Processes If we examine the above examples, then we see that there is uncertainty in the "outcomes" of the phenomena. If we make the basic assumption that the uncertainty involved is due to chance (or randomness), then it makes sense to talk about the chances for their occurrences although the outcomes cannot be predicted with certainty. For example, in tossing a fair coin, although we cannot predict with certainty the occurrence of H(ead) or T(ail) , we still can assign 50-50 chance to each of these two possible outcomes.

    Thus the random phenomena can be viewed as family of random vari-ables indexed by a time set. The mathematical concept of random variables as well as related concepts were reviewed in Lesson 1.

    From the above point of view, we are going to describe a random phe-nomenon as a stochastic process, that is, a family of random variables X t , t E T, where T is some index set, usually, T ~ R = [-0000), in-terpreted as a time set. The common range of random variables, Xt's (the

  • Modeling Random Phenomena 35

    set of their possible values) is called the state space of the process and is denoted by S.

    Stochastic processes are thus the mathematical models for random phe-nomena. They are classified according to the nature of the time set T and the state space S (discrete or continuous). For example, it T is continuous, say, [0,00) and S is discrete, say, S = {-'" -2, -1, 0,1,2, .}, then the process is called a continuous-time process with discrete state space. The classification of stochastic processes is exemplified by the examples of the previous section as follows.

    Example 2.1: A discrete-time stochastic process with a finite state space. Example 2.2: A continuous-time stochastic process with a finite state

    space. Example 2.3: A discrete-time stochastic process with a discrete state

    space. Example 2.4: A discrete-time stochastic process with a continuous state

    space. Example 2.5: A continuous-time stochastic process with a discrete state

    space. Example 2.6: A continuous-time stochastic process with a continuous

    state space.

    2.3 Distributions of Stochastic Processes We are going to specify rigorously the structure of stochastic processes. The standard probability background for this section has been reviewed in Lesson 1.

    As stated in Section 2.2, a stochastic process X is a collection ofrandom variables X t , t E T. Since, in general, the time set T is infinite (countable or not), we need to elaborate a little bit on the concept of probability laws (distributions) governing an infinite collection of random variables. Note that a stochastic process (Xt , t E T) can be viewed as a random function, that is a random variable talking values in a space of functions. (See details below.)

    To be concrete, consider the case where T = [0, 00) and S = IR = (-00,00). Each random variable X t is defined on some probability space (0, A, P) and taking values in the set IR of real numbers. To specify the process X = (Xt , t ~ 0) is to specify the space (O,A, P) and the maps X t , t ~ O. As we will see in the following Lessons, it is possible, in practice, to specify the finite dimensional distributions of X, that is, joint cumulative distribution functions (CDF) of the form

    F(tl,'",t .. )(X1,, xn) = P (Xh ~ Xl,, X t .. ~ Xn)

  • 36 Lesson 2

    for n 2: 1, t1, .. " tn E T, :e1, .. ',:en E JR, or equivalently, the probability measures of the random vectors (Xtl' .. " X t ,,), namely

    Pt(B) = P{w: (Xt1(w),"',Xt,,(w) E B} (2.1)

    where t = (t1' .. " tn) and B E 8(JRn ) (see Lesson 1 for notation). The construction of (0, A, P) and Xt should take the set :F of all finite

    dimensional distributions of X into account. First, for each w E 0, the sample path at w is the real-valued function

    defined on T : t -+ Xt(w). Thus we can take 0= JRT which is the set of all real-valued functions defined on T, so thatXt(w) = w(t), with wE JRT, that is, for each t E T,

    X t : JRT --JR.

    For Xt to be random variable, the u-field A on JRT should be such that X t- 1(B) E A for any B E 8(JRn ).

    More generally, in view of (2.1), A should also contain all (finite dimen-sional) cylinder sets of JRT, that is, subsets A of JRT of the form

    A = {w E JRT: (w(tt), ,w(tn) E B}

    for some B E 8(JRn ). Let C denote the set of all such cylinder sets of JR T. Then take A to

    be the u-field generated by C, denoted by u(C), i.e., the smallest u-field containing C.

    It remains to construct a probability measure P on (lRT, 0'( C) satisfy-ing (2.1) with the collection :F = {Pt } given in advance.

    Observe that if (0, A, P) is given, then the induced collection :F will satisfy the following consistency condition:

    (i) If a is a permutation of elements of {I, 2, .. n} and fa: JRn -- JRn: (:e1,"', :en) -+ (Xa(l), .. " Xa(n) ,

    then, obviously, Pt(B) = Pa(t) (J~l(B ,

    for BE 8(JRn), t = (t1,"', tn), and a(t) = (ta(l),"" ta(n' (ii) For t = (t1,"', t n), s = (t1,"" tn, Sn+1), and B E 8(JRn), we have

    Pt(B) = Ps(B x JR).

    Thus, it is possible to construct P compatible with (2.1) when the given collection :F satisfied the above consistency condition. Below we will sketch

  • Modeling Random Phenomena 37

    the proof that P is unique. The probability P so obtained is referred to as the distribution of the process X. It represents the complete probabilistic information concerning the process X, in the same way that a probability measure characterizes probabilistically a random variable. We also refer to P as the probability law governing the random evolution of the process X, or of the random phenomenon under study.

    Note that the construction (IR7, u(C), P) and X t : IR7 -+ JR : w -+ w(t), is referred to as the canonical representation of the process X. From the probabilistic view point, two processes are equivalent if they admit the same collection of finite dimensional distributions F.

    The construction of P from a consistent family F goes as follows. First, it can be verified that the collection C of all (finite dimensional)

    cylinder sets of JRT is a field. Define P on C by

    P(A) = Pt(B), (2.2) where A = {w E JRT : (W(tl),'" ,w(tn) E B} and t = (t1,"", tn), and BE 8(JRn ).

    Although the representation of cylinder sets is not unique, P is well-defined on C through (2.2), that is, the value P(A) is the same for all pos-sible representations of A. This is guaranteed precisely by the consistency condition of F.

    It can be shown that Pis u-additive on the field C. Then from a stan-dard extension theorem in measure theorey (see Appendix), P is uniquely extended to a probability measure on u(C). This result is called the Kol-mogorov existence theorem.

    From the above canonical representation of a stochastic process X, we see that, in applications, it suffices to specify the set F of all possible finite dimensional distributions of X. The knowledge of F is sufficient for com-putations of all quantities of interest related to X. Since the computations of various events of interest in stochastic processes are based on the rig-orous calculus of probabilities, some technical problems should be at least mentioned. We have in mind the justification of various subsets as events, that is in the domain of a probability measure so that the computations of their probabilities make sense.

    (a) The u-field u(C) might be too small as compared to the space JRT. We might need to enlarge u(C) to include more subsets of JRT.

    For any given probability space (0, A, P), it is always possible to enlarge A without changing P on A (see Exercise 2). A probability space (0, A, P) is said to be complete if subsets of elements A E A such that P(A) = 0 are elements of A. In other words, all subsets of zero probability events

  • 38 Lesson 2

    are events. Unless stated otherwise, (0, A, P) is always assumed to be completed, without loss of generality.

    (b) When dealing with continuous-time stochastic processes, we might be interested in computing the probabilities of "events" such as {w E R[O,oo) : w(.) is continuous },

    {w: wet) = 0 for some t ~ O} = Ut~o{w: wet) = OJ, {w: supXt(w):::::; a} = nt>o{w: Xt(w):::::; a}.

    t~O -

    Now, observe that the above subsets of R[O,oo) are uncountable unions and intersections of elements of u(C). They need not be in u(C). This also implies that functions like SUPt>o X t (.) and inft>o X t (.) might not be u(C)-measurable (i.e., they might not be random variables).

    Fortunately, since the real line R is rich enough in structure, the above technical problem can be handled by calling upon the concept of separable versions of stochastic processes.

    Specifically, let T = [0,00), or more generally, an interval of R, a stochastic process is said to be separable if there exist a countable dense set D ~ T and A E A with peA) = 0 such that

    {w: Xt(w) E F, tEInD} \ {w: Xt(w) E F, t E InT} ~ A (2.3) for any closed set F and any open interval I of R.

    Let B = {w: Xt(IN) E F, tEInD} and C = {IN: Xt(w) E F, t E In T}, we have C ~ B as D ~ T. Not that B \ C = {w : wEB, w C}.

    For a separable process, (2.3) implies that AC n B = AC n C. (See Exercise 3)

    where A C is the complement of A, that is \ A. Since A E A and I n D is countable, AC n BE A, and hence AC n C E A.

    Assuming that (0, A, P) is complete, we have An C ~ A with A E A and peA) = 0, and hence An C E A (of course peA n C) = 0). Now C = (A n C) U (AC n C) E A. Thus for separable stochastic processes, functions such as SUPt>o X t (.) and inft>o X t (.) are legitimate random variables.

    Fortunately, every stochMtic process X = (Xt, t E T) with state space S C Rand T being an interval of R, has a separable version, that is a stodastic process X = CXt, t E T) which is equivalent ot X. Thus in the following, without loss of generality, we always assume that (0, A, P) is a complete probability space and that real-valued, continuous-time processes are separable.

  • Modeling Random Phenomena. 39

    2.4 Some Important Properties of Stochastic Processes

    First consider a. very special stochastic process X = (Xn, n = 1,2", -) with state space S = {O, I}. We assume that the variables Xn's are indepen-dent and having the same distribution, say,

    P(Xn = 1) = p = 1 - P(Xn = 0), '

  • 40 Lesson 2

    the random variables X t2 -Xt1 , Xt3 -Xh ,"', X t" -Xt,,_l are independent. If for any t, sET with t < s, the distribution of X6 - X t depends only on s - t, then the process X is said to have stationary increments.

    Various stochastic processes studied in this text have stationary and independent increments, such as Poisson processes (Lesson 4).

    From a modeling point of view, the assumption of independent incre-ments seems appropriate when the random phenomenon exhibits an obvious fact that outcomes in disjoint time intervals are independent. The station-ary increments property can be postulated when it seems plausible that the distribution of outcomes in any time interval depends only on the length of that interval.

    For processes having stationary and independent increments, their finite dimensional distributions are obtained simply in terms of distributions of increments. Thus it suffices to specify the latter in applications.

    Next, the process (Yn , n ~ 1), associated with a Bernoulli process (Xn, n ~1), has the following form of "conditional dependence":

    For any n ~ 1, the conditional distribution ofYn given YI , Y2 , "', Yn- l , is the same as the conditional distribution of Yn given Yn- l . Indeed, since Yn = Yn - l + X n ,

    P(Yn = klYI = kl, .. " Yn- l = kn-d P(Yn = klYn- 1 = kn- l ) P(Xn = k - kn-I}.

    Roughly speaking, the ''future'' Yn depends only on the "present" Yn-l, and not on the entire "past" Yl, Y2,"', Yn-l. In other words, given the present Yn-l, the future Yn is independent of the past YI , Y2, ... , Yn- 2. This property is formulated in the general case under the name of Markov property.

    Definition 2.2 The stochastic process X = (Xt, t E T) is called a Markov process if it has the following Markov property:

    For any t, sET, with t < s, the conditional distribution of X6 given X t is the same as the conditional distribution of X6 given {Xu, u $ t}. Specifically, for any choices oftl, t2,"', tn in T and B E B(IR), we have

    P (Xt" E BIXtt = Xl,"', Xt"_l = Xn-l) = P (Xtn E BIXtn_1 = Xn-l) . The Markov property is suitable for modeling situations in which future

    behaviors of random phenomena are not altered by additional information about their past, once their present states are known. Markov processes will occupy a large portion in this text. Poisson processes (Lesson 4) are examples of continuous-time Markov chains (when the state space S is

  • Modeling Random Phenomena 41

    discrete, that is finite or infinitely countable, the Markov process is called a Markov chain), Brownian motion (Lesson 12) is is a continuous-time Markov process.

    The specification of finite dimensional distributions of a Markov process is obtained by using

    (i) transitional probability functions, that is expressions of the form P(Xs E BIXt = x), t < s

    and (ii) the initial distribution of X to , where to is the smallest element ofT. Note that independent increments property is a stronger requirement

    than Markov property. Next, observe that the identical distributions of the Xn's in a Bernoulli

    process can be expressed as follows. For any n, m, the distribution of Xn and X n+m are the same. More

    generally, in view of the independence property of Bernoulli processes, for any nl, n2, ... , nm and k, the joint distributions of the random vectors (Xnll X n2 , ,Xnm ) and (Xn,+k,Xn2+k," "Xnm+k) are the same. This property is formulated for other processes as follows.

    Definition 2.3 A stochastic process X = (Xt , t E T) is strictly station-ary if for any choices oftl, t2,"" tn in T and h > 0 such that tj + hE T for each i = 1,2,,n, the joint distribution of (Xt"Xt2 , .. ,Xt .. ) and (Xt,+h,Xt2+h," "Xt .. +h) are the same.

    As a consequence of the definition, we see that, for a strictly stationary process, the distributions of Xt's are identical. Indeed, take h = s - t, n = 1, and t1 = t, in the definition (with arbitrary t, s such that t < s), the distribution of X t is the same as that of Xt+(s-t) = Xs.

    When dealing with second order processes, i.e., processes (Xt , t E T) such that E(X2) < 00 for each t, the following concept of stationarity is useful (Lesson 9 and Lesson 10). Definition 2.4 A second order process (Xt , t E T) is called a weakly sta-tionary process if the mean function m(t) = E(Xt ) is independent oft, and the covariance function Cov (Xs, Xt) depends only on the difference /t - s/.

    Remark.

    A weakly stationary process is also referred to as stationary in the wide sense, or second order stationary. A second order process which is strictly

  • 42 Lesson 2

    stationary is also weakly stationary (exercise). However, a strictly station-ary process with infinite second moments cannot be weakly stationary.

    From modeling point of view, stationary processes are appropriate to model random phenomena whose random behaviors seem unchanged through shifts in time, such as in economics, communication theory, ...

    Finally, if we consider a process, say (Xn , n ~ 1), such that the Xn's are independent and have means zero, then the associated process Yn E~=l X n , n ~ 1 has the following property:

    E(Yn+lIYl = Yl,, Yn = Yn) = Yn, since Yn+1 = Yn + Xn with E(Xn) = O. This property is formulated as follows.

    Definition 2.5 A stochastic process (Xt , t E T) with E(IXtl) < 00, for each t E T is called a martingale, iffor any choices oftl < t2 < ... < tn+l in T, we have

    E (Xt,,+1IXt1 , X t2 , ... , Xt,,) = X t " (a.s.). Discrete-time martingales will be treated in Lesson 11. The concept of

    martingales is appropriate for modeling random phenomena such as fair games. Note that martingale and Markov properties are distinct concepts (Exercise).

    2.5 Exercises 2.1. Give several examples ofrandom phenomena together with their math-ematical modeling by stochastic processes.

    2.2*. Let (0, A, P) be a probability space. (i) Define the collection .A of the subsets of 0 as follows. For A ~ 0, A E .A if and only if there are B l , B2 E A such that

    Bl ~ A ~ B2 and P(Bl ) = P(B2). Verify that A ~ .A and .A is au-field. (ii) Define P : .A -+ [0,1] by P(A) = P(Bt) = P(B2). Show that P is

    well-defined and is a probability measure on A. (iii) Let A E .A with P(A) = O. Show that if B ~ A, then B E A.

    ((0,.4, P) is called the completion of (0, A, P)). 2.3*. Let Xt, t E [0,00), be a real-valued stochastic process.

    (i) verify that

    { w: supXt(W):$ x} = n{w: Xt(W):$ x}. t~O t~O

  • Modeling Random Phenomena 43

    (ii) For A, B, C E A, Show that ifC ~ Band B\C ~ A then AcnB = AcnC.

    (iii) Show that if the process (Xt, t ~ 0) is separable, then the map w .- inf Xt(w)

    t~O

    is a random variable. (iv) Explain why the assumption of completeness of (0, A, P) is neces-

    sary in addressing the concept of separability of stochastic processes.

    2.4. Let (Xn, n ~ 1) be a Bernoulli process with state space S = {O, I} and probability of "success" p = P(Xn = 1).

    (i) Compute P(X2 = 0, X5 = 1, Xs = 1). (ii) Give an explicit formula for computing finite-dimensional distribu-

    tion of the process. (iii) Let Yn = E~=l Xi, n ~ 1. Verify that the process (Yn , n ~ 1) has

    stationary and independent increments. (iv) Is (Yn , n ~ 1) a Markov process? (v) Is (Yn , n ~ 1) a martingale?

    2.5*. Consider the experiment consisting of tossing a fair coin indefinitely. The space of all possible outcomes is

    0= {O, l}IN = {w = (Wl,W2,"') : Wi = 0, 1} .. (i) What is the cardinality of o? (ii) Specify the probability space (0, A, P) for modeling the above ex-

    periment.

    2.6. Let (Xt, t ~ 0) be a (strictly) stationary process with E(Xl) < 00, Vt ~ 0. Show that this process is weakly stationary.

    2.7. Let Xt , yt, t E T be two stochastic processes defined on (O,A,P). Show that if Xt = yt almost surely, that is P(w : Xt(w) = yt(w = 1, for any t ~ 0, then (Xt, t ~ 0) and (yt, t ~ 0) have the same collection of finite dimensional distributions.

    2.8*. Let (O,A,P) = ([0, 1],B([0, 1]),dx) and S = [0,1]. Let Xt(w) = on S for eachw E [0,1]' yt(w) = OonS\{to} foreachw E [0, 1], and yto(w) = 1 for w = to, elsewhere. Verify that for any t E [0, 1], P(Xt = yt) = 1. 2.9. Let (Xn, n ~ 1) be a discrete-time stochastic process with state space S = {O, 1, 2", -}. Suppose that the variables Xn's are independent and have the same probability density function f. Consider the process Yn =

    E~=l Xi, n ~ 1.

  • 44

    (i) Verify that (Yn , n ;::: 1) is a Markov chain. (ii) Show that P(Yn +1 = ylYn = x) is independent of n. (iii) Compute P(Y1 = nl, Y2 = n2, Y3 = n3) in terms of f.

    Lesson 2

    (iv) Give an explicit formula for computing finite-dimensional distribu-tions of the process (Yn , n ;::: 1). 2.10. Let (Xn, n ;::: 0) be a process having independent increments. Show that such a process is necessarily a Markov process.

  • Lesson 3

    Discrete Chains

    Time Markov

    This Lesson is devoted to detailed studies of an important class of stochastic processes whose time-dependent structures are simple but general enough to model a variety of practical random phenomena.

    3.1 The Markov Model Consider the following typical random phenomena.

    Example 3.1 Consider a system in which two identical pieces of equip-ment are installed in parallel. These pieces of equipment act independently of each other and have a reliability of a E [0,1] in a day (meaning that the probability that a piece of equipment fails during this time period is 1- a). Initially, they are in working condition. We are interested in the number of pieces of equipment which fail after n days; the time between a good working condition (both pieces of equipment are working) and the breakdown of the system (both pieces of equipment fail); and so on ...

    If we let X n , n ~ 0, be the number of pieces of equipment which are not in working conditions at the beginning of the nth day, then obviously, the Xn's are random numbers with possible values 0, 1, 2. This random phenomenon can be modeled by a discrete-time stochastic process whose state space S is finite.

    Example 3.2 Suppose a commodity is stocked to satisfy a continuing de-mand. The stock is checked at times tn, n ~ 1. At each checking time, if

    45

  • 46 Lesson 3

    the stock is below some prescribed level a, then the stock level is brought up to some prescribed level b (a < b), otherwise, no replenishment is un-dertaken. Since the demand for the commodity during each time interval [tn-I, t n) cannot be predicted with certainty, the stock level just before tn is a random number.

    If we let X n , n ~ 0, be the stock level just before time tn, then {Xn, n ~ O} is a discrete-time stochastic process with finite state space S = {O, 1"", b}.

    Example 3.3 This example is typical in every situation in which a facil-ity for common use is provided when waiting and queueing situations are encountered. Suppose that customers arrive for service at a taxi stand, and that a cab arrives every five minutes. Assume that a single customer is served during each time period. Since the number of customers who arrive at the stand at time n is random, so is the number Xn of customers waiting in line at the start of the time period n. The discrete-time stochastic process {Xn, n ~ O} has a infinitely countable state space.

    Example 3.4 Suppose that the lifetime of some piece of equipment is mea-sured in units of time, say, minutes. When a piece of equipment fails, it is immediately replaced by an identical one, and so on ...

    If we let Xn be the remaining lifetime of the piece of equipment in use at time n, then the discrete-time stochastic process {Xn, n ~ O} has {O, 1,2, ... } as state space.

    Now, if we examine the above examples, then we recognize that all the above stochastic processes (Xn, n ~ 0) possess a common time-dependent structure. Indeed, in Example 3.1, if we observe X o = io, Xl = iI, "', Xn = in, then the prediction of the ''future'' X n+1 depends only on the "present" state Xn = in of the process. The knowledge of the "past", namely Xo, Xl,"', Xn-l, will not contribute to any improvement of X n+1 In other words, the present Xn contains all information concerning the prediction of Xn+l . This property is expressed mathematically as, for any n ~ 0,

    P(Xn+1 = in+IIXo = io," '