Author
hoanghanh
View
232
Download
4
Embed Size (px)
Steven Shreve: Stochastic Calculus and Finance
PRASAD CHALASANICarnegie Mellon University
SOMESH JHACarnegie Mellon University
THIS IS A DRAFT: PLEASE DO NOT DISTRIBUTEc
Copyright; Steven E. Shreve, 1996
October 6, 1997
Contents
1 Introduction to Probability Theory 11
1.1 The Binomial Asset Pricing Model . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 Finite Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3 Lebesgue Measure and the Lebesgue Integral . . . . . . . . . . . . . . . . . . . . 22
1.4 General Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.5.1 Independence of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.5.2 Independence of -algebras . . . . . . . . . . . . . . . . . . . . . . . . . 411.5.3 Independence of random variables . . . . . . . . . . . . . . . . . . . . . . 42
1.5.4 Correlation and independence . . . . . . . . . . . . . . . . . . . . . . . . 44
1.5.5 Independence and conditional expectation. . . . . . . . . . . . . . . . . . 45
1.5.6 Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1.5.7 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2 Conditional Expectation 49
2.1 A Binomial Model for Stock Price Dynamics . . . . . . . . . . . . . . . . . . . . 49
2.2 Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.3 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.3.1 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.3.2 Definition of Conditional Expectation . . . . . . . . . . . . . . . . . . . . 53
2.3.3 Further discussion of Partial Averaging . . . . . . . . . . . . . . . . . . . 54
2.3.4 Properties of Conditional Expectation . . . . . . . . . . . . . . . . . . . . 55
2.3.5 Examples from the Binomial Model . . . . . . . . . . . . . . . . . . . . . 57
2.4 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
1
2
3 Arbitrage Pricing 59
3.1 Binomial Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2 General one-step APT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.3 Risk-Neutral Probability Measure . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3.1 Portfolio Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.3.2 Self-financing Value of a Portfolio Process . . . . . . . . . . . . . . . . 623.4 Simple European Derivative Securities . . . . . . . . . . . . . . . . . . . . . . . . 63
3.5 The Binomial Model is Complete . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4 The Markov Property 67
4.1 Binomial Model Pricing and Hedging . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2 Computational Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3 Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3.1 Different ways to write the Markov property . . . . . . . . . . . . . . . . 70
4.4 Showing that a process is Markov . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.5 Application to Exotic Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5 Stopping Times and American Options 77
5.1 American Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2 Value of Portfolio Hedging an American Option . . . . . . . . . . . . . . . . . . . 79
5.3 Information up to a Stopping Time . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6 Properties of American Derivative Securities 85
6.1 The properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2 Proofs of the Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.3 Compound European Derivative Securities . . . . . . . . . . . . . . . . . . . . . . 88
6.4 Optimal Exercise of American Derivative Security . . . . . . . . . . . . . . . . . . 89
7 Jensens Inequality 91
7.1 Jensens Inequality for Conditional Expectations . . . . . . . . . . . . . . . . . . . 91
7.2 Optimal Exercise of an American Call . . . . . . . . . . . . . . . . . . . . . . . . 92
7.3 Stopped Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
8 Random Walks 97
8.1 First Passage Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3
8.2 is almost surely finite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 978.3 The moment generating function for . . . . . . . . . . . . . . . . . . . . . . . . 998.4 Expectation of . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1008.5 The Strong Markov Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.6 General First Passage Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.7 Example: Perpetual American Put . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.8 Difference Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.9 Distribution of First Passage Times . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.10 The Reflection Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
9 Pricing in terms of Market Probabilities: The Radon-Nikodym Theorem. 111
9.1 Radon-Nikodym Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
9.2 Radon-Nikodym Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
9.3 The State Price Density Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
9.4 Stochastic Volatility Binomial Model . . . . . . . . . . . . . . . . . . . . . . . . . 116
9.5 Another Applicaton of the Radon-Nikodym Theorem . . . . . . . . . . . . . . . . 118
10 Capital Asset Pricing 119
10.1 An Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
11 General Random Variables 123
11.1 Law of a Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
11.2 Density of a Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
11.3 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
11.4 Two random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
11.5 Marginal Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
11.6 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
11.7 Conditional Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
11.8 Multivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
11.9 Bivariate normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
11.10MGF of jointly normal random variables . . . . . . . . . . . . . . . . . . . . . . . 130
12 Semi-Continuous Models 131
12.1 Discrete-time Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4
12.2 The Stock Price Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
12.3 Remainder of the Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
12.4 Risk-Neutral Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
12.5 Risk-Neutral Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
12.6 Arbitrage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
12.7 Stalking the Risk-Neutral Measure . . . . . . . . . . . . . . . . . . . . . . . . . . 135
12.8 Pricing a European Call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
13 Brownian Motion 139
13.1 Symmetric Random Walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
13.2 The Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
13.3 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
13.4 Brownian Motion as a Limit of Random Walks . . . . . . . . . . . . . . . . . . . 141
13.5 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
13.6 Covariance of Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
13.7 Finite-Dimensional Distributions of Brownian Motion . . . . . . . . . . . . . . . . 144
13.8 Filtration generated by a Brownian Motion . . . . . . . . . . . . . . . . . . . . . . 144
13.9 Martingale Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
13.10The Limit of a Binomial Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
13.11Starting at Points Other Than 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
13.12Markov Property for Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . 147
13.13Transition Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
13.14First Passage Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
14 The Ito Integral 153
14.1 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
14.2 First Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
14.3 Quadratic Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
14.4 Quadratic Variation as Absolute Volatility . . . . . . . . . . . . . . . . . . . . . . 157
14.5 Construction of the Ito Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
14.6 Ito integral of an elementary integrand . . . . . . . . . . . . . . . . . . . . . . . . 158
14.7 Properties of the Ito integral of an elementary process . . . . . . . . . . . . . . . . 159
14.8 Ito integral of a general integrand . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5
14.9 Properties of the (general) Ito integral . . . . . . . . . . . . . . . . . . . . . . . . 163
14.10Quadratic variation of an Ito integral . . . . . . . . . . . . . . . . . . . . . . . . . 165
15 Itos Formula 167
15.1 Itos formula for one Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . 167
15.2 Derivation of Itos formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
15.3 Geometric Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
15.4 Quadratic variation of geometric Brownian motion . . . . . . . . . . . . . . . . . 170
15.5 Volatility of Geometric Brownian motion . . . . . . . . . . . . . . . . . . . . . . 170
15.6 First derivation of the Black-Scholes formula . . . . . . . . . . . . . . . . . . . . 170
15.7 Mean and variance of the Cox-Ingersoll-Ross process . . . . . . . . . . . . . . . . 172
15.8 Multidimensional Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . 173
15.9 Cross-variations of Brownian motions . . . . . . . . . . . . . . . . . . . . . . . . 174
15.10Multi-dimensional Ito formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
16 Markov processes and the Kolmogorov equations 177
16.1 Stochastic Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
16.2 Markov Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
16.3 Transition density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
16.4 The Kolmogorov Backward Equation . . . . . . . . . . . . . . . . . . . . . . . . 180
16.5 Connection between stochastic calculus and KBE . . . . . . . . . . . . . . . . . . 181
16.6 Black-Scholes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
16.7 Black-Scholes with price-dependent volatility . . . . . . . . . . . . . . . . . . . . 186
17 Girsanovs theorem and the risk-neutral measure 189
17.1 Conditional expectations under . . . . . . . . . . . . . . . . . . . . . . . . . . 19117.2 Risk-neutral measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
18 Martingale Representation Theorem 197
18.1 Martingale Representation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 197
18.2 A hedging application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
18.3 -dimensional Girsanov Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 19918.4 -dimensional Martingale Representation Theorem . . . . . . . . . . . . . . . . . 20018.5 Multi-dimensional market model . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
6
19 A two-dimensional market model 203
19.1 Hedging when . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20419.2 Hedging when . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
20 Pricing Exotic Options 209
20.1 Reflection principle for Brownian motion . . . . . . . . . . . . . . . . . . . . . . 209
20.2 Up and out European call. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
20.3 A practical issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
21 Asian Options 219
21.1 Feynman-Kac Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
21.2 Constructing the hedge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
21.3 Partial average payoff Asian option . . . . . . . . . . . . . . . . . . . . . . . . . . 221
22 Summary of Arbitrage Pricing Theory 223
22.1 Binomial model, Hedging Portfolio . . . . . . . . . . . . . . . . . . . . . . . . . 223
22.2 Setting up the continuous model . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
22.3 Risk-neutral pricing and hedging . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
22.4 Implementation of risk-neutral pricing and hedging . . . . . . . . . . . . . . . . . 229
23 Recognizing a Brownian Motion 233
23.1 Identifying volatility and correlation . . . . . . . . . . . . . . . . . . . . . . . . . 235
23.2 Reversing the process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
24 An outside barrier option 239
24.1 Computing the option value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
24.2 The PDE for the outside barrier option . . . . . . . . . . . . . . . . . . . . . . . . 243
24.3 The hedge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
25 American Options 247
25.1 Preview of perpetual American put . . . . . . . . . . . . . . . . . . . . . . . . . . 247
25.2 First passage times for Brownian motion: first method . . . . . . . . . . . . . . . . 247
25.3 Drift adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
25.4 Drift-adjusted Laplace transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
25.5 First passage times: Second method . . . . . . . . . . . . . . . . . . . . . . . . . 251
7
25.6 Perpetual American put . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
25.7 Value of the perpetual American put . . . . . . . . . . . . . . . . . . . . . . . . . 256
25.8 Hedging the put . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
25.9 Perpetual American contingent claim . . . . . . . . . . . . . . . . . . . . . . . . . 259
25.10Perpetual American call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
25.11Put with expiration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
25.12American contingent claim with expiration . . . . . . . . . . . . . . . . . . . . . 261
26 Options on dividend-paying stocks 263
26.1 American option with convex payoff function . . . . . . . . . . . . . . . . . . . . 263
26.2 Dividend paying stock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
26.3 Hedging at time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26627 Bonds, forward contracts and futures 267
27.1 Forward contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
27.2 Hedging a forward contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
27.3 Future contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
27.4 Cash flow from a future contract . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
27.5 Forward-future spread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
27.6 Backwardation and contango . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
28 Term-structure models 275
28.1 Computing arbitrage-free bond prices: first method . . . . . . . . . . . . . . . . . 276
28.2 Some interest-rate dependent assets . . . . . . . . . . . . . . . . . . . . . . . . . 276
28.3 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
28.4 Forward rate agreement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
28.5 Recovering the interest from the forward rate . . . . . . . . . . . . . . . . . . 27828.6 Computing arbitrage-free bond prices: Heath-Jarrow-Morton method . . . . . . . . 279
28.7 Checking for absence of arbitrage . . . . . . . . . . . . . . . . . . . . . . . . . . 280
28.8 Implementation of the Heath-Jarrow-Morton model . . . . . . . . . . . . . . . . . 281
29 Gaussian processes 285
29.1 An example: Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
30 Hull and White model 293
8
30.1 Fiddling with the formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
30.2 Dynamics of the bond price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
30.3 Calibration of the Hull & White model . . . . . . . . . . . . . . . . . . . . . . . . 297
30.4 Option on a bond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
31 Cox-Ingersoll-Ross model 303
31.1 Equilibrium distribution of . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30631.2 Kolmogorov forward equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
31.3 Cox-Ingersoll-Ross equilibrium density . . . . . . . . . . . . . . . . . . . . . . . 309
31.4 Bond prices in the CIR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
31.5 Option on a bond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
31.6 Deterministic time change of CIR model . . . . . . . . . . . . . . . . . . . . . . . 313
31.7 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
31.8 Tracking down "!##$% in the time change of the CIR model . . . . . . . . . . . . . 31632 A two-factor model (Duffie & Kan) 319
32.1 Non-negativity of & . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32032.2 Zero-coupon bond prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
32.3 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
33 Change of numeraire 325
33.1 Bond price as numeraire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
33.2 Stock price as numeraire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
33.3 Merton option pricing formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
34 Brace-Gatarek-Musiela model 335
34.1 Review of HJM under risk-neutral . . . . . . . . . . . . . . . . . . . . . . . . . 33534.2 Brace-Gatarek-Musiela model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
34.3 LIBOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
34.4 Forward LIBOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
34.5 The dynamics of '(*)* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33834.6 Implementation of BGM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
34.7 Bond prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
34.8 Forward LIBOR under more forward measure . . . . . . . . . . . . . . . . . . . . 343
9
34.9 Pricing an interest rate caplet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
34.10Pricing an interest rate cap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
34.11Calibration of BGM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
34.12Long rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
34.13Pricing a swap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
35 Notes and References 349
35.1 Probability theory and martingales. . . . . . . . . . . . . . . . . . . . . . . . . . . 349
35.2 Binomial asset pricing model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
35.3 Brownian motion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
35.4 Stochastic integrals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
35.5 Stochastic calculus and financial markets. . . . . . . . . . . . . . . . . . . . . . . 350
35.6 Markov processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
35.7 Girsanovs theorem, the martingale representation theorem, and risk-neutral measures.351
35.8 Exotic options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
35.9 American options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
35.10Forward and futures contracts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
35.11Term structure models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
35.12Change of numeraire. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
35.13Foreign exchange models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
35.14REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
10
Chapter 1
Introduction to Probability Theory
1.1 The Binomial Asset Pricing Model
The binomial asset pricing model provides a powerful tool to understand arbitrage pricing theoryand probability theory. In this course, we shall use it for both these purposes.
In the binomial asset pricing model, we model stock prices in discrete time, assuming that at eachstep, the stock price will change to one of two possible values. Let us begin with an initial positivestock price +-, . There are two positive numbers, and . , with
$/01.2) (1.1)such that at the next period, the stock price will be either 3+4, or .5+-, . Typically, we take and .to satisfy $676897. , so change of the stock price from +-, to :+-, represents a downwardmovement, and change of the stock price from + , to .;+ , represents an upward movement. It iscommon to also have
12
S = 40
S (H) = 8
S (T) = 2
S (HH) = 16
S (TT) = 1
S (HT) = 4
S (TH) = 4
1
1
2
2
2
2
Figure 1.1: Binomial tree of stock prices with +-,(CB , .ED%F .results in tail (T). After the second toss, the price will be one of:+2G%#[email protected][email protected];+"I#[email protected] G +-,K)L+2G%#?MA("3+NO#[email protected]:.5+4,>)+2G%A([email protected]".5+">A(J.5:+-,K) +2G%APA(JQ3+"IA(JQ G +-,%RAfter three tosses, there are eight possible coin sequences, although not all of them result in differentstock prices at time S .For the moment, let us assume that the third toss is the last one and denote byT [email protected]?M)*[email protected]?MAP)?MAX?M)*?YAPA()[email protected]?M)A(?MAP)APA(?M)APAPA[Zthe set of all possible outcomes of the three tosses. The set
Tof all possible outcomes of a ran-
dom experiment is called the sample space for the experiment, and the elements \ of T are calledsample points. In this case, each sample point \ is a sequence of length three. We denote the ] -thcomponent of \ by \J^ . For example, when \_?MA(? , we have \`a? , \NGPbA and \Hc(? .The stock price + ^ at time ] depends on the coin tosses. To emphasize this, we often write + ^ d\a .Actually, this notation does not quite tell the whole story, for while +-c depends on all of \ , +2Gdepends on only the first two components of \ , +N depends on only the first component of \ , and+ , does not depend on \ at all. Sometimes we will use notation such + G d\ )e\ G just to record moreexplicitly how +fG depends on \_gd\hi)e\NGI)e\HcI .Example 1.1 Set + , gB , .9LF and j G . We have then the binomial tree of possible stockprices shown in Fig. 1.1. Each sample point \Qkd\`l)\NGK)e\Hci represents a path through the tree.Thus, we can think of the sample space
Tas either the set of all possible outcomes from three coin
tosses or as the set of all possible paths through the tree.
To complete our binomial asset pricing model, we introduce a money market with interest rate ;$1 invested in the money market becomes m5hn0K in the next period. We take to be the interest
CHAPTER 1. Introduction to Probability Theory 13
rate for both borrowing and lending. (This is not as ridiculous as it first seems, because in a manyapplications of the model, an agent is either borrowing or lending (not both) and knows in advancewhich she will be doing; in such an application, she should take to be the rate of interest for heractivity.) We assume that o`n0p0.fR (1.2)The model would not make sense if we did not have this condition. For example, if "nqpr0. , thenthe rate of return on the money market is always at least as great as and sometimes greater than thereturn on the stock, and no one would invest in the stock. The inequality #[email protected]`n6KI u ,9/,I+-,llR (1.3)If the stock goes down, the value of your portfolio is/,I+">A(-n`n6KI u ,9/,I+-,ll)and you need to have
u A( . Thus, you want to choose u , and , to also haveu IA(J/,l+">A[nVJn0KI u ,`/,l+-,IlR (1.4)
14
These are two equations in two unknowns, and we solve them below
Subtracting (1.4) from (1.3), we obtainu #[email protected] u A(J , #+ #[email protected]+ A(l) (1.5)so that /,P u I#[email protected]" u OA(+ #[email protected]"q+ A( R (1.6)This is a discrete-time version of the famous delta-hedging formula for derivative securities, ac-cording to which the number of shares of an underlying asset a hedge should hold is the derivative(in the sense of calculus) of the value of the derivative security with respect to the price of theunderlying asset. This formula is so pervasive the when a practitioner says delta, she means thederivative (in the sense of calculus) just described. Note, however, that my definition of /, is thenumber of shares of stock one holds at time zero, and (1.6) is a consequence of this definition, notthe definition of , itself. Depending on how uncertainty enters the model, there can be casesin which the number of shares of stock a hedge should hold is not the (calculus) derivative of thederivative security with respect to the price of the underlying asset.
To complete the solution of (1.3) and (1.4), we substitute (1.6) into either (1.3) or (1.4) and solveforu , . After some simplification, this leads to the formulau ,P `n6 `n0[9.9 u O#[email protected] .bJn0K.oq u >A(aR (1.7)
This is the arbitrage price for the European call option with payoffu at time . To simplify this
formula, we define x `n0[9.q ) x .0e`n6K.o9 P
) (1.8)so that (1.7) becomes u ,P Jn1H
u I#[email protected] u OA[R (1.9)Because we have taken 9g. , both and are defined,i.e., the denominator in (1.8) is not zero.Because of (1.2), both
and are in the interval $)II , and because they sum to , we can regardthem as probabilities of ? and A , respectively. They are the risk-neutral probabilites. They ap-peared when we solved the two equations (1.3) and (1.4), and have nothing to do with the actualprobabilities of getting ? or A on the coin tosses. In fact, at this point, they are nothing more thana convenient tool for writing (1.7) as (1.9).
We now consider a European call which pays offs
dollars at time F . At expiration, the payoff ofthis option is
u G x8#+ G s v , where u G and + G depend on \ and \ G , the first and second cointosses. We want to determine the arbitrage price for this option at time zero. Suppose an agent sellsthe option at time zero for
u , dollars, where u , is still to be determined. She then buys , shares
CHAPTER 1. Introduction to Probability Theory 15
of stock, investingu ,P/,l+-, dollars in the money market to finance this. At time , the agent has
a portfolio (excluding the short position in the option) valued at (x/,I+NHn`n6KI u ,9/,I+-,llR (1.10)Although we do not indicate it in the notation, +" and therefore depend on \` , the outcome ofthe first coin toss. Thus, there are really two equations implicit in (1.10): O#[email protected] /,i+N>#[email protected]`n6KI u ,a9,I+-,iO) IA(x /,i+N>A(-nJn6KI u ,9/,I+-,IlRAfter the first coin toss, the agent has
dollars and can readjust her hedge. Suppose she decides tonow hold shares of stock, where is allowed to depend on \ because the agent knows whatvalue \` has taken. She invests the remainder of her wealth, `6
16
Equation (1.13), gives the value the hedging portfolio should have at time if the stock goes downbetween times $ and . We define this quantity to be the arbitrage value of the option at time if\`aCA , and we denote it by u OA[ . We have just shown thatu OA[Px `n6"
u G%A([email protected] u GKAPA(R (1.14)The hedger should choose her portfolio so that her wealth
A( if \ A agrees with u A(defined by (1.14). This formula is analgous to formula (1.9), but postponed by one step. The firsttwo equations implicit in (1.11) lead in a similar way to the formulas?W" u G%[email protected] u G>?A[+2G%[email protected]+2GK#?MA( (1.15)and
I#[email protected] u O#[email protected] , where u O?W is the value of the option at time if \`aQ? , defined byu I#[email protected] `n0f u GK#[email protected]?W-n u G>?A[R (1.16)
This is again analgous to formula (1.9), postponed by one step. Finally, we plug the values #[email protected] O?W and IA(P u OA[ into the two equations implicit in (1.10). The solution of these equa-
tions for /, and u , is the same as the solution of (1.3) and (1.4), and results again in (1.6) and(1.9).
The pattern emerging here persists, regardless of the number of periods. Ifu ^ denotes the value at
time ] of a derivative security, and this depends on the first ] coin tosses \`I)ERIRER)\N^ , then at time]
CHAPTER 1. Introduction to Probability Theory 17
Definition 1.1 A probability measure is a function mapping into $)E with the followingproperties:(i) T J ,(ii) If I)*[GI)ERIRIR is a sequence of disjoint sets in , then gY^*f [^lb^2 #[^KlRProbability measures have the following interpretation. Let be a subset of . Imagine that T isthe set of all possible outcomes of some random experiment. There is a certain probability, between$ and , that when that experiment is performed, the outcome will lie in the set . We think of #[ as this probability.Example 1.2 Suppose a coin has probability c for ? and Gc for A . For the individual elements ofT
in (2.1), define pU>[email protected][email protected]?Z( cK c ) pU>[email protected]?MAZ( cK G Gc% ) pU>?MA(?Z( c G Gc% ) pU>?MAPAZ( c% Gc> G ) pUIA[?W?Z( c G c% ) pUIA(?MAZ( c% Gc> G ) pUIAA(?Z~ c Gc G ) pUIAPAAZ~ Gc c RFor pUl\PZ3R (2.2)For example, pU>[email protected][email protected]?)[email protected]?A[)?jA(?M)?MAAZa8 S- c n1F S; G FS- n7 S- FS; G S )which is another way of saying that the probability of ? on the first toss is c .As in the above example, it is generally the case that we specify a probability measure on only someof the subsets of
Tand then use property (ii) of Definition 1.1 to determine #[ for the remaining
sets Vj . In the above example, we specified the probability measure only for the sets containinga single element, and then used Definition 1.1(ii) in the form (2.2) (see Problem 1.4(ii)) to determine for all the other sets in .Definition 1.2 Let
Tbe a nonempty set. A -algebra is a collection of subsets of T with the
following three properties:
(i) /M ,
18
(ii) If V , then its complement (M ,(iii) If i)(GE)*cK)IRIRER is a sequence of sets in , then ^2 (^ is also in .Here are some important -algebras of subsets of the set T in Example 1.2: , J) T ) J) T )OU>[email protected]?W?M)[email protected]?jAP)?jA(?M)?MAaAZ3)UIA(?M?)wAX?MAP)APA[?)APAAZ )(G J) T )OU>[email protected]?W?M)[email protected]?jAZ:)U>?jA(?M)?MAAZ3)UIA(?M?)wAP?A[Z3)OUlAPA(?M)APAPAZ3)
and all sets which can be built by taking unions of these )Pc g The set of all subsets of T R
To simplify notation a bit, let us definePxVU>[email protected][email protected]?)?W?MAP)?MAP?)?jAPAZ|U>? on the first toss Z:)PqxVUEA([email protected]?)wA(?MA()APAP?M)AAPAZaUIA on the first toss Z3)so that `VU>) T )*()P"Z3)and let us define xVU>[email protected][email protected]?M)*[email protected]?jAZVU>[email protected]? on the first two tosses Z3) xVU>?MA(?M)*?MAPAZ`VU>?MA on the first two tosses Z3) 5 xVUIA([email protected]?M)A(?MAZ`VUIA[? on the first two tosses Z3) 5 xVUEAPA(?M)APAPAZaVUIAPA on the first two tosses Z3)so that (G U>) T )*a[)*H)*5)P;N) )* )* a M ; )* M ; )* a 5 ) 5 ) ) )* 5 )* 5 Z3RWe interpret -algebras as a record of information. Suppose the coin is tossed three times, and youare not told the outcome, but you are told, for every set in whether or not the outcome is in thatset. For example, you would be told that the outcome is not in and is in T . Moreover, you mightbe told that the outcome is not in but is in . In effect, you have been told that the first tosswas a A , and nothing more. The -algebra is said to contain the information of the first toss,which is usually called the information up to time . Similarly, [G contains the information of
CHAPTER 1. Introduction to Probability Theory 19
the first two tosses, which is the information up to time F . The -algebra Pc(Q contains fullinformation about the outcome of all three tosses. The so-called trivial -algebra , contains noinformation. Knowing whether the outcome \ of the three tosses is in (it is not) and whether it isinT
(it is) tells you nothing about \Definition 1.3 Let
Tbe a nonempty finite set. A filtration is a sequence of -algebras P,%)*)(GE)IRERIR)wP
such that each -algebra in the sequence contains all the sets contained by the previous -algebra.Definition 1.4 Let
Tbe a nonempty finite set and let be the -algebra of all subsets of T . A
random variable is a function mappingT
into .Example 1.3 Let
Tbe given by (2.1) and consider the binomial asset pricing Example 1.1, where+-,9B , .VF and b G . Then +-, , +" , +2G and +4c are all random variables. For example,+2G%#[email protected]?MA(JV. G +-,[LI . The random variable +-, is really not random, since +-,}\a`VB for all\ T . Nonetheless, it is a function mapping T into , and thus technically a random variable,
albeit a degenerate one.
A random variable mapsT
into , and we can look at the preimage under the random variable ofsets in . Consider, for example, the random variable +2G of Example 1.1. We have+2GK#[email protected][email protected]?W"+2GK#[email protected]?MA("I)+2GK#?MA([email protected]+2G%#?MAPA("+2G%A[[email protected]+2G%A(?MA("bB5)+2GKAPA([email protected]+fG>AAPA(J%RLet us consider the interval B5)F%E . The preimage under +2G of this interval is defined to beUO\ T +2GKd\a B5)*F%>ZaVUl\0 T B/0+2G[0F%:Z(Q ; RThe complete list of subsets of
Twe can get as preimages of sets in is:) T )P[)*aj5)P;H)
and sets which can be built by taking unions of these. This collection of sets is a -algebra, calledthe -algebra generated by the random variable +2G , and is denoted by `+2G* . The informationcontent of this -algebra is exactly the information learned by observing +2G . More specifically,suppose the coin is tossed three times and you do not know the outcome \ , but someone is willingto tell you, for each set in h#+2G , whether \ is in the set. You might be told, for example, that \ isnot in , is in j 5 , and is not in ; . Then you know that in the first two tosses, therewas a head and a tail, and you know nothing more. This information is the same you would havegotten by being told that the value of +fG>d\a is B .Note that (G defined earlier contains all the sets which are in `#+2G , and even more. This meansthat the information in the first two tosses is greater than the information in + G . In particular, if yousee the first two tosses, you can distinguish P from 5 , but you cannot make this distinctionfrom knowing the value of +2G alone.
20
Definition 1.5 LetT
be a nonemtpy finite set and let be the -algebra of all subsets of T . Let be a random variable on T )w . The -algebra h generated by is defined to be the collectionof all sets of the form Ul\b T d\[email protected] , where is a subset of . Let be a sub- -algebra of . We say that is -measurable if every set in ` is also in .Note: We normally write simply U Z rather than UO\ T d\aJMZ .Definition 1.6 Let
Tbe a nonempty, finite set, let be the -algebra of all subsets of T , let be
a probabilty measure on T )w , and let be a random variable on T . Given any set gV , wedefine the induced measure of to beh ( x pU Z3RIn other words, the induced measure of a set tells us the probability that takes a value in . Inthe case of +2G above with the probability measure of Example 1.2, some sets in and their inducedmeasures are: al %J #J$) l J T `%) l $)*6" T `%) l $)*SE5 pUK+2GKZ~b P;HJ8 FS G RIn fact, the induced measure of + G places a mass of size c G at the number I , a mass of size at the number B , and a mass of size Gc% G at the number . A common way to record thisinformation is to give the cumulative distribution function l #; of +2G , defined by
l 5Px +2G0;J $) if M%) ) if 1j6B5) ) if B1jI)%) if I/0fR (2.3)
By the distribution of a random variable
, we mean any of the several ways of characterizing`
. If
is discrete, as in the case of +fG above, we can either tell where the masses are and howlarge they are, or tell what the cumulative distribution function is. (Later we will consider randomvariables
which have densities, in which case the induced measure of a set Q0 is the integral
of the density over the set .)Important Note. In order to work through the concept of a risk-neutral measure, we set up thedefinitions to make a clear distinction between random variables and their distributions.
A random variable is a mapping fromT
to , nothing more. It has an existence quite apart fromdiscussion of probabilities. For example, in the discussion above, +2G%APA([email protected]+2G>APAPA(/ ,regardless of whether the probability for ? is c or G .
CHAPTER 1. Introduction to Probability Theory 21
The distribution of a random variable is a measure
`on , i.e., a way of assigning probabilities
to sets in . It depends on the random variable and the probability measure we use in T . If weset the probability of ? to be c , then l assigns mass to the number I . If we set the probabilityof ? to be G , then l assigns mass to the number I . The distribution of +2G has changed, butthe random variable has not. It is still defined by+2GK#[email protected][email protected]?W"+2GK#[email protected]?MA("I)+2GK#?MA([email protected]+2G%#?MAPA("+2G%A[[email protected]+2G%A(?MA("bB5)+2GKAPA([email protected]+fG>AAPA(J%RThus, a random variable can have more than one distribution (a market or objective distribution,and a risk-neutral distribution).
In a similar vein, two different random variables can have the same distribution. Suppose in thebinomial model of Example 1.1, the probability of ? and the probability of A is G . Consider aEuropean call with strike price lB expiring at time F . The payoff of the call at time F is the randomvariable #+ G 0lB: v , which takes the value F if \[email protected]? or \[email protected]?MA , and takes the value $ inevery other case. The probability the payoff is F is , and the probability it is zero is c . Consider alsoa European put with strike price S expiring at time F . The payoff of the put at time F is #S+2G* v ,which takes the value F if \gAA(? or \gAAPA . Like the payoff of the call, the payoff of theput is F with probability and $ with probability c . The payoffs of the call and the put are differentrandom variables having the same distribution.
Definition 1.7 LetT
be a nonempty, finite set, let be the -algebra of all subsets of T , let bea probabilty measure on T )w , and let be a random variable on T . The expected value of isdefined to be x > }\a pUl\PZ3R (2.4)Notice that the expected value in (2.4) is defined to be a sum over the sample space
T. Since
Tis a
finite set,
can take only finitely many values, which we label -l)IRERIR) . We can partition T intothe subsets U J-lZ:)IRIRER)lU ~Q;Z , and then rewrite (2.4) as x > d\a pUl\PZ ^*f J ; d\a pUO\PZ ^*f ;^ % J ; pUl\PZ ^*f ^ pU ^ Q ^ Z ^*f ;^ h U>;^KZ3R
22
Thus, although the expected value is defined as a sum over the sample spaceT
, we can also write itas a sum over .To make the above set of equations absolutely clear, we consider +2G with the distribution given by(2.3). The definition of /+2G is /+2G +2G%[email protected][email protected] [email protected]?ZHn6+2G%#[email protected]?MA( pU>[email protected]?AZn+2G%#?MA([email protected] pU>?MA(?ZHn1+2G>?AA( pU>?MAPAZn+ G A([email protected][email protected] pUIA[?W?ZHn1+ G A(?MA( pUIA[?AZn+2G%APA([email protected] pUIAPA(?ZNn6+fG%APAPA(w pUIAPAPAZ II #(2n6BI #Pj5-nPl P;H II pU>+2GE:ZanBI pU>+2GbBZanbI pU>+2GKZ I aO U3I:Z`nB l UIBZ|nQ aO U3KZ I n6B B nB~ B B3 RDefinition 1.8 Let
Tbe a nonempty, finite set, let be the -algebra of all subsets of T , let be a
probabilty measure on T )
CHAPTER 1. Introduction to Probability Theory 23
Definition 1.9 The Borel -algebra, denoted # , is the smallest -algebra containing all openintervals in . The sets in # are called Borel sets.Every set which can be written down and just about every set imaginable is in # . The followingdiscussion of this fact uses the -algebra properties developed in Problem 1.3.By definition, every open interval #;)*l is in ~ , where and are real numbers. Since ~# isa -algebra, every union of open intervals is also in # . For example, for every real number ,the open half-line 5)6H Kf #;)*n6is a Borel set, as is Pb)*:J K2 #9")3lRFor real numbers and , the union Pb)*:29#K)6is Borel. Since ~ is a -algebra, every complement of a Borel set is Borel, so # contains ;)*w>Pb)*:29#K)6 RThis shows that every closed interval is Borel. In addition, the closed half-lines
;)*6J >f ;)*Pn1and ePb)* >f p9")*are Borel. Half-open and half-closed intervals are also Borel, since they can be written as intersec-tions of open half-lines and closed half-lines. For example,#;)*w5gPb)w39#5)6lREvery set which contains only one real number is Borel. Indeed, if is a real number, thenU>Z( >f 3p )*Pn RThis means that every set containing finitely many real numbers is Borel; if U> ) G )IRERIR) Z ,then 0 ^f U>:^KZ:R
24
In fact, every set containing countably infinitely many numbers is Borel; if bVU>i)3GE)IRERIRZ , then0 ^f U>:^KZ:RThis means that the set of rational numbers is Borel, as is its complement, the set of irrationalnumbers.
There are, however, sets which are not Borel. We have just seen that any non-Borel set must haveuncountably many points.
Example 1.4 (The Cantor set.) This example gives a hint of how complicated a Borel set can be.We use it later when we discuss the sample space for an infinite sequence of coin tosses.
Consider the unit interval $)E , and remove the middle half, i.e., remove the open intervalx B ) SB2 RThe remaining set ` $) B SB )E has two pieces. From each of these pieces, remove the middle half, i.e., remove the open set[Gx I ) SI ISI ) II RThe remaining set G $) I SI ) B SB ) ISI II )EaRhas four pieces. Continue this process, so at stage ] , the set ^ has F ^ pieces, and each piece haslength . The Cantor set x ^2 ^is defined to be the set of points not removed at any stage of this nonterminating process.
Note that the length of , the first set removed, is G . The length of (G , the second set removed,is n . The length of the next set removed is B cG , and in general, the length of the] -th set removed is F ^ . Thus, the total length removed is^*f F ^ )and so the Cantor set, the set of points not removed, has zero length.
Despite the fact that the Cantor set has no length, there are lots of points in this set. In particular,none of the endpoints of the pieces of the sets
) G )ERIRIR is ever removed. Thus, the points$) B ) SB )E%) E ) SI ) ISI ) II ) KB )IRIRERare all in
. This is a countably infinite set of points. We shall see eventually that the Cantor set
has uncountably many points.
CHAPTER 1. Introduction to Probability Theory 25
Definition 1.10 Let ~# be the -algebra of Borel subsets of . A measure on # /)w~# is afunction mapping into $) with the following properties:(i) `#J$ ,(ii) If I)*[GI)ERIRIR is a sequence of disjoint sets in ~# , then ^*f [^ ^2 `#(^ElRLebesgue measure is defined to be the measure on # /)w~# which assigns the measure of eachinterval to be its length. Following Williamss book, we denote Lebesgue measure by , .A measure has all the properties of a probability measure given in Problem 1.4, except that the totalmeasure of the space is not necessarily (in fact, -,3# J ), one no longer has the equation`# [email protected]`#[in Problem 1.4(iii), and property (v) in Problem 1.4 needs to be modified to say:
(v) If I)(GE)IRERIR is a sequence of sets in ~# with [0(G[EI and `#*0 , then ^*f [^ y> |PlRTo see that the additional requirment `#*0 is needed in (v), considera )*6l)(GJ F)*6l)Pc` S)*6O)IRIRERRThen ^2 [^ , so -,3# ^2 [^KJ$ , but yo> ,3P3JQ .We specify that the Lebesgue measure of each interval is its length, and that determines the Lebesguemeasure of all other Borel sets. For example, the Lebesgue measure of the Cantor set in Example1.4 must be zero, because of the length computation given at the end of that example.
The Lebesgue measure of a set containing only one point must be zero. In fact, sinceU>Z p )*n for every positive integer , we must have$/6-,KU>Z6-, / )*[n F RLetting j , we obtain ,KUKZ($R
26
The Lebesgue measure of a set containing countably many points must also be zero. Indeed, ifbVU> ) G )IRERIRZ , then -,3#[J ^f -,KU>:^KZ~^f $Q$RThe Lebesgue measure of a set containing uncountably many points can be either zero, positive andfinite, or infinite. We may not compute the Lebesgue measure of an uncountable set by adding upthe Lebesgue measure of its individual members, because there is no way to add up uncountablymany numbers. The integral was invented to get around this problem.
In order to think about Lebesgue integrals, we must first consider the functions to be integrated.
Definition 1.11 Let be a function from to . We say that is Borel-measurable if the setU>q N5Z is in # whenever L9# . In the language of Section 2, we want the -algebra generated by to be contained in ~ .Definition 3.4 is purely technical and has nothing to do with keeping track of information. It isdifficult to conceive of a function which is not Borel-measurable, and we shall pretend such func-tions dont exist. Hencefore, function mapping to will mean Borel-measurable functionmapping to and subset of will mean Borel subset of .Definition 1.12 An indicator function from to is a function which takes only the values $and . We call xVU>Y H#;HZthe set indicated by . We define the Lebesgue integral of to be: X-, xC,3(lRA simple function from to is a linear combination of indicators, i.e., a function of the form"#5J ^*f ^i3^3#;l)where each 3^ is of the form 3^3#;Jg %) if M[^:)$) if D[^:)and each ^ is a real number. We define the Lebesgue integral of to be [%-,x ^2 ^ 3^E%-,( ^f ^O,3(^ElRLet be a nonnegative function defined on , possibly taking the value at some points. Wedefine the Lebesgue integral of to be% %-,xb*p % [, is simple and H50J#; for every M R
CHAPTER 1. Introduction to Probability Theory 27
It is possible that this integral is infinite. If it is finite, we say that is integrable.Finally, let be a function defined on , possibly taking the value at some points and the valueP at other points. We define the positive and negative parts of to be v #;PxyzK{UKN5l)$:Z3) #;PxyzK{;U3PJ#5O)*$:Z3)respectively, and we define the Lebesgue integral of to be (-,x v ,0 %-,K)provided the right-hand side is not of the form M . If both v %-, and %-, are finite(or equivalently, ,[0 , since v n6 ), we say that is integrable.Let be a function defined on , possibly taking the value at some points and the value P atother points. Let be a subset of . We define [ , x , )where 5Pxg %) if Mp)$) if Dp)is the indicator function of .The Lebesgue integral just defined is related to the Riemann integral in one very important way: ifthe Riemann integral PJ#5: is defined, then the Lebesgue integral [, agrees with theRiemann integral. The Lebesgue integral has two important advantages over the Riemann integral.The first is that the Lebesgue integral is defined for more functions, as we show in the followingexamples.
Example 1.5 Let be the set of rational numbers in $)I , and consider Wx . Being a countableset, has Lebesgue measure zero, and so the Lebesgue integral of over $)I is , -,XQ$RTo compute the Riemann integral , J#5w3 , we choose partition points $8,M7-8EIJb and divide the interval $)E into subintervals ,K)*4) 4*)*;G)ERIRER) -I)*E . In eachsubinterval 5^O-I)*5^i there is a rational point ^ , where N ^I[ , and there is also an irrationalpoint ^ , where J# ^ h$ . We approximate the Riemann integral from above by the upper sum^*f J ^ I ^ q ^l- h ^*f a:# ^ q ^O- J%)and we also approximate it from below by the lower sum^*f J#E^II5^P9;^l-*h ^*f $(:#5^q5^O-*J$R
28
No matter how fine we take the partition of $)I , the upper sum is always and the lower sum isalways $ . Since these two do not converge to a common value as the partition becomes finer, theRiemann integral is not defined. Example 1.6 Consider the functionJ#5x b) if Q$)$) if Q$RThis is not a simple function because simple function cannot take the value . Every simplefunction which lies between $ and is of the form"#5[xg ) if $:Z($RIt follows that %-,Pb* [%-, is simple and H#;0J#5 for every $RNow consider the Riemann integral J#;5: , which for this function is the same as theRiemann integral - J#5;3 . When we partition )I into subintervals, one of these will containthe point $ , and when we compute the upper approximating sum for - J#5;3 , this point willcontribute times the length of the subinterval containing it. Thus the upper approximating sum is . On the other hand, the lower approximating sum is $ , and again the Riemann integral does notexist. The Lebesgue integral has all linearity and comparison properties one would expect of an integral.In particular, for any two functions and and any real constant ,% #/n55-, (-,`n % P%-,%) % , % , )and whenever J#;f5 for all , we have% -,~ % 5a-,IRFinally, if and are disjoint sets, then -,X -,`n ,%R
CHAPTER 1. Introduction to Probability Theory 29
There are three convergence theorems satisfied by the Lebesgue integral. In each of these the sit-uation is that there is a sequence of functions )* )*F)ERIRlR converging pointwise to a limitingfunction . Pointwise convergence just means that y> 5JJ#; for every M /RThere are no such theorems for the Riemann integral, because the Riemann integral of the limit-ing function is too often not defined. Before we state the theorems, we given two examples ofpointwise convergence which arise in probability theory.
Example 1.7 Consider a sequence of normal densities, each with variance and the -th havingmean : E#;Px F ! #"
RThese converge pointwise to the functionJ#5J$ for every M /RWe have I:%-,( for every , so y E%-,P , but (-,($ . Example 1.8 Consider a sequence of normal densities, each with mean $ and the -th having vari-ance : #5Px $ F
RThese converge pointwise to the functionJ#5x b) if Q$)$) if Q$RWe have again E%-,@ for every , so yo> E%-, , but ,@$ . Thefunction is not the Dirac delta; the Lebesgue integral of this function was already seen in Example1.6 to be zero. Theorem 3.1 (Fatous Lemma) Let E-)*@%)*F)IRIRlR be a sequence of nonnegative functions con-verging pointwise to a function . Then %-,6 y6&%'> Ea,%RIf yo> I%-, is defined, then Fatous Lemma has the simpler conclusion% %-,y> 3 Ea-,%RThis is the case in Examples 1.7 and 1.8, where y> % Ea-,P%)
30
while -,$ . We could modify either Example 1.7 or 1.8 by setting jgE if is even,but F% if is odd. Now % , if is even, but , 7F if is odd. Thesequence U> %a-,KZ >f has two cluster points, and F . By definition, the smaller one, , is y6&%'iK %a-, and the larger one, F , is y* > %a, . Fatous Lemma guaranteesthat even the smaller cluster point will be greater than or equal to the integral of the limiting function.The key assumption in Fatous Lemma is that all the functions take only nonnegative values. FatousLemma does not assume much but it is is not very satisfying because it does not conclude that %-,Py> Ea-,%RThere are two sets of assumptions which permit this stronger conclusion.
Theorem 3.2 (Monotone Convergence Theorem) Let E5)*@ %)*F)ERIRER be a sequence of functionsconverging pointwise to a function . Assume that$/1%>50KG%#;0Ec3#5EI for every j /RThen % %-,Py> 3 Ea-,%)where both sides are allowed to be .Theorem 3.3 (Dominated Convergence Theorem) Let E5)%)*F)ERIRER be a sequence of functions,which may take either positive or negative values, converging pointwise to a function . Assumethat there is a nonnegative integrable function (i.e., P%-,0 ) such that E#5 f5 for every M for every "RThen % %-,Py> 3 Ea-,%)and both sides will be finite.1.4 General Probability Spaces
Definition 1.13 A probability space T )) consists of three objects:(i)T
, a nonempty set, called the sample space, which contains all possible outcomes of somerandom experiment;
(ii) , a -algebra of subsets of T ;(iii) , a probability measure on T )
CHAPTER 1. Introduction to Probability Theory 31
Remark 1.1 We recall from Homework Problem 1.4 that a probability measure has the followingproperties:
(a) #%J$ .(b) (Countable additivity) If )* G )ERIRER is a sequence of disjoint sets in , then gY^*f [^lb^2 #[^KlR(c) (Finite additivity) If is a positive integer and I)IRERIR)P are disjoint sets in , then # IE*j J # -nIE*n6 lR(d) If and are sets in and ( , then ) (2n1 )+*P[lR
In particular, )r0 #[lR(d) (Continuity from below.) If I)*[GI)ERIRER is a sequence of sets in with P0[G(QII , then ^*f [^ y> #5lR(d) (Continuity from above.) If )* G )ERIRER is a sequence of sets in with 0 G IE , then g ^*f [^lby> #5lRWe have already seen some examples of finite probability spaces. We repeat these and give someexamples of infinite probability spaces as well.
Example 1.9 Finite coin toss space.Toss a coin times, so that T is the set of all sequences of ? and A which have components.We will use this space quite a bit, and so give it a name:
T . Let be the collection of all subsetsofT . Suppose the probability of ? on each toss is , a number between zero and one. Then the
probability of A is xa . For each \d\`I)\JGI)ERIRIRw)\H3 in T , we define pUl\PZx -, =/. )0)13254 6 7, =/. )0)18294 6 RFor each Vj , we define #[ x > pUl\PZ3R (4.1)We can define #[ this way because has only finitely many elements, and so only finitely manyterms appear in the sum on the right-hand side of (4.1).
32
Example 1.10 Infinite coin toss space.Toss a coin repeatedly without stopping, so that
Tis the set of all nonterminating sequences of ?
and A . We call this space T . This is an uncountably infinite space, and we need to exercise somecare in the construction of the -algebra we will use here.For each positive integer , we define P to be the -algebra determined by the first tosses. Forexample, [G contains four basic sets,a x Ul\qg}\`i)e\NGK)\"c>)ERIRIR \`J?M)\NG?Z The set of all sequences which begin with ?W?M) x Ul\qg}\`i)e\NGK)\"c>)ERIRIR \`J?M)\NGbAZ The set of all sequences which begin with ?A[)5 x Ul\qg}\`i)e\NGK)\"c>)ERIRIR \`JbA()e\NG`Q?9Z The set of all sequences which begin with A(?)P; x Ul\qg}\`i)e\NGK)\"c>)ERIRIR \`JbA()e\NG`CAZ The set of all sequences which begin with APA(RBecause (G is a -algebra, we must also put into it the sets , T , and all unions of the four basicsets.
In the -algebra , we put every set in every -algebra , where ranges over the positiveintegers. We also put in every other set which is required to make be a -algebra. For example,the set containing the single [email protected][email protected][email protected]?VIIeZ~VU>? on every toss Zis not in any of the -algebras, because it depends on all the components of the sequence andnot just the first components. However, for each positive integer , the setU>? on the first tosses Zis in P and hence in . Therefore,U>? on every toss Z >f UK? on the first tosses Zis also in .We next construct the probability measure on T )
CHAPTER 1. Introduction to Probability Theory 33
Let us now consider a set q for which there is no positive integer such that L . Suchis the case for the set UK? on every toss Z . To determine the probability of these sets, we write themin terms of sets which are in P for positive integers , and then use the properties of probabilitymeasures listed in Remark 1.1. For example,UK? on the first toss Z U>? on the first two tosses Z U>? on the first three tosses Z II;)and Kf U>? on the first tosses ZVU>? on every toss Z3RAccording to Remark 1.1(d) (continuity from above), pUK? on every toss Z( yK pUK? on the first tosses Z( y> RIf , then pU>? on every toss Z( ; otherwise, pU>? on every toss Z($ .A similar argument shows that if $/ so that $/ , then every set in T which containsonly one element (nonterminating sequence of ? and A ) has probability zero, and hence very setwhich contains countably many elements also has probabiliy zero. We are in a case very similar toLebesgue measure: every point has measure zero, but sets can have positive measure. Of course,the only sets which can have positive probabilty in
T are those which contain uncountably manyelements.In the infinite coin toss space, we define a sequence of random variables &2I)w&GE)IRIRER by&-^3d\ax if \ ^ ?)$ if \N^[bA()and we also define the random variable d\a" ^2 &^:d\aF ^ RSince each &-^ is either zero or one, takes values in the interval $)Ii . Indeed, [email protected]$ , #[email protected][email protected][email protected] and the other values of lie in between. We define a dyadic rationalnumber to be a number of the form
.G , where ] and : are integers. For example, c is a dyadicrational. Every dyadic rational in (0,1) corresponds to two sequences \ T . For example, #[email protected]?AAPAPAPAEIJ #?MA([email protected][email protected]?W?M?EIJ SB RThe numbers in (0,1) which are not dyadic rationals correspond to a single \0 T ; these numbershave a unique binary expansion.
34
Whenever we place a probability measure on T ) , we have a corresponding induced measure on $)I . For example, if we set G in the construction of this example, then we haveh $) F Q pU First toss is AZ( F )h F )I~Q pU First toss is ?9Z( F )h $) B Q pU First two tosses are AAZ( B )h B ) F Q pU First two tosses are A[?9Z( B ) F ) SB Q pU First two tosses are ?MAZ( B )h SB )I~Q pU First two tosses are [email protected]?Z( B R
Continuing this process, we can verify that for any positive integers ] and : satisfying$/ :L0F ^ :F ^ )we have ` :L1F ^ ) :F ^ F ^ RIn other words, the
`-measure of all intervals in $)E whose endpoints are dyadic rationals is the
same as the Lebesgue measure of these intervals. The only way this can be is for
hto be Lebesgue
measure.
It is interesing to consider what
would look like if we take a value of other than G when we
construct the probability measure on T .We conclude this example with another look at the Cantor set of Example 3.2. Let
T= be thesubset of
Tin which every even-numbered toss is the same as the odd-numbered toss immediately
preceding it. For example, ?W?MAPAAPA([email protected]? is the beginning of a sequence in T?; 6 1>= , but ?MA is not.Consider now the set of real numbers ! xVU }\a \q T?; 6 1>= Z3RThe numbers between ) G can be written as d\a , but the sequence \ must begin with eitherA(? or ?MA . Therefore, none of these numbers is in ! . Similarly, the numbers between [email protected] ) [email protected] can be written as
d\a , but the sequence \ must begin with APAPA[? or APA(?MA , so none of thesenumbers is in
! . Continuing this process, we see that ! will not contain any of the numbers whichwere removed in the construction of the Cantor set
in Example 3.2. In other words,
! .With a bit more work, one can convince onself that in fact
! , i.e., by requiring consecutivecoin tosses to be paired, we are removing exactly those points in $)I which were removed in theCantor set construction of Example 3.2.
CHAPTER 1. Introduction to Probability Theory 35
In addition to tossing a coin, another common random experiment is to pick a number, perhapsusing a random number generator. Here are some probability spaces which correspond to differentways of picking a number at random.
Example 1.11Suppose we choose a number from in such a way that we are sure to get either , B or I .Furthermore, we construct the experiment so that the probability of getting is , the probability ofgetting B is and the probability of getting I is . We describe this random experiment by takingT
to be , to be # , and setting up the probability measure so that pU3KZ B )( pUIBZ( B )P pU3E:Z~ RThis determines #[ for every set @~ . For example, the probability of the interval #$)*>is , because this interval contains the numbers and B , but not the number I .
The probability measure described in this example is
O, the measure induced by the stock price+2G , when the initial stock price +4,XCB and the probability of ? is c . This distribution was discussed
immediately following Definition 2.8. Example 1.12 Uniform distribution on $)E .Let T $)I and let ~ $)Ii , the collection of all Borel subsets containined in $)Ii . Foreach Borel set V $)I , we define (hb-,#[ to be the Lebesgue measure of the set. Because , $)E , this gives us a probability measure.This probability space corresponds to the random experiment of choosing a number from $)E sothat every number is equally likely to be chosen. Since there are infinitely mean numbers in $)I ,this requires that every number have probabilty zero of being chosen. Nonetheless, we can speak ofthe probability that the number chosen lies in a particular set, and if the set has uncountably manypoints, then this probability can be positive. I know of no way to design a physical experiment which corresponds to choosing a number atrandom from $)I so that each number is equally likely to be chosen, just as I know of no way totoss a coin infinitely many times. Nonetheless, both Examples 1.10 and 1.12 provide probabilityspaces which are often useful approximations to reality.
Example 1.13 Standard normal distribution.Define the standard normal density
P#5x FB GF RLet
T , gC~# and for every Borel set V0 , define ([x o%-,KR (4.2)
36
If in (4.2) is an interval ;)*w , then we can write (4.2) as the less mysterious Riemann integral: ;)*wNx F GF 3fRThis corresponds to choosing a point at random on the real line, and every single point has probabil-ity zero of being chosen, but if a set is given, then the probability the point is in that set is givenby (4.2). The construction of the integral in a general probability space follows the same steps as the con-struction of Lebesgue integral. We repeat this construction below.
Definition 1.14 Let T )) be a probability space, and let be a random variable on this space,i.e., a mapping from
Tto , possibly also taking the values C .D If is an indicator, i.e, d\a" d\aJ if \ M)$ if \ M()
for some set Vj , we define : gx #(ORD If is a simple function, i.e, }\aJ ^*f ^ }\al)where each ^ is a real number and each [^ is a set in , we define : Lx ^*f ^ 3 ^2 ^I (^KlRD If is nonnegative but otherwise general, we define 3 x * &6: & is simple and &d\a }\a for every \0 T RIn fact, we can always construct a sequence of simple functions &55)j%)*F)IRIRlR such that$o6&2>d\a&G%d\a6&;c3d\aRERIR for every \ T )and &jd\aJb yK &;4d\a for every \ T . With this sequence, we can define : x y> &53 R
CHAPTER 1. Introduction to Probability Theory 37D If is integrable, i.e, v 3 0b) : 0b)where v d\aPxyozK{U d\al)$:Z3) }\aPxyzK{;U3 d\al)$:Z3)then we define 3 x v 3 b1 : R
If is a set in and is a random variable, we define 3 x 3 RThe expectation of a random variable
is defined to be x 3 R
The above integral has all the linearity and comparison properties one would expect. In particular,if
and & are random variables and is a real constant, then n&;3 : 6n &_: ) : :)If d\a`&d\a for every \0 T , then : &_: R
In fact, we dont need to have d\a`&d\a for every \ T in order to reach this conclusion; it is
enough if the set of \ for which d\aPQ&d\a has probability one. When a condition holds withprobability one, we say it holds almost surely. Finally, if and are disjoint subsets of T and is a random variable, then E : 3 1n 3 RWe restate the Lebesgue integral convergence theorem in this more general context. We acknowl-edge in these statements that conditions dont need to hold for every \ ; almost surely is enough.Theorem 4.4 (Fatous Lemma) Let
5)W%)F)IRERlR be a sequence of almost surely nonnegativerandom variables converging almost surely to a random variable
. Then 3 6 y6&%'> : )
or equivalently, 6 y&%'> 5R
38
Theorem 4.5 (Monotone Convergence Theorem) Let ;)* )*F)ERIRlR be a sequence of random
variables converging almost surely to a random variable
. Assume that$/ X G[ cIE almost surely RThen 3 yK :3 )or equivalently, y> 5RTheorem 4.6 (Dominated Convergence Theorem) Let
;)*@L%)*F)IRIRlR be a sequence of randomvariables, converging almost surely to a random variable
. Assume that there exists a random
variable & such that 6& almost surely for every "RThen 3 Q y> `: )or equivalently, y> 5RIn Example 1.13, we constructed a probability measure on # /)~ by integrating the standardnormal density. In fact, whenever is a nonnegative function defined on satisfying ,P ,we call a density and we can define an associated probability measure by #[[x -, for every Qj~# lR (4.3)We shall often have a situation in which two measure are related by an equation like (4.3). In fact,the market measure and the risk-neutral measures in financial markets are related this way. We saythat in (4.3) is the Radon-Nikodym derivative of : with respect to -, , and we write 9 : -, R (4.4)The probability measure weights different parts of the real line according to the density . Nowsuppose is a function on #/)# l) . Definition 1.14 gives us a value for the abstract integral% : RWe can also evaluate o%-,K)which is an integral with respec to Lebesgue measure over the real line. We want to show that% : - %-,K) (4.5)
CHAPTER 1. Introduction to Probability Theory 39
an equation which is suggested by the notation introduced in (4.4) (substitute F GF>HI for in (4.5) andcancel the , ). We include a proof of this because it allows us to illustrate the concept of thestandard machine explained in Williamss book in Section 5.12, page 5.
The standard machine argument proceeds in four steps.
Step 1. Assume that is an indicator function, i.e., N5P 5 for some Borel set . Inthat case, (4.5) becomes #[` -,>RThis is true because it is the definition of #[ .
Step 2. Now that we know that (4.5) holds when is an indicator function, assume that is asimple function, i.e., a linear combination of indicator functions. In other words,J#5J ^*f ^E5^#;l)where each ^ is a real number and each ;^ is an indicator function. Then 3 KJ ^2 ^I;^LW: ^2 ^ % ;^H: ^2 ^ ;^K %-, J ^2 ^I;^ L , - -,>R
Step 3. Now that we know that (4.5) holds when is a simple function, we consider a generalnonnegative function . We can always construct a sequence of nonnegative simple functions )*#51>G%50Ec3#;RIRER for every /)and J#;Jb yK I2#; for every jM . We have already proved that: E3 V % E %-, for every "RWe let j and use the Monotone Convergence Theorem on both sides of this equality toget : % - ,KR
40
Step 4. In the last step, we consider an integrable function , which can take both positive andnegative values. By integrable, we mean that% v 3 0b) % : 0bRFrom Step 3, we have v 3 v ,K) 3 ,KRSubtracting these two equations, we obtain the desired result: : -v3 b : % v %-, % -, K - -,%R
1.5 Independence
In this section, we define and discuss the notion of independence in a general probability space T )w)* , although most of the examples we give will be for coin toss space.1.5.1 Independence of sets
Definition 1.15 We say that two sets j and gY are independent if 6Mh #[ NlRSuppose a random experiment is conducted, and \ is the outcome. The probability that \6 is #[ . Suppose you are not told \ , but you are told that \O . Conditional on this information,the probability that \ is # [x #6Bp )p RThe sets and are independent if and only if this conditional probability is the uncondidtionalprobability #[ , i.e., knowing that \C does not change the probability you assign to . Thisdiscussion is symmetric with respect to and ; if and are independent and you know that\ j , the conditional probability you assign to is still the unconditional probability ) .Whether two sets are independent depends on the probability measure . For example, suppose wetoss a coin twice, with probability for ? and probability LP for A on each toss. To avoidtrivialities, we assume that $/ . Then pU>[email protected]?Z( G )P pU>?MAZ( pUIA[?9Z 5 )P pUIAAZ~ G R (5.1)
CHAPTER 1. Introduction to Probability Theory 41
Let bVUK?W?M)*?MAZ and VUK?A[)A(?Z . In words, is the set ? on the first toss and is theset one ? and one A . Then MVVU>?MAZ . We compute #(h G n 5 ) )F ; ) #(w )F G ) #6Bh 5 RThese sets are independent if and only if F G 5 , which is the case if and only if G .If G , then ) , the probability of one head and one tail, is G . If you are told that the cointosses resulted in a head on the first toss, the probability of , which is now the probability of a Aon the second toss, is still G .Suppose however that $R $ . By far the most likely outcome of the two coin tosses is APA , andthe probability of one head and one tail is quite small; in fact, )pL$R $ . However, if youare told that the first toss resulted in ? , it becomes very likely that the two tosses result in one headand one tail. In fact, conditioned on getting a ? on the first toss, the probability of one ? and oneA is the probability of a A on the second toss, which is $R % .1.5.2 Independence of P -algebrasDefinition 1.16 Let and Q be sub- -algebras of . We say that and Q are independent if everyset in is independent of every set in Q , i.e, #6Rh #[ )p for every VBQM)SRExample 1.14 Toss a coin twice, and let be given by (5.1). Let be the -algebradetermined by the first toss: contains the sets) T )OU>[email protected]?)?MA[Z3)UIA[?)APAZ3RLet Q be the -albegra determined by the second toss: Q contains the sets) T )OU>[email protected]?)A[[email protected])U>?MAP)APAZ3RThese two -algebras are independent. For example, if we choose the set U>[email protected]?M)*?MAZ from andthe set U>[email protected]?)wA(?Z from Q , then we have pU>[email protected]?M)*?MAZ> pUK?W?M)A[?9Z`g G n 5 I G n 5 J G ) UK?W?M)*?MAZH U>[email protected]?)wA(?Z pU>[email protected]?Z( G RNo matter which set we choose in and which set we choose in Q , we will find that the product ofthe probabilties is the probability of the intersection.
42
Example 1.14 illustrates the general principle that when the probability for a sequence of tosses isdefined to be the product of the probabilities for the individual tosses of the sequence, then everyset depending on a particular toss will be independent of every set depending on a different toss.We say that the different tosses are independent when we construct probabilities this way. It is alsopossible to construct probabilities such that the different tosses are not independent, as shown bythe following example.
Example 1.15 Define for the individual elements of T VU>[email protected]?M)*?MA()A(?M)APA[Z to be pU>[email protected]?Z( )( pUK?AZ( F )P pUIA[?9Z( S )P pUIAPAZ~ S )and for every set T , define ( to be the sum of the probabilities of the elements in . Then T Pg , so is a probability measure. Note that the sets U>? on first toss ZU>[email protected]?)?MAZ andU>? on second toss Z U>[email protected]?)wA(?Z have probabilities pU>[email protected]?M)*?MAZ c and pU>[email protected]?M)A(?Z , so the product of the probabilities is G>T . On the other hand, the intersection of U>[email protected]?M)*?MAZand U>[email protected]?M)A(?Z contains the single element U>[email protected]?Z , which has probability . These sets are notindependent.
1.5.3 Independence of random variables
Definition 1.17 We say that two random variables
and & are independent if the -algebras theygenerate ` and `& are independent.In the probability space of three independent coin tosses, the price +2G of the stock at time F isindependent of
/Ul. This is because +2G depends on only the first two coin tosses, whereas /Ul is
either . or , depending on whether the third coin toss is ? or A .Definition 1.17 says that for independent random variables
and & , every set defined in terms of
is independent of every set defined in terms of & . In the case of +2G and UO just considered, for ex-ample, the sets U>+2G.5:+-,EZU>?MA(?M)*?MAPAZ and V /Ul Q.XWU>[email protected][email protected]?M)*?MA(?M)[email protected]?)APAP?Zare indepedent sets.
Suppose
and & are independent random variables. We defined earlier the measure induced by on to be ` #(xQ pU Z3)PV0 /RSimilarly, the measure induced by & is3Y N[xQ pUI&7MZ3)ZL0 /RNow the pair )& takes values in the plane G , and we can define the measure induced by thepair
` Y J pU )&/ Z:) 0 G RThe set
in this last equation is a subset of the plane G . In particular, could be a rectangle,
i.e, a set of the form \[] , where Q0 and g0 . In this case,U )&\[M/ZVU ZMUI&R/Z3)
CHAPTER 1. Introduction to Probability Theory 43
and
and & are independent if and only if` Y #\[]p MEU Z` UI&R/Z pU ZK pUI&LM/Z (5.2) h #[ ?Y )plRIn other words, for independent random variables
and & , the joint distribution represented by the
measure
Y factors into the product of the marginal distributions represented by the measures`and
?Y.
A joint density for )& is a nonnegative function Y #2) such that` Y #\[Mh Y #f) 5:( RNot every pair of random variables )w& has a joint density, but if a pair does, then the randomvariables
and & have marginal densities defined by #;J
Y #2)^5_^5) Y Y a`) ;b`5R
These have the properties ` #[ #;5:2)V0 /)?Y ) Y ; )
44
Definition 1.18 Let l) GI)ERIRIR be a sequence of random variables. We say that these random
variables are independent if for every sequence of sets M` l)* G M` G l)IRERIR and for everypositive integer , #H[[email protected] #l #[Gl2EI #PlR1.5.4 Correlation and independence
Theorem 5.8 If two random variables
and & are independent, and if and are functions from to , then H H&( H -I /H&l)provided all the expectations are defined.
PROOF: Let H#5p #5 and " be indicator functions. Then the equation we aretrying to prove becomes MIU ZaMUI&LM/Z pU jZ> pUI&7MZ3)which is true because
and & are independent. Now use the standard machine to get the result for
general functions and . The variance of a random variable
is defined to be
Var x G RThe covariance of two random variables
and & is defined to be
Cov )&x hg q I& p&5i &[9 E p&aRAccording to Theorem 5.8, for independent random variables, the covariance is zero. If
and &
both have positive variances, we define their correlation coefficient2 )&Px Cov )&jVar Var & R
For independent random variables, the correlation coefficient is zero.
Unfortunately, two random variables can have zero correlation and still not be independent. Con-sider the following example.
Example 1.16 Let
be a standard normal random variable, let d be independent of and havethe distribution pUkd7 KZj pUkd kKZ$ . Define & d . We show that & is also astandard normal random variable,
and & are uncorrelated, but and & are not independent.
The last claim is easy to see. If
and & were independent, so would be G and & G , but in fact, G b& G almost surely.
CHAPTER 1. Introduction to Probability Theory 45
We next check that & is standard normal. For M , we have pUI&7 Z pUI&7 and dbKZan6 pUI&7 and dbKZ pU and dbKZ`n0 pU3 and dbKZ pU Z> pUkdbZ|n1 pU3 Z> pUbd KZ F pU Zan F pU3 Z3RSince
is standard normal, pU Z( pU Q Z , and we have pUE& Z~Q pU Z ,
which shows that & is also standard normal.Being standard normal, both
and & have expected value zero. Therefore,
Cov )w&J &[ G dQ G l dbPl$$RWhere in G does the measure Y put its mass, i.e., what is the distribution of )w& ?We conclude this section with the observation that for independent random variables, the varianceof their sum is the sum of their variances. Indeed, if
and & are independent and dg n1& ,
then
Var )d[x Og#)d19 d( G i 9 n& 9 p& G i g 9 G n6F5 9 I&9 p&/2n&V9 p& G i Var 5n0F q &b9 p&n Var &/ Var 5n Var &lRThis argument extends to any finite number of random variables. If we are given independentrandom variables
l) GO)IRIRER) , thenVar 2n G"nEI*n 3` Var n Var G2nbEI*n Var :lR (5.3)
1.5.5 Independence and conditional expectation.
We now return to property (k) for conditional expectations, presented in the lecture dated October19, 1995. The property as stated there is taken from Williamss book, page 88; we shall need onlythe second assertion of the property:
(k) If a random variable
is independent of a -algebra Q , then Q/5 RThe point of this statement is that if
is independent of Q , then the best estimate of based on
the information in Q is , the same as the best estimate of based on no information.
46
To show this equality, we observe first that is Q -measurable, since it is not random. We mustalso check the partial averaging property : 3 for every QBQRIf
is an indicator of some set , which by assumption must be independent of Q , then the partialaveraging equation we must check is N5: V : RThe left-hand side of this equation is ( ) , and the right hand side is 3 Q El : #6BlRThe partial averaging equation holds because and are independent. The partial averagingequation for general
independent of Q follows by the standard machine.
1.5.6 Law of Large Numbers
There are two fundamental theorems about sequences of independent random variables. Here is thefirst one.
Theorem 5.9 (Law of Large Numbers) Let I) Gl)IRERIR be a sequence of independent, identically
distributed random variables, each with expected value and variance G . Define the sequence ofaverages & x n G nbIE*n )Pj%)F)IRERlRRThen & converges to almost surely as j .We are not going to give the proof of this theorem, but here is an argument which makes it plausible.We will use this argument later when developing stochastic calculus. The argument proceeds in twosteps. We first check that & for every . We next check that Var & [$ as q$ . Inother words, the random variables &5 are increasingly tightly distributed around as j .For the first step, we simply compute p&5/ [ 2n6 G"nIE*n1 E @ n6nEIn5m no p times bJRFor the second step, we first recall from (5.3) that the variance of the sum of independent randomvariables is the sum of their variances. Therefore,
Var &5J ^f Var ^b ^f G G G RAs j , we have Var &5J$ .
CHAPTER 1. Introduction to Probability Theory 47
1.5.7 Central Limit Theorem
The Law of Large Numbers is a bit boring because the limit is nonrandom. This is because thedenominator in the definition of &5 is so large that the variance of &; converges to zero. If we wantto prevent this, we should divide by
rather than . In particular, if we again have a sequence ofindependent, identically distributed random variables, each with expected value and variance G ,but now we set d|x 22n G 922nbEIln @2 )then each d has expected value zero and
Var )d|J ^*f Var ^[email protected] ^2 G G RAs W , the distributions of all the random variables dJ have the same degree of tightness, asmeasured by their variance, around their expected value $ . The Central Limit Theorem asserts thatas j , the distribution of d| approaches that of a normal random variable with mean (expectedvalue) zero and variance G . In other words, for every set rq0 ,y> pUkd Z~ F
As :2R
48
Chapter 2
Conditional Expectation
Please see Hulls book (Section 9.6.)
2.1 A Binomial Model for Stock Price Dynamics
Stock prices are assumed to follow this simple binomial model: The initial stock price during theperiod under study is denoted +-, . At each time step, the stock price either goes up by a factor of .or down by a factor of . It will be useful to visualize tossing a coin at each time step, and say thatD the stock price moves up by a factor of . if the coin comes out heads ( ? ), andD down by a factor of if it comes out tails ( A ).Note that we are not specifying the probability of heads here.
Consider a sequence of 3 tosses of the coin (See Fig. 2.1) The collection of all possible outcomes(i.e. sequences of tosses of length 3) isT VU>[email protected][email protected]?M)*[email protected]?MAP)?AP?M)*?YAPAP)[email protected]?)A[?W?Y)AP?MAP)AA(?M)AAPAZ3RA typical sequence of
Twill be denoted \ , and \a^ will denote the ] th element in the sequence \ .
We write +2^3d\a to denote the stock price at time ] (i.e. after ] tosses) under the outcome \ . Notethat +2^3d\a depends only on \PI)\`GI)ERIRER5)\`^ . Thus in the 3-coin-toss example we write for instance,+">d\`Zt+"Id\PI)e\|GE)\`clZt+NOd\Pl)+2G%d\`Zt+2GKd\PI)e\|GK)e\NclZt+fG>d\PI)e\|GlREach + ^ is a random variable defined on the set T . More precisely, let 8vu T . Then is a -algebra and T )w is a measurable space. Each +2^ is an -measurable function T 0 , that is,+ -^ is a function where is the Borel -algebra on IR. We will see later that +2^ is in fact
49
50
S3 (HHH) = u
S3
S3
S3 d S02(HHT) = u
S3
S3
S3
(HH) = u 22S S0
S0
1
1 =
=
2
2
2
2 3
3
3
3
3
3
=
=
=
=
=
=
=
=
=
=
S (T) = dS1
S (H) = uS1
0
0
S3 S03(TTT) = d
S03
d S02
d S02
(HTH) = u
(THH) = u
S02
S02
(HTT) = d u
S02
(TTH) = d
(THT) = d u
u
22S S0(TT) = d
2S (TH) = ud S
(HT) = ud S2S
0
0
Figure 2.1: A three coin period binomial model.
measurable under a sub- -algebra of . Recall that the Borel -algebra is the -algebra generatedby the open intervals of IR. In this course we will always deal with subsets of IR that belong to .For any random variable
defined on a sample space
Tand any , we will use the notation:U ZwtVUl\0 T }\N Z3R
The sets U Z3)lU r Z3)OU Z:) etc, are defined similarly. Similarly for any subset of ,we define U M/ZxtVUl\0 T }\NM/Z:RAssumption 2.1 . t t $ .2.2 Information
Definition 2.1 (Sets determined by the first ] tosses.) We say that a set q T is determined bythe first ] coin tosses if, knowing only the outcome of the first ] tosses, we can decide whether theoutcome of all tosses is in . In general we denote the collection of sets determined by the first ]tosses by ^ . It is easy to check that ^ is a -algebra.Note that the random variable +2^ is ^ -measurable, for each ]/%)F)IRERIR:)* .Example 2.1 In the 3 coin-toss example, the collection y{z of sets determined by the first toss consists of:
CHAPTER 2. Conditional Expectation 51
1. |3}~589358 ,2. |8 ~3388388 ,3. ,4. .
The collection y of sets determined by the first two tosses consists of:1. |3}} ~/9 ,2. |3} ~/358 ,3. | } ~33 ,4. |8_~h83A88 ,5. The complements of the above sets,
6. Any union of the above sets (including the complements),
7. and .Definition 2.2 (Information carried by a random variable.) Let
be a random variable
T 6 .We say that a set rq T is determined by the random variable if, knowing only the value d\`of the random variable, we can decide whether or not \ M . Another way of saying this is that forevery W , either - q or 2Mr . The collection of susbets of T determinedby
is a -algebra, which we call the -algebra generated by , and denote by J .If the random variable
takes finitely many different values, then J is generated by the collec-
tion of sets U - }\| \0 T Z these sets are called the atoms of the -algebra ` .In general, if
is a random variable
T 0 , then h is given byh HVU - ) jPZ3RExample 2.2 (Sets determined by ) The -algebra generated by consists of the following sets:
1. |3}} /9h#] 9 Xh ,2. |8_ h83A883h h b3. |3}|} h h b4. Complements of the above sets,
5. Any union of the above sets,
6. ,7. .
52
2.3 Conditional Expectation
In order to talk about conditional expectation, we need to introduce a probability measure on ourcoin-toss sample