38
Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied down by me during lectures and everyone makes mistakes when they do that. The fact that I had to type pretty fast to keep up with the lecturer didn’t help. So obviously don’t rely on these notes. If you do spot mistakes, I’m only too happy to fix them if you email me at [email protected] with a message about them. Messages of gratitude, chocolates and job offers will also be gratefully received. Whatever you do, don’t start using these notes instead of going to the lectures, because the lecturers don’t just write (and these notes are, or should be, a copy of what went on the blackboard) they talk as well, and they will explain the concepts and processes much, much better than these notes will. Also beware of using these notes at the expense of copying the stuff down yourself during lectures it really makes you concentrate and stops your mind wandering if you’re having to write the material down all the time. However, hopefully these notes should help in the following ways; you can catch up on material from the odd lecture you’re too ill/drunk/lazy to go to; you can find out in advance what’s coming up next time (if you’re that sort of person) and the general structure of the course; you can compare them with your current notes if you’re worried you’ve copied something down wrong or if you write so badly you can’t read your own handwriting. Although if there is a difference, it might not be your notes that are wrong! These notes were taken from the course lectured by Prof Grimmett in Lent 2010. If you get a different lecturer (increasingly likely as time goes on) the stuff may be rearranged or the concepts may be introduced in a different order, but hopefully the material should be pretty much the same. If they start to mess around with what goes in what course, you may have to start consulting the notes from other courses. And I won’t be updating these notes (beyond fixing mistakes) – I’ll be far too busy trying not to fail my second/third/th year courses. Good luck Mark Jackson Schedules These are the schedules for the year 2009/10, i.e. everything in these notes that was examinable in that year. The numbers in brackets after each topic give the subsection of these notes where that topic may be found, to help you look stuff up quickly. Basic concepts Classical probability (1.1), equally likely outcomes. Combinatorial analysis (1.2), permutations and combinations (1.3). Stirling’s formula (asymptotics for log n! proved) (1.3.3). Axiomatic approach Axioms (countable case). Probability spaces (2.1). Addition theorem, inclusion-exclusion formula (2.1.4). Boole and Bonferroni inequalities (2.1.4). Independence (2.3). Binomial (2.3.4), Poisson and geometric (2.3.5) distributions. Relation between Poisson and binomial distributions (3.1.4). Conditional probability (2.2), Bayes’s formula (2.2.4). Examples, including Simpson’s paradox (2.2.5).

Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

Probability This is a pure course.

Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

down by me during lectures and everyone makes mistakes when they do that. The fact that I had to type pretty fast to keep up with the lecturer didn’t help. So obviously don’t rely on these notes.

If you do spot mistakes, I’m only too happy to fix them if you email me at [email protected] with a message about them. Messages of gratitude, chocolates and job offers will also be gratefully received.

Whatever you do, don’t start using these notes instead of going to the lectures, because the lecturers don’t just write (and these notes are, or should be, a copy of what went on the blackboard) – they talk as well, and they will explain the concepts and processes much, much better than these notes will. Also beware of using these notes at the expense of copying the stuff down yourself during lectures – it really makes you concentrate and stops your mind wandering if you’re having to write the material down all the time. However, hopefully these notes should help in the following ways;

you can catch up on material from the odd lecture you’re too ill/drunk/lazy to go to;

you can find out in advance what’s coming up next time (if you’re that sort of person) and the general structure of the course;

you can compare them with your current notes if you’re worried you’ve copied something down wrong or if you write so badly you can’t read your own handwriting. Although if there is a difference, it might not be your notes that are wrong!

These notes were taken from the course lectured by Prof Grimmett in Lent 2010. If you get a different lecturer (increasingly likely as time goes on) the stuff may be rearranged or the concepts may be introduced in a different order, but hopefully the material should be pretty much the same. If they start to mess around with what goes in what course, you may have to start consulting the notes from other courses. And I won’t be updating these notes (beyond fixing mistakes) – I’ll be far too busy trying not to fail my second/third/ th year courses.

Good luck – Mark Jackson

Schedules These are the schedules for the year 2009/10, i.e. everything in these notes that was

examinable in that year. The numbers in brackets after each topic give the subsection of these notes where that topic may be found, to help you look stuff up quickly.

Basic concepts

Classical probability (1.1), equally likely outcomes. Combinatorial analysis (1.2), permutations and combinations (1.3). Stirling’s formula (asymptotics for log n! proved) (1.3.3).

Axiomatic approach

Axioms (countable case). Probability spaces (2.1). Addition theorem, inclusion-exclusion formula (2.1.4). Boole and Bonferroni inequalities (2.1.4). Independence (2.3). Binomial (2.3.4), Poisson and geometric (2.3.5) distributions. Relation between Poisson and binomial distributions (3.1.4). Conditional probability (2.2), Bayes’s formula (2.2.4). Examples, including Simpson’s paradox (2.2.5).

Page 2: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

Discrete random variables

Expectation (3.2). Functions of a random variable (3.1.2, 3.1.3), indicator function (3.5), variance, standard deviation (3.2.4). Covariance (3.4.2), independence of random variables (3.4). Generating functions (3.3): sums of independent random variables, random sum formula (3.6.3), moments (3.2.5).

Conditional expectation (3.6.1). Random walks: gambler’s ruin, recurrence relations (2.3.6). Difference equations and their solution (3.3.4). Mean time to absorption (3.8). Branching processes (3.7): generating functions (3.7.2) and extinction probability (3.7.4). Combinatorial applications of generating functions.

Continuous random variables

Distributions and density functions (4.1). Expectations (4.3); expectation of a function of a random variable (4.3.2). Uniform, normal and exponential random variables. Memoryless property of exponential distribution (4.1.2).

Joint distributions (4.4): transformation of random variables, examples (4.2, 4.5). Simulation: generating continuous random variables, independent normal random variables (4.4.4). Geometrical probability: Bertrand’s paradox (6.1), Buffon’s needle (6.2). Correlation coefficient (3.4.2), bivariate normal random variables (4.6).

Inequalities and limits

Markov’s inequality, Chebyshev’s inequality (5.2). Weak law of large numbers (5.3). Convexity: Jensen’s inequality (5.1), AM/GM inequality (5.1.2).

Moment generating functions (7.2). Statement of central limit theorem (7.1) and sketch of proof (7.3). Examples, including sampling (7.3).

Contents 1. Basic concepts ..................................................................................................................................... 3

1.1 Sample space ............................................................................................................................. 3 1.2 Combinatorial probability ............................................................................................................. 3 1.3 Permutations and combinations ................................................................................................... 4

2. Probability spaces ............................................................................................................................... 5 2.1 Introduction .................................................................................................................................. 5 2.2 Conditional probability ................................................................................................................. 8 2.3 Independence ............................................................................................................................... 9

3. Discrete random variables ................................................................................................................ 11 3.1 Random variables ........................................................................................................................ 11 3.2 Expectations of discrete random variables ................................................................................. 12 3.3 Probability generating functions ................................................................................................. 15 3.4 Independent discrete random variables ..................................................................................... 16 3.5 Indicator functions ...................................................................................................................... 18 3.6 Joint distributions and conditional expectations ........................................................................ 18 3.7 Branching process ....................................................................................................................... 19 3.8 Random walk (again)................................................................................................................... 21

4. Continuous random variables ........................................................................................................... 22 4.1 Density functions ........................................................................................................................ 22 4.2 Changes of variables ................................................................................................................... 23 4.3 Expectation ................................................................................................................................. 23

5. Three very useful results ................................................................................................................... 24 5.1 Jensen’s inequality ...................................................................................................................... 24

Page 3: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

5.2 Chebyshev’s inequality ............................................................................................................... 26 5.3 Law of large numbers .................................................................................................................. 26 4.4 Families of random variables ...................................................................................................... 27 4.5 Change of variable ...................................................................................................................... 28 4.6 Bivariate (or multivariate) normal distribution........................................................................... 29

6. Geometrical probability .................................................................................................................... 31 6.1 Bertrand’s paradox ..................................................................................................................... 31 6.2 Buffon’s needle ........................................................................................................................... 31 6.3 Broken stick ................................................................................................................................. 32

7. Central limit theorem ........................................................................................................................ 32 7.1 Central limit theorem .................................................................................................................. 32 7.2 Moment generating functions .................................................................................................... 33 7.3 Proof of the central limit theorem .............................................................................................. 34 7.4 Further applications of the central limit theorem and moment generating functions .............. 34

8. Convergence of random variables (non-examinable?) ..................................................................... 35 8.2 Almost-sure convergence ........................................................................................................... 37 8.3 Strong law of large numbers ....................................................................................................... 37

Notes from final lecture ........................................................................................................................ 38

1. Basic concepts

1.1 Sample space

Experiment – something with an uncertain outcome, e.g.

tossing a coin

throwing a die

spinning a roulette wheel

a lottery machine selecting a combination -subsets of or something

spinning a pointer

The sample space is the set of all possible outcomes.

A subset of is called an event. E.g. in the above sample spaces, the following are all events;

, the event that a head is thrown

, the event that a prime is thrown on a die

, the event that an even number comes up on a roulette wheel

to to to the event you get a run on the lottery

the event that the time is o’clock.

is called an elementary event.

If occurs, and , we say “ has occurred”.

Here is a dictionary between set theory and probability.

either or occurred did not occur

both and occurred when occurs, occurs

occurred but not are equivalent

mutually exclusive (both cannot occur).

1.2 Combinatorial probability Let be finite and . Define

Page 4: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

If the s have equal probability, then .

Example. A hand of cards from a pack of is dealt at random. What is the probability that it contains (i) exactly one ace (ii) exactly one ace and two kings?

Example. From a table of random integers, pick the first . Then . Assume each element in is equiprobable. What is the probability that (i) no digit exceeds (ii) is the greatest digit?

since, if we call the greatest digit, (i) is asking for and (ii) is asking for .

1.3 Permutations and combinations

1.3.1 Definitions

Definition. A permutation is the number of ways of choosing an ordered sequence of size from a set of size (e.g. football teams with positions). We write this as

Definition. A combination is the same, but the sequence is unordered, we write this as . , thus

1.3.2 Examples

Example. An urn contains blue balls and red balls. What is the probability that the first red ball picked (without replacement) is the th overall?

The number of outcomes after all balls are chosen is

The number of outcomes of the form

is

and the answer is .

Example. keys are picked at random in attempts to open one lock. What is the probability that the th key opens the lock?

Sampling with replacement, let be the number of keys tried (including the successful one).

for small , large .

Sampling without replacement,

Page 5: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

1.3.3 Stirling’s formula

Stirling’s formula states that

where we say that if as .

Proof. We prove the logarithmic version;

by comparing columns. Hence

since the LHS and RHS both as .

Example. A fair coin is tossed repeatedly; what is the probability that, after tosses, the number of heads equals the number of tails?

The answer is

by using Stirling’s formula.

2. Probability spaces

2.1 Introduction A probability space consists of three component objects; a sample space , a collection of

events, and a probability function.

2.1.1 Event spaces

An ‘event space’ is the power set of , denoted or , which is the set of all subsets of . If is finite, or countably infinite, we usually take for the event space. However if is uncountable, is too big.

Theorem (Banach, Kuratowski). Let be uncountable. There is no with countably additive, , and .

Proof. Uses the continuum hypothesis.

It is a reasonable statement that if are events then so are , and . Therefore;

Definition. An event space is a collection of subsets of such that

a)

b) If then ( is closed under countable unions)

c) If then ( is closed under complementation)

It is also called a ‘ -field’ or ‘ -algebra’.

Page 6: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

Notes.

1) by (a) and (c)

2) Finite unions lie in , since

where .

3) so

so is closed under countable intersection.

4) if . Similarly with .

5) Property (a) is equivalent to saying that is non-empty, since

2.1.2 Probability measures

Definition. Let be a set and be an event space in . A probability measure on is a function such that

a)

b) and

c) if and are disjoint ( for ) then

i.e. has countable additivity.

Notes. If is a probability measure then

a) is finitely additive

b) follows from other axioms, since so

c) ‘is equal to’ the probability of

2.1.3 Definition of probability spaces

Definition. A probability space is a triple where is an event space of and is a probability measure on .

Examples. 1) The Bernoulli distribution, equivalent to a coin toss, where , and . We then have

2) , , where the satisfy and .

3) Same as 2), but with

4) and

Page 7: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

for and . This is the Poisson distribution.

2.1.4 Some handy theorems and inequalities

Theorem 2.1. Some basic properties; if , etc... then

a)

b) If then

c)

Proof (c). is a disjoint union. Also, is a disjoint union. Then and and the result follows.

A Venn diagram is (often) useful.

Theorem 2.2 (Inclusion-Exclusion Principle). For ,

Proof. By induction on ,

Note. Often easier to calculate than .

Proposition 2.3 (Boole’s Inequality). If , then

Proof. Induction on . (Also valid if ; proof later.)

Proposition 2.4 (Bonferroni’s Inequality). If and is even,

If is odd, then the inequality changes to a .

Proof. By induction.

2.1.5 The example of the mad porter

Example (derangements). After dinner, porter hands hats back randomly to the guests, one each. What is the probability that no-one receives the correct hat?

Let and . Let . What is

?

Take distinct people . Then

By the Inclusion-Exclusion Principle,

Page 8: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

Therefore the answer to the question is as .

Let be . Then

We deduce that the number of correctly hatted guests converges, as , to the Poisson distribution with parameter .

2.2 Conditional probability

2.2.1 Introduction

The event has probability . We discover that has occurred. What is now? It must be , but what is ?

If , then , so , so , so . Thus;

Definition. The conditional probability of given is

whenever .

Theorem 2.5. If and , then .

Proof. a disjoint union. So and use the definition.

Theorem 2.6 (more general). Let be a partition of with . Then

2.2.2 Application to ‘two-stage calculation’

Toss a fair coin. If heads, throw one die; if tails, throw two dice. What is the probability that the total shown is ?

Let , and . Then

2.2.3 Properties of conditional probability

a)

b)

(c)

(d)

Page 9: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

2.2.4 Bayes’ formula

Theorem 2.7 (Bayes’ formula). Let partition with . Then

Proof.

A typical application to ‘real life’ is in medical diagnosis, for example if are types of disease and is the symptoms. Then the doctor must compute from and .

Example (false positives). A rare disease affects in people. A test correctly identifies the disease of the time, and wrongly identifies the disease of the time. If we let and , then by Bayes’ formula, so the test is almost worthless!

2.2.5 Simpson’s paradox

There are two treatments for kidney stones; open surgery (OS) and percutaneous nephrolithotomy (PN).

Treatment Big stones Small stones Overall

OS

PN

So OS beats PN in both sub-categories, but PN beats OS overall!

Mathematically, if we let success, PN, OS, small and large then the following are not inconsistent;

2.3 Independence

2.3.1 Definition

Definition. Events are independent if . More generally, a collection of events is independent if

Definition. is pairwise independent if .

Note that independence implies pairwise independence, but the converse is not true.

E.g. , , with , , is pairwise independent but not independent.

2.3.2 ‘Independence’ and ‘repeated trials’

E.g. Two dice thrown, and each of the possible outcomes has probability . Let be an attribute of the number on the first die, and of the number on the second die. Then

Page 10: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

2.3.3 Product spaces

Let and be two probability spaces, with

Let , and be something appropriate, then such that

is a probability measure. This is because if and , then

2.3.4 The binomial distribution

By convention, coin tosses, die throws etc. have independent outcomes. E.g. flips of a coin which comes up heads with probability each time. Let be the number of heads, then

for . This is the binomial distribution.

Another way of reaching the answer is as follows; let be the outcome of the first toss. Then

This technique is called recursion.

2.3.5 The geometric distribution

Flip coins and let be the number of coins until the first head. Then

Alternatively, . Careful.

2.3.6 Random walk

A particle inhabits . At each stage it takes a step. The steps lie in with probabilities , . The steps are independent of one another, like coin tosses.

Gambler’s ruin problem; A gambler’s fortune at stage is the th position of the random walk. What is the probability that the gambler finishes bankrupt, i.e. the random walk hits before having started at , where ?

Let and . Then

which is a difference equation, with boundary conditions and .

Try , then , so , so .

Page 11: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

General solution is

Feeding in the boundary conditions we get

is always a solution in these sorts of things. The recurrence relation is

3. Discrete random variables

3.1 Random variables

3.1.1 Definition

Definition. Let be a probability space. A real-valued random variable on this space is a function .

Note. In this, for simplicity, assume is countable and

. For uncountable there is an extra condition (see later).

Examples. (i) Toss a coin twice; number of heads.

(ii) Throw a die thrice; largest number that comes up.

3.1.2 Distribution function

Definition. The distribution function of is the function given by

Note. may be written to emphasise the role of .

Example. Toss a coin once, so that , and . Let be the outcome and let . Then

3.1.3 Probability function

Definition. The probability (mass) function of is the function (or ) given by , .

Note. We use the mass function rather than the distribution function when is discrete, i.e. either finite or countably infinite such that . This is not always ‘jumpy’.

Examples. (i) The Bernoulli distribution has , .

(ii) The binomial distribution has

(iii) The Poisson distribution has

Page 12: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

3.1.4 Binomial-Poisson limit theorem

Example (misprints). An edition of the Grauniad has characters and each character is mis-set with some probability . Let be the total number of misteaks .

Take, for example, and . What is the approximate distribution of ? The answer is because of the

Theorem 3.1 (Binomial-Poisson limit theorem). If , in such a way that remains constant, then

3.1.5 More examples

Examples. (iv) The geometric distribution has

(v) The negative binomial distribution has and and is defined to be the number of coin tosses until the appearance of the th head. Then

[Note that

where is defined to be

for .]

Why is it called the ‘negative binomial’?

where and

3.2 Expectations of discrete random variables

3.2.1 Definition

Let be a probability space and be a discrete random variable.

Definition. The expectation (or mean, expected value) of is

wherever the sum converges absolutely.

(All random variables in this section are assumed to be discrete.)

Page 13: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

3.2.2 Composition of functions

Theorem 3.2 (Law of the unconscious statistician). Suppose and . Then

whenever this expectation exists.

Proof. Let . Then

By convention, capital letters denote random variables whereas small letters denote their possible values.

3.2.3 Properties of expectation

1) If , i.e. , then .

2) If and then .

Proof.

3) for

Proof.

4)

Proof.

3.2.4 Variance

is a measure of the ‘centre’ of a distribution, whereas the variance is a measure of the ‘dispersion’.

Definition. The variance of is and the standard deviation is

. We write for the variance, for the standard deviation and for the expectation.

Notes. (i) – non-linearity.

(ii)

, since

Page 14: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

Take care with the parentheses!

3.2.5 Moments

Definition. The th moment of is .

Notes. (i) and

(ii)

(iii) iff .

3.2.6 Examples for various distributions

1) The Bernoulli distribution

, , so and , so where .

Take , ( ). Let be

the indicator function of . Then , and .

Indicator functions obey the rules and .

2) The binomial distribution has

because

3) The Poisson distribution has

4) The geometric distribution has

Now

Page 15: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

3.3 Probability generating functions

3.3.1 Definition

Definition. Let be a random variable taking values in . Its probability generating function is the function given by

whenever the sum converges absolutely. (The probability generating function can be thought of as a transform; remember Fourier.)

Notes. (i) The sum converges on . We often restrict the domain to .

(ii) We write to emphasise the dependence on .

(iii) and .

Theorem 3.3. The distribution of is uniquely determined by its probability generating function .

‘Proof’. . . , and progressive

differentiation of at gives the probabilities .

3.3.2 Reasons for using probability generating functions

1) They are an elegant method for dealing with sums of independent random variables (later).

2) They are a good method for calculating moments;

. ( may be on edge of domain of convergence. Abel’s Lemma validates this statement.)

And similarly, and .

.

3.3.3 Examples of probability generating functions

Distribution probability generating function

3.3.4 Application of generating functions to difference equations

An bathroom wall is to be tiled with tiles. In how many ways can this be done?

Let be the number of ways. Then , and . Multiply and sum;

Page 16: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

Let

Then , so

where and .

3.4 Independent discrete random variables

3.4.1 Joint mass function

Definition. Let be discrete random variables. The joint mass function of the pair is given by .

We say that and are independent if . That is, the joint mass function factorises as the product of the ‘marginal’ mass functions.

Note. independent . This is because

3.4.2 Covariance and correlation

Definition. The covariance of and is and the

correlation (coefficient) is

if and .

and are uncorrelated if .

Note. .

Theorem 3.4. (a) If are independent,

(b) If are independent, .

(c) There exist random variables which are uncorrelated but not independent.

Proof. (a)

(b) , .

Page 17: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

(c) Let be independent with distribution . Let , . Then , but . (Exercise.)

3.4.3 Correlation as a measure of dependence

(a) Correlation is a single number.

(b) .

Theorem 3.5 (Schwarz’s inequality).

.

Proof. Let , . Then . So this quadratic has one or no real roots, so the discriminant is , therefore

(c) iff for some , i.e. iff for some

, with [exercise] . Similarly,

(d) iff for some , . ( is undefined if .)

(e) if independent and .

(f)

if . I.e. the correlation is unchanged by scaling and moving the origin.

3.4.4 Three random theorems on independence

Theorem 3.6. (a) .

(b) If independent, .

Proof. (a)

.

(b) Since are assumed independent, their covariance is .

Examples. (a) The binomial distribution . The variance Bernoulli distribution .

(b) Negative binomial distribution with parameters . Variance

.

Theorem 3.7. , ,

Proof.

but this is a disjoint union.

Corollary. If and are independent,

This is called a convolution of and , written .

Theorem 3.8. If and are independent (and take values in ) then .

Proof. .

Page 18: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

Example. Let be and be , with independent. What is the distribution of

? , therefore is .

Example. What is the probability generating function of the negative binomial distribution with parameters ? Answer sum of independent random variables, say where and the s are independent. So

3.5 Indicator functions Let , then if and if . Sometimes write .

Note: .

.

.

.

Converting to probabilities, .

Example.

Multiplying out and taking expectations gives the inclusion-exclusion formula.

Example. ( ) married couples are seated round a table with the wives randomly in the odd seats and the husbands in the even seats. Let be the number of husbands sitting by their wives. Calculate and .

Let th couple are seated together . Then

3.6 Joint distributions and conditional expectations

3.6.1 Definitions

If are discrete random variables, then the joint mass function .

The marginal mass function , so .

The conditional mass function of given is

Page 19: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

The conditional expectation of given that is

We normally say the conditional expectation of given is . This is a random variable.

3.6.2 Example and theorems

Example. are independent and identically have the distribution . Let . What is ?

Solution 1.

Solution 2. using the rule that .

Theorem 3.9. (a) If are independent, then .

(b) .

Proof. (b)

3.6.3 Random sum formula

if . What about when we have a random number of random variables?

Theorem 3.10 (Random sum formula). Let be independent, taking values in such that the are identically distributed with probability generating function . The

‘random sum’ has probability generating function .

Proof.

Theorem 3.11. In notation of Theorem 3.10, .

Proof.

.

Exercise. Compute in terms of moments of and .

3.7 Branching process

3.7.1 Definition

This is a basic model for population/bacterial/etc... growth. At generation , there is some number of individuals. Assume

a)

Page 20: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

b) is a random variable, the number of offspring of the progenitor, i.e. it has probability mass function ;

c) each individual in the system has a family of offspring with same distribution as ;

d) crudely speaking, all family sizes are independent.

Then

where

is the number of offspring of the th member of the

th generation.

3.7.2 Relation to probability generating functions

Let probability generating function of .

Theorem 3.12. where , and similarly . Hence

.

Theorem 3.13. Let and . Then and

if and if .

Proof. . so

. Therefore

. Similarly for the variance (exercise).

Example where has a closed form. Let where and . Then if , and . By induction,

is the coefficient of in the Taylor expansion of .

3.7.3 Extinction (and some general theory)

When , this becomes if and if .

Let . Then . Therefore

How do you relate to the ?

Theorem 3.14 (probability measures are continuous set functions). If are events with , then

Proof. Let and . Then . So

3.7.4 Value of the probability of extinction

Let by continuity of

Theorem 3.15. is the smallest non-negative root of the equation .

Page 21: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

Proof. Let so that

Then . Since and

since is continuous on .

Let be a non-negative root of . Then since is non-decreasing on .

. Therefore , so .

Theorem 3.16. (a) If then .

(b) If then .

(c) If and then .

Proof.

therefore is a convex function.

(a) . If then is the only root of in . So .

(b) If , there is a second root of in . So .

(c) If and , then so .

3.8 Random walk (again)

Three types of barrier; ‘absorbing’ (you die when you hit the barriers), ‘reflecting’ (you bounce off), ‘retaining’ (you can’t move past the barriers but don’t bounce off).

Take a random walk on with absorbing barriers at and . Let be the number of steps taken until absorption and start at .

Thus

General solution

13

Page 22: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

Particular solution

Values of , can be calculated from the boundary conditions. Hence

4. Continuous random variables

4.1 Density functions

4.1.1 Definition and notes

Probability space , random variable . Distribution function .

Definition. The random variable is called continuous (misnomer) if there exists such that

a)

b)

.

If this holds, is called the (probability) density function of , sometimes written .

(i) If is differentiable, then we take .

(ii) If has pdf , then .

(iii)

.

(iv) pdf’s satisfy ,

.

(v) element of probability is

In particular

.

4.1.2 Examples of density functions

Uniform distribution on , Unif .

Exponential distribution Exp

Distribution function

Page 23: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

Exponential distribution is ‘memoryless’ (lack-of-memory property). For ,

Exercise; prove that the exponential distribution is the only continuous distribution with the memoryless property.

Normal distribution (or Gaussian distribution) has density function

More generally, has density function

is changed by location and scale.

4.2 Changes of variables

is a random variable, . What is the distribution of ? I.e.

If has probability density function , and is strictly increasing and differentiable,

Example. Let (i.e. has uniform distribution on ). Let . What is the distribution of ?

Therefore has distribution .

Example. Let , , i.e. where .

Therefore .

Example. Let , and let be a continuous distribution function. Let .

Therefore has distribution function . A key fact in Monte Carlo methods.

What if has flat sections?

4.3 Expectation

Recall. discrete, then

Page 24: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

4.3.1 Continuous expectation

Definition. The expectation of the continuous random variable is

whenever this integral is absolutely convergent (i.e. ).

Note. This expectation has same general properties as that of discrete random variables. E.g. linearity, mean, variance, moments, covariance, correlation, etc.

4.3.2 Expectation of functions

Theorem 4.1. If has probability density function , and ,

Proposition 4.2. If is a continuous random variable then

Note. can be taken as a definition of that does not depend on the type of (discrete, continuous, etc.) .

Proof (4.2).

Therefore (1)-(2)=

.

Proof (4.1).

5. Three very useful results

5.1 Jensen’s inequality

5.1.1 Convex and concave functions

Definition. A function is convex if .

A function is concave if – is convex.

Examples. ,

Page 25: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

Fact. If exists on and then is convex.

5.1.2 Jensen’s inequality

Theorem 5.1 (Jensen’s inequality). Let be a random variable taking

values in and let be convex on . Then .

Example (AM-GM inequality). Let with and . By Jensen’s inequality;

Notes. 1) Equality holds in JI iff is constant (with probability ).

2) unless is a constant random variable.

Lemma 5.2 If is convex on then

with , , .

Proof. Induction on . OK for by definition of convexity. Assume OK for , then

5.1.3 Sketch proof of general Jensen’s inequality

Theorem 5.3 (Supporting hyperplane theorem). is convex on iff , .

Proof of Jensen’s inequality from supporting hyperplane theorem is as follows;

Set and choose accordingly. Then

.

Theorem 5.4 (Separating hyperplane theorem). If , there exists a line with beneath and the curve above (strictly).

Proof. distance from to . is a continuous function of , as strides? on curve. has a

minimum; with on curve. Take

perpendicular bisector of line from to .

Proof of supporting hyperplane theorem;

Page 26: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

Find points with . Let , . Find separating hyperplane separating from the curve. Hence line which is easily shown to be supporting.

5.2 Chebyshev’s inequality

If is small, then is near . In what way?

Theorem 5.5 (Markov’s inequality). If exists then

Proof. Let . Then , so .

Theorem 5.6 (Chebyshev’s inequality).

Proof. Using Markov’s inequality

Often the following is very useful

Large deviation theory.

5.3 Law of large numbers

5.3.1 Law of large numbers

Theorem 5.7. Let be independent and identically distributed with finite variance and mean . Let

. Then

(a) as (mean square law of large numbers)

(b) as (weak law of large numbers)

Proof. (a) . Then

(b) By Chebyshev,

5.3.2 Principle of repeated experimentation

Repeat an experiment “independence” and each time observe whether or not occurs. occurs at the experiment. Number of occurences of up to time is

. Proportion is

Page 27: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

4.4 Families of random variables

4.4.1 Joint functions

a vector of random variables on

Joint distribution function.

with , and . If

then we call the joint pdf of . If we can, we take

Note. Let and suppose is jointly continuous with joint pdf

4.4.2 Marginal functions

The marginal distribution function of is

The marginal density function is

4.4.3 “Element of probability”

is

4.4.4 Independence

Generally, and are independent if for ,

and are independent .

If is continuous (has a joint pdf) then and are independent iff

If and are independent then and hence

if are continuous and independent.

4.4.5 Conditional pdf of given

Some people write the LHS as assuming undefined as but means take the limit as .

Page 28: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

4.4.6 Conditional expectation of given

where

Theorem 4.3.

4.5 Change of variable

Example. Let be a random point in , with joint probability density function . Let

and . What is the joint probability density function of ?

4.5.1 General solution

General question; have joint pdf . , . where What is

Define by so .

Need some invertibility of .

Let . Assume is invertible on .

Let and . Then

where and is the modulus of the Jacobian determinant

Therefore for .

4.5.2 Examples

Example. Let be independent with distribution . Let

What is ?

Solution.

Inverse: , , , .

. is invertible. The Jacobian is

Therefore for and ,

which can be written in the form . Therefore are independent.

Page 29: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

has pdf on .

has pdf on so has distribution .

Example. Let be independent with , , . Then

, ,

are independent

on . This distribution has spherical symmetry. A function has spherical symmetry if it is equal to for some .

4.6 Bivariate (or multivariate) normal distribution

Recall

Note the elementary fact that

Now iff is .

4.6.1 Definition and expectation

Bivariate normal distribution is of the form

where is a ‘quadratic form’ in .

Usual expectation is, for ,

where , , .

Usually write

where and and

4.6.2 The normalised random variables and

Example. Let have that joint pdf,

Page 30: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

‘Completing the square’ gives

Thus .

has distribution

Note.

where

which is not a coincidence. We can prove that is the covariance;

4.6.3 Important properties

1) Two random variables with the bivariate normal distribution are independent iff their correlation is .

because the cross-product is in , and hence .

2) If is bivariate normal then is univariate normal. ‘Linearisation retains normality.’

4.6.4 Covariance matrix

Vector of random variables. The mean vector

. The covariance matrix is

4.6.5 Multivariate normal distribution (not examinable)

Definition. has the multivariate normal distribution if

for . It may be shown that and the covariance matrix of is .

Alternative definition. The vector is said to have the multivariate normal distribution whenever;

has a (univariate) normal distribution.

Page 31: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

6. Geometrical probability

6.1 Bertrand’s paradox

A chord of the unit circle is picked at random. What is the probability that an equilateral triangle based on the chord fits within the circle?

We formalise the problem as follows; Define to be the distance from the centre to the chord. What is the probability that ?

(a) Assume is . Then .

(b) Assume is . Then .

(c) Pick point at random (uniformly) in the unit ball, draw the chord with as midpoint. Then

(d) Pick two points uniformly at random on the circle, and join them by the chord. Answer turns out to be .

6.2 Buffon’s needle

6.2.1 Buffon’s needle

Rule the plane with parallel lines distance apart. Drop a needle of unit length at random on the table. What is the probability that the needle intersects some line?

The coordinates of the centre of the needle are the random variable . The inclination to the horizontal is random. Assume is and is and are independent.

for , .

An intersection occurs if either

or

.

where

Hence may be estimated by repeatedly throwing the needle.

Note: ‘Buffon cross’ gives a much faster estimate.

Buffon’s needle with length on lines of distance apart;

Page 32: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

6.2.2 Buffon’s noodle

Drop a noodle of length on a table with ruled lines distance apart. What is the mean number of intersections of the lines with the noodle?

Number of intersections

The variance of the number of intersections depends heavily on the shape of the noodle. E.g. the probabilities are distributed differently for a tightly coiled noodle.

6.2.3 Buffon’s needle ( ) versus cross

Let be the number of intersections in one throw of the needle. Then

Let be the number of intersections in one throw of the cross. Then and

So it is better to use the cross than the needle. This is an example of the ‘technique of variance reduction’ which is useful in computation via simulation.

Example.

Find a principle reduction technique!

6.3 Broken stick A unit stick is broken in two places chosen

uniformly at random on , independently of each other. What is the probability that the three pairs can be

used to make a triangle?

Note that , and .

Triangle can be made if , , .

, , , or

, , , .

.

Generalisation to breaks in a stick;

7. Central limit theorem

7.1 Central limit theorem Let be independent and identically distributed random variables with mean and

variance . Let .

Page 33: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

Law of large numbers states that . The central limit theorem states that

, where .

Theorem 7.1. Let be independent and identically distributed random variables with and variance . Let

. Then

i.e. is ‘asymptotically normal ’ and we write

weak convergence. If we apply this looking at density functions, it has another condition.

7.2 Moment generating functions Definition. The moment generating function of the random variable is

wherever this is finite.

Note. If takes values in then the probability generating function is so .

Examples. (a) . Then

(b) . Then

by completing the square. Now

, so we end up with

( ).

(c) Cauchy distribution

infinite variance.

Vital properties of moment generating functions;

(a) Uniqueness

If on a neighbourhood of the origin, then there is a unique distribution with moment generating function .

(b)

is the (exponential) generating function of the moments , for .

(c)

(d) if are independent.

Page 34: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

(e) Continuity theorem. If are random variables with for all in some

neighbourhood of , then for all at which is continuous.

7.3 Proof of the central limit theorem WLOG, and , i.e. write

which is the moment generating function of the distribution. The result follows by the continuity theorem.

Example. An unknown function of voters have decided to vote Labour in the next election. It is desired to estimate by asking a sample of people. We want an error not exceeding . How large a sample should I approach?

Assume a sample of size , that their answers are independent, and that each sample is a Labour voter with probability .

Let be if the th person asked says Labour and otherwise. Recall that , and

estimate by . Then

Let us require that LHS . Then as ,

by the central limit theorem

If (from tables) then this will be as required. Thus for the population of Britain we should take roughly .

7.4 Further applications of the central limit theorem and moment

generating functions a) Let ,

. Then

E.g. for ;

Page 35: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

(b) Let , so that by probability generating functions.

Taking ,

(c) Binomial-Poisson limit theorem

Therefore converges to the Poisson distribution (in the sense of weak convergence).

(d) Let be independent with . Define

Using the rule that

which is the moment generating function of .

8. Convergence of random variables (non-examinable?) Sequence of random variables, and another one .

Definitions. (a) in mean square if as .

(b) in probability if ,

(c) in distribution if as , for all at which is continuous.

Page 36: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

Theorem 8.1. Mean square probability distribution (where means that if in style , it also converges in style ). The converse statements are false.

Proof (ms prob). The Chebyshev inequality gives

Proof (prob ms). Let

so with probability. However,

Proof (prob dist).

Let be continuous at . Let and . Then .

Proof (dist prob). Let be and let

, are both , so has distribution so in distribution. However in probability, because, if even, .

Example. with probability . Then .

If with probability , then .

Theorem 8.2. If in distribution for some constant , then in probability.

Proof. Assume in distribution which means

Proposition 8.3. in probability iff

Proof.

Page 37: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

because is strictly increasing on .

by Markov’s inequality.

If RHS then LHS . I.e.

Assume in probability. Then

Let to obtain

8.2 Almost-sure convergence a sequence of random variables, a random variable, the probability space,

an event. Let

Then almost surely if . Written “a.s.”, “a.c.”, “wp1” “with probability ”

Theorem 8.4. Almost-sure convergence implies probability convergence.

Proof. Let . Then .

Recall definition of convergence; if , . Now let

Note; is decreasing in . Define

,

(by continuity of

, since

in probability.

8.3 Strong law of large numbers Let be independent and identically distributed with . Then

and

almost surely as .

Page 38: Probability - WordPress.com · Probability This is a pure course. Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied

Proof. Much harder.

Notes from final lecture probability space, random variable

, .

gives rise to a mass function if discrete with , or a pdf if continuous with .

gives a joint mass function with , or in the continuous case we get .

independent iff the joint distribution factorises.

Recall

Example – bivariate normal distribution.

For in the continuous case we have

the conditional probability density function.

Let . The conditional expectation is defined to be so it is a

random variable with the handy property that .