Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Probability This is a pure course.
Copier’s Message These notes may contain errors. In fact, they almost certainly do since they were just copied
down by me during lectures and everyone makes mistakes when they do that. The fact that I had to type pretty fast to keep up with the lecturer didn’t help. So obviously don’t rely on these notes.
If you do spot mistakes, I’m only too happy to fix them if you email me at [email protected] with a message about them. Messages of gratitude, chocolates and job offers will also be gratefully received.
Whatever you do, don’t start using these notes instead of going to the lectures, because the lecturers don’t just write (and these notes are, or should be, a copy of what went on the blackboard) – they talk as well, and they will explain the concepts and processes much, much better than these notes will. Also beware of using these notes at the expense of copying the stuff down yourself during lectures – it really makes you concentrate and stops your mind wandering if you’re having to write the material down all the time. However, hopefully these notes should help in the following ways;
you can catch up on material from the odd lecture you’re too ill/drunk/lazy to go to;
you can find out in advance what’s coming up next time (if you’re that sort of person) and the general structure of the course;
you can compare them with your current notes if you’re worried you’ve copied something down wrong or if you write so badly you can’t read your own handwriting. Although if there is a difference, it might not be your notes that are wrong!
These notes were taken from the course lectured by Prof Grimmett in Lent 2010. If you get a different lecturer (increasingly likely as time goes on) the stuff may be rearranged or the concepts may be introduced in a different order, but hopefully the material should be pretty much the same. If they start to mess around with what goes in what course, you may have to start consulting the notes from other courses. And I won’t be updating these notes (beyond fixing mistakes) – I’ll be far too busy trying not to fail my second/third/ th year courses.
Good luck – Mark Jackson
Schedules These are the schedules for the year 2009/10, i.e. everything in these notes that was
examinable in that year. The numbers in brackets after each topic give the subsection of these notes where that topic may be found, to help you look stuff up quickly.
Basic concepts
Classical probability (1.1), equally likely outcomes. Combinatorial analysis (1.2), permutations and combinations (1.3). Stirling’s formula (asymptotics for log n! proved) (1.3.3).
Axiomatic approach
Axioms (countable case). Probability spaces (2.1). Addition theorem, inclusion-exclusion formula (2.1.4). Boole and Bonferroni inequalities (2.1.4). Independence (2.3). Binomial (2.3.4), Poisson and geometric (2.3.5) distributions. Relation between Poisson and binomial distributions (3.1.4). Conditional probability (2.2), Bayes’s formula (2.2.4). Examples, including Simpson’s paradox (2.2.5).
Discrete random variables
Expectation (3.2). Functions of a random variable (3.1.2, 3.1.3), indicator function (3.5), variance, standard deviation (3.2.4). Covariance (3.4.2), independence of random variables (3.4). Generating functions (3.3): sums of independent random variables, random sum formula (3.6.3), moments (3.2.5).
Conditional expectation (3.6.1). Random walks: gambler’s ruin, recurrence relations (2.3.6). Difference equations and their solution (3.3.4). Mean time to absorption (3.8). Branching processes (3.7): generating functions (3.7.2) and extinction probability (3.7.4). Combinatorial applications of generating functions.
Continuous random variables
Distributions and density functions (4.1). Expectations (4.3); expectation of a function of a random variable (4.3.2). Uniform, normal and exponential random variables. Memoryless property of exponential distribution (4.1.2).
Joint distributions (4.4): transformation of random variables, examples (4.2, 4.5). Simulation: generating continuous random variables, independent normal random variables (4.4.4). Geometrical probability: Bertrand’s paradox (6.1), Buffon’s needle (6.2). Correlation coefficient (3.4.2), bivariate normal random variables (4.6).
Inequalities and limits
Markov’s inequality, Chebyshev’s inequality (5.2). Weak law of large numbers (5.3). Convexity: Jensen’s inequality (5.1), AM/GM inequality (5.1.2).
Moment generating functions (7.2). Statement of central limit theorem (7.1) and sketch of proof (7.3). Examples, including sampling (7.3).
Contents 1. Basic concepts ..................................................................................................................................... 3
1.1 Sample space ............................................................................................................................. 3 1.2 Combinatorial probability ............................................................................................................. 3 1.3 Permutations and combinations ................................................................................................... 4
2. Probability spaces ............................................................................................................................... 5 2.1 Introduction .................................................................................................................................. 5 2.2 Conditional probability ................................................................................................................. 8 2.3 Independence ............................................................................................................................... 9
3. Discrete random variables ................................................................................................................ 11 3.1 Random variables ........................................................................................................................ 11 3.2 Expectations of discrete random variables ................................................................................. 12 3.3 Probability generating functions ................................................................................................. 15 3.4 Independent discrete random variables ..................................................................................... 16 3.5 Indicator functions ...................................................................................................................... 18 3.6 Joint distributions and conditional expectations ........................................................................ 18 3.7 Branching process ....................................................................................................................... 19 3.8 Random walk (again)................................................................................................................... 21
4. Continuous random variables ........................................................................................................... 22 4.1 Density functions ........................................................................................................................ 22 4.2 Changes of variables ................................................................................................................... 23 4.3 Expectation ................................................................................................................................. 23
5. Three very useful results ................................................................................................................... 24 5.1 Jensen’s inequality ...................................................................................................................... 24
5.2 Chebyshev’s inequality ............................................................................................................... 26 5.3 Law of large numbers .................................................................................................................. 26 4.4 Families of random variables ...................................................................................................... 27 4.5 Change of variable ...................................................................................................................... 28 4.6 Bivariate (or multivariate) normal distribution........................................................................... 29
6. Geometrical probability .................................................................................................................... 31 6.1 Bertrand’s paradox ..................................................................................................................... 31 6.2 Buffon’s needle ........................................................................................................................... 31 6.3 Broken stick ................................................................................................................................. 32
7. Central limit theorem ........................................................................................................................ 32 7.1 Central limit theorem .................................................................................................................. 32 7.2 Moment generating functions .................................................................................................... 33 7.3 Proof of the central limit theorem .............................................................................................. 34 7.4 Further applications of the central limit theorem and moment generating functions .............. 34
8. Convergence of random variables (non-examinable?) ..................................................................... 35 8.2 Almost-sure convergence ........................................................................................................... 37 8.3 Strong law of large numbers ....................................................................................................... 37
Notes from final lecture ........................................................................................................................ 38
1. Basic concepts
1.1 Sample space
Experiment – something with an uncertain outcome, e.g.
tossing a coin
throwing a die
spinning a roulette wheel
a lottery machine selecting a combination -subsets of or something
spinning a pointer
The sample space is the set of all possible outcomes.
A subset of is called an event. E.g. in the above sample spaces, the following are all events;
, the event that a head is thrown
, the event that a prime is thrown on a die
, the event that an even number comes up on a roulette wheel
to to to the event you get a run on the lottery
the event that the time is o’clock.
is called an elementary event.
If occurs, and , we say “ has occurred”.
Here is a dictionary between set theory and probability.
either or occurred did not occur
both and occurred when occurs, occurs
occurred but not are equivalent
mutually exclusive (both cannot occur).
1.2 Combinatorial probability Let be finite and . Define
If the s have equal probability, then .
Example. A hand of cards from a pack of is dealt at random. What is the probability that it contains (i) exactly one ace (ii) exactly one ace and two kings?
Example. From a table of random integers, pick the first . Then . Assume each element in is equiprobable. What is the probability that (i) no digit exceeds (ii) is the greatest digit?
since, if we call the greatest digit, (i) is asking for and (ii) is asking for .
1.3 Permutations and combinations
1.3.1 Definitions
Definition. A permutation is the number of ways of choosing an ordered sequence of size from a set of size (e.g. football teams with positions). We write this as
Definition. A combination is the same, but the sequence is unordered, we write this as . , thus
1.3.2 Examples
Example. An urn contains blue balls and red balls. What is the probability that the first red ball picked (without replacement) is the th overall?
The number of outcomes after all balls are chosen is
The number of outcomes of the form
is
and the answer is .
Example. keys are picked at random in attempts to open one lock. What is the probability that the th key opens the lock?
Sampling with replacement, let be the number of keys tried (including the successful one).
for small , large .
Sampling without replacement,
1.3.3 Stirling’s formula
Stirling’s formula states that
where we say that if as .
Proof. We prove the logarithmic version;
by comparing columns. Hence
since the LHS and RHS both as .
Example. A fair coin is tossed repeatedly; what is the probability that, after tosses, the number of heads equals the number of tails?
The answer is
by using Stirling’s formula.
2. Probability spaces
2.1 Introduction A probability space consists of three component objects; a sample space , a collection of
events, and a probability function.
2.1.1 Event spaces
An ‘event space’ is the power set of , denoted or , which is the set of all subsets of . If is finite, or countably infinite, we usually take for the event space. However if is uncountable, is too big.
Theorem (Banach, Kuratowski). Let be uncountable. There is no with countably additive, , and .
Proof. Uses the continuum hypothesis.
It is a reasonable statement that if are events then so are , and . Therefore;
Definition. An event space is a collection of subsets of such that
a)
b) If then ( is closed under countable unions)
c) If then ( is closed under complementation)
It is also called a ‘ -field’ or ‘ -algebra’.
Notes.
1) by (a) and (c)
2) Finite unions lie in , since
where .
3) so
so is closed under countable intersection.
4) if . Similarly with .
5) Property (a) is equivalent to saying that is non-empty, since
2.1.2 Probability measures
Definition. Let be a set and be an event space in . A probability measure on is a function such that
a)
b) and
c) if and are disjoint ( for ) then
i.e. has countable additivity.
Notes. If is a probability measure then
a) is finitely additive
b) follows from other axioms, since so
c) ‘is equal to’ the probability of
2.1.3 Definition of probability spaces
Definition. A probability space is a triple where is an event space of and is a probability measure on .
Examples. 1) The Bernoulli distribution, equivalent to a coin toss, where , and . We then have
2) , , where the satisfy and .
3) Same as 2), but with
4) and
for and . This is the Poisson distribution.
2.1.4 Some handy theorems and inequalities
Theorem 2.1. Some basic properties; if , etc... then
a)
b) If then
c)
Proof (c). is a disjoint union. Also, is a disjoint union. Then and and the result follows.
A Venn diagram is (often) useful.
Theorem 2.2 (Inclusion-Exclusion Principle). For ,
Proof. By induction on ,
Note. Often easier to calculate than .
Proposition 2.3 (Boole’s Inequality). If , then
Proof. Induction on . (Also valid if ; proof later.)
Proposition 2.4 (Bonferroni’s Inequality). If and is even,
If is odd, then the inequality changes to a .
Proof. By induction.
2.1.5 The example of the mad porter
Example (derangements). After dinner, porter hands hats back randomly to the guests, one each. What is the probability that no-one receives the correct hat?
Let and . Let . What is
?
Take distinct people . Then
By the Inclusion-Exclusion Principle,
Therefore the answer to the question is as .
Let be . Then
We deduce that the number of correctly hatted guests converges, as , to the Poisson distribution with parameter .
2.2 Conditional probability
2.2.1 Introduction
The event has probability . We discover that has occurred. What is now? It must be , but what is ?
If , then , so , so , so . Thus;
Definition. The conditional probability of given is
whenever .
Theorem 2.5. If and , then .
Proof. a disjoint union. So and use the definition.
Theorem 2.6 (more general). Let be a partition of with . Then
2.2.2 Application to ‘two-stage calculation’
Toss a fair coin. If heads, throw one die; if tails, throw two dice. What is the probability that the total shown is ?
Let , and . Then
2.2.3 Properties of conditional probability
a)
b)
(c)
(d)
2.2.4 Bayes’ formula
Theorem 2.7 (Bayes’ formula). Let partition with . Then
Proof.
A typical application to ‘real life’ is in medical diagnosis, for example if are types of disease and is the symptoms. Then the doctor must compute from and .
Example (false positives). A rare disease affects in people. A test correctly identifies the disease of the time, and wrongly identifies the disease of the time. If we let and , then by Bayes’ formula, so the test is almost worthless!
2.2.5 Simpson’s paradox
There are two treatments for kidney stones; open surgery (OS) and percutaneous nephrolithotomy (PN).
Treatment Big stones Small stones Overall
OS
PN
So OS beats PN in both sub-categories, but PN beats OS overall!
Mathematically, if we let success, PN, OS, small and large then the following are not inconsistent;
2.3 Independence
2.3.1 Definition
Definition. Events are independent if . More generally, a collection of events is independent if
Definition. is pairwise independent if .
Note that independence implies pairwise independence, but the converse is not true.
E.g. , , with , , is pairwise independent but not independent.
2.3.2 ‘Independence’ and ‘repeated trials’
E.g. Two dice thrown, and each of the possible outcomes has probability . Let be an attribute of the number on the first die, and of the number on the second die. Then
2.3.3 Product spaces
Let and be two probability spaces, with
Let , and be something appropriate, then such that
is a probability measure. This is because if and , then
2.3.4 The binomial distribution
By convention, coin tosses, die throws etc. have independent outcomes. E.g. flips of a coin which comes up heads with probability each time. Let be the number of heads, then
for . This is the binomial distribution.
Another way of reaching the answer is as follows; let be the outcome of the first toss. Then
This technique is called recursion.
2.3.5 The geometric distribution
Flip coins and let be the number of coins until the first head. Then
Alternatively, . Careful.
2.3.6 Random walk
A particle inhabits . At each stage it takes a step. The steps lie in with probabilities , . The steps are independent of one another, like coin tosses.
Gambler’s ruin problem; A gambler’s fortune at stage is the th position of the random walk. What is the probability that the gambler finishes bankrupt, i.e. the random walk hits before having started at , where ?
Let and . Then
which is a difference equation, with boundary conditions and .
Try , then , so , so .
General solution is
Feeding in the boundary conditions we get
is always a solution in these sorts of things. The recurrence relation is
3. Discrete random variables
3.1 Random variables
3.1.1 Definition
Definition. Let be a probability space. A real-valued random variable on this space is a function .
Note. In this, for simplicity, assume is countable and
. For uncountable there is an extra condition (see later).
Examples. (i) Toss a coin twice; number of heads.
(ii) Throw a die thrice; largest number that comes up.
3.1.2 Distribution function
Definition. The distribution function of is the function given by
Note. may be written to emphasise the role of .
Example. Toss a coin once, so that , and . Let be the outcome and let . Then
3.1.3 Probability function
Definition. The probability (mass) function of is the function (or ) given by , .
Note. We use the mass function rather than the distribution function when is discrete, i.e. either finite or countably infinite such that . This is not always ‘jumpy’.
Examples. (i) The Bernoulli distribution has , .
(ii) The binomial distribution has
(iii) The Poisson distribution has
3.1.4 Binomial-Poisson limit theorem
Example (misprints). An edition of the Grauniad has characters and each character is mis-set with some probability . Let be the total number of misteaks .
Take, for example, and . What is the approximate distribution of ? The answer is because of the
Theorem 3.1 (Binomial-Poisson limit theorem). If , in such a way that remains constant, then
3.1.5 More examples
Examples. (iv) The geometric distribution has
(v) The negative binomial distribution has and and is defined to be the number of coin tosses until the appearance of the th head. Then
[Note that
where is defined to be
for .]
Why is it called the ‘negative binomial’?
where and
3.2 Expectations of discrete random variables
3.2.1 Definition
Let be a probability space and be a discrete random variable.
Definition. The expectation (or mean, expected value) of is
wherever the sum converges absolutely.
(All random variables in this section are assumed to be discrete.)
3.2.2 Composition of functions
Theorem 3.2 (Law of the unconscious statistician). Suppose and . Then
whenever this expectation exists.
Proof. Let . Then
By convention, capital letters denote random variables whereas small letters denote their possible values.
3.2.3 Properties of expectation
1) If , i.e. , then .
2) If and then .
Proof.
3) for
Proof.
4)
Proof.
3.2.4 Variance
is a measure of the ‘centre’ of a distribution, whereas the variance is a measure of the ‘dispersion’.
Definition. The variance of is and the standard deviation is
. We write for the variance, for the standard deviation and for the expectation.
Notes. (i) – non-linearity.
(ii)
, since
Take care with the parentheses!
3.2.5 Moments
Definition. The th moment of is .
Notes. (i) and
(ii)
(iii) iff .
3.2.6 Examples for various distributions
1) The Bernoulli distribution
, , so and , so where .
Take , ( ). Let be
the indicator function of . Then , and .
Indicator functions obey the rules and .
2) The binomial distribution has
because
3) The Poisson distribution has
4) The geometric distribution has
Now
3.3 Probability generating functions
3.3.1 Definition
Definition. Let be a random variable taking values in . Its probability generating function is the function given by
whenever the sum converges absolutely. (The probability generating function can be thought of as a transform; remember Fourier.)
Notes. (i) The sum converges on . We often restrict the domain to .
(ii) We write to emphasise the dependence on .
(iii) and .
Theorem 3.3. The distribution of is uniquely determined by its probability generating function .
‘Proof’. . . , and progressive
differentiation of at gives the probabilities .
3.3.2 Reasons for using probability generating functions
1) They are an elegant method for dealing with sums of independent random variables (later).
2) They are a good method for calculating moments;
. ( may be on edge of domain of convergence. Abel’s Lemma validates this statement.)
And similarly, and .
.
3.3.3 Examples of probability generating functions
Distribution probability generating function
3.3.4 Application of generating functions to difference equations
An bathroom wall is to be tiled with tiles. In how many ways can this be done?
Let be the number of ways. Then , and . Multiply and sum;
Let
Then , so
where and .
3.4 Independent discrete random variables
3.4.1 Joint mass function
Definition. Let be discrete random variables. The joint mass function of the pair is given by .
We say that and are independent if . That is, the joint mass function factorises as the product of the ‘marginal’ mass functions.
Note. independent . This is because
3.4.2 Covariance and correlation
Definition. The covariance of and is and the
correlation (coefficient) is
if and .
and are uncorrelated if .
Note. .
Theorem 3.4. (a) If are independent,
(b) If are independent, .
(c) There exist random variables which are uncorrelated but not independent.
Proof. (a)
(b) , .
(c) Let be independent with distribution . Let , . Then , but . (Exercise.)
3.4.3 Correlation as a measure of dependence
(a) Correlation is a single number.
(b) .
Theorem 3.5 (Schwarz’s inequality).
.
Proof. Let , . Then . So this quadratic has one or no real roots, so the discriminant is , therefore
(c) iff for some , i.e. iff for some
, with [exercise] . Similarly,
(d) iff for some , . ( is undefined if .)
(e) if independent and .
(f)
if . I.e. the correlation is unchanged by scaling and moving the origin.
3.4.4 Three random theorems on independence
Theorem 3.6. (a) .
(b) If independent, .
Proof. (a)
.
(b) Since are assumed independent, their covariance is .
Examples. (a) The binomial distribution . The variance Bernoulli distribution .
(b) Negative binomial distribution with parameters . Variance
.
Theorem 3.7. , ,
Proof.
but this is a disjoint union.
Corollary. If and are independent,
This is called a convolution of and , written .
Theorem 3.8. If and are independent (and take values in ) then .
Proof. .
Example. Let be and be , with independent. What is the distribution of
? , therefore is .
Example. What is the probability generating function of the negative binomial distribution with parameters ? Answer sum of independent random variables, say where and the s are independent. So
3.5 Indicator functions Let , then if and if . Sometimes write .
Note: .
.
.
.
Converting to probabilities, .
Example.
Multiplying out and taking expectations gives the inclusion-exclusion formula.
Example. ( ) married couples are seated round a table with the wives randomly in the odd seats and the husbands in the even seats. Let be the number of husbands sitting by their wives. Calculate and .
Let th couple are seated together . Then
3.6 Joint distributions and conditional expectations
3.6.1 Definitions
If are discrete random variables, then the joint mass function .
The marginal mass function , so .
The conditional mass function of given is
The conditional expectation of given that is
We normally say the conditional expectation of given is . This is a random variable.
3.6.2 Example and theorems
Example. are independent and identically have the distribution . Let . What is ?
Solution 1.
Solution 2. using the rule that .
Theorem 3.9. (a) If are independent, then .
(b) .
Proof. (b)
3.6.3 Random sum formula
if . What about when we have a random number of random variables?
Theorem 3.10 (Random sum formula). Let be independent, taking values in such that the are identically distributed with probability generating function . The
‘random sum’ has probability generating function .
Proof.
Theorem 3.11. In notation of Theorem 3.10, .
Proof.
.
Exercise. Compute in terms of moments of and .
3.7 Branching process
3.7.1 Definition
This is a basic model for population/bacterial/etc... growth. At generation , there is some number of individuals. Assume
a)
b) is a random variable, the number of offspring of the progenitor, i.e. it has probability mass function ;
c) each individual in the system has a family of offspring with same distribution as ;
d) crudely speaking, all family sizes are independent.
Then
where
is the number of offspring of the th member of the
th generation.
3.7.2 Relation to probability generating functions
Let probability generating function of .
Theorem 3.12. where , and similarly . Hence
.
Theorem 3.13. Let and . Then and
if and if .
Proof. . so
. Therefore
. Similarly for the variance (exercise).
Example where has a closed form. Let where and . Then if , and . By induction,
is the coefficient of in the Taylor expansion of .
3.7.3 Extinction (and some general theory)
When , this becomes if and if .
Let . Then . Therefore
How do you relate to the ?
Theorem 3.14 (probability measures are continuous set functions). If are events with , then
Proof. Let and . Then . So
3.7.4 Value of the probability of extinction
Let by continuity of
Theorem 3.15. is the smallest non-negative root of the equation .
Proof. Let so that
Then . Since and
since is continuous on .
Let be a non-negative root of . Then since is non-decreasing on .
. Therefore , so .
Theorem 3.16. (a) If then .
(b) If then .
(c) If and then .
Proof.
therefore is a convex function.
(a) . If then is the only root of in . So .
(b) If , there is a second root of in . So .
(c) If and , then so .
3.8 Random walk (again)
Three types of barrier; ‘absorbing’ (you die when you hit the barriers), ‘reflecting’ (you bounce off), ‘retaining’ (you can’t move past the barriers but don’t bounce off).
Take a random walk on with absorbing barriers at and . Let be the number of steps taken until absorption and start at .
Thus
General solution
13
Particular solution
Values of , can be calculated from the boundary conditions. Hence
4. Continuous random variables
4.1 Density functions
4.1.1 Definition and notes
Probability space , random variable . Distribution function .
Definition. The random variable is called continuous (misnomer) if there exists such that
a)
b)
.
If this holds, is called the (probability) density function of , sometimes written .
(i) If is differentiable, then we take .
(ii) If has pdf , then .
(iii)
.
(iv) pdf’s satisfy ,
.
(v) element of probability is
In particular
.
4.1.2 Examples of density functions
Uniform distribution on , Unif .
Exponential distribution Exp
Distribution function
Exponential distribution is ‘memoryless’ (lack-of-memory property). For ,
Exercise; prove that the exponential distribution is the only continuous distribution with the memoryless property.
Normal distribution (or Gaussian distribution) has density function
More generally, has density function
is changed by location and scale.
4.2 Changes of variables
is a random variable, . What is the distribution of ? I.e.
If has probability density function , and is strictly increasing and differentiable,
Example. Let (i.e. has uniform distribution on ). Let . What is the distribution of ?
Therefore has distribution .
Example. Let , , i.e. where .
Therefore .
Example. Let , and let be a continuous distribution function. Let .
Therefore has distribution function . A key fact in Monte Carlo methods.
What if has flat sections?
4.3 Expectation
Recall. discrete, then
4.3.1 Continuous expectation
Definition. The expectation of the continuous random variable is
whenever this integral is absolutely convergent (i.e. ).
Note. This expectation has same general properties as that of discrete random variables. E.g. linearity, mean, variance, moments, covariance, correlation, etc.
4.3.2 Expectation of functions
Theorem 4.1. If has probability density function , and ,
Proposition 4.2. If is a continuous random variable then
Note. can be taken as a definition of that does not depend on the type of (discrete, continuous, etc.) .
Proof (4.2).
Therefore (1)-(2)=
.
Proof (4.1).
5. Three very useful results
5.1 Jensen’s inequality
5.1.1 Convex and concave functions
Definition. A function is convex if .
A function is concave if – is convex.
Examples. ,
Fact. If exists on and then is convex.
5.1.2 Jensen’s inequality
Theorem 5.1 (Jensen’s inequality). Let be a random variable taking
values in and let be convex on . Then .
Example (AM-GM inequality). Let with and . By Jensen’s inequality;
Notes. 1) Equality holds in JI iff is constant (with probability ).
2) unless is a constant random variable.
Lemma 5.2 If is convex on then
with , , .
Proof. Induction on . OK for by definition of convexity. Assume OK for , then
5.1.3 Sketch proof of general Jensen’s inequality
Theorem 5.3 (Supporting hyperplane theorem). is convex on iff , .
Proof of Jensen’s inequality from supporting hyperplane theorem is as follows;
Set and choose accordingly. Then
.
Theorem 5.4 (Separating hyperplane theorem). If , there exists a line with beneath and the curve above (strictly).
Proof. distance from to . is a continuous function of , as strides? on curve. has a
minimum; with on curve. Take
perpendicular bisector of line from to .
Proof of supporting hyperplane theorem;
Find points with . Let , . Find separating hyperplane separating from the curve. Hence line which is easily shown to be supporting.
5.2 Chebyshev’s inequality
If is small, then is near . In what way?
Theorem 5.5 (Markov’s inequality). If exists then
Proof. Let . Then , so .
Theorem 5.6 (Chebyshev’s inequality).
Proof. Using Markov’s inequality
Often the following is very useful
Large deviation theory.
5.3 Law of large numbers
5.3.1 Law of large numbers
Theorem 5.7. Let be independent and identically distributed with finite variance and mean . Let
. Then
(a) as (mean square law of large numbers)
(b) as (weak law of large numbers)
Proof. (a) . Then
(b) By Chebyshev,
5.3.2 Principle of repeated experimentation
Repeat an experiment “independence” and each time observe whether or not occurs. occurs at the experiment. Number of occurences of up to time is
. Proportion is
4.4 Families of random variables
4.4.1 Joint functions
a vector of random variables on
Joint distribution function.
with , and . If
then we call the joint pdf of . If we can, we take
Note. Let and suppose is jointly continuous with joint pdf
4.4.2 Marginal functions
The marginal distribution function of is
The marginal density function is
4.4.3 “Element of probability”
is
4.4.4 Independence
Generally, and are independent if for ,
and are independent .
If is continuous (has a joint pdf) then and are independent iff
If and are independent then and hence
if are continuous and independent.
4.4.5 Conditional pdf of given
Some people write the LHS as assuming undefined as but means take the limit as .
4.4.6 Conditional expectation of given
where
Theorem 4.3.
4.5 Change of variable
Example. Let be a random point in , with joint probability density function . Let
and . What is the joint probability density function of ?
4.5.1 General solution
General question; have joint pdf . , . where What is
Define by so .
Need some invertibility of .
Let . Assume is invertible on .
Let and . Then
where and is the modulus of the Jacobian determinant
Therefore for .
4.5.2 Examples
Example. Let be independent with distribution . Let
What is ?
Solution.
Inverse: , , , .
. is invertible. The Jacobian is
Therefore for and ,
which can be written in the form . Therefore are independent.
has pdf on .
has pdf on so has distribution .
Example. Let be independent with , , . Then
, ,
are independent
on . This distribution has spherical symmetry. A function has spherical symmetry if it is equal to for some .
4.6 Bivariate (or multivariate) normal distribution
Recall
Note the elementary fact that
Now iff is .
4.6.1 Definition and expectation
Bivariate normal distribution is of the form
where is a ‘quadratic form’ in .
Usual expectation is, for ,
where , , .
Usually write
where and and
4.6.2 The normalised random variables and
Example. Let have that joint pdf,
‘Completing the square’ gives
Thus .
has distribution
Note.
where
which is not a coincidence. We can prove that is the covariance;
4.6.3 Important properties
1) Two random variables with the bivariate normal distribution are independent iff their correlation is .
because the cross-product is in , and hence .
2) If is bivariate normal then is univariate normal. ‘Linearisation retains normality.’
4.6.4 Covariance matrix
Vector of random variables. The mean vector
. The covariance matrix is
4.6.5 Multivariate normal distribution (not examinable)
Definition. has the multivariate normal distribution if
for . It may be shown that and the covariance matrix of is .
Alternative definition. The vector is said to have the multivariate normal distribution whenever;
has a (univariate) normal distribution.
6. Geometrical probability
6.1 Bertrand’s paradox
A chord of the unit circle is picked at random. What is the probability that an equilateral triangle based on the chord fits within the circle?
We formalise the problem as follows; Define to be the distance from the centre to the chord. What is the probability that ?
(a) Assume is . Then .
(b) Assume is . Then .
(c) Pick point at random (uniformly) in the unit ball, draw the chord with as midpoint. Then
(d) Pick two points uniformly at random on the circle, and join them by the chord. Answer turns out to be .
6.2 Buffon’s needle
6.2.1 Buffon’s needle
Rule the plane with parallel lines distance apart. Drop a needle of unit length at random on the table. What is the probability that the needle intersects some line?
The coordinates of the centre of the needle are the random variable . The inclination to the horizontal is random. Assume is and is and are independent.
for , .
An intersection occurs if either
or
.
where
Hence may be estimated by repeatedly throwing the needle.
Note: ‘Buffon cross’ gives a much faster estimate.
Buffon’s needle with length on lines of distance apart;
6.2.2 Buffon’s noodle
Drop a noodle of length on a table with ruled lines distance apart. What is the mean number of intersections of the lines with the noodle?
Number of intersections
The variance of the number of intersections depends heavily on the shape of the noodle. E.g. the probabilities are distributed differently for a tightly coiled noodle.
6.2.3 Buffon’s needle ( ) versus cross
Let be the number of intersections in one throw of the needle. Then
Let be the number of intersections in one throw of the cross. Then and
So it is better to use the cross than the needle. This is an example of the ‘technique of variance reduction’ which is useful in computation via simulation.
Example.
Find a principle reduction technique!
6.3 Broken stick A unit stick is broken in two places chosen
uniformly at random on , independently of each other. What is the probability that the three pairs can be
used to make a triangle?
Note that , and .
Triangle can be made if , , .
, , , or
, , , .
.
Generalisation to breaks in a stick;
7. Central limit theorem
7.1 Central limit theorem Let be independent and identically distributed random variables with mean and
variance . Let .
Law of large numbers states that . The central limit theorem states that
, where .
Theorem 7.1. Let be independent and identically distributed random variables with and variance . Let
. Then
i.e. is ‘asymptotically normal ’ and we write
weak convergence. If we apply this looking at density functions, it has another condition.
7.2 Moment generating functions Definition. The moment generating function of the random variable is
wherever this is finite.
Note. If takes values in then the probability generating function is so .
Examples. (a) . Then
(b) . Then
by completing the square. Now
, so we end up with
( ).
(c) Cauchy distribution
infinite variance.
Vital properties of moment generating functions;
(a) Uniqueness
If on a neighbourhood of the origin, then there is a unique distribution with moment generating function .
(b)
is the (exponential) generating function of the moments , for .
(c)
(d) if are independent.
(e) Continuity theorem. If are random variables with for all in some
neighbourhood of , then for all at which is continuous.
7.3 Proof of the central limit theorem WLOG, and , i.e. write
which is the moment generating function of the distribution. The result follows by the continuity theorem.
Example. An unknown function of voters have decided to vote Labour in the next election. It is desired to estimate by asking a sample of people. We want an error not exceeding . How large a sample should I approach?
Assume a sample of size , that their answers are independent, and that each sample is a Labour voter with probability .
Let be if the th person asked says Labour and otherwise. Recall that , and
estimate by . Then
Let us require that LHS . Then as ,
by the central limit theorem
If (from tables) then this will be as required. Thus for the population of Britain we should take roughly .
7.4 Further applications of the central limit theorem and moment
generating functions a) Let ,
. Then
E.g. for ;
(b) Let , so that by probability generating functions.
Taking ,
(c) Binomial-Poisson limit theorem
Therefore converges to the Poisson distribution (in the sense of weak convergence).
(d) Let be independent with . Define
Using the rule that
which is the moment generating function of .
8. Convergence of random variables (non-examinable?) Sequence of random variables, and another one .
Definitions. (a) in mean square if as .
(b) in probability if ,
(c) in distribution if as , for all at which is continuous.
Theorem 8.1. Mean square probability distribution (where means that if in style , it also converges in style ). The converse statements are false.
Proof (ms prob). The Chebyshev inequality gives
Proof (prob ms). Let
so with probability. However,
Proof (prob dist).
Let be continuous at . Let and . Then .
Proof (dist prob). Let be and let
, are both , so has distribution so in distribution. However in probability, because, if even, .
Example. with probability . Then .
If with probability , then .
Theorem 8.2. If in distribution for some constant , then in probability.
Proof. Assume in distribution which means
Proposition 8.3. in probability iff
Proof.
because is strictly increasing on .
by Markov’s inequality.
If RHS then LHS . I.e.
Assume in probability. Then
Let to obtain
8.2 Almost-sure convergence a sequence of random variables, a random variable, the probability space,
an event. Let
Then almost surely if . Written “a.s.”, “a.c.”, “wp1” “with probability ”
Theorem 8.4. Almost-sure convergence implies probability convergence.
Proof. Let . Then .
Recall definition of convergence; if , . Now let
Note; is decreasing in . Define
,
(by continuity of
, since
in probability.
8.3 Strong law of large numbers Let be independent and identically distributed with . Then
and
almost surely as .
Proof. Much harder.
Notes from final lecture probability space, random variable
, .
gives rise to a mass function if discrete with , or a pdf if continuous with .
gives a joint mass function with , or in the continuous case we get .
independent iff the joint distribution factorises.
Recall
Example – bivariate normal distribution.
For in the continuous case we have
the conditional probability density function.
Let . The conditional expectation is defined to be so it is a
random variable with the handy property that .