Continuous Random Variables · Continuous Random Variables Chapter 5 { Lecture 18 Yiren Ding Shanghai Qibao Dwight High School ... Instead, we show this convergence by graphing. We

Continuous Random VariablesChapter 5 – Lecture 18

Yiren Ding

Shanghai Qibao Dwight High School

May 21, 2016

Yiren Ding Continuous Random Variables 1 / 22

Outline

1 Central Limit TheoremIntroductionStronger Form of CLT

2 Graphical Illustration of CLTn-dimensional ConvolutionOptimized AlgorithmCLT Visualized

3 Applications of CLT


Central Limit Theorem

Theorem 1 (Central Limit Theorem).

If X1,X2, ... are i.i.d. random variables with expected value µ and standarddeviation σ, then

limn→∞

P

(X1 + · · ·+ Xn − nµ

σ√n

≤ x

)= Φ(x) for all x . (1)

Another way to state the CLT is that the sum

X1 + · · ·+ Xn ∼ N(nµ, (σ√n)2). (2)

Also, we state without proof the useful fact that the sample mean

X =X1 + · · ·+ Xn

n∼ N

(µ,

(σ√n

)2). (3)


Central Limit Theorem Introduction

Introduction

The CLT is the queen of all theorems in probability.

The amazing thing about the CLT is that Xk can have anydistribution, as long as n is large and Xk are independent.

The first version of the CLT was proved by Abraham de Moivre(1667-1754) for a special case of binomially distributed Xk .

Indeed, a binomial random variable can be written as the sum of nindependent Bernoulli random variables! It is no coincidence that theshape of binomial distributions is approximately bell-like!

In practical scenarios, the rule of thumb for using normal toapproximate binomial is when np(1− p) ≥ 25.

Poisson distribution, being the limiting case of the binomial, can alsobe approximated by the normal distribution, when λ ≥ 25.


Central Limit Theorem Stronger Form of CLT

Central Limit Theorem (Stronger Form)

We can even relax the conditions a bit further by allowing the randomvariables Xk to have different distributions!

Suppose that E (Xk) = µk and σ(Xk) = σk . Then the sum

X1 + · · ·+ Xn ∼ N

(n∑

k=1

µk ,

n∑k=1

σ2k

). (4)

This explains why in the real world so many things that are the resultof many small unrelated effects are normally distributed.

For example: the return on a stock, the height of men, the testscores, the duration of a pregnancy, IQ, income, etc.

For example, in a large population, test scores can be affected bystudents’ health, age, attitude, or even what they ate last night.


Central Limit Theorem Stronger Form of CLT

2005 Distribution of ACT Scores1

1Source: WikipediaYiren Ding Continuous Random Variables 6 / 22

Graphical Illustration of CLT

Graphical illustration of CLT

The mathematical proof of the central limit theorem is extremelycomplicated and technical (remember the fun Borel-Cantalli lemma),and sheds no insight into the issue of how large n must be in order forthe sum X1 + · · ·+ Xn to approach a normal shape.

Instead, we show this convergence by graphing. We have two options.

First, we could run many simulation trials of the sum X1 + · · ·+ Xn,and creates a histogram by plotting the outcomes of the simulationsruns. By the law of large numbers, the histogram will approach itstrue shape, that must be, by the CLT, a bell-shaped curve.

However, the disadvantage of this method is that for a fixed value ofn, one needs many simulation trials before the simulated distributionbecomes sufficiently close to its actual distribution. How large mustthe number of simulations be? We have no idea when to stop.


Graphical Illustration of CLT n-dimensional Convolution

Graphical illutration of CLT

We can circumvent this problem by taking a different approach. Inthe case where Xi have a discrete distribution, it is fairly easy tocalculate the exact probability distribution of X1 + · · ·+ Xn using theconvolution formula (Theorem 4, Lecture 12), which saysIf discrete random variables X1 and X2 have the set of nonnegativeintegers as the range, and are independent, then

P(X1+X2 = k) =k∑

i=0

P(X1 = i)P(X2 = k−i) for k = 0, 1, .... (5)

This can be generalized for arbitrary n:

P(X1 + · · ·+ Xn = k) =k∑

i1=0

k−i1∑i2=0

· · ·k−i1−···−in−2∑

in−1=0

P(X1 = i1)

×P(X2 = i2) · · ·P(Xn = k − i1 − · · · − in−1)

(6)



Sum of Dice Rolls

Let’s consider rolling of a fair die n times. Denote by the randomvariable Xi the result of the ith roll. Each Xi is uniformly distributed;it is easy to verify that µ = E (Xi ) = 7/2 and σ2 = var(Xi ) = 35/12.

By the CLT, the sum of the n rolls will be approximately normallydistributed with expected value 7n/2 and variance 35n/12.

Using the convolution rule, P(X1 + · · ·Xn) = k is the sum

min(6,k)∑i1=1

min(6,k−i1)∑i2=1

· · ·min(6,k−i1−···−in−1)∑

in−1=1

(1

6

)n−1p(k − i1 − · · · − in−2)

where p(i) = 16 if 1 ≤ i ≤ 6 and is 0 otherwise.

The above formula can be easily implemented by a computer programfor small values of n. See the example next.



Python Code for n = 5

def Dice(k):

if k in [1,2,3,4,5,6]:

return 1/6

else:

return 0

n, Prob = 5, []

for k in range(n, 6*n+1):

Term = sum(Dice(a)*Dice(b)*Dice(c)*Dice(d)*Dice(k-a-b-c-d)

for a in range(min(6, k)+1)

for b in range(min(6, k-a)+1)

for c in range(min(6, k-a-b)+1)

for d in range(min(6, k-a-b-c)+1))

Prob.append(Term)

print(Prob)



n = 5, run time = 0.2 second, CPU: m3-6Y30, 2.20 GHz



n = 7, run time = 5 seconds



n = 9, run time = 500 seconds, painfully slow!


Graphical Illustration of CLT Optimized Algorithm

Code Optimization

As you can see, the run time goes up exponentially as n increases!

Simply adding all the terms in the convolution formula by brute forcemay not be the best idea! The code requires optimization.

Optimization often requires the so-called extreme principle: Considerfinding the probability that the sum of 5 fair dice is equal to 5.

As soon as the outer loop’s a-value exceeds 1, the probability will be0 regardless; however, the algorithm will keep on running regardless,wasting our precious time!

Next consider the opposite scenario: Finding the probability that thesum of 5 fair dice is equal to 30.

If the outer loop’s a-value is less than 6, the probability will also be 0regardless, but the algorithm does not know that either!



Slightly Optimized Code for n = 3

Dice = [1/6, 1/6, 1/6, 1/6, 1/6, 1/6]

n, Prob = 3, []

for k in range(n, 6*n+1):

Term=0

for a in range(1, min(6, k)+1):

cont = True

if a + (n-1)*1 > k or a + (n-1)*6 < k:

cont = False

if cont == True:

for b in range(1, min(6, k-a)+1):

cont = True

if a+b+ (n-2)*1 > k or a + b + (n-2)*6 < k:

cont = False

if cont == True:

Term = Term + Dice[a-1]*Dice[b-1]*Dice[k-a-b-1]

Prob.append(Term)



Code Optimization

Using this slightly optimized code, we have reduced the executiontime for n = 7 from 5 seconds to merely 1.3 seconds, and reduced theexecution time for n = 9 from 500 seconds to merely 40 seconds!

I was brave enough to try n = 20 and after 4 hours of waiting, I gaveup in frustration. After all, Python is not known for its speed.

Even though at this moment I have not been able to produce a graphfor n larger than 15 (without leaving my pc on overnight, which isbad for my health due to all the radiation I will receive and I don’twant to get cancer), someone else kindly did for n = 15 and n = 20.

It can also be shown that this convergence holds even for a biaseddie, although the rate of convergence is much slower. From this wecan conjecture that the more symmetrical the shape of Xi , the fasterwill the sum X1 + · · ·+ Xn converge to a normal shape!


Graphical Illustration of CLT CLT Visualized

Central Limit Theorem for Unbiased Die


Graphical Illustration of CLT CLT Visualized

Central Limit Theorem for Biased Die


Applications of CLT

Example 1.

Do you believe a friend who claims to have tossed heads 5, 227 times in10, 000 tosses of a fair coin?

Solution. Denote by Xi the result of the ith toss of the fair coin. Weassign Xi = 1 if the toss lands heads and 0 if tails.

Therefore the sum Sn = X1 + X2 + · · ·+ Xn represents the number oftimes heads appear in n tosses of a fair coin.

By the CLT, Sn is approximately normally distributed with expectedvalue nµ = 5, 000 and standard deviation σ

√n = 50. The p-value of

the claim is found to be 1 in 3 million! So it is most likely fake!

P(Sn ≥ 5, 227) = 1− P(Sn < 5, 227) = 1− Φ

(5, 227− 5, 000

50

)= 1− Φ(4.54) = 2.81× 10−6.


Applications of CLT

Example 2.

Your friend asserts to have rolled an average of 3.25 points per roll in1,000 rolls of a fair dice. Do you believe this?

Solution. Denote by Xi the outcome of the ith roll of the die. Wehave found earlier that E (Xi ) = 3.5 and σ(Xi ) = 1.7078.

By the CLT, the sample mean X = (X1 + · · ·Xn)/n is also follows anapproximately normally distribution with expected value µ = 3.5 andstandard deviation σ = 1.7078/

√1, 000 = 0.0540.

By Theorem 1 from Lecture 18,

P(X < 3.25) = Φ

(3.25− 3.5

0.0540

)= Φ(−4.529) = 1.84× 10−7.

The claim of your friend is highly implausible!


Applications of CLT

Example 3.

Suppose that X1, ...,Xn are independent random variables that areuniformly distributed on (0, 1). Approximate the probability that therounded sum X1 + · · ·+ Xn equals the sum of the rounded Xi when therounding is to the nearest integer.

Solution. Denote by round(x) the value of x after rounding to thenearest integer. We have

round

(n∑

i=1

Xi

)=

n∑i=1

round(Xi ). (7)

This is true only if

n∑i=1

round(Xi )−1

2≤

n∑i=1

Xi <

n∑i=1

round(Xi ) +1

2. (8)


Applications of CLT

Example 3

Denote Di = Xi − round(Xi ). Then the desired probability is

P

(−1

2≤ Sn <

1

2

)where Sn =

n∑i=1

Di . (9)

The random variables D1, ...,Dn are independent and uniformlydistributed on (−1/2, 1/2) so that E (Di ) = 0 and σ(Di ) =

√1/12.

By the central limit theorem, it follows that

P

(−1

2≤ Sn <

1

2

)≈ Φ

(√3

n

)− Φ

(−√

3

n

)(10)

The values are 0.416, 0.271, 0.194, and 0.138 for n = 10, 25, 50 and100. This is very close to 0.411, 0.269, 0.191 and 0.136, the resultsobtained via one million Monte Carlo simulations.


Documents

Continuous Random Variables · Continuous Random Variables Chapter 5 { Lecture 18 Yiren Ding Shanghai Qibao Dwight High School ... Instead, we show this convergence by graphing. We