9
GEM2900 - Understanding Uncertainty and Statistical Thinking David Chew and David Nott Department of Statistics and Applied Probability National University of Singapore The normal distribution The normal distribution, also called the Gaussian distribution, can be used to model continuous random variables. The normal distribution has two parameters μ and σ 2 , called the (population) mean and (population) variance, respectively. If the random variable X follows a normal distribution with parameters μ and σ 2 (also denoted by X N (μ, σ 2 )) then E[X ]= μ and Var[X ]= σ 2 . Woolfson (2008, Chapter 10) The normal distribution (cont.) The probability density function (pdf) of the normal distribution is symmetric, with a smooth bell-shape. The formula for the pdf of the N (μ, σ 2 ) distribution is: f (x )= 1 2πσ 2 exp - (x - μ) 2 2σ 2 The pdf is largest at μ, the expected value, and how “spread out" the density curve is will be controlled by σ, the standard deviation. Areas of regions underneath the pdf represent probabilities. The normal distribution (cont.) 5 10 15 0.0 0.1 0.2 0.3 0.4 Normal pdfs with same mean but different spread x f(x)

lec21-2by2

Embed Size (px)

DESCRIPTION

GEM

Citation preview

Page 1: lec21-2by2

GEM2900 - Understanding Uncertainty andStatistical Thinking

David Chew and David Nott

Department of Statistics and Applied ProbabilityNational University of Singapore

The normal distribution

I The normal distribution, also called the Gaussian distribution,can be used to model continuous random variables.

I The normal distribution has two parameters µ and σ2, called the(population) mean and (population) variance, respectively.

I If the random variable X follows a normal distribution withparameters µ and σ2 (also denoted by X ∼ N(µ, σ2)) then

E[X ] = µ and Var[X ] = σ2.

Woolfson (2008, Chapter 10)

The normal distribution (cont.)

I The probability density function (pdf) of the normal distribution issymmetric, with a smooth bell-shape.The formula for the pdf of the N(µ, σ2) distribution is:

f (x) =1√

2πσ2exp

{− (x − µ)2

2σ2

}

I The pdf is largest at µ, the expected value, and how “spread out"the density curve is will be controlled by σ, the standarddeviation.

I Areas of regions underneath the pdf represent probabilities.

The normal distribution (cont.)

5 10 15

0.0

0.1

0.2

0.3

0.4

Normal pdfs with same mean but different spread

x

f(x)

RepliGo Reader
Highlight
RepliGo Reader
Highlight
RepliGo Reader
Highlight
Page 2: lec21-2by2

The normal distribution (cont.)

I As I mentioned, the normal distribution is sometimes alsocalled the Gaussian distribution, named for the Germanmathematician Karl Friedrich Gauss.

I The Gaussian distribution, however, was not firstdiscovered by Gauss, an example of what mathematicianscall Stigler’s law. (This is the law that discoveries shouldnot be named after their inventors). Stigler’s law is ofcourse an example of Stigler’s law ...

I Until conversion to the euro, the German ten mark bill hada picture of Gauss, a picture of the Gaussian densitycurve, and even the formula for the Gaussian density on it.

The normal distribution (cont.)

Brownian motion

I One problem in which the Gaussian distribution arises is in thedescription of the physical phenomenon of Brownian motion.Albert Einstein’s mathematical description of Brownian motion iswidely regarded as having convinced most physicists of theexistence of atoms.

I Brownian motion is named after Robert Brown, who observedthe highly erratic motion of pollen grains in a drop of water.

I The motion of the grains can be thought of as arising from alarge number of collisions with the water molecules. Brownianmotion is an idealized model for the coordinates of the motion.

I For a particle starting at the origin and followed for a period oftime t the distribution of the x or y coordinates will be N(0, σ2t)for some σ2 > 0.

Brownian motion

I It is possible to simulate paths of Brownian motion on acomputer.

I The java applet athttp://www.ms.uky.edu/ mai/java/stat/brmo.htmlsimulates a Brownian motion in two dimensions.

I The paths traced out by the process tend to be highlyerratic (in fact, although they are continuous they are notdifferentiable anywhere, for those of you who have doneenough math to know what that means).

Page 3: lec21-2by2

Brownian motion

I Although Robert Brown wasn’t the first to observeBrownian motion (Stigler’s law again), he was certainly themost systematic experimenter to look at this phenomenon.

I He looked at not just pollen grains suspended in a waterdrop, but also many other things (including scrapings ofparticles from the sphinx, which he had access to in hiswork as a curator at the British museum - there was somequestion about whether only living things were subject toBrownian motion and apparently he regarded the sphinxas undeniably, certifiably dead).

Brownian motion

I A century and a half before Brown a draper from Delft,Antony Van Leeuwenhoek, had observed Brownian motion.

I Among other things, he had looked at scrapings of theunbrushed teeth of old men.

I Einstein was the first scientist to really take an interest inBrownian motion who also had the mathematical ability todescribe it analytically.

I Other scientists had guessed the correct explanation buthad not done any calculations that could be compared toexperiments.

Calculating with the normal distribution

I A normal distribution is specified by its mean µ and itsvariance σ2.

I The standard normal distribution has mean µ = 0 andvariance σ2 = 1.

I If a random variable is distributed as standard normal, it istypically denoted by Z rather than X , and the standard normalobservations are known as z-values.

I The standard normal distribution is symmetric about µ = 0,hence P(Z < −c) = P(Z > c) for all c ∈ R.

I Probabilities of (continuous) random variables are given byareas under pdf curves. For example, suppose Z is standardnormal. Then the probability P(0 < Z < 1), i.e. the probabilitythat Z is between zero and one, is equal to the area of theshaded region in the next slide.

Calculating with the normal distribution (cont.)

Page 4: lec21-2by2

Calculating with the normal distribution (cont.)

How to find normal probabilities?

I Use a computer.All software packages will calculate P(X < c) for X distributednormal with mean µ and standard deviation σ.(E.g. The free statistical package R has the commandpnorm().)

I If we wish to calculate a probability P(a < X < b), then we canuse a computer to calculate P(X < a) and P(X < b) and thenuse

P(a < X < b) = P(X < b)− P(X < a)

I Alternatively, the probabilities can be calculated “by hand” usinga technique known as standardisation.

Calculating with the normal distribution (cont.)

Standardisation

I The standardised version of a random variable X with(population) mean µ and (population) standard deviation σis

Z =X − µσ

I The mean of Z is zero; the standard deviation of Z is one.I If X has a normal distribution, then Z has a standard

normal distribution!This result does not necessarily hold for other distributions.

Calculating with the normal distribution (cont.)

Example: Marilyn vos Savant’s IQ

I Earlier in the course I mentioned Marilyn vos Savant, who waslisted for a time in the Guinness Book of Records as the personwith the world’s highest IQ.

I She popularized the Monty Hall problem with her column inParade magazine. Her presentation of the problem and her(correct) solution, generated a lot of heated discussion at thetime.

I IQ scores are often assumed to follow a normal distribution,calibrated so that the mean is 100 and the standard deviation is15.

I Marilyn vos Savant’s IQ is 228. What is the probability ofsomeone randomly chosen from the general population havingan IQ score larger than this?

Calculating with the normal distribution (cont.)

Example: (cont.)Let X have a normal distribution with mean 100 and standarddeviation 15.The questions ask for P(X > 228).

I Method One: Using a computerI The free statistical package R (http://www.r-project.org)

gives a value of 0 to machine precision ...I Suppose her IQ was a mere 150. Then R gives a value of

0.0004.

RepliGo Reader
Highlight
RepliGo Reader
Highlight
Page 5: lec21-2by2

Calculating with the normal distribution (cont.)

Example: (cont.)I Method Two: using standardisation

I We have:

P(X > 228) = P(

X − µ

σ>

228 − µ

σ

)

= P(

Z >228 − 100

15

)= P(Z > 8.53)

So the probability we need is P(Z > 8.53), where Z has astandard normal distribution. But how do we work out what this is?We have to look this probability up in a table, like the one given inthe text book. Actually 8.53 is well beyond the upper limit of valuesin the table. You won’t be required to read normal tables foranything in this course.

Calculating with the normal distribution (cont.)

I Consider a normally distributed random variable X with mean µ andvariance σ2, and the random variable Z = X−µ

σwhich is standard

normal.I The for c > 0

P(µ− cσ < X < µ+ cσ) = P(−c < Z < c)

I For example

P(µ− σ < X < µ+ σ) = P(−1 < Z < 1) ≈ 0.682

P(µ− 2σ < X < µ+ 2σ) = P(−2 < Z < 2) ≈ 0.954

P(µ− 3σ < X < µ+ 3σ) = P(−3 < Z < 3) ≈ 0.997

I This is sometimes called the 68 — 95 — 99.7 rule.I If data are normally distributed, roughly 68% (≈ 2

3 ) of observationsshould be within one standard deviation (SD) of the mean; and roughly95% of observations should be within two SDs of the mean.

The 68 — 95 — 99.7 rule Why is the normal distribution so important?

Motivating example: sums of dice

I The next five slides show the probabilities of the sums ofn = 1,2,3,4 and 5 dice.

I What happens to the shape of the probabilities as thenumber of dice n increases?

I Amazingly, this will happen with (almost) any randomvariable; as long as n is large enough the probabilities forthe sum (and mean) will start to follow a normalbell-shaped curve.

RepliGo Reader
Highlight
Page 6: lec21-2by2

Why is the normal distribution so important? Why is the normal distribution so important?

Why is the normal distribution so important? Why is the normal distribution so important?

Page 7: lec21-2by2

Why is the normal distribution so important? The Central Limit Theorem (CLT)

I Consider a random variable; such as the number on a die roll.

I Suppose we observe n realisations of the random variable; i.e.we roll the die n times.

I If n is large enough, then the distribution of the sum (and mean)of the n values follows approximately a normal distribution.

NOTE: We assume here that the value obtained on one die roll doesnot affect the value obtained on another die roll; i.e. we are assumingindependence.If we do not have independence, then the CLT may not hold.There are many different sets of assumptions under which the CLTholds.

What does the CLT tell us?

I It shows that a normally distributed random variable can beregarded as the sum (or mean) of a large number of smallrandom contributions.

I Often it can be argued that variables observed in the realphysical world are subject to a large number of differentsources of variability.

I It is therefore not very surprising that many real-lifevariables are of the form signal + noise where the noisehas an approximate normal distribution.

Why is the CLT important?

I It explains why many real-life observed variables have asignal + noise form with the noise following a normaldistribution.

I In statistics, very commonly used quantities are sums ormeans of observations, so the CLT tells us that thesequantities have approximate normal distributions.

I Hence many methods in statistics rely on the normaldistribution.

Page 8: lec21-2by2

Connection between binomial and normalAssume X follows a binomial distribution with parameters n and p.

Then we can view X as the sum of n independent random variables,each having a Bernoulli distribution with parameter p.

Hence, if Y is a normally distributed random variable with mean npand variance np(1− p), then

P(X = x) ≈ P(

x − 12≤ Y ≤ x +

12

)=

∫ x+ 12

x− 12

fY (u)du

≈ fY (x).

where fY (y) = 1√2πnp(1−p)

exp(− (y−np)2

2np(1−p)

)is the probability density

function of Y .

NOTE: a typical rule of thumb is that this approximation is valid if np ≥ 5 andn(1 − p) ≥ 5.

Connection between binomial and normal

Connection between binomial and normal Connection between binomial and normal

I You find that your GEM2900 lecturer is becoming increasinglydifficult, unreasonable and paranoid as the semester progresses.

I He has just set a continuing assessment for you to do on theIVLE containing 100 multiple choice questions with 4 optionseach.

I As you don’t have time for this kind of thing you decide to simplyguess an answer at random for each question.

I What is the probability that you pass the CA (that is, what is theprobability that you obtain a score of 50 or more correct?)

RepliGo Reader
Highlight
Page 9: lec21-2by2

Connection between binomial and normal

I Let X be your score. Then if you are guessing each question randomlyclearly X ∼ Binomial(100, 0.25). Using the result that for a binomialwith parameters n and p the mean is np and the variance np(1 − p) wehave E(X ) = 25 and SD(X ) =

√300/16.

I Let Y be a normal random variable with the same mean and varianceas X , i.e. Y ∼ N(25, 300/16).

I Then

P(X ≥ 50) ≈ P(Y ≥ 49.5)

= P(

Y − µ

σ≥ 49.5 − µ

σ

)

= P

(Z ≥ 49.5 − 25√

300/16

)

= P(Z ≥ 5.66)

≈ 7.6 × 10−9

Connection between Poisson and normal

Assume X follows a Poisson distribution with parameters λ.

Then we can view X as the sum of n independent randomvariables, each having a Poisson distribution with parameter λ

n .

Hence, if Y is a normally distributed random variable with meanλ and variance λ, then

P(X = x) ≈ P(

x − 12≤ Y ≤ x +

12

)=

∫ x+12

x−12

fY (u)du

≈ fY (x).

where fY (y) = 1√2πλ

exp(− (y−λ)2

)is the probability density

function of Y .

Connection between Poisson and normal