Lectures prepared by: Elchanan Mossel Yelena Shvets Introduction to probability Stat 134 FAll 2005...

Preview:

Citation preview

Lectures prepared by:Elchanan MosselYelena Shvets

Introduction to probability

Stat 134 FAll 2005

Berkeley

Follows Jim Pitman’s book:

ProbabilitySection 2.2

Binomial Distribution:Mean

Recall: the Mean

of a binomial(n,p) distribution

is given by:

= #Trials £ P(success) = n p

• The Mean = Mode (most likely value).

• The Mean = “Center of gravity” of the distribution.

Standard Deviation

The Standard Deviation (SD) of a distribution, denoted measures the spread of the distribution.

Def: The Standard Deviation (SD) of the binomial(n,p) distribution is given by:

Binomial DistributionExample 1:

• Let’s compare Bin(100,1/4) to Bin(100,1/2).

• 1 = 25, 2 = 50

• 1 = 4.33, 2 = 5

• We expect Bin(100,1/4) to be less spread than Bin(100,1/2).

• Indeed, you are more likely to guess the value of Bin(100,1/4) distribution than of Bin(100,1/2) since:For p=1/2 : P(50) ¼ 0.079589237;

For p=1/4 : P(25) ¼ 0.092132;

Histograms for binomial(100, 1/4) & binomial(100, 1/2) ;

.00

.02

.04

.06

.08

.10

.12

0 10 20 30 40 50 60 70 80 90 100

Binomial DistributionExample 2:

• Let’s compare Bin(50,1/2) to Bin(100,1/2).

• 1 = 25, 2 = 50

• 1 = 3.54, 2 = 5

• We expect Bin(50,1/2) to be less spread than Bin(100,1/2).

• Indeed, you are more likely to guess the value of Bin(50,1/2) distribution than of Bin(100,1/2) since:

For n=100 : P(50) ¼ 0.079589237;

For n=50 : P(25) ¼ 0.112556;

Histograms for binomial(50, 1/2) & binomial(100,1/2)

.00

.02

.04

.06

.08

.10

.12

0 10 20 30 40 50 60 70 80 90 100

, and the normal curve

•We will see today that the •Mean () and the •SD () give a very good summary of the binom(n,p) distribution via •The Normal Curve with parameters and .

binomial(n,½); n=50,100,250,500

.00

.02

.04

.06

.08

.10

.12

0 50 100 150 200 250 300

25, 3.54

50, 5

125, 7.91

250, 11.2

.0

.1

.2

.3

.4

.5

Normal Curve

2

2

(x- )-

21y e

2

x

y

.0

.1

.2

.3

.4

.5

The Normal Distribution

a xb

2

2b

2

a

1(a,b)= dx

2

( - )x

P e

2

2

(x- )a2

-

1P(- ,a) e dx

2

.0

.1

.2

.3

.4

.5

.0

.1

.2

.3

.4

.5

.0

.1

.2

.3

.4

.5

The Normal Distribution

2

2

(x- )b -2

a

1P(a,b) = e dx

2

a xb

2

2

(x- )b -21

e dx2

2(x- )a -

221e dx

2

=

Standard Normal

2 x2

1(x)=

2e

Standard Normal Density Function:

Corresponds to = 0 and =1

Standard Normal

For the normal (,) distribution: P(a,b) = (b-) - (a-);

z

-

(z)= (x)dx

Standard Normal Cumulative Distribution Function:

Standard Normal

.0

.1

.2

.3

.4

.5

-5 -4 -3 -2 -1 0 1 2 3 4 5

.0

.1

.2

.3

.4

.5

.6

.7

.8

.9

1.0

-5 -4 -3 -2 -1 0 1 2 3 4 5

2 x2

1(x)=

2e

z

-

(z)= (x)dx

Standard Normal

.0

.1

.2

.3

.4

.5

-5 -4 -3 -2 -1 0 1 2 3 4 5

.0

.1

.2

.3

.4

.5

.6

.7

.8

.9

1.0

-5 -4 -3 -2 -1 0 1 2 3 4 5

2 x2

1(x)=

2e

1(0)=

2

(z)

1 (z)

(-z)

z-z

Standard Normal

For the normal (,) distribution: P(a,b) = (b-) - (a-);

In order to prove this, suffices to show:

P(-1,a) = (a-);

Standard Normal Cumulative Distribution Function:

2

2

(x- )a -2

-

1e dx

2

Change of variables

xs

dxds

ax a s

a

(s)ds

a( )

2a

s21

e ds 2

2

2

( )a -2

-

x-1e

2dx

Properties of :

(0) = 1/2;

(-z) = 1 - (z);

(-1) = 0;

(1)= 1.

does not have a closed form formula!

Normal Approximation of a binomial

For n independent trials with success probability p:

where:

binomial(n,½); n=50,100,250,500

.00

.02

.04

.06

.08

.10

.12

0 50 100 150 200 250 300

25, 3.54

50, 5

125, 7.91

250, 11.2

Normal(,);

25, 3.54

50, 5

125, 7.91

250, 11.2

.00

.02

.04

.06

.08

.10

.12

0 50 100 150 200 250 300

Normal Approximation of a binomial

For n independent trials with success probability p:

•The 0.5 correction is called the “continuity correction”

•It is important when is small or when a and b are close.

Normal Approximation

Question: Find P(H>40) in 100 tosses.

.00

.02

.04

.06

.08

.10

.12

Normal Approximation to bin(100,1/2).

2

2

(x-50)

2*51

2 5e

P(#H 40)

((40-50)/5) =1-(-2)

= (2) ¼ 0.9772

Exact = 0.971556

.00

.02

.04

.06

.08

.10

.12

What do we get With continuity correction?

When does the Normal Approximation fail?

.00

.10

.20

.30

.40

.50

.60

.70

.80

.90

1.00

0 1

bin(1,1/2)

=0.5,=0.5

N(0.5,0.5)

0.

0.2

0.4

0.6

0.8

1.

0 1 2 3 4 5 6 7 8 9 10

When does the NA fail?bin(100,1/100)

=1, ¼ 1

N(1,1)

Rule of Thumb

Normal works better:•The larger is.•The closer p is to ½.

Fluctuation in the number of successes.

From the normal approximation it follows that:

P(- to + successes in n trials) ¼ 68%

P(-2 to +2 successes in n trials) ¼ 95%

P(-3 to +3 successes in n trials) ¼ 99.7%

P(-4 to +4 successes in n trials) ¼ 99.99%

Fluctuation in the proportion of successes.Typical size of fluctuation in the number of successes

is:

np(1 p)

p(1 p)n n

Typical size of fluctuation in the proportion of successes is:

Square Root Law

Let n be a large number of independent trials with probability of success p on each.

The number of successes will, with high probability, lie in an interval, centered on the mean np, with a width a moderate multiple of .

The proportion of successes, will lie in a small interval centered on p, with the width a moderate multiple of 1/ .

n

n

Law of large numbers

Let n be a number of independent trials, with probability p of success on each.

For each > 0;

P(|#successes/n – p|< ) ! 1, as n ! 1

Confidence intervals

Suppose we observe the results of n trials with an probability of success p.

#successesˆThe observed f requency of successes

unk n

p=

now

.n

Conf Intervals

The Normal Curve Approximation says that f or any

fi xed p and n large enough, there is a 99.99% chance

ˆ that the observed f requency p will diff er f rom p by

p(1-p) less than 4 .

n

I t's easy to see that p(p(1-p)1 2

1-p) , so 4 .2 n n

Conf Intervals2 2ˆ ˆ (p ,p ) n n

is called a 99.99% confidence interval.

binomial(n,½); n=50,100,250,500

.00

.02

.04

.06

.08

.10

.12

0 50 100 150 200 250 300

25, 3.54

50, 5

125, 7.91

250, 11.2

Normal(,);

25, 3.54

50, 5

125, 7.91

250, 11.2

.00

.02

.04

.06

.08

.10

.12

0 50 100 150 200 250 300

.00

.02

.04

.06

.08

.10

.12

0 10 20 30 40 50 60 70 80 90 100

.00

.02

.04

.06

.08

.10

.12

0 10 20 30 40 50 60 70 80 90 100

bin(50,1/2)

= 25, = 3.535534

bin(50,1/2) + 25

50, =3.535534

Normal Approximation

2 (x- )221

e2

bin(100,1/2)

50, =5

(bin(50,1/2) + 25)* 5/3.535534

50, = 5

50, =5

Notre Dame de Rheims

Quote

Bientôt je pus montrer quelques esquisses. Personnen'y comprit rien. Même ceux qui furent favorables à ma perception des vérités que je voulais ensuite graverdans le temple, me félicitèrent de les avoirdécouvertes au « microscope », quand je m'étais aucontraire servi d'un télescope pour apercevoir deschoses, très petites en effet, mais parce qu'elles étaient situées à une grande distance […]. Là où jecherchais les grandes lois, on m'appelait fouilleur dedétails. (TR, p.346)

Recommended