Chapter 12: Discrete Probability Distributions · Chapter 12: Discrete Probability Distributions....

Preview:

Citation preview

ACMS 20340Statistics for Life Sciences

Chapter 12:Discrete Probability Distributions

What about categorical variables?

We’ve studied various distributions of quantitative variables, mostnotably, the Normal distributions.

But what is the appropriate probability model for the count ofsuccessful outcomes of a categorical variable?

We will focus on one distribution in particular, the binomialdistribution.

Some Motivating Examples

I You toss a fair coin ten times.

I How many times does it come up heads?

I What is the probability of it coming up heads exactly threetimes?

I An obstetrician oversees 12 single-birth deliveries on a certainday.

I How many of the deliveries are of girls?

I What is the probability of there being exactly 7 girls in this“batch” of 12?

The Binomial Setting

1. There is a fixed number n of observations.

2. The n observations are independent, which means thatknowing the result of one observation doesn’t change theprobabilities we assign to other observations.

3. Each observation falls into one of two categories, one of whichwe will call “success”, and the other “failure”.

4. The probability p of a success is the same for eachobservation.

The Binomial Distribution

The count X of successes in the binomial setting has the binomialdistribution with parameters n and p.

The parameter n is the number of observations, and p is theprobability of a success on any one observation.

The possible values of X are whole numbers from 0 to n.

An important caveat: Not all counts have a binomial distribution,so we must ensure that we’re in the binomial setting before weconclude that a count has a binomial distribution.

Binomial Distribution Examples

I You toss a fair coin ten times and count the number of Hs.

I n = 10I p = 1/2

I An obstetrician oversees 12 single-birth deliveries on a certainday and counts the number of girls born.

I n = 12I p = 1/2

I You roll a fair die 100 times and count the number ofoccurrence of ‘1’.

I n = 100I p = 1/6

A Non-Example

You select five balls from a barrel containing 50 red balls and 50blue balls, without replacement.

What is the probability of selecting only red balls?

(50

100

)(49

99

)(48

98

)(47

97

)(46

96

)=

1081

38412= 0.028

Why aren’t these counts binomially distributed?

Binomial Probabilities 1

What we’d like is a formula for the probability that a binomialrandom variable takes any value.

Idea: We add probabilities for the different ways of getting exactlythat many successes in n observations.

That is, if X is a binomial random variable, we want a formula forcalculating

P(X = k)

for any k = 0, 1, 2, . . . , n.

Binomial Probabilities 2

Let’s first consider an example.

Each child born to a particular set of parents has probability 0.25of having blood type O.

If these parents have 5 children, what is the probability of exactlytwo of them having blood type O?

The count of children with blood type O is binomially distributed:

I n = 5

I p = 0.25

Let’s use “S” to stand for success (blood type O) and “F” tostand for failure.

Binomial Probabilities 3

Step 1: What is the probability of that just the first and thirdchild give successes? That is,

P(SFSFF ) =?

The probability of a sequence of independent events is the productof the probabilities of each individual event:

P(SFSFF ) = P(S) · P(F ) · P(S) · P(F ) · P(F )

= (0.25)(0.75)(0.25)(0.75)(0.75)

= (0.25)2(0.75)3

Binomial Probabilities 4

Step 2: Observe that any arrangement of 2 S’s and 3 F’s has thissame probability: we always just multiply 0.25 twice and 0.75 threetimes whenever we have 2 S ’s and 3 F ’s.

So the probability that X = 2 is the probability of getting 2 S ’sand 3 F ’s in any arrangement whatsoever:

SSFFF SFSFF SFFSF SFFFS FSSFFFSFSF FSFFS FFSSF FFSFS FFFSS

There are ten such arrangements, each with the same probability,and hence

P(X = 2) = 10(0.25)2(0.75)3 = 0.2637.

The Binomial Coefficient

The number of ways of arranging k successes among nobservations is given by the binomial coefficient(

n

k

)=

n!

k!(n − k)!

for any k = 0, 1, 2, . . . , n.

Recall that the factorial of n, n! is

n! = n · (n − 1) · (n − 2) · . . . · 3 · 2 · 1,

and 0!=1.

The Binomial Coefficient in Action

How many different ways are there to have exactly two successes infive trials?

(5

2

)=

5!

2!3!

=(5)(4)(3)(2)(1)

(2)(1)(3)(2)(1)

=(5)(4)

(2)(1)

=20

2= 10.

The Official Formula for Binomial Probabilitiies

If X has the binomial distribution with n observations andprobability p of success for each observation, then the possiblevalues of X are 0, 1, 2, . . . , n.

If k is any one of these values, then

P(X = k) =

(n

k

)pk(1− p)n−k .

Example

One in ten boxes of Cracker Jacks contains a decoder ring.

What is the probability that no more than one of ten randomlychosen boxes of Cracker Jacks contains a decoder ring?

I n = 10

I p = 0.1

P(X ≤ 1) = P(X = 0) + P(X = 1)

=

(10

0

)(0.1)0(0.9)10 +

(10

1

)(0.1)(0.9)9

=10!

0!10!(1)(0.3487) +

10!

1!9!(0.1)(0.3874)

= (1)(1)(0.3487) + (10)(0.1)(0.3874)

= 0.3487 + 0.3874 = 0.7361

Binomial mean and standard deviation

Q In many repetitions of the binomial setting, with nobservations and the probability of success p, what will be theaverage count of successes?

(In other words, what is the mean of the count variable X?)

A If a count X has the binomial distribution with n observationsand probability p of success, the mean and standard deviationof X are

µ = np

σ =√np(1− p).

Coin Tossing

You toss a fair coin ten times and count the occurrence of Hs.

I n = 10

I p = 1/2

If we repeat the ten trials repeatedly, how many heads shouldoccur on average?

µ = np = (10)(1/2) = 5

And the standard deviation?

σ =√

np(1− p) =√

10(1/2)(1/2) =√

5/2

The Normal Approximation to Binomial Distributions

Suppose that a count X has the binomial distribution with nobservations and probability of success p.

When n is large, the distribution of X is approximately Normal,N(np,

√np(1− p)).

As a rule of thumb, we use the Normal approximation when n is solarge that np ≥ 10 and n(1− p) ≥ 10.

Remember This?

! !

!"#$%&'()#*+,$-'(./#$(0#%1&$/

One Last Example

About 60% of American adults are either overweight or obese.What is the probability that at least 1520 individuals from arandom sample of 2500 adults are overweight or obese?

Given that our sample is random, we can take the 2500 membersof our sample to be independent.

So we’re in the binomial setting:

I n = 2500

I p = 0.6

Using software, we find that

P(X ≥ 1520) = 0.2131.

Let’s Use the Normal Approximation 1

µ = np = (2500)(0.6) = 1500

σ =√

np(1− p) =√

(2500)(0.6)(0.4) = 24.49

The distribution of this binomial random variable is approximatedwell by the Normal distribution N(1500, 24.49)(since np = 1500 ≥ 10 and n(1− p) = 1000 ≥ 10).

2/22/12 11:47 AM

Page 1 of 1http://angel.bfwpub.com/BFW/si.asp?file=http%3A//ebooks.bfwpub.com/psls2e/figures/12_3.html

FIGURE 12.3 Probability distribution for the binomial model n = 2500, p = 0.6, displayed graphically. The height ofeach bar represents the probability for X when it takes a value on the horizontal axis. Notice how the shape of thisbinomial probability distribution closely resembles a Normal curve.

Let’s Use the Normal Approximation 2

P(X ≥ 1520) = P

(X − 1500

24.49≥ 1520− 1500

24.49

)= P(Z ≥ 0.82)

= 1− 0.7939 = 0.2061

The Normal approximation 0.2061 differs from the software result0.2131 by only 0.007.

Recommended