Section 8.1 - Estimating a Proportion with Confidence Objectives: 1.To find a confidence interval graphically 2.Understand a confidence interval as consisting

Section 8.1 - Estimating a Proportion with Confidence

Objectives:

1. To find a confidence interval graphically

2. Understand a confidence interval as consisting of those population proportions for which the result from the sample is reasonably likely

3. To always check the three conditions before constructing a confidence interval

• To construct a confidence interval using the formula

1. To interpret a confidence interval and the meaning of “confidence”

2. To compute the required sample size for a given margin of error


General idea:

Consider the population of the U.S. Suppose you are interested in the proportion of redheads in the population.

Since the proportion of redheads is probably unknown, you will have to estimate it. What should you do?

– Take a sample. (The size will depend on how much time and money you have.)

– Compute the sample proportion. (The Central Limit Theorem tells you that this estimator is unbiased, and has other “desirable” properties.) This is your best guess.

– Are you “sure”? What do you mean by sure? How “sure” do you need to be?

Introduction

A Pew Research Center survey found that 55% of singles ages 18-29 say they aren’t in a committed relationship and are not actively looking. This percentage is based on interviews with 1068 singles. The survey reported a margin of error of 3%.

The researchers also say that they are 95% confident that the error in the percentage (55%) is less than 3% either way. That is, they are 95% confident that if they were to ask all young singles in the U.S., between 52% and 58% would report that they aren’t in a committed relationship and are not actively looking.

What do they mean by this?


Reasonably Likely Events


About 95% of all sample proportions p will fall

within about two standard errors of the population

proportion p, that is, within the interval

p ±1.96p(1- p)

nThe sample proportions in this interval are calledreasonably likely.This rule works well only under the condition that bothnp≥10 and n(1−p)≥10.

Reasonably Likely Events and Rare Events

Reasonably likely events are those in the middle 95% of the distribution of all possible outcomes. The outcomes in the upper 2.5% and lower 2.5% of the distribution are rare events - they happen, but rarely.


y = x 0 1, ,( )normalDensity

1.96 = 1.961.96− = -1.96

0.050.100.150.200.250.300.350.400.45

-3 -2 -1 0 1 2 3x

no data Function Plot

Rare

Upper 2.5%

Rare

Lower 2.5%

Reasonably Likely

Middle 95%

Example: Reasonably Likely Results from Coin Flips


Suppose you flip a fair coin 100 times.

What are the reasonably likely values of the

sample proportion p ?

What numbers of heads are reasonably likely?



Suppose you flip a fair coin 100 times. What are the reasonably

likely values of the sample proportion p ?

Check conditions :

np =(100)(0.50) =50 ≥10; n(1−p) =(100)(0.50) =50 ≥10

95% of all sample proportions p should fall in the interval

p±1.96p(1−p)

n=0.50 ±1.96

(0.5)(0.5)100

=0.50 ±1.96(0.05) =0.50 ±0.10

= 0.4, 0.6[ ]



Suppose you flip a fair coin 100 times. What numbers of

heads are reasonably likely?

In about 95% of the samples, the number of successes xwill be in the interval

np ±1.96 np(1−p) =50 ±1.96 (100)(0.5)(0.5) =50 ±1.96(5) =50 ±10

= 40, 60[ ]

Introduction, continued.


The Pew Research Center doesn’t know the value of p

(the percentage of young singles not in a relationship).

For each possible value of p, Pew can compute how close

to p most sample proportions will be.

By knowing the variability expected in random samples, Pew

can estimate how close p should be to p.

The Meaning of a Confidence Interval

Suppose you take repeated random samples of size 40 from a population with 60% successes. What proportion of successes would be reasonably likely in your sample?


np =(40)(0.60) =24 ≥10; n(1−p) =(40)(0.40) =16 ≥10μ p =p=0.60

σ p =p(1−p)

n=

(0.60)(0.40)40

=0.077

Reasonably likely = Middle 95% = μ p ±1.96 ⋅σ p

=0.60 ±1.96(0.077) =0.60 ±0.151

= 0.449, 0.751[ ]



Reasonably likely sample proportions for n = 40

p (1 - p) σ ME = 1.96 σ CI = p ± ME

0.9 0.1 0.047 0.092 [0.808, 0.992]

0.8 0.2 0.063 0.123 [0.677, 0.923]

0.7 0.3 0.072 0.141 [0.559, 0.841]

0.6 0.4 0.077 0.151 [0.449, 0.751]

0.5 0.5 0.079 0.155 [0.345, 0.655]

0.4 0.6 0.077 0.151 [0.249, 0.551]

0.3 0.7 0.072 0.141 [0.159, 0.441]

0.2 0.8 0.063 0.123 [0.077, 0.323]

0.1 0.9 0.047 0.092 [0.008, 0.192]


Reasonably likely sample proportions for samples of size n = 40


1

0.8

0.6

0.4

0.2

-0.2

y

0.2 0.4 0.6 0.8 1

ProportionofSuccessesin thePopulation

Proportion of Successes in the Sample


Suppose that in an experiment, 75%, or 30 out of the 40 trials, resulted in success.

Is it plausible that the true proportion is 50%?

Is it plausible that the true proportion is 80%

What values are plausible for the population proportion?



Plausible population percentages are p = 0.6, p = 0.7, p = 0.8


1

0.8

0.6

0.4

0.2

-0.2

y

0.2 0.4 0.6 0.8 1




Plausible population percentages are p = 0.6, p = 0.7, p = 0.8.

The sample proportion 0.75 (represented by the red vertical line) intersects the reasonably likely range of values for p = 0.80 (from 0.677 to 0.923, represented by the orange line segment).

If the population proportion is 0.80, you are reasonably likely to get 30 successes in 40 trials, or 75%.

The sample proportion 0.75 (represented by the red vertical line) does not intersect the reasonably likely range of values for p = 0.50 (from 0.345 to 0.655, represented by the orange line segment).

If the population proportion is 0.50, you are not likely to get 30 successes in 40 trials, or 75%.



Plausible population percentages are p = 0.6, p = 0.7, p = 0.8


1

0.8

0.6

0.4

0.2

-0.2

y

0.2 0.4 0.6 0.8 1




Plausible population percentages are from about p = 0.6 to about p = 0.85.

These plausible percentages for the population proportion are called the 95% confidence interval for p.


1

0.8

0.6

0.4

0.2

-0.2

y

0.2 0.4 0.6 0.8 1




A 95% confidence interval consists of those population

proportions p for which the sample proportion p is

reasonably likely.

Note that the population proportion p is the unknown

parameter.

A Confidence Interval for a Population Proportion


Recall our formula for the "reasonably likely" interval, which

represents the middle 95% of the sampling distribution:

μp ± 1.96 ⋅σ p = p ± 1.96 ⋅p(1 − p)

n

Where did the "1 .96" come from?

It is the z - score corresponding to a probability of 0 .9750 .



The population proportion p is an unknown parameter.

In fact, estimating p is the whole point of what we are doing.

The idea is to estimate p by a range (interval) of values instead

of by a single value (point) p.

Let's see if we can modify the formula p ±1.96 ⋅p(1−p)

n



Since we don't know p, let's use the next best thing, p :

p ±1.96 ⋅p(1−p)

n

Instead of using 1.96, which is the z - score that corresponds

to the middle 95%, let's just put in a variable, z* , which will depend on how confident we want to be.

p±z* ⋅p(1−p)

n


The formula for a confidence interval for the proportion of successes

p in the population is based on three components : the sample

proportion p, the standard error σ p , and the confidence level z*

p±z* ⋅p(1−p)

nHere n is the sample size and p is the proportion of successes in the sample.

The value of z* depends on how confident you want to be that the confidenceinterval will contain p. 90% CI ⇒ z* =1.645

95% CI ⇒ z* =1.96

99% CI ⇒ z* =2.576

Where do these values come from?


Once again, what is it that we are trying to do?

We wish to find out the value of an unknown population parameter - the proportion of successes.

The best estimate of the value of the population proportion, based on the Central Limit Theorem, is to take a random sample and compute the sample proportion. (Bigger samples are better, etc.)

In some applications, it is useful to consider a range or interval of values, instead of just one. Depending on how “confident” we want or need to be, we can construct a confidence interval - a range of likely values for the population proportion.




A confidence interval for the proportion of successes pin the population is given by the formula

p ±z* ⋅p(1−p)

n

This confidence interval is reasonably accurate for(1) Simple random samples from binomial populations.(2) np≥10 and n(1- p)≥10(3) Populations that are at least 10 times the size of the sample. N ≥10⋅n



For the confidence interval p ±z* ⋅p(1−p)

n,

the expression ME =z* ⋅p(1−p)

n is called the Margin of Error

The margin of error is one- half the width of the confidence interval.

The point estimate p is located in the center of the confidence interval.

Example: Safety Violations

Suppose you have a random sample of 40 buses from a large city and find that 24 buses have a safety violation. Find the 90% confidence interval for the proportion of all buses that have a safety violation.


90% confidence interval :

p =2440

=0.60

z90%* =1.645

p±z* ⋅p(1−p)

n=0.60 ±1.645

(0.60)(0.40)40

=0.60 ±0.127

= 0.473, 0.727[ ]

Example: Safety Violations

Suppose you have a random sample of 40 buses from a large city and find that 24 buses have a safety violation. Find the 90% confidence interval for the proportion of all buses that have a safety violation.

Using the TI-83/84:

STAT TESTS 1-PropZInt ENTER

1-PropZInt

x: 24

n: 40

C-Level: .90

Calculate [ENTER]


1-PropZInt

(.47259, .72741)

p = .6

n = 40

The Capture Rate

Sometimes a confidence interval “captures” the true population proportion and sometimes it doesn’t.

The capture rate of a method of constructing confidence intervals is the proportion of confidence intervals that contain the population parameter (proportion) in repeated usage of the method.

If a polling company uses 95% confidence intervals in a large number of different surveys, the population proportion p should be in 95% of them.



Correct statement :

"I am 95% confident that the interval p ±z* ⋅p(1−p)

ncontains the true value of the population proportion p."

Incorrect statement :"I am 95% confident that the true value of the population

proportion p will fall in the interval p±z* ⋅p(1−p)

n"

Margin of Error and Sample Size


95% confidence intervals for large sample sizes are

narrower than those for small sample sizes :

p ±z* ⋅p(1−p)

nn increases ⇒ SE decreases ⇒ ME decreases ⇒ CI decreases


Example: The Effect of Sample Size on the Margin of Error


Suppose you take a random sample and get p = 0.7

(a) If n = 100, find the 95% confidence interval for p

and state the margin of error

p ±z* ⋅p(1−p)

n=0.70 ±1.96

(0.7)(0.3)100

=0.70 ±0.0898

= 0.6102, 0.7898[ ]The margin of error is 0.0898


Example: The Effect of Sample Size on the Margin of Error


Suppose you take a random sample and get p = 0.7

(b) What happens to the confidence interval and margin of

error if you quadruple the sample size, to n = 400?

p ±z* ⋅p(1−p)

n=0.70 ±1.96

(0.7)(0.3)400

=0.70 ±0.0449

= 0.6551, 0.7449[ ]The margin of error is 0.0449

What Sample Size Should You Use?


To find a formula for the sample size, take the formula

for the margin of error and solve for the sample size n:

ME =z* ⋅p(1−p)

n

ME2 = z*( )2 p(1−p)

n⎛⎝⎜

⎞⎠⎟

n= z*( )2 p(1−p)

ME2⎛⎝⎜

⎞⎠⎟



To use the formula for the sample size, you need to know

(1) what margin of error is acceptable

(2) the confidence level (use 95% unless otherwise specified)

(3) the value of p (use 0.5 if no other information is available)

n = z*( )2 p(1−p)

ME2⎛⎝⎜

⎞⎠⎟


Example: What sample size should you use for a survey if you want the margin of error to be at most 3% with 95% confidence but you have no estimate of p?


n = z*( )2 p(1−p)

ME2⎛⎝⎜

⎞⎠⎟

=1.962 0.5 ⋅0.50.032

⎛⎝⎜

⎞⎠⎟

=1067.111≈1068

Documents

Section 8.1 - Estimating a Proportion with Confidence Objectives: 1.To find a confidence interval graphically 2.Understand a confidence interval as consisting