Chapter 9 Confidence Intervals. Suppose we wanted to estimate the proportion of blue candies in a VERY large bowl. How might we go about estimating this

Chapter 9

Confidence Intervals

Suppose we wanted to estimate the proportion of blue candies in a VERY large bowl. How might we go about estimating this proportion?

We could take a sample of candies and compute the proportion of blue

candies in our sample.

We would have a sample proportion or a statistic – a

single value for the estimate.

http://www.candywarehouse.com/mms.html

Point Estimate

• A single number (a statistic) based on sample data that is used to estimate a population characteristic

• But not always to the population characteristic due to sampling variation

“point” refers to the single value on a

number line.

Different samples may produce

different statistics.

Population characteristic

The paper “U.S. College Students’ Internet Use: Race, Gender and Digital Divides” (Journal of Computer-Mediated Communication, 2009) reports the results of 7421 students at 40 colleges and universities. (The sample was selected in such a way that it is representative of the population of college students.) The authors want to estimate the proportion (p) of college students who spend more than 3 hours a day on the Internet.

2998 out of 7421 students reported using the Internet more than 3 hours a day.

This is a point estimate for the population proportion of college

students who spend more than 3 hours a day on the Internet.

p = 2998/7421 = .404

The paper “The Impact of Internet and Television Use on the Reading Habits and Practices of College Students” (Journal of Adolescence and Adult Literacy, 2009) investigates the reading habits of college students. The following observations represent the number of hours spent on academic reading in 1 week by 20 college students.

1.7 3.8 4.7 9.6 11.7

12.3

12.3

12.4

12.6

13.4

14.1

14.2

15.8

15.9

18.7

19.4

21.2

21.9

23.3

28.2

The dotplot suggest this data is approximately symmetrical.

If a point estimate of , the mean academic reading time per week for all

college students, is desired, an obvious choice of a statistic for

estimating is the sample mean x ̅ .However, there are other possibilities –

a trimmed mean or the sample median.

College Reading Continued . . .

75.132

1.144.13 median sample

1.7 3.8 4.7 9.6 11.7

12.3

12.3

12.4

12.6

13.4

14.1

14.2

15.8

15.9

18.7

19.4

21.2

21.9

23.3

28.2

36.1420

2.287 mean sample x

The mean of the middle

16 observations

.39.14

162.230

mean trimmed 10%

So which of these point estimates should we

use?

Choosing a Statistic for Computing an Estimate

• Choose a statistic that is unbiased (accurate)

A statistic whose mean value is equal to the value of the population

characteristic being estimated is said to be an unbiased statistic.

Unbiased, since the distribution is centered at the true value

Unbiased, since the distribution is centered at the true value

Biased, since the distribution is NOT centered at the true value

Choosing a Statistic for Computing an Estimate

• Choose a statistic that is unbiased (accurate)

• Choose a statistic with the smallest standard deviation

Unbiased, but has a larger

standard deviation so it is not as precise.

Unbiased, but has a smaller standard

deviation so it is more precise.

If the population distribution is normal, then x ̅ has a smaller

standard deviation than any other unbiased statistic for estimating .

1. When the population distribution is normal, then x̅� has a smaller standard deviation than any other unbiased statistic.

2. When the population distribution is symmetric with heavy tails compared to the normal curve, a trimmed mean is a better statistic than x̅� for estimating .

3. To estimate the population variance, 2, use the sample variance:

1

2

2

n

xxs

Suppose we wanted to estimate the proportion of blue candies in a VERY large bowl. We could take a sample of candies and compute the proportion of blue candies in our sample.

How much confidence do you have in the point

estimate?

Would you have more confidence if

your answer were an interval?

http://www.candywarehouse.com/mms.html

Confidence intervals

A confidence interval (CI) for a population characteristic is an interval of plausible values for the characteristic.

It is constructed so that, with a chosen degree of confidence, the actual value of the characteristic will be between the lower and upper endpoints of the interval.

The primary goal of a confidence interval is to estimate an unknown

population characteristic.

Rate your confidence0 – 100%

How confident (%) are you that you can ...

Guess my age within 10 years?

. . . within 5 years?

. . . within 1 year?

What does it mean to be within 10 years?

What happened to your level of confidence as the interval

became smaller?

Confidence level

The confidence level associated with a confidence interval estimate is the success rate of the method used to construct the interval.

If this method was used to generate an interval estimate over and over again from different samples, in the long run 95% of the resulting intervals would include the actual value of the characteristic being estimated.

Our confidence is in the method – NOT in any one particular interval!

The most common confidence levels are 90%, 95%, and 99% confidence.

Recall the General Properties for Sampling Distributions of p

1.

2.

pp

ˆ

npp

p

)1(ˆ

As long as the sample size is less than 10% of the population

3. As long as n is large (np > 10 and n (1-p) > 10) the sampling distribution of p is approximately normal.

These are the conditions that must be true in order to calculate a large-sample

confidence interval for p

Let’s develop the equation for the large-sample confidence interval.

To begin, we will use a 95% confidence level. Use the table of standard normal curve areas to determine the value of z* such that a central area of .95 falls between –z* and z*.

-1.96 1.96

Central Area = .95

Upper tail area = .025

Lower tail area = .025

95% of these values are within 1.96 of the

mean.

We can generalize this to normal distributions other than

the standard normal distribution –

About 95% of the values are within 1.96 standard deviations

of the mean

0

For large random samples, the sampling distribution of p is approximately normal. So about 95% of the possible p will fall within

pn

pp within

)1(96.1

Developing a Confidence Interval Continued . . .

If p is within of p,

this means the interval

will capture p.

)1(

96.1n

pp

npp

pn

ppp

)1(96.1ˆ to

)1(96.1ˆ

And this will happen for 95% of all possible samples!

Developing a Confidence Interval Continued . . .

npp )1(

96.1

npp )1(

96.1

Here is the mean of the sampling distribution

p

Approximate sampling distribution of p

This line represents 1.96 standard deviations below the mean.

This line represents 1.96 standard deviations above the mean.

Suppose we get this p

p

Create an interval around p

Notice that the length of each half of the interval equals

npp )1(

96.1

This p fell within 1.96 standard deviations of the mean AND its confidence interval “captures” p.

Suppose we get this p and create an interval

p

This p fell within 1.96 standard deviations of the mean AND its confidence interval “captures” p.

Suppose we get this p and create an interval

p

This p doesn’t fall within 1.96 standard deviations of the mean AND its confidence interval does NOT “capture” p.

Using this method of calculation,

the confidence interval will

not capture p 5% of the

time.When n is large, a 95% confidence interval for p is

npp

p)1(

96.1ˆ

The diagram to the right is 100 confidence intervals for p computed from 100 different random samples.

Note that the ones with asterisks do not capture p.

If we were to compute 100 more confidence intervals for p from 100 different random samples, would we get the same results?

The general formula for a confidence interval for a population proportion p when

• if the sample is selected without replacement, the sample size is small relative to the population size (at most 10% of the population)

The Large-Sample Confidence Interval for p

Now let’s look at a more general formula.

• p is the sample proportion from a random sample

• the sample size n is large (np > 10 and n(1-p) > 10), and


The general formula for a confidence interval for a population proportion p . . . is

npp

zp)ˆ1(ˆ

value) critical (ˆ

point estimate Estimate of the

standard deviation of p or standard error

The standard error of a statistic is the estimated standard deviation of the statistic.


The general formula for a confidence interval for a population proportion p . . . is

npp

zp)ˆ1(ˆ

value) critical (ˆ

This is called the bound on the error

estimation.

The 95% confidence interval is based on the fact that, for approximately 95% of all random samples, p is within the bound on

error estimation of p.

This is also called the margin of error.

The article “How Well Are U.S. Colleges Run?” (USA Today, February

17, 2010) describes a survey of 1031 adult Americans. The survey was carried out by the National Center for Public Policy and the sample was selected in a way that makes it reasonable to regard the sample as representative of adult Americans. Of those surveyed, 567 indicated that they believe a college education is essential for success.

What is a 95% confidence interval for the population proportion of adult Americans who believe that a college education is essential for success?

Before computing the confidence

interval, we need to verify the conditions.

The point estimate is

55.1031567ˆ p

College Education Continued . . .

What is a 95% confidence interval for the population proportion of adult

Americans who believe that a college education is essential for success?

Conditions:

2) The sample size of n = 1031 is much smaller than 10% of the population size (adult Americans).

3) The sample was selected in a way designed to produce a representative sample. So we can regard the sample as a random sample from the population.

1) np = 1031(.55) = 567 and n(1-p) = 1031(.45) = 364, since both of these are greater than 10, the sample size is large enough to proceed.

All our conditions are verified so it is safe to

proceed with the calculation of the

confidence interval.

College Education Continued . . .

What is a 95% confidence interval for the population proportion of adult

Americans who believe that a college education is essential for success?

Calculation:

Conclusion:

We are 95% confident that the population proportion of adult Americans who believe that a college education is essential for success is between 52.1% and 57.9%

npp

zp)ˆ1(ˆ

value) critical (ˆ

)579,.521(.1031

)45(.55.96.155.

What does this interval mean in the

context of this problem?

College Education Revisited . . .

A 95% confidence interval for the population proportion of adult Americans who believe

that a college education is essential for success is:

Compute a 90% confidence interval for this proportion.

Compute a 99% confidence interval for this proportion.

)579,.521(.1031

)45(.55.96.155.

)575,.524(.1031

)45(.55.645.155.

)590,.510(.1031

)45(.55.58.255.

What do you notice about

the relationship between the confidence level of an

interval and the width of the

interval?

Recall the “Rate your

Confidence” Activity

Choosing a Sample Size

The bound on error estimation for a 95% confidence interval is

If we solve this for n . . .

npp

B)1(

96.1

296.1

1

Bppn

Before collecting any data, an investigator may wish to

determine a sample size needed to achieve a certain bound on

error estimation.

What value should be used for the unknown value p?

Sometimes, it is feasible to perform a preliminary study to estimate the

value for p.In other cases, prior knowledge may suggest a reasonable estimate for p.If there is no prior knowledge and a

preliminary study is not feasible, then the conservative estimate for p

is 0.5.

Why is the conservative estimate for p = 0.5?

.1(.9) = .09

.2(.8) = .16

.3(.7) = .21

.4(.6) = .24

.5(.5) = .25

By using .5 for p, we are using the largest value for p(1 – p) in our calculations.

In spite of the potential safety hazards, some people would like to have an internet connection in their car. Determine the sample size required to estimate the proportion of adult Americans who would like an internet connection in their car to within 0.03 with 95% confidence. 296.1

)1(

Bppn

What value should be used for p?

This is the value for the bound on error estimate

B.

people 1068

111.1067

03.96.1

25.2

n

n

n

Always round the sample size up to

the next whole number.

Now let’s look at confidence intervals to estimate the mean of a population.

nvalue) critical(

zx

Confidence intervals for when is known

The general formula for a confidence interval for a population mean when . . .

1) x is the sample mean from a random sample,2) the sample size n is large (n > 30), and3) , the population standard deviation, is

known

is These are the properties of the

sampling distribution of x.

Is this typically known?

This confidence interval is appropriate even when n is small, as long as it is

reasonable to think that the population distribution is normal in

shape.

Point estimate

Standard deviation of the statistic

Bound on error of estimation

Cosmic radiation levels rise with increasing altitude, promoting researchers to consider how pilots and flight crews might be affected by increased exposure to cosmic radiation. A study reported a mean annual cosmic radiation dose of 219 mrems for a sample of flight personnel of Xinjiang Airlines. Suppose this mean is based on a random sample of 100 flight crew members. Let = 35 mrems.

Calculate and interpret a 95% confidence interval for the actual mean annual cosmic radiation exposure for Xinjiang flight crew members.

1)Data is from a random sample of crew members

2)Sample size n is large (n > 30)

3) is known

First, verify that the conditions are met.

Cosmic Radiation Continued . . .

Let x = 219 mrems

n = 100 flight crew members

= 35 mrems.

Calculate and interpret a 95% confidence interval for the actual mean annual cosmic radiation exposure for Xinjiang flight crew members.

What does this mean in context?)86.225,14.212(

100

3596.1219

)value critical(

nzx

We are 95% confident that the actual mean annual cosmic radiation exposure for Xinjiang flight crew members is between 212.14 mrems and 225.86 mrems.

What would happen to the width of this interval if the confidence level was 90%

instead of 95%?

Confidence intervals for when is unknown

When is unknown, we use the sample standard deviation s to estimate . In

place of z-scores, we must use the following to standardize the values:

nsx

t

The use of the value of s introduces extra variability. Therefore the distribution of t values has more

variability than a standard normal curve.

Important Properties of t Distributions

1) The t distribution corresponding to any particular number of degrees of freedom is bell shaped and centered at zero (just like the standard normal (z) distribution).

2) Each t distribution is more spread out than the standard normal distribution.

z curve

t curve for 2 df

Why is the z curve taller than the t

curve for 2 df?

0

t distributions are described by degrees of freedom (df).

Important Properties of t Distributions Continued . . .

3) As the number of degrees of freedom increases, the spread of the corresponding t distribution decreases.

t curve for 8 df

t curve for 2 df

0

Important Properties of t Distributions Continued . . .

3) As the number of degrees of freedom increases, the spread of the corresponding t distribution decreases.

4) As the number of degrees of freedom increases, the corresponding sequence of t distributions approaches the standard normal distribution.

z curve

t curve for 2 dft curve for 5 df

For what df would the t distribution be approximately

the same as a standard normal

distribution?

0

Confidence intervals for when is unknownThe general formula for a confidence interval

for a population mean based on a sample of size n when . . .

1) x is the sample mean from a random sample,2) the population distribution is normal, or the

sample size n is large (n > 30), and3) , the population standard deviation, is

unknown

is

Where the t critical value is based on df = n - 1.

nvalue) critical(

stx

This confidence interval is appropriate for small n ONLY when the

population distribution is (at least approximately) normal.

t critical values are found in

Table 3

The article “Chimps Aren’t Charitable” (Newsday, November 2, 2005) summarized the results of a research study published in the journal Nature. In this study, chimpanzees learned to use an apparatus that dispersed food when either of two ropes was pulled. When one of the ropes was pulled, only the chimp controlling the apparatus received food. When the other rope was pulled, food was dispensed both to the chimp controlling the apparatus and also a chimp in the adjoining cage. The accompanying data represent the number of times out of 36 trials that each of seven chimps chose the option that would provide food to both chimps (charitable response).

23 22 21 24 19 2020

Compute a 99% confidence interval for the mean number of charitable responses for the population of all chimps.

First verify that the conditions for

a t-interval are met.

The plot is reasonable straight, so it seems plausible that the population distribution of number of charitable responses is approximately normal.

-2

-1

1

2

20 22 24Number of Charitable ResponsesN

orm

al Sco

res

Chimps Continued . . .

23 22 21 24 19 20 20

Since n is small, we need to verify if it is plausible that this sample is from a

population that is approximately normal.

Let’s use a normal probability plot.

Let’s suppose it is reasonable to regard this sample of

seven chimps as representative of the chimp

population.

Chimps Continued . . .

23 22 21 24 19 20 20

x = 21.29 and s = 1.80 df = 7 – 1 = 6

)81.23,77.18(7

80.171.329.21

value) critical (

ns

tx

We are 99% confident that the mean number of charitable responses for the population of all chimps is between 18.77 and 23.81.

The bound on error of estimation associated with a 95% confidence interval is

Solve this for n:

Choosing a Sample Size

n

B 96.1

296.1

B

n

We can use this to find the

necessary sample size

for a particular bound on error of

estimation.

This requires to be known – which is rarely

the case!

When is unknown, a preliminary study can be performed to estimate

ORmake an educated guess of the value

of . A rough estimate for (used with

distributions that are not too skewed) is the range divided by 4.

The financial aid office wishes to estimate the mean cost of textbooks per quarter for students at a particular university. For the estimate to be useful, it should be within $20 of the true population mean. How large a sample should be used to be 95% confident of achieving this level of accuracy?

The financial aid office is believes that the amount spent on books varies with most values between $150 to $550.To estimate :

100$4

150550

The financial aid office wishes to estimate the mean cost of textbooks per quarter for students at a particular university. For the estimate to be useful, it should be within $20 of the true population mean. How large a sample should be used to be 95% confident of achieving this level of accuracy?

97

04.9620

10096.12

n

nAlways round

sample size up to the

next whole number!

Documents

Chapter 9 Confidence Intervals. Suppose we wanted to estimate the proportion of blue candies in a VERY large bowl. How might we go about estimating this