Download ppt - Sociology 5811: Lecture 8: CLT Applications: Confidence Intervals, Examples Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Sociology 5811:Lecture 8: CLT Applications: Confidence Intervals, Examples

Copyright © 2005 by Evan Schofer

Do not copy or distribute without permission

Announcements

• Problem Set 3 handed out• On course website

Review: Sampling Distributions

• Q: What is the sampling distribution of the mean?

• Answer: Sampling Distribution: The distribution of estimates created by taking all possible unique samples (of a fixed size) from a population

• Q: What is the Standard Error?

• Answer: The standard deviation of the sampling distribution

• Q: What does the Standard Error tell you?

• Answer: How “dispersed” estimates will be around the true parameter value

Review: Central Limit Theorem

• Q: What does the CLT mean in plain language?

1. As N grows large, the sampling distribution of the mean approaches normality

YY μμ 2.

NY

Y

σσ 3.

Central Limit Theorem: Visually

Ys

YμYσ

Implications of the C.L.T• Visually: Suppose we observe mu-hat = 16

16μ̂ μ

16μ̂ μ

16μ̂ μ

16μ̂ μ

But, mu-hat always falls within the

sampling distribution

Sampling distribution

There are many

possible locations

of

Implications of the C.L.T

• What is the relation between the Standard Error and the size of our sample (N)?

• Answer: It is an inverse relationship.• The standard deviation of the sampling distribution shrinks

as N gets larger

• Formula:

NY

Y

σσ

• Conclusion: Estimates of the mean based on larger samples tend to cluster closer around the true population mean.

Implications of the CLT

• The width of the sampling distribution is an inverse function of N (sample size)– The distribution of mean estimates based on N = 10

will be more dispersed. Mean estimates based on N = 50 will cluster closer to .

μ̂μ

μ̂μ

Smaller sample size Larger sample size

Confidence Intervals

• Benefits of knowing the width of the sampling distribution:

• 1. You can figure out the general range of error that a given point estimate might miss by

• Based on the range around the true mean that the estimates will fall

• 2. And, this defines the range around an estimate that is likely to hold the population mean

• A “confidence interval”

• Note: These only work if N is large!

Confidence Interval

• Confidence Interval: “A range of values around a point estimate that makes it possible to state the probability that an interval contains the population parameter between its lower and upper bounds.” (Bohrnstedt & Knoke p. 90)

• It involves a range and a probability

• Examples: • We are 95% confident that the mean number of CDs owned

by grad students is between 20 and 45

• We are 50% confident the mean rainfall this year will be between 12 and 22 inches.

Confidence Interval

• Visually: It is probable that falls near mu-hat16μ̂

μ

μ μ

Probable values of

Range where is unlikely to be

Q: Can be this far from mu-hat?

Answer: Yes, but it is very improbable

Confidence Interval

• To figure out the range in of “error” in our mean estimate, we need to know the width of the sampling distribution

• The Standard Error! (S.D. of the sampling dist of the mean)

• The Central Limit Theorem provides a formula:

NY

Y

σσ

• Problem: We do not know the exact value of sigma-sub-Y, the population standard deviation!

Confidence Interval

• Question: How do we calculate the standard error if we don’t know the population S.D.?

• Answer: We estimate it using the information we have:

• Where N is the sample size and s-sub-Y is the sample standard deviation.

NY

Y

sσ̂

95% Confidence Interval Example

• Suppose a sample of 100 students with mean SAT score of 1020, standard deviation of 200

• How do we find the 95% Confidence Interval?

• If N is large, we know that:• 1. The sampling distribution is roughly normal

• 2. Therefore 95% of samples will yield a mean estimate within 2 standard deviations (of the sampling distribution) of the population mean ()

• Thus, 95% of the time, our estimates of (Y-bar) are within two “standard errors” of the actual value of .

95% Confidence Interval

• Formula for 95% confidence interval:

)(σ2Y : CI 95% Y• Where Y-bar is the mean estimate and sigma (Y-

bar) is the standard error

• Result: Two values – an upper and lower bound

• Adding our estimate of the standard error:

N

s2Y )σ̂(2Y Y

Y

95% Confidence Interval

• Suppose a sample of 100 students with mean SAT score of 1020, standard deviation of 200

• Calculate:

)100

200)(2(1020

)(2Y : CI 95%Ns

)2( 1020 10200

40 1020 2(20) 1020 • Thus, we are 95% confident that the population

mean falls between 980 and 1060


• Question: Suppose we want to know the confidence interval for a value other than 95%?

• How can we find the C.I. For any number?

• Answer #1: We know that 68% of cases fall within 1 standard deviation, 99% within 3

• Q: What is 99% C.I.? (Y-bar = 1020, S.D. = 200)

)100

200)(3(1020 1080 to960

)(3Y : 99%CINs


• Question: Which was a larger range: the 95% CI or 99% CI ?

• Answer: The 99% range was larger

• The larger the range, the more likely that the true mean will fall in it

• It is a safe bet if you specify a very wide range

• If you want to bet that the mean will fall in a very narrow range, you’ll lose more often.


• Question: Suppose we want to know the confidence interval for a value other than 95%?

• Answer #2: Look at the “Z-table”

• Z-table = Normal curve probability distribution with mean 0, SD of 1

• Found on Knoke, p. 459

– It tells you the % of cases falling within a particular number of S.D.’s of the mean

• Lists all values, not just 1, 2, and 3!

Confidence Intervals: Z-tableQuestion:

What Z-value should we use

for 20% confidence interval?

Answer: 10% fall from 0 to

Z=.26.

20% of cases fall from -.26 to +.26


• General formula for Confidence Interval:

)(σ ZY :C.I. Yα/2• Where:

– Y-bar is the sample mean

– Sigma sub-Y-bar is the standard error of mean

– Z sub /2 is the Z-value for level of confidence

– It can be looked up in a Z-table

– If you want 90%, look up p(0 to Z) of .45

Small N Confidence Intervals

• If N is large, the C.L.T. assures us that that the sampling distribution is normal

• This allows us to construct confidence intervals

• Issue: What if N is not large?• The sampling distribution may not be normal

• Z-distribution probabilities don’t apply…

• In short: If N is small our confidence interval formula based on Z-scores doesn’t work.

Small N Confidence Intervals• Solution: Find another curve that accurately

characterizes sampling distribution for small N

• The “T-distribution”• An alternative that accurately approximates the shape of the

sampling distribution for small N

• The T distribution actually a set of distributions with known probabilities

• Again, we can look up values in a table to determine probabilities associated with a # of standard deviations from the mean.

Confidence Intervals for Small N

• Small N C. I. Formula:• Yields accurate results, even if N is not large

)σ̂( tY :C.I. Yα/2

N

s tY :C.I. α/2

• Again, the standard error can be estimated by the sample standard deviation:

T-Distributions

• Issue: Which T-distribution do you use?

• The T-distribution is a “family” of distributions• In a T-Distribution table, you’ll find many T-distributions

to choose from

• One t-distribution for each “degree of freedom”– Also called “df” or “DofF”

• Which T-distribution should you use?

• For confidence intervals: Use T-distribution for df = N - 1

• Ex: If N = 15, then look at T-distribution for df = 14.

Looking Up T-Tables

Choose the correct df

(N-1)

Choose the desired

probability for /2

Find t-value in correct row

and column

Interpretation is just like a Z-score.

2.145 = number of standard

errors for C.I.!

Uses of Confidence Intervals

• What are some uses for confidence intervals?

• 1. Assessing the general quality of an estimate– Ex: Mean level of happiness of graduate students

• Happiness scored on a measure from 1-10 (10=most)

– Suppose 95% is: 6 +/- 4• i.e., range = 2 to 10

– Question: Is this a “good” estimate?– Answer: No, it is not very useful.

• Something like 6 +/- 1 is a more useful estimate.

Uses of Confidence Intervals

• 2. Comparing a mean estimate to a specific value

• Ex: Comparing a school’s test scores to a national standard

• Suppose national standard on a math test is 47

• Suppose a sample of students scores 52. Did the school population meet the national standard?

• If 99% CI is 50-54, then the answer is probably yes– If 99% CI is 42-62, it isn’t certain.

• Ex: A factory makes bolts that must hold 10 kilos• Confidence intervals let you verify that the bolts are strong

enough, without testing each one.

Uses of the Sampling Distribution

• Extended example:

• Let’s figure out what the sampling distribution looks like for a specific population

• Since the sampling distribution is a probability distribution….

• We can then calculate the probability of observing any particular value of Y-bar (given a known )

• Note: Later we’ll use the converse logic to draw conclusions about the actual value of , given an observed Y-bar.

Probability of Y-bar, given • Suppose we have a population with the following

characteristics: = 23, = 9

• What is the probability of picking a sample (N=35) that has a mean of 27 or more?

• To determine this, we must first determine the shape of the sampling distribution

• Then we can determine the probability of falling a given distance from it…

Probability of Y-bar, given • Q: According to the Central Limit Theorem,

what is the mean of the sampling distribution?

• A: Same as the population: 23μμY • Second, we must determine the “width” of the

sampling distribution: the standard deviation (referred to as Standard Error)

• The C.L.T says we can calculate it as:

52.19.5

9

35

9σσ Y

Y N

Probability of Y-bar, given • If we know and the Standard Error, we can

draw the sampling distribution of the mean for this population:

5.1σ 23,μ YY

19 20 21 22 23 24 25 26 27

Probability of Y-bar, given • We know that 95% of possible Y-bars fall within

two Standard Errors (i.e., +/- 3):– between 20 and 26

5.1σ 23,μ YY

19 20 21 22 23 24 25 26 27

Probability of Y-bar, given • To determine the probability associated with a

particular value, convert to Z-scores• p(-1<Z<1) is.68, p(-2<Z<2) is.95, etc

• We use a slightly different Z-score formula than we learned before

• But it is analogous

Yσ

)μ()(

Y

s

YYZ

Y

ii

Probability of Y-bar, given • Why use a different formula for Z-scores?

• Old formula calculates # standard deviations a case falls from the sample mean

• From Y-sub-i to Y-bar

• New formula tells the number of standard errors a mean estimate falls from the population mean

• From Y-bar to mu

Yσ

)μ()(

Y

s

YYZ

Y

ii

Probability of Y-bar, given • Back to the problem: What is the Z-score

associated with getting a sample mean of 27 or greater from this population?

• Sampling distribution mean = 23

• Standard error = 1.5

66.25.1

2327

σ

)μ(

Y

Y

Z

Probability of Y-bar, given • Finally, what is the probability of observing a Z-

score of 2.66 (or greater) in a standard normal distribution?

• To convert Z-scores to probabilities, look it up in a table, such as Knoke p. 463

• Area beyond Z=2.66 is .0039• How do we interpret that?

• Lets look at it visually:

Probability of Y-bar, given

• The Z-distribution is a probability distribution– Total area under curve = 1.0– Area under half curve is .5– Red are (“Area beyond Z”) = .0039

Probability of Y-bar, given Is the probability of Z > 2.66 very large?

-3 -2 -1 0 1 2 3

No! Red area = probability of

Z > 2.66 = .004, which is .4%

Probability of Y-bar, given • Conclusion: Y-bar of 27 (or larger) should occur

only 4 out of 1000 times we sample from this population

• Possible interpretations:• 1. We just experienced an improbable sample

• 2. Our sample was biased, not representative

• 3. Maybe we begin to suspect that the population mean () isn’t really 23 after all…

• Idea: We could “cast doubt on” someone’s claim that = 23, given this observed Y-bar and S.D.

• Hypothesis testing is based on this!

Conclusions About Means

• The previous example started out with the assumption that = 23– Typically, will be unknown; Only Y-bar is known– But, the same logic can be applied to “test” whether

is likely to equal 23• If observed Y-bar is highly unlikely, we cast doubt on the

idea that is really 23

– Example: We can “test” whether a school’s math scores are above national standard of 47

• If school sample is far above national average, it is improbable that the school population is at or below 47

• Next Class: Hypothesis testing!