Sociology 5811:Lecture 8: CLT Applications: Confidence Intervals, Examples
Copyright © 2005 by Evan Schofer
Do not copy or distribute without permission
Announcements
• Problem Set 3 handed out• On course website
Review: Sampling Distributions
• Q: What is the sampling distribution of the mean?
• Answer: Sampling Distribution: The distribution of estimates created by taking all possible unique samples (of a fixed size) from a population
• Q: What is the Standard Error?
• Answer: The standard deviation of the sampling distribution
• Q: What does the Standard Error tell you?
• Answer: How “dispersed” estimates will be around the true parameter value
Review: Central Limit Theorem
• Q: What does the CLT mean in plain language?
1. As N grows large, the sampling distribution of the mean approaches normality
YY μμ 2.
NY
Y
σσ 3.
Central Limit Theorem: Visually
Ys
YμYσ
Implications of the C.L.T• Visually: Suppose we observe mu-hat = 16
16μ̂ μ
16μ̂ μ
16μ̂ μ
16μ̂ μ
But, mu-hat always falls within the
sampling distribution
Sampling distribution
There are many
possible locations
of
Implications of the C.L.T
• What is the relation between the Standard Error and the size of our sample (N)?
• Answer: It is an inverse relationship.• The standard deviation of the sampling distribution shrinks
as N gets larger
• Formula:
NY
Y
σσ
• Conclusion: Estimates of the mean based on larger samples tend to cluster closer around the true population mean.
Implications of the CLT
• The width of the sampling distribution is an inverse function of N (sample size)– The distribution of mean estimates based on N = 10
will be more dispersed. Mean estimates based on N = 50 will cluster closer to .
μ̂μ
μ̂μ
Smaller sample size Larger sample size
Confidence Intervals
• Benefits of knowing the width of the sampling distribution:
• 1. You can figure out the general range of error that a given point estimate might miss by
• Based on the range around the true mean that the estimates will fall
• 2. And, this defines the range around an estimate that is likely to hold the population mean
• A “confidence interval”
• Note: These only work if N is large!
Confidence Interval
• Confidence Interval: “A range of values around a point estimate that makes it possible to state the probability that an interval contains the population parameter between its lower and upper bounds.” (Bohrnstedt & Knoke p. 90)
• It involves a range and a probability
• Examples: • We are 95% confident that the mean number of CDs owned
by grad students is between 20 and 45
• We are 50% confident the mean rainfall this year will be between 12 and 22 inches.
Confidence Interval
• Visually: It is probable that falls near mu-hat16μ̂
μ
μ μ
Probable values of
Range where is unlikely to be
Q: Can be this far from mu-hat?
Answer: Yes, but it is very improbable
Confidence Interval
• To figure out the range in of “error” in our mean estimate, we need to know the width of the sampling distribution
• The Standard Error! (S.D. of the sampling dist of the mean)
• The Central Limit Theorem provides a formula:
NY
Y
σσ
• Problem: We do not know the exact value of sigma-sub-Y, the population standard deviation!
Confidence Interval
• Question: How do we calculate the standard error if we don’t know the population S.D.?
• Answer: We estimate it using the information we have:
• Where N is the sample size and s-sub-Y is the sample standard deviation.
NY
Y
sσ̂
95% Confidence Interval Example
• Suppose a sample of 100 students with mean SAT score of 1020, standard deviation of 200
• How do we find the 95% Confidence Interval?
• If N is large, we know that:• 1. The sampling distribution is roughly normal
• 2. Therefore 95% of samples will yield a mean estimate within 2 standard deviations (of the sampling distribution) of the population mean ()
• Thus, 95% of the time, our estimates of (Y-bar) are within two “standard errors” of the actual value of .
95% Confidence Interval
• Formula for 95% confidence interval:
)(σ2Y : CI 95% Y• Where Y-bar is the mean estimate and sigma (Y-
bar) is the standard error
• Result: Two values – an upper and lower bound
• Adding our estimate of the standard error:
N
s2Y )σ̂(2Y Y
Y
95% Confidence Interval
• Suppose a sample of 100 students with mean SAT score of 1020, standard deviation of 200
• Calculate:
)100
200)(2(1020
)(2Y : CI 95%Ns
)2( 1020 10200
40 1020 2(20) 1020 • Thus, we are 95% confident that the population
mean falls between 980 and 1060
Confidence Intervals
• Question: Suppose we want to know the confidence interval for a value other than 95%?
• How can we find the C.I. For any number?
• Answer #1: We know that 68% of cases fall within 1 standard deviation, 99% within 3
• Q: What is 99% C.I.? (Y-bar = 1020, S.D. = 200)
)100
200)(3(1020 1080 to960
)(3Y : 99%CINs
Confidence Intervals
• Question: Which was a larger range: the 95% CI or 99% CI ?
• Answer: The 99% range was larger
• The larger the range, the more likely that the true mean will fall in it
• It is a safe bet if you specify a very wide range
• If you want to bet that the mean will fall in a very narrow range, you’ll lose more often.
Confidence Intervals
• Question: Suppose we want to know the confidence interval for a value other than 95%?
• Answer #2: Look at the “Z-table”
• Z-table = Normal curve probability distribution with mean 0, SD of 1
• Found on Knoke, p. 459
– It tells you the % of cases falling within a particular number of S.D.’s of the mean
• Lists all values, not just 1, 2, and 3!
Confidence Intervals: Z-tableQuestion:
What Z-value should we use
for 20% confidence interval?
Answer: 10% fall from 0 to
Z=.26.
20% of cases fall from -.26 to +.26
Confidence Intervals
• General formula for Confidence Interval:
)(σ ZY :C.I. Yα/2• Where:
– Y-bar is the sample mean
– Sigma sub-Y-bar is the standard error of mean
– Z sub /2 is the Z-value for level of confidence
– It can be looked up in a Z-table
– If you want 90%, look up p(0 to Z) of .45
Small N Confidence Intervals
• If N is large, the C.L.T. assures us that that the sampling distribution is normal
• This allows us to construct confidence intervals
• Issue: What if N is not large?• The sampling distribution may not be normal
• Z-distribution probabilities don’t apply…
• In short: If N is small our confidence interval formula based on Z-scores doesn’t work.
Small N Confidence Intervals• Solution: Find another curve that accurately
characterizes sampling distribution for small N
• The “T-distribution”• An alternative that accurately approximates the shape of the
sampling distribution for small N
• The T distribution actually a set of distributions with known probabilities
• Again, we can look up values in a table to determine probabilities associated with a # of standard deviations from the mean.
Confidence Intervals for Small N
• Small N C. I. Formula:• Yields accurate results, even if N is not large
)σ̂( tY :C.I. Yα/2
N
s tY :C.I. α/2
• Again, the standard error can be estimated by the sample standard deviation:
T-Distributions
• Issue: Which T-distribution do you use?
• The T-distribution is a “family” of distributions• In a T-Distribution table, you’ll find many T-distributions
to choose from
• One t-distribution for each “degree of freedom”– Also called “df” or “DofF”
• Which T-distribution should you use?
• For confidence intervals: Use T-distribution for df = N - 1
• Ex: If N = 15, then look at T-distribution for df = 14.
Looking Up T-Tables
Choose the correct df
(N-1)
Choose the desired
probability for /2
Find t-value in correct row
and column
Interpretation is just like a Z-score.
2.145 = number of standard
errors for C.I.!
Uses of Confidence Intervals
• What are some uses for confidence intervals?
• 1. Assessing the general quality of an estimate– Ex: Mean level of happiness of graduate students
• Happiness scored on a measure from 1-10 (10=most)
– Suppose 95% is: 6 +/- 4• i.e., range = 2 to 10
– Question: Is this a “good” estimate?– Answer: No, it is not very useful.
• Something like 6 +/- 1 is a more useful estimate.
Uses of Confidence Intervals
• 2. Comparing a mean estimate to a specific value
• Ex: Comparing a school’s test scores to a national standard
• Suppose national standard on a math test is 47
• Suppose a sample of students scores 52. Did the school population meet the national standard?
• If 99% CI is 50-54, then the answer is probably yes– If 99% CI is 42-62, it isn’t certain.
• Ex: A factory makes bolts that must hold 10 kilos• Confidence intervals let you verify that the bolts are strong
enough, without testing each one.
Uses of the Sampling Distribution
• Extended example:
• Let’s figure out what the sampling distribution looks like for a specific population
• Since the sampling distribution is a probability distribution….
• We can then calculate the probability of observing any particular value of Y-bar (given a known )
• Note: Later we’ll use the converse logic to draw conclusions about the actual value of , given an observed Y-bar.
Probability of Y-bar, given • Suppose we have a population with the following
characteristics: = 23, = 9
• What is the probability of picking a sample (N=35) that has a mean of 27 or more?
• To determine this, we must first determine the shape of the sampling distribution
• Then we can determine the probability of falling a given distance from it…
Probability of Y-bar, given • Q: According to the Central Limit Theorem,
what is the mean of the sampling distribution?
• A: Same as the population: 23μμY • Second, we must determine the “width” of the
sampling distribution: the standard deviation (referred to as Standard Error)
• The C.L.T says we can calculate it as:
52.19.5
9
35
9σσ Y
Y N
Probability of Y-bar, given • If we know and the Standard Error, we can
draw the sampling distribution of the mean for this population:
5.1σ 23,μ YY
19 20 21 22 23 24 25 26 27
Probability of Y-bar, given • We know that 95% of possible Y-bars fall within
two Standard Errors (i.e., +/- 3):– between 20 and 26
5.1σ 23,μ YY
19 20 21 22 23 24 25 26 27
Probability of Y-bar, given • To determine the probability associated with a
particular value, convert to Z-scores• p(-1<Z<1) is.68, p(-2<Z<2) is.95, etc
• We use a slightly different Z-score formula than we learned before
• But it is analogous
Yσ
)μ()(
Y
s
YYZ
Y
ii
Probability of Y-bar, given • Why use a different formula for Z-scores?
• Old formula calculates # standard deviations a case falls from the sample mean
• From Y-sub-i to Y-bar
• New formula tells the number of standard errors a mean estimate falls from the population mean
• From Y-bar to mu
Yσ
)μ()(
Y
s
YYZ
Y
ii
Probability of Y-bar, given • Back to the problem: What is the Z-score
associated with getting a sample mean of 27 or greater from this population?
• Sampling distribution mean = 23
• Standard error = 1.5
66.25.1
2327
σ
)μ(
Y
Y
Z
Probability of Y-bar, given • Finally, what is the probability of observing a Z-
score of 2.66 (or greater) in a standard normal distribution?
• To convert Z-scores to probabilities, look it up in a table, such as Knoke p. 463
• Area beyond Z=2.66 is .0039• How do we interpret that?
• Lets look at it visually:
Probability of Y-bar, given
• The Z-distribution is a probability distribution– Total area under curve = 1.0– Area under half curve is .5– Red are (“Area beyond Z”) = .0039
Probability of Y-bar, given Is the probability of Z > 2.66 very large?
-3 -2 -1 0 1 2 3
No! Red area = probability of
Z > 2.66 = .004, which is .4%
Probability of Y-bar, given • Conclusion: Y-bar of 27 (or larger) should occur
only 4 out of 1000 times we sample from this population
• Possible interpretations:• 1. We just experienced an improbable sample
• 2. Our sample was biased, not representative
• 3. Maybe we begin to suspect that the population mean () isn’t really 23 after all…
• Idea: We could “cast doubt on” someone’s claim that = 23, given this observed Y-bar and S.D.
• Hypothesis testing is based on this!
Conclusions About Means
• The previous example started out with the assumption that = 23– Typically, will be unknown; Only Y-bar is known– But, the same logic can be applied to “test” whether
is likely to equal 23• If observed Y-bar is highly unlikely, we cast doubt on the
idea that is really 23
– Example: We can “test” whether a school’s math scores are above national standard of 47
• If school sample is far above national average, it is improbable that the school population is at or below 47
• Next Class: Hypothesis testing!