8/7/2019 10.1 Power Point
1/17
Confidence Intervals: The Basics
Section 10.1
8/7/2019 10.1 Power Point
2/17
How long can you expect a AAA battery tolast? What proportion of college
undergraduates have engaged in bingedrinking? Is caffeine dependence real?
These are the types of questions that we would
like to be able to answer, but it just isntpractical to ask/experiment on every battery,college undergrad, or caffeine addictedperson.
Instead we select a sample and collect datafrom those individuals only. The goal is toinfer from the sample data some conclusion
about the population.
8/7/2019 10.1 Power Point
3/17
We cannot ever be certain that our
conclusions are correct since a different
sample would generally lead to a differentconclusion.
Statistical inference uses the language of
probability to express the strength of ourconclusions. Probability allows us to take
chance variation into account and to correct
our judgment using calculations.
8/7/2019 10.1 Power Point
4/17
In the Next Two Chapters
We will study the two most common methods
of statistical inference, confidence intervals
and significance tests.
Both are based on the sampling distributions
of statistics.
That is, both report probabilities that state
what would happen if we used the inference
method many, many times.
8/7/2019 10.1 Power Point
5/17
Your Data is Only as Good as Your
Collection Methods
The methods of formal inference require the long-run, regular behavior that probability describes.
Inference is most reliable when the data are
produced by a properly randomized design. If thisisnt done, your conclusions will be open tochallenge.
Formal inference cannot remedy basic flaws in
producing data, such as voluntary responsesamples and uncontrolled experiments. Use thecommon sense you acquired over the first ninechapters and proceed to formal inference onlywhen you are satisfied that the data deserve such
analysis.
8/7/2019 10.1 Power Point
6/17
Lets Pretend
To begin learning about the methods of inference, wewill pretend we know the true population standarddeviation , although we would never actually knowthat value without knowing . Once we know thebasic ideas, we will get rid of this unrealisticrequirement.
There are libraries full of more elaborate statisticaltechniques than we will use, but informed use of anyof these methods requires an understanding of theunderlying reasoning.
A computer or calculator can do the arithmetic, butwe must still exercise judgment based on
understanding.
8/7/2019 10.1 Power Point
7/17
Hey Baby, Whats Your IQ?
Big City University would like to find the
average IQ of its 5000 freshman. Having
each one take an IQ test would be difficult
and expensive though, so the university
decides to administer the test to an SRS of
50 freshman. The university finds that themean IQ score for this sample is
Lets ponder a few questions
112.x !
8/7/2019 10.1 Power Point
8/17
Find Your Exact Match! Really?
Is the mean IQ score of all Big CityUniversity freshman exactly 112?
Probably not. But the law of large numbers tellsus that the sample mean from a large SRS willbe close to the unknown population parameter.Because , we guess that issomewhere around 112.
How close to 112 is likely to be?
Well, to answer this we must ask anotherquestion
112x !
8/7/2019 10.1 Power Point
9/17
We ask How would the sample mean vary if we
took many, many samples of 50 freshman from this
same population?
The sampling distribution of describes how the values
of vary in repeated samples. Remember from last
chapter:
The mean of the sampling distribution of is the same as theunknown mean of the entire population.
The standard deviation of for an SRS of 50 freshman is ,
where is the standard deviation of the IQ scores from all Big
City University freshman. Suppose we know that the IQ scores
have standard deviation = 15, then the standard deviation ofis
The central limit theorem tells us that the mean of 50 scores has
a distribution that is close to Normal.
x
x
x
x
x 50
W
15 2.1.50
}
8/7/2019 10.1 Power Point
10/17
Put It All Together and What Do
You Have?
Putting these facts together gives us the
reasoning of statistical estimation in a nutshell for
this example:
1. To estimate the unknown population mean ,
use the mean of the SRS
2. Although is an unbiased estimator of, it will
rarely be exactly equal to , so our estimate hassome error.
3. In repeated samples, the values of follow an
approximately Normal distribution with mean
and standard deviation 2.1.
x
x
.x
8/7/2019 10.1 Power Point
11/17
4. The 68-95-99.7% Rule says that in about 95% of all
samples, the mean IQ score for the sample will be
within 4.2 (two standard deviations) of the population
mean .
5. Whenever is within 4.2 points of, is within 4.2
points of . This happens in about 95% of all
samples. So the unknown population parameterlies between and in about 95% of all
samples.
6. So if we estimate that , lies somewhere in theinterval 112 4.2 = 107.8 to 112 + 4.2 = 116.2, we
would be calculating this interval using a method that
captures the true in about 95% of all possible
samples.
x
4.2x
x
x
4.2x
8/7/2019 10.1 Power Point
12/17
The Big Picture - The big idea is that thesampling distribution of tells us how big the error
is likely to be when we use as an estimate for.
x
x
8/7/2019 10.1 Power Point
13/17
What Was That Again?
We have just learned that in 95% of all
samples of 50 Big City University freshman,
the interval will contain the truepopulation mean .
The language of statistical inference usesthis fact about what would happen in many
samples to express our confidence in the
results of any one sample.
4.2x s
8/7/2019 10.1 Power Point
14/17
So To Finally Answer the
Question
Earlier we asked How close to 112 is likely
to be? The resulting interval is 112 4.2,
which can be written as (107.8, 116.2). Wecan now say that We are 95% confident that
the unknown mean IQ score for all Big City
University freshman is between 107.8 and
116.2.
Remember this phrasing because we will use
it every time we create a confidence interval.
8/7/2019 10.1 Power Point
15/17
Be Careful!
Be sure that you understand the basis for our
confidence. There are only two possibilities:
1. The interval (107.8, 116.2) contains the true .
2. Our SRS was one of the few samples for which
is not within 4.2 points of the true . (Only 5% of all
samples give such inaccurate results.)
We cannot know whether our sample is one of the
unlucky 5%.The phrase we are 95% confident is
shorthand for saying, We got these numbers by a
method that gives correct results 95% of the time.
x
8/7/2019 10.1 Power Point
16/17
8/7/2019 10.1 Power Point
17/17
Twenty-five
samples fromthe same
population give
these 95%
confidenceintervals. In the
long run, 95% of
all samples give
an interval thatcontains the
population mean
.