Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Business Statistics
Lecture 5:
Confidence Intervals
2
• Confidence intervals
• The “t” distribution
Goals for this Lecture
3
Welcome to Interval Estimation!
Moments
Mean 815.0340
Std Dev 0.8923
Std Error Mean 0.0892
Upper 95% Mean 815.2111
Lower 95% Mean 814.8569
N 100.0000
Sum Weights 100.0000
Sample mean
Sample SD
Sample size
THIS CLASS
LAST CLASS
4
Hmmm...
• In the motor shaft case, we built a model using and s
• We treated them like population parameters
• What if they were way off?
• How would we know?
• E.g., What if for the 100 shafts and we decided the process was not capable
• How sure are we that would give the same result?
X
820X
5
Inference: Making Educated Guesses
• We want to use a sample to make
guesses about a larger population
• Samples are variable (we’d get different
values if we took a different sample), so
our guesses are uncertain
• We want to guess in such a way that:
• There is a chance we guessed right
• We know what that chance is
6
General Strategy for Guessing
• Pick a statistic similar to the parameter
you want to guess
• Figure out what the sampling distribution of
your statistic looks like
• Use the sampling distribution to assess the
quality of your guess
• We’ll start by guessing averages,
because they’re easy
7
Assumption
• Data are collected as a simple random
sample:
• Every unit in population equally likely to be
chosen
• Don’t just look at the 800 most highly paid
CEO’s
• Choosing one unit does not change the
relative chances for another unit to be
chosen
• Other sampling schemes require
different techniques
8
Last Class: Distribution of Averages
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
812 813 814 815 816 817 818
ShaftDiam
Individual
Histogram of all future shafts. SD=s
Mean of 5
Histogram of all possible means of five future shafts. SE= s/sqrt(5)
9
• Best guess for is
• If you always guess for , you will
never be right!
• You can guess with a confidence
interval and be right some of the time
• Narrow intervals: higher chance of being
wrong
• Wide intervals: less useful
X
X
Guessing , the Population Mean
100.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
812 813 814 815 816 817 818
ShaftDiam
Individual
(Unobserved) dist. of population
Mean of 5
(Unobserved) dist. of sample mean
Unobserved pop mean
95% confidence interval for pop mean
• Alternatively, is within 2 SE’s of
95% of the time
X
• Because of the CLT, we know that
is within 2 SE’s of 95% of the time
Main Idea
X
Sample mean
11
How to Guess
• Choose a probability of being wrong: a
• Find z on the normal table so that a/2 of
the probability is above z and a/2 is less
than -z
• Example: if a = 5% then z = 1.96
• Calculate and
• Your interval is
Xn
s
X zn
s
12
Example: Shaft Process
• Sample mean:
• Sample SD:
• Number of shafts:
• SE:
• 95% confidence interval for :
• With numbers:
815.03 1.96(0.08923) = [814.859, 815.209]
/ 0.8923/ 100 = 0.08923X
s s n
100n
0.8923s
815.03x
1.96 /x s n
13
Mean
Std Dev
Std Error Mean
Upper 95% Mean
Lower 95% Mean
N
Sum Weights
815.0340
0.89230.0892
815.2111
814.8569
100.0000
100.0000
Best Guess at
Best Guess at s
95% Confidence Interval for
• There’s a 95% chance that this interval contains
• There is a 5% chance it does not
JMP: Shaft Diameters
14
So, What is a Confidence Interval?
• It’s an interval around our sample statistic that
shows how variable the sample statistic is
• Narrow: real (population) value unlikely to be far
away
• Wide: little information about the population value
• Two CIs for our example:
Confidence interval #1
Confidence interval #2
814 816815
Observed
Sample Mean
15
Again, What is a Confidence Interval?
• A confidence interval is a random
interval
• Random because it is a function of a
random variable ( )
• Confidence level is the long-run
percentage of intervals that will “cover”
the population parameter
• It is not the probability that the interval
contains the true parameter!
X
16
A Simulation
intervals not including population mean: 2
95% Confidence Intervals for mean = 50, sd = 10, n = 5sample
1 5 10 15 20 25 30 35 40 45 50
10
20
30
40
50
60
70
80
90
100
17
intervals not including population mean: 10
95% Confidence Intervals for mean = 50, sd = 10, n = 95sample
1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
20
30
40
50
60
70
80
Another Simulation
5
18
Deriving a Confidence Interval (1)
• Let X1, X2, …, Xn be a random sample
from a normal population with unknown
mean and known standard deviation s
• Create a CI for based on the sampling
distribution of the mean:
• To start, we know that (via
standardizing):
~ (0,1)/
XN
n
s
nNX /,~ 2s
Deriving a Confidence Interval (2)
• Now for Z ~ N(0,1) we know
• That is, there is a 95% probability that the random
variable Z lies in this fixed interval
• Thus
• And, after some algebra
• Now we say we are 95% confident that this
random interval covers the fixed (unknown) 19
Pr( 1.96 1.96) 0.95Z
-Pr -1.96 1.96 0.95
/
X
n
s
Pr 1.96 1.96 0.95X Xn n
s s
20
Deriving a Confidence Interval (3)
• So, If X1 = x1, X2 = x2, …, Xn = xn are
observed values of a random sample
from a :
• We can be 95% confident that the
interval covers the population mean
• Interpretation: In the long run, 19 times out
of 20 the interval will cover the true mean
and 1 time out of 20 it will not
1.96xn
s is a 95% confidence interval for
2,sN
21
But Something’s Fishy...
• Why make all the fuss about the mean
being random if we treat the SD as
known?
• Since s is a population quantity, we
have to estimate it with s
• Should that make the intervals wider or
narrower?
• When s unknown (almost always), use
“t” distribution rather than the normal
22
0.00
0.10
0.20
0.30
0.40
-4 -3 -2 -1 0 1 2 3 4
Z= number of SE’s from the mean
normal
T3
T10
T100
The t Distribution
23
Degrees of Freedom (df)
• The more degrees of freedom we have, the
better we can estimate s
• The better we estimate s, the closer we are to
s being known
• Thus, the more df we have, the closer t
values are to z values
• Calculating degrees of freedom:
• Each observation adds one degree of freedom
• One degree of freedom is used up when we
calculate
• There are n-1 degrees of freedom left
X
24
How to (Really) Guess
• Choose a probability of being wrong: a
• Calculate and
• For DF=1-n, find t from Table A3-5
(page 496) for p=1-a
• Then, the confidence interval is
X /s n
sX t
n
Table A3-5
• Example
• For a=0.05 and
df=100, we have
t=1.984
• Notation
25
, / 2
1, / 2
100,0.025
1.984
df
n
t
t
t
a
a
26
Mean
Std Dev
Std Error Mean
Upper 95% Mean
Lower 95% Mean
N
Sum Weights
815.0340
0.89230.0892
815.2111
814.8569
100.0000
100.0000
Best Guess at
Best Guess at s
95% Confidence Interval for
Shaft Diameters Example Redux
1.984 / 815.034 1.984 0.8923/ 100
[814.857,815.211]
x s n
27
How Confidence Intervals Behave
• Width of CI’s:
• Margin of error:
• Bigger SD Bigger SE wider intervals
• Bigger sample size• Smaller SE narrower intervals
• Smaller t values narrower intervals
• Higher confidence Bigger t values wider intervals
n
stw n 2/,12 a
n
stE n 2/,1 a
28
t vs. z
• Use t when you don’t know s
• The t distribution assumes the data are normally distributed
• Options if data are not normally distributed:
• Transform the data (logarithms)
• If transformations don’t work and sample size is big ( > 30) ignore the problem
• If transformations don’t work and sample size is small, read the book about nonparametric tests
29
• Manufacturer of consumer electronics:
• How many households will purchase a
computer in the next year?
• Use survey to collect responses from
100 households
• To justify sales projections, management
needs the proportion to be at least 25%
• Should management revise sales
projections?
Example (CompPur.jmp)
30
Example, continued
• Survey results:
• Another sample would likely have given a different result
• What we want to know is, based on this result, where could the true proportion lie?
No
Yes
Total
Level
86
14
100
Count
0.86000
0.14000
1.00000
Prob
2 Levels
Frequencies
31
Example, continued
• When data are 0’s and 1’s, they are REALLY not normal
.0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0 1.1
Mean
Std Dev
Std Error Mean
Upper 95% Mean
Lower 95% Mean
N
Sum Weights
0.1400
0.3487
0.0349
0.2092
0.0708
100.0000
100.0000
• Rule of thumb: at least 30 observations, 5
successes, and 5 failures lets CLT kick in
• No difference between means and proportions!
95% CI for true proportion
Other Confidence Intervals
• There are lots of other confidence
intervals – we’ve concentrated on CIs
for the mean
• See your textbook for
• CI for the variance (i.e., s2)
• CI for the difference of two means
• Not enough time to learn about these
• Just skim those sections in the book
• And know that CIs exist for other
parameters 32
33
• Descriptive Statistics
• Probability
• And, Or, Not
• Normal distribution
• Central limit theorem
• Computing SE( ) from SD(X)
• Inference
• Confidence intervals for population means and proportions
X
What We Have Learned So Far…