7/27/2019 2. Statistical Inference - Single Population
1/38
0
2. Statistical Inference: Single
Population Mean and Proportion
(Review)
ECON 251
Research Methods
7/27/2019 2. Statistical Inference - Single Population
2/38
1
Descr ipt ive stat ist ics: calculating summary characteristics of data.
Inferential statist ic s: Using sample summary measures to estimatepopulation characteristics.
In descriptive
statistics we
summarize the data
from a population or
a sample of it.
Data on population is
NOT available. We
take a sample and
use its summarizing
measures to estimate
the unknown
populationcharacteristics.
Population
Characteristics
are unknown
Sample:Find
summarizing
measures
Inference
Descriptive Statistics
Inferential Statistics
population
Summarize the
data
sampleSummarize
the data
Descriptive Statistics vs. Inferential Statistics
7/27/2019 2. Statistical Inference - Single Population
3/38
2
Statistical Inference Review
There are two procedures for making inference
Hypothesis Testing (HT) and Estimation
In estimation, we attempt to estimate the value of the parameter ineither of two ways:
Point Estimator
A point estimator draws inference about a population by estimating the value ofan unknown parameter using a single value or a point.
Interval Estimator
An interval estimator draws inference about a population by estimating thevalue of an unknown parameter using an interval.
We use intervals so we can be precise about our degree of certainty regardingthe sample statistics proximity to the population parameter.
HT involves testing a specific belief about the value of the parameter
HT concepts are the foundation for estimation as well, so we begin
there.
7/27/2019 2. Statistical Inference - Single Population
4/38
3
4 Steps For Hypothesis Testing
Find the p-value
(P-value method)
Set upalternative &null hypotheses
Step 1
Calculate thetest statistic
Step 2
Find critical values
(Rejection region method)Step 3
Make a decisionStep 4
7/27/2019 2. Statistical Inference - Single Population
5/38
4
Step One: Set up alternative & null hypotheses
The purpose of hypothesis testing is to determine whetherthere is enough statistical evidence in favor of a certainbelief about a population parameter.
There are two hypotheses (about a population parameter(s))
H0 - the null hypothesis [for example, H0: m = 5] H1 - the alternative hypothesis [for example, H1: m > 5]
7/27/2019 2. Statistical Inference - Single Population
6/38
5
Step One: Set up alternative & null hypotheses
The alternative hypothesis is most important, it is what youare trying to prove. Always start by stating the alternativefirst.
The alternative can involve >, < or
The alternative establishes whether the test is one-tailed ortwo-tailed.
The alternative establishes the location of the rejectionregion(s).
Once you have correctly defined the alternative, the null iseasy to establish.
We always assume the null is true, therefore H0 MUSTcontain =, and may contain , .
7/27/2019 2. Statistical Inference - Single Population
7/38
6
n
xz
m
ns
xt
m
n
pp
ppz
)1(
Step Two: Calculating test statistics
Population Mean w/ Sigma known
Population Mean w/ Sigma unknown
Population Proportion
7/27/2019 2. Statistical Inference - Single Population
8/38
7
Step Two: Calculating test statistics
The standardization formulas provide the test statistic.
They convert our sample statistic from the samplingdistribution to the standardized distribution (torzin thiscase).
There are millions of sampling distributions. Rather than
knowing everything about every one of those distributions, westandardize our statistic thereby moving it from the samplingdistribution and placing it on the standardized distribution.
We know everything there is to know about the standardized
distribution. Because the test statistic is on the standardizeddistribution, we can compare the test statistic to a criticalvalue, or the area associated with the test statistic (p-value) toalpha.
7/27/2019 2. Statistical Inference - Single Population
9/38
8
Step Three: Find critical value or p-value
You need to decide which method you are going to use tomake your decision.
If you are doing the calculations by hand, you will frequentlyuse the rejection region (critical value) method.
The critical value will either be given to you (exams, in classexamples) or you would find it in excel (NORMSINV, TINV).
P-value method will frequently be used when you are usingsoftware to do your calculations, as most programs provide
these values. You can also find them in excel (NORMSDIST,TDIST).
In the latter case, be sure you can identify the p-valuegraphically as well.
7/27/2019 2. Statistical Inference - Single Population
10/38
9
Decision Rule: rejection region (critical value) methodReject H0 if the test statistic is more extreme than the critical value
Given the significance level (probability of type I error) = a
Two sided alternative
One sided (upper tail) alternative
One sided (lower tail) alternative
Rejection regionaz
Critical value
Critical value
Rejection region az
In case of t distribution we will have & respectively.2at at
Rejection region Rejection region
Critical values
2az2a
z
7/27/2019 2. Statistical Inference - Single Population
11/38
10
Decision Rule: p-value methodP-value is "the amount of evidence in favor of the alternative hypothesis. Thesmaller the p-value, the more evidence in favor of the alternative (and the morelikely you will reject H
0
). P-value is most commonly compared to aof 5% forReject/DNR decision:Reject H0 if the p-value is smaller than the significance level
Each p-value/2
-|tm| |tm|
p-value = the area to theright of tm
tm
tm
Two sided alternative
One sided (upper tail) alternative
One sided (lower tail) alternative
(tm=test statistic; Same holds true for |Z
m| & Z
m)
p-value = the area to the
left of tm
7/27/2019 2. Statistical Inference - Single Population
12/38
11
Three steps to finding the p-value from a graph.1. Find the test statistic
2. Draw an arrow from the test statistic to the extreme end of nearestrejection region
3. If a two-tailed test, do this on the opposite side of the distribution aswell.
The area of the graph which has an arrow through it, is the p-value.
Try showing the p-value graphically in these 4 examples. In each case, assumethat the critical value is 3.2:
H0: m = 5; H0: m = 5; H0: m = 5; H0: m = 5;
H1: m > 5 H1: m > 5; H1: m < 5 H1: m 5
Test stat = 7 Test stat = 3 Test stat = 3 Test stat = 7
Using the p-value is the most common method of making your decision asmost computer software provides this value. However, you must graph yourdistribution before making a final determination.
7/27/2019 2. Statistical Inference - Single Population
13/38
12
Step Four: Make your decision
Make one of the following two conclusions based on the test:
Reject the null hypothesis in favor of the alternativehypothesis.
There ___ enough evidence to infer that the alternative is true
Do not reject the null hypothesis in favor of the alternativehypothesis.
There _______ enough evidence to infer that the alternativeis true
7/27/2019 2. Statistical Inference - Single Population
14/38
13
H0 is true H0 is false
DNR H0
Reject H0
States of Nature
1- a
Type I error, a,
Significance level
1-,power of test
Type II error,
Errors
Two types of errors are possible when making adecision: Type I error - reject H0 when H0 is true.
Type II error - do not reject H0
when H0
is false.
7/27/2019 2. Statistical Inference - Single Population
15/38
14
Analogy: Hypothesis testing is similar to a jury trial
Assume innocent until proven guilty
Assume H0 is true until proven otherwise
Court either finds defendant guilty (Reject H0) or not guilty (DNR H0)
Courts do not prove a person innocent (Accept H0); rather if not guiltyjust not enough evidence to prove guilty; similarly, if we DNR H0, we are
not saying H0 is true, only that there is not enough evidence for us tobelieve otherwise.
Level of proof required to establish guilty verdict? What if you convictan innocent person?
Identical to establishing significance level of test. Type I error (a) is equivalent to convicting an innocent person. We focus
on a, rather than worry about a Type II error () releasing a guiltyperson.
Beyond a reasonable doubt is court of law norm
7/27/2019 2. Statistical Inference - Single Population
16/38
15
Errors
It would be desirable to reduce both types of errors at the
same time. But this is NOT possible.
There is a trade off between aand. As we try to decreasea,will increase and vice versa.
Because the consequences of a Type I error are in mostcircumstances considered to be of greater concern than aType II error (sending innocent person to jail is worse thanletting a guilty person go), we focus on controlling the size of
the Type I error.
7/27/2019 2. Statistical Inference - Single Population
17/38
16
Errors
Standard in statistics varies depending upon the issue at
stake:
______________ evidence = 1% significance level
__________ evidence = 1.001-5% significance level
__________ evidence = 5.001-10% significance level __________ evidence = 10.001% or higher significance level
Unless stated specifically to the contrary, assume we areusing a= .05 in all problems.
7/27/2019 2. Statistical Inference - Single Population
18/38
17
#1 A Nielsen survey estimated in the year 2000 that the mean
number of hours of television viewing per household was7.25 hours per day. The survey involved 250 households. Thesample data had a standard deviation of 2.5 hours per day. In1990, it was determined that the population mean of viewingper household was 6.70 hours per day. Has TV viewingincreased since 1990?
(t249,0.005=2.596, t249,0.01=2.34, t249,0.025=1.9695, t249,0.05=1.651);
(z0.005=2.58, z0.01=2.33, z0.025=1.96, z0.05=1.645)
ExamplesHypothesis Testing
7/27/2019 2. Statistical Inference - Single Population
19/38
18
Hypothesis Testing 4 Step Solution
Identify the alternative and null hypotheses.
H0: mH1: m
Calculate the test statistic
Find the critical value or p-value.
Z0.05 = Make the decision
_______ H0 in favor of the alternative. There is ___________proof that TV viewing has increased since 1990.
n
xZ
m
7/27/2019 2. Statistical Inference - Single Population
20/38
19
#2 The owners of Subway claim that their stores average$875,000 in annual sales. You used this information indeciding to open a store in Delaware. Your store, however,
has not come even close to these annual sales figures. Youwant to prove that you were misled, and that the averagefigure for all stores is actually less than 875,000. You collect
annual sales figures from 70 randomly selected stores. Theaverage in your sample turns out to be $856,000, with astandard deviation of $24,000. You also know from a friendwho is in management of a similar franchise, that you can
count on the standard deviation of sales being $28,000. Canyou prove your claim?(t69,0.005=2.649, t69,0.01=2.382, t69,0.025=1.995, t69,0.05=1.667);
(t70,0.005=2.648, t70,0.01=2.381, t70,0.025=1.994, t70,0.05=1.667);
(z0.005
=2.58, z0.01
=2.33, z0.025
=1.96, z0.05
=1.645)
7/27/2019 2. Statistical Inference - Single Population
21/38
20
#3 Your company is considering opening a retail store inFairbanks Alaska, but will only do so if average dailyspending per capita is higherthere than in the rest of the
country. According to recent data, the average UShousehold spends $90 per day. A sample was taken inFairbanks. From a sample of 49, the average daily
expenditure was $84.50, and the standard deviation was$14.50. Should you open a store in Fairbanks? You have alot riding on this decision, you need to be sure of yourconclusion.
(t48,0.005=2.68, t48,0.01=2.41, t48,0.025=2.01, t48,0.05=1.68);(z0.005=2.58, z0.01=2.33, z0.025=1.96, z0.05=1.645)
7/27/2019 2. Statistical Inference - Single Population
22/38
21
#4 Microsoft Outlook is believed to be the most widely used e-mail manager. A Microsoft executive claims that MicrosoftOutlook is used by more than 75% of Internet users. A Merrill
Lynch study involving 300 respondents, reported that 72%use Microsoft Outlook. Is there enough evidence here todisprove the executives claim?
(t299,0.005=2.592, t299,0.01=2.339, t299,0.025=1.968, t299,0.05=1.65);(z0.005=2.58, z0.01=2.33, z0.025=1.96, z0.05=1.645)
7/27/2019 2. Statistical Inference - Single Population
23/38
22
#5 A fast-food restaurant plans a special offer that will enablecustomers to purchase specially designed drink glassesfeaturing well-known cartoon characters. Ifmore than 15%
of the customers will purchase the glasses, the special offerwill be implemented. A preliminary test has been set up atseveral locations, and 88 of 500 customers purchased the
glasses. Should the special offer be introduced?(t87,0.005=2.634, t87,0.01=2.37, t87,0.025=1.988, t87,0.05=1.663);(z0.005=2.58, z0.01=2.33, z0.025=1.96, z0.05=1.645)
7/27/2019 2. Statistical Inference - Single Population
24/38
23
#6 For a new newspaper to be financially viable, it has tocapture more than 12% of the Toronto market. In a surveyconducted among 400 randomly selected prospective
readers, 58 participants indicated they would subscribe tothe newspaper. Can the publisher conclude that theproposed newspaper will be financially viable at the 10%
significance level?(t57,0.005=2.665, t57,0.01=2.39, t57,0.025=2.00, t57,0.05=1.67);(z0.005=2.58, z0.01=2.33, z0.025=1.96, z0.05=1.645, z0.1=1.282)
7/27/2019 2. Statistical Inference - Single Population
25/38
24
Confidence Interval Estimation 4 Steps
Confidence interval estimation relies on the same concepts
and relationships as does hypothesis testing. A simple fourstep approach to these problems can also be helpful.
1. We begin by calculating the point estimate from our sample
data.2. To establish the appropriate interval width, find the upperand lower limits on the standardized distribution associatedwith your confidence level.
3. Use the confidence interval formulas to place them on thesampling distribution.4. Place the sample statistic at the center of the interval and
the confidence interval is complete.
7/27/2019 2. Statistical Inference - Single Population
26/38
25
Population Mean w/ Sigma known
Population Mean w/ Sigma unknown
Population Proportion
nzx a 2/
n
stx 2/a
n
ppzp
)1( 2/
a
Confidence Interval Formulas
7/27/2019 2. Statistical Inference - Single Population
27/38
26
1a is the confidence level associated with the interval Sample statistic is used as the center of the interval
W: width of the interval; 2 x W: total length of the interval
UCL (Upper Confidence Limit) and LCL (Lower Confidence Limit)are found using the critical value associated with a/2
Confidence interval width for mean is a function of: use oft distribution orz distribution
level of confidence chosen (positively related) of the sampling distribution (positively related) sample size (negatively related)
Population parametercan lie outside of interval in fact, we
know it will a % of the time If interested in establishing a confidence interval of a specific width
and level of confidence, calculate the least number that is requiredto be in your sample to achieve your objective ahead of time.
7/27/2019 2. Statistical Inference - Single Population
28/38
27
#7 As a new Subway franchisee, you are estimating your expected annual
sales. You have annual sales figures from 70 randomly selected stores.The average in your sample turns out to be $856,000, with a standarddeviation of $24,000. The population standard deviation is 28,000. Youwant a 90% and 95% confidence interval around your estimate.
(t69,0.005=2.649, t69,0.01=2.382, t69,0.025=1.995, t69,0.05=1.667);(t70,0.005=2.648, t70,0.01=2.381, t70,0.025=1.994, t70,0.05=1.667);(z0.005=2.58, z0.01=2.33, z0.025=1.96, z0.05=1.645)
Example Confidence Interval Estimation
7/27/2019 2. Statistical Inference - Single Population
29/38
28
Confidence Interval Estimation 4 Steps
1. Calculate the point estimate.
2. Find the upper and lower limits on the standardizeddistribution associated with your confidence level.
For 1a = 90%; Z0.05=3. Use the confidence interval formulas to place the upper
and lower limits on the sampling distribution.
4. Place the point estimate at the center of the interval
x
nzx
a 2/
7/27/2019 2. Statistical Inference - Single Population
30/38
29
Using CI to decide hypothesis tests:
If you have calculated a confidence interval, and then decide
you also want to test a hypothesis with this information, youcan do so directly provided:
The hypothesis being tested is two-tailed
Thea
from the hypothesis test, and 1a
from the confidenceinterval total 1.0
If these two conditions hold, then determine whether thehypothesized value in the null hypothesis for the parameter
falls in the interval created. If it does, DNR H0. If it does notReject H0.
7/27/2019 2. Statistical Inference - Single Population
31/38
30
ExampleUsing CI to decide a Hypothesis Test
#8 The owners of Subway claim that their stores average $875,000
in annual sales. You used this information in deciding to open astore in Delaware. Your store, however, has not come even closeto these annual sales figures. You want to prove that you weremisled, and that the average figure for all stores is actually NOT875,000. You collect annual sales figures from 70 randomlyselected stores. The average in your sample turns out to be$856,000, with a standard deviation of $24,000. You also knowfrom a friend who is in management of a similar franchise, thatyou can count on the standard deviation of sales being $28,000.
Can you prove your claim?(t69,0.005=2.649, t69,0.01=2.382, t69,0.025=1.995, t69,0.05=1.667);
(t70,0.005=2.648, t70,0.01=2.381, t70,0.025=1.994, t70,0.05=1.667);
(z0.005=2.58, z0.01=2.33, z0.025=1.96, z0.05=1.645)
7/27/2019 2. Statistical Inference - Single Population
32/38
31
Sample sizes required to construct intervals of a certain
degree of confidence and width can be determined by usingone of the following formulas below:
Sample Size for Means
Sample Size for Proportions
a prioriidea of
no a prioriidea of
2
222/
w
z
n
a
2
2/ )1(
W
ppz
n
a
2
222/ )5(.
w
zn a
p
p
Estimating n for Confidence Intervals
7/27/2019 2. Statistical Inference - Single Population
33/38
32
When involving mean:
Use sample standard deviation from previous study as
Use a pilot study to obtain a standard deviation ()
Use judgment, or best guess
When involving proportion: Use best estimate if confident of a reasonable value for
You have an a priorivalue for sample proportion
Use 0.5 as You have no a priorivalue for sample proportion
p
p
Estimating n for Confidence Intervals
7/27/2019 2. Statistical Inference - Single Population
34/38
33
Example Estimating n for Confidence Intervals
#9The interval you have created for your Subway is a goodstart, but you would be more comfortable with a tighterrange for your estimate of sales. You decide that themaximum you can tolerate is +/- 2,500. What sample sizewould you need to collect to obtain a 90% confidenceinterval for annual sales with a width of 2,500?(t69,0.005=2.649, t69,0.01=2.382, t69,0.025=1.995, t69,0.05=1.667);
(t70,0.005=2.648, t70,0.01=2.381, t70,0.025=1.994, t70,0.05=1.667);
(z0.005=2.58, z0.01=2.33, z0.025=1.96, z0.05=1.645)
2
22
2/
w
zn
a
7/27/2019 2. Statistical Inference - Single Population
35/38
34
ExamplesConfidence Intervals
#10 The Environmental Protection Agency (EPA) has agreed to
give tax rebates to manufacturers of vehicles that get acombined city and highway gas mileage of at least 32mpg. A 49 car sample of a new Ford vehicle reveals amean of 32.6 mpg. It is believed that the highway gasmileage for Ford vehicles has a standard deviation of 0.78mpg.(t48,0.005=2.68, t48,0.01=2.41, t48,0.025=2.01, t48,0.05=1.68);(z0.005=2.58, z0.01=2.33, z0.025=1.96, z0.05=1.645)
Construct a 95% confidence interval. Then a 99%confidence interval
7/27/2019 2. Statistical Inference - Single Population
36/38
35
#11 Redo example #10, but this time, the standard deviation ofthe mpg for the 49 cars is 0.83, and there is no credible
information regarding the population standard deviation ofmpg for these vehicles.
Construct a 95% confidence interval. Then a 99%
confidence interval.
7/27/2019 2. Statistical Inference - Single Population
37/38
36
#12 Suppose we have made an interval estimation for themean of the population such as: [126.56, 192.41]. If we
realize that the true population mean is 195.7, what shouldwe conclude?
The procedure for interval estimation must have been doneincorrectly.
We should first standardize the LCL and UCL and then seeif they capture the mean.
The procedure can still be valid, since we allow for a certainamount of error.
We must use a t distribution instead of a z distribution.
We could never get this result.
7/27/2019 2. Statistical Inference - Single Population
38/38
37
#13 A major news source conducted a poll asking 814 adults torespond to a series of questions about their feelings towardthe state of affairs within the United States. A total of 562
adults responded yes to the question: Do you feel thingsare going well in the United States these days?
A) What is the point estimate of the proportion of the adultpopulation that feel things are going well in the United
States?
B) What is the 90% confidence interval for the proportion ofthe adult population that feels things are going well in the
United States?
C) If one wanted to be 95% certain, and have an interval nowider than 3%, what sample size would be required?