Upload
renzo-marquez
View
11
Download
0
Embed Size (px)
DESCRIPTION
Qmt12 Chapter 7 Sampling Distributions
Citation preview
Chapter 7
Sampling and Sampling Distributions
McGraw-Hill/Irwin Copyright 2011 by The McGraw-Hill Companies, Inc. All rights reserved.
St. Andrews
St. Andrews University receives 900 applications annually from prospective students. The application forms contain a variety of information including the individuals scholastic aptitude test (SAT) score and whether or not the individual desires on-campus housing.
St. Andrews
l To get numerical/statistical information from the population (for example, the mean scores of all the applicants) l Census of all 900 applicants l Survey of a portion of the applicants (ex. 30)
l Taking a Census of the 900 Applicants l SAT Scores
l Population Mean l Population Standard Deviation
l Applicants Wanting On-Campus Housing l Population Proportion
= = ix 990900
= = ix
2( )80
900
= =p 648 .72900
St. Andrews
l Taking a survey of 30 people
Random No. Number Applicant SAT Score On-Campus 1 744 Connie Reyman 1025 Yes 2 436 William Fox 950 Yes 3 865 Fabian Avante 1090 No 4 790 Eric Paxton 1120 Yes 5 835 Winona Wheeler 1015 No . . . . . 30 685 Kevin Cossack 965 No
St. Andrews
l Population
= = = ixx 29,910 99730 30
= = = ix xs
2( ) 163,996 75.229 29
= =p 20 30 .68
St. Andrews
= = ix 990900
= = ix
2( )80
900
= =p 648 .72900
Sample
l The absolute value of the difference between an unbiased point estimate and the population parameter it estimates is called the sampling error.
l For the case of a sample mean estimating a population mean:
Sampling Error
Sampling Error = | |x
l Population
= = = ixx 29,910 99730 30
= = = ix xs
2( ) 163,996 75.229 29
= =p 20 30 .68
St. Andrews
= = ix 990900
= = ix
2( )80
900
= =p 648 .72900
Sample
Sampling Distribution of the Sample Mean
The sampling distribution of the sample mean x is the probability distribution of the population of the sample means obtainable from all possible samples of size n from a population of size N
Example: Sampling Annual % Return of 6 Stocks
STOCKS A B C D E F % RETURN 10% 20% 30% 40% 50% 60%
Assume that we have a population of 6 stocks (shown in the table) Computing for the population parameters, we get:
N = 6 = 35% = 17.078%
Example: Sampling Annual % Return of 6 Stocks
l Lets try taking a random sample of size n = 1.
l We can take 6 samples (6C1) from the population, each with the same probability of being chosen.
l Thus, each would have a 1/6 chance of being chosen.
Example: Sampling Annual % Return of 6 Stocks
Stock
% Return
Frequency
Relative Frequency
Stock A 10 1 1/6 Stock B 20 1 1/6 Stock C 30 1 1/6 Stock D 40 1 1/6 Stock E 50 1 1/6 Stock F 60 1 1/6 Total 6 1
Example: Sampling Annual % Return of 6 Stocks
Example: Sampling Annual % Return of 6 Stocks
l Now, lets try taking samples of size n = 2. l We can take a total of 15 samples (6C2) from
the population of 6 stocks. l Calculating the sample mean of each and
every sample, we get
Example: Sampling Annual % Return of 6 Stocks
Sample % Returns Sample Mean
Sample % Returns Sample Mean
1 10%, 20% 15% 9 20%, 60% 40% 2 10%, 30% 20% 10 30%, 40% 35% 3 10%, 40% 25% 11 30%, 50% 40% 4 10%, 50% 30% 12 30%, 60% 45% 5 10%, 60% 35% 13 40%, 50% 45% 6 20%, 30% 25% 14 40%, 60% 50% 7 20%, 40% 30% 15 50%, 60% 55% 8 20%, 50% 35%
Example: Sampling Annual % Return of 6 Stocks
Sample Mean Frequency
Relative Frequency
15 1 1/1520 1 1/1525 2 2/1530 2 2/1535 3 3/1540 2 2/1545 2 2/1550 1 1/1555 1 1/15
Example: Sampling Annual % Return of 6 Stocks
Observations l Although the population of N = 6 stock
returns has a uniform distribution, l the histogram of n = 15 sample mean
returns: 1. Seem to be centered over the sample
mean return of 35%, and 2. Appears to be bell-shaped and less
spread out than the histogram of individual returns
Example: NYSE Stocks
l Population of returns of all 1,815 stocks listed on NYSE for 1987 l The mean rate of return was 3.5% with
a standard deviation of 26%
Example: NYSE Stocks
Example: NYSE Stocks
l Draw all possible random samples of size n = 5 and calculate the sample mean return of each
Example: Sampling All Stocks
Results from Sampling All Stocks
l Observations l Both histograms appear to be bell-shaped and
centered over the same mean of 3.5% l The histogram of the sample mean returns looks
less spread out than that of the individual returns
l Statistics l Mean of all sample means: = = -3.5% l Standard deviation of all possible means:
%63.11526
===nx
x
And the Empirical Rule l The empirical rule holds for the sampling
distribution of the sample mean l 68.26% = 1 Standard Deviation from the Mean l 95.44% = 2 Standard Deviations from the Mean l 99.73% = 3 Standard Deviations from the Mean
Properties of the Sampling Distribution of the Sample Mean
l If the population being sampled is normal, then so is the sampling distribution of the sample mean, x
l The mean x of the sampling distribution of x is x = l That is, the mean of all possible sample means
is the same as the population mean
7-25
Properties of the Sampling Distribution of the Sample Mean #2
l The variance 2x of the sampling distribution of x is
l That is, the variance of the sampling distribution of x is l Directly proportional to the variance of the
population l Inversely proportional to the sample size
nx2
2 =
7-26
Properties of the Sampling Distribution of the Sample Mean #3
l The standard deviation x of the sampling distribution of x is
l That is, the standard deviation of the sampling distribution of x is l Directly proportional to the standard deviation of
the population l Inversely proportional to the square root of the
sample size
nx
=
7-27
Notes l The formulas for 2x and x hold if the sampled population
is infinite l The formulas hold approximately if the sampled population
is finite but if N is much larger (at least 20 times larger) than the n (N/n 20) l x is the point estimate of , and the larger the sample size
n, the more accurate the estimate l Because as n increases, x decreases (/n)
l Additionally, as n increases, the more representative is the sample of the population So, to reduce x, take bigger samples!
7-28
Finite Populations l If a finite population of size N is sampled randomly
and without replacement, must use the finite population correction to calculate the correct standard deviation of the sampling distribution of the sample mean l If N is less than 20 times the sample size, that is,
if N/n < 20
l Otherwise
nn xx
Finite Populations Continued
l The finite population correction is
l The standard error is
1
NnN
1
=N
nNnx
7-30
Effect of Sample Size
7-31
Central Limit Theorem l If the population is normally distributed
l The sampling distribution is normal regardless of the sample size n
l But if population is non-normal, what is the shape of the sampling distribution of the sample mean? l The sampling distribution is approximately normal
if the sample is large enough, even if the population is non-normal (Central Limit Theorem)
7-32
The Central Limit Theorem Continued
l No matter the probability distribution that describes the population, if the sample size n is large enough, the population of all possible sample means is approximately normal with mean x= and standard deviation x=/n
l Further, the larger the sample size n, the closer the sampling distribution of the sample mean is to being normal l In other words, the larger n, the better the
approximation
7-33
Central Limit Theorem Random Sample (x1, x2, , xn)
Population Distribution
(, ) (right-skewed)
X
as n large
( )n, xx ==
Sampling Distribution of Sample Mean
(nearly normal)
x
How Large? l How large is large enough? l If the sample size is at least 30, then for most
sampled populations, the sampling distribution of sample means is approximately normal l Here, if n is at least 30, it will be assumed that the
sampling distribution of x is approximately normal l If the population is normal, then the sampling
distribution of x is normal regardless of the sample size
7-36
Car Mileage Statistical Inference
l Recall from Chapter 3 example, x = 31.56 mpg for a sample of size n = 50 and = 0.8
l The minimum standard for a tax credit is a population mean mileage of at least 31 mpg
l We want to know if the automaker is qualified for the tax credit
l If the population mean is exactly 31, what is the probability of observing a sample mean that is greater than or equal to 31.56?
7-37
Car Mileage Statistical Inference #2
l Calculate the probability of observing a sample mean that is greater than or equal to 31.56 mpg if = 31 mpg l Want P(x > 31.56 if = 31)
l Compute for the standard error
x =n=0.850 = 0.1131
7-38
Car Mileage Statistical Inference #3
l Then
l But z = 4.95 is off the standard normal table l The largest z value in the table is 3.99, which
has a right hand tail area of 0.00003
P x 31.56 if = 31( ) = P z 31.56x x
#
$%
&
'(
= P z 31.56310.1131#
$%
&
'(
= P z 4.95( )
7-39
Car Mileage Statistical Inference #4
l z = 4.95 > 3.99, so P(z 4.95) < 0.00003 l If = 31 mpg, fewer than 3 in 100,000 of all samples have
a mean as large as observed l Difficult to believe such a small chance would occur,
so conclude that there is strong evidence that does not equal 31 mpg
l And, is, in fact, actually larger than 31 mpg l There is enough evidence to give a tax credit to the
automaker
7-40
Example l Suppose that we will randomly select a sample
of 64 measurements from a population having a mean equal to 20 and a standard deviation equal to 4. l Describe the shape of the sampling distribution of the
sample mean. l Find the mean and the standard deviation of the
sampling distribution of the sample mean. l Calculate the probability that the sample mean is
greater than 21. l Calculate the probability that the sample mean is less
than 19.385.
Exercise: Pizza Delivery l When a pizza restaurants delivery process is
operating effectively, pizzas are delivered in an average of 45 minutes with a standard deviation of 6 minutes. To monitor its delivery process, the restaurant randomly selects five pizzas each night and records their delivery times. The population of all delivery times on a given evening is known to be normally distributed. Suppose that a sample gave a mean of 55 minutes. Would you say that the restaurants delivery process is operating effectively?
Exercise: Bank Customer Waiting Time Case
l A bank manager wants to show that the new system reduces typical customer waiting times to less than six minutes. One way to do this is to demonstrate that the mean of the population of all customer waiting times is less than 6. We wish to investigate whether the sample of 100 waiting times provides evidence to support the claim that is less than 6. The mean of the sample of 100 waiting times is 5.46 minutes, and assume that of the population of all customer waiting times is known to be 2.47.
Exercise: Bank Customer Waiting Time Case
a) Consider the population of all sample means obtained from a sample of 100 waiting times. What is the shape of this population of sample means?
b) Find the mean and standard deviation of the population of all possible sample means when we assume that is equal to 6.
c) The sample mean that we have actually observed is 5.46. Assuming that = 6, find the probability of observing a sample mean that is less than or equal to 5.46.
d) Is it more reasonable to believe that = 6 or is less than 6? What do you conclude about whether the new system has reduced the typical customer waiting time to less than 6 minutes?
7.2 Sampling Distribution of the Sample Proportion
l The probability distribution of all possible sample proportions is the sampling distribution of the sample proportion
l If a random sample of size n is taken from a population, then the sampling distribution of the sample proportion is l Approximately normal, if n is large enough
l n can be considered large if both np and n(1 p) are at least 5 l Has a mean that equals =
l Has standard deviation
Where is the population proportion and p is the sampled proportion
( )np
=1
7-45
Sampling Distribution of the Sample Proportion
l If the population is finite and N/n < 20, l The finite population correction is
l The standard error of the proportion is
1
NnN
1)1(
=
NnN
np
7-46
Example: Sample Proportion
l Suppose that we will randomly select a sample of n = 100 units from a population and that we will compute the sample proportion of these units that fall into a category of interest. If the true population proportion p equals 90%: l Describe the shape of the sampling distribution of l Find the mean and the standard deviation of the
sampling distribution of
p
p
p
Example: Sample Proportion
l Calculate the following probabilities about the sample proportion . In each case sketch the sampling distribution and the probability. l P( 0.96) l P(0.855 0.945) l P( 0.915)
p
pp
p
Exercise: Bank of America l Historically, the percentage of Bank of America
customers expressing customer delight has been 48%. Suppose we want to justify the claim that more than 48% of all current Bank of America customers express customer delight. To do so, we did a survey of 350 Bank of America customers, out of which, 189 said they were delighted. Can we say that the percentage of customers expressing delight is, indeed, greater than 48%?