Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Faculdade de Ciências Económicas e EmpresariaisUNIVERSIDADE CATÓLICA PORTUGUESA
MBACATÓLICA
Miguel GouveiaManuel Leite Monteiro
Quantitative Methods
MBACatólica 2006/07 Métodos Quantitativos 7-2
9. SAMPLING DISTRIBUTIONS
MBACatólica 2006/07 Métodos Quantitativos 7-3
Problem
! A soft-drink vending machine is set so the amount of drink dispensed is a random variable with a mean of 200 milliliters and a standard deviation of 15 milliliters. What is the probability that the average amount dispensed in a random sample of 36 is at least 204 milliliters:
a) if the the random variable is normally distributed?
b) if the distribution is unknown?
MBACatólica 2006/07 Métodos Quantitativos 7-4
Distribution of the sample mean
! The sample mean (computed from n observations drawn from a population) is a random variable.
! Our objective is to study the distribution of that variable and to see how it is related to the distribution of the population from which the sample was drawn.
MBACatólica 2006/07 Métodos Quantitativos 7-5
Distribution of the sample mean
! Example: samples (with replacement) of size n=2from a population with four values: 1, 2, 3, 4.
(µ=2.5 e σ 2 =1.25)
! Possible samples : 16 Sample means
4,44,34,24,1
3,43,33,23,1
2,42,32,22,1
1,41,31,21,1
4.03.53.02.5
3.53.02.52.0
3.02.52.01.5
2.52.01.51.0
MBACatólica 2006/07 Métodos Quantitativos 7-6
Distribution of the sample mean
116Total
1/1614.0
2/1623.5
3/1633.0
4/1642.5
3/1632.0
2/1621.5
1/1611.0
ProbabilityNº of samplesSample Mean
MBACatólica 2006/07 Métodos Quantitativos 7-7
Distribution of the sample mean
0.3
0.2
0.1
01 2 3 4 1.0 1.5 2.0 2.5 3.0 3.5 4.0
0
0.1
0.2
0.3
( )f x
x
Distribution of the population Distribution of the sample mean
( )f x
x
MBACatólica 2006/07 Métodos Quantitativos 7-8
Distribution of the sample mean
! The mean of the sample mean’s distribution is the mean of the population.
! Concepts of mean being used:
Expected value (parameter of the mean's distribution)
Random variable
Parameter (parameter of the universe)
. ( ) 2.5E X x f x µ = = = ∑
MBACatólica 2006/07 Métodos Quantitativos 7-9
Distribution of the sample mean
! The standard deviation of the sample mean is:
! As the sample size (n) increases, the standard deviation of the mean decreases.
! As the standard deviation (σ) decreases, the standard deviation of the mean also decreases.
( ) ( )2
2
. 0.625
1.25 / 2
V X x f x
V X n
µ
σ
= − = = =
∑
xn
σ σ=
MBACatólica 2006/07 Métodos Quantitativos 7-10
Distribution of the sample mean
0
.1
.2
.3
Sample mean (n = 2)
0
.1
.2
.3
Population: N = 4
22.5 1.25µ σ= = [ ] 2.5 [ ] 0.625E X V X= =
( )f x( )f x
1 2 3 4 1.0 1.5 2.0 2.5 3.0 3.5 4.0xx
MBACatólica 2006/07 Métodos Quantitativos 7-11
Distribution of the sample mean
1 2
1 2
2 2 2 2 2
2 2
...
...
...
...
n
n
X X XE X E
n
n
n nX X X
V X Vn
n
n n n
µ µ µ µ µ
σ σ σ σ σ
+ + + = + + += = =
+ + + = + + += = =
MBACatólica 2006/07 Métodos Quantitativos 7-12
Distribution of the sample meanfor Normal Populations
! The linear combination of independent normal random variables is itself a normal random variable.
! Application:
If X ~ N (µ, σ) then
! X×Y e X/Y do not have a normal distribution
1
~ ,n
i ii
X X f Nn
σµ=
=
∑
MBACatólica 2006/07 Métodos Quantitativos 7-13
Problem
! A soft-drink vending machine is set so the amount of drink dispensed is a random variable with a mean of 200 milliliters and a standard deviation of 15 milliliters. What is the probability that the average amount dispensed in a random sample of 36 is at least 204 milliliters:
a) if the the random variable is normally distributed?
b) if the distribution is unknown?
MBACatólica 2006/07 Métodos Quantitativos 7-14
Solution
! X: quantity of the soft-drink dispensed, with µ=200 and σ=15. Sample size: n=36
a)
! probability that the average amount is at least 204:
and if the distribution was unknown?
( )2
2 15if ~ 200,15 ~ 200,
36X N X N
⇒
[ ]
204 200P 204 =P
15 36
=P 1.6 =1-0.9452=5.48%
XX
n
Z
µσ − − ≥ ≥
≥
MBACatólica 2006/07 Métodos Quantitativos 7-15
Central Limit Theorem
! The distribution of a random variable obtained from the sum (mean) of “n” independent and identically distributed (i.i.d) random variables approaches a normal distribution as “n” increases.
! This result is independent from the distribution of the population.
! If X1, X2, ..., Xn are n random variables i.i.d. with mean µ and variance σ 2, then:
( )1,0~ Nn
X
σµ−
MBACatólica 2006/07 Métodos Quantitativos 7-16
MBACatólica 2006/07 Métodos Quantitativos 7-17
MBACatólica 2006/07 Métodos Quantitativos 7-18
Central Limit Theorem
As the sample size increases…
…the distribution of the sample mean becomes almost Normal, independently of the population’s distribution.
x
MBACatólica 2006/07 Métodos Quantitativos 7-19
Central Limit Theorem
! What sample size (n) is “large enough”?
– For most population distributions, n>30
– For distributions that are fairly symmetric, n>15 may suffice
– For distributions that are normally distributed, the sampling distribution of the mean will always be normally distributed, regardless of the sample size.
MBACatólica 2006/07 Métodos Quantitativos 7-20
Solution
! X: quantity of the soft-drink dispensed, with µ=200 and σ=15. Sample size: n=36
b)
! probability that the average amount is at least 204:
215since is "large" ~ = 200,
36n X N
⇒
!
[ ]
204 200P 204 =P
15 36
P 1.6 =1-0.9452=5.48%
XX
n
Z
µσ − − ≥ ≥
≥"
MBACatólica 2006/07 Métodos Quantitativos 7-21
10. INTRODUCTION TO STATISTICAL INFERENCE
MBACatólica 2006/07 Métodos Quantitativos 7-22
Statistical Inference
11. Point Estimation
12. Confidence Intervals
13. Hypothesis Tests
MBACatólica 2006/07 Métodos Quantitativos 7-23
Problem
! BankX plans to launch a new financial product different from all the existing ones. A sample of 25 potential investors provided the following information regarding the amount they wish to invest in the new product (normally distributed): Σxi=1000 and Σ(xi–x)2=9600.
a) Compute a point estimate for the average amount invested.
b) Compute a 90% confidence interval for the average amount invested.
MBACatólica 2006/07 Métodos Quantitativos 7-24
Parameters and Statistics
! Parameter: is a numerical value that characterizes the distribution or the universe studied.
! Estimator: is a random variable that can take different values depending on the particular sample drawn.
! Estimate: is a number that is obtained from a specific sample.
MBACatólica 2006/07 Métodos Quantitativos 7-25
11. Point Estimation
MBACatólica 2006/07 Métodos Quantitativos 7-26
Estimators for the mean, variance and proportion
sSσStandard
deviation
(fn)fnpProportion
s2S2σ 2Variance
µMean
EstimateEstimatorPopulation’s parameter
X x
MBACatólica 2006/07 Métodos Quantitativos 7-27
Estimator’s properties
! UnbiasednessAn estimator is unbiased it the mean of its distribution equals the parameter.
! Efficiency
An unbiased estimator is the most efficient if its variance (around the parameter) is minimal.
! Consistency
An estimator is consistent if, as the sample size increases, itsmean approaches the parameter and its variance decreases.
MBACatólica 2006/07 Métodos Quantitativos 7-28
µµµµ
BiasedUnbiased
Unbiasedness
( )f ⋅
MBACatólica 2006/07 Métodos Quantitativos 7-29
µµµµ
Sampling distribution of the median
Sampling distribution of the mean
Efficiency
( )f ⋅
MBACatólica 2006/07 Métodos Quantitativos 7-30
µµµµ
Large sample
Small sample
Consistency
( )f ⋅
MBACatólica 2006/07 Métodos Quantitativos 7-31
Problem
! BankX plans to launch a new financial product different from all the existing ones. A sample of 25 potential investors provided the following information regarding the amount they wish to invest in the new product (normally distributed): Σxi=1000 and Σ(xi–x)2=9600.
a) Compute a point estimate for the average amount invested.
b) Compute a 90% confidence interval for the average amount invested.
MBACatólica 2006/07 Métodos Quantitativos 7-32
Solution
! BankX plans to launch a new financial product different from all the existing ones. A sample of 25 potential investors provided the following information regarding the amount they wish to invest in the new product (normally distributed): Σxi=1000 and Σ(xi–x)2=9600.
a) Compute a point estimate for the average amount invested.
2n=25; x=1000 25 =40; s = 9600 24 =400.
ˆpoint estimate: = x= 1000 25 =40µb) Compute a 90% confidence interval for the average amount
invested.
MBACatólica 2006/07 Métodos Quantitativos 7-33
12. CONFIDENCE INTERVALS
MBACatólica 2006/07 Métodos Quantitativos 7-34
Point Estimation vs. Confidence Intervals
The mean, µµµµ, is unknown
Population Random sampleI’ve got 95%
confidence that µµµµis located
between 40 and 60.
Mean x = 50
Sample
MBACatólica 2006/07 Métodos Quantitativos 7-35
Confidence Intervals for the mean
! Example for a Normal population (or for “large” samples)
As: we have
Thus:
n
NXσµ,~ ( )1,0~ N
n
X
σµ−
1.96 1.96 0.95/
XP
n
µσ
−− < < =
MBACatólica 2006/07 Métodos Quantitativos 7-36
Confidence Intervals for the mean
which can also be written as:
! So, we have a 95% confidence interval for the mean:
1.96 1.96x xn n
σ σµ− < < +
1.96 1.96 0.95P X Xn n
σ σµ − < < + =
MBACatólica 2006/07 Métodos Quantitativos 7-37
Interpretation of a (1-α)% confidence interval
! (1-α)% is the percentage of confidence intervals,
– from successive samples,
– all with size n,
– drawn from the same population
that include the true value of the parameter being estimated.
MBACatólica 2006/07 Métodos Quantitativos 7-38
Interpretation of a (1-α)% confidence interval
Confidence intervals for 10 different
samples
of the intervals contain and
don’t.
/ 2α / 2α
x[ ]E X µ=
1 α−
µ%α
/ 2zn
ασµ +
/ 2zn
ασµ −
( )1 %α−
MBACatólica 2006/07 Métodos Quantitativos 7-39
(1- α)% CI for the mean:Normal Pop., n large and σ known
! For a Normal population (or large n) with σ known:
1. Define the level of confidence (1- α)%
2. Collect a sample with size n. Compute
3. Obtain zα/2 from the statistic tables
4. The confidence interval is given by:
x
2 2x z x zn n
α ασ σµ− < < +
MBACatólica 2006/07 Métodos Quantitativos 7-40
Problem
! BankX plans to launch a new financial product different from all the existing ones. A sample of 25 potential investors, collected the following information regarding the amount they wish to invest in the new product (normally distributed): Σxi=1000 and Σ(xi–x)2=9600.
a) Compute a point estimate for the average amount invested.
b) Compute a 90% confidence interval for the average amount invested.
MBACatólica 2006/07 Métodos Quantitativos 7-41
Solution
! BankX plans to launch a new financial product different from all the existing ones. A sample of 25 potential investors, collected the following information regarding the amount they wish to invest in the new product (normally distributed): Σxi=1000 and Σ(xi–x)2=9600.
b) Compute a 90% CI for the average amount invested.
2n=25; x=1000 25 =40; s = 9600 24 =400.
0.05
2 2
10% 1.64520
40 1.64525
IC for : (33.156,46.844)
z
zx x zn n
α α
ασ σµ
µ
= ⇒ =
− < < + ⇒ ±
MBACatólica 2006/07 Métodos Quantitativos 7-42
Conflict between credibility and precision
! Credibility – Confidence level of an interval
! Precision – Width of the confidence interval
! For a given sample size n:– More precision means decrease the width of the interval. Therefore
implying a lower level of confidence.
– A higher level of confidence implies a larger interval (less precision).
! The only way to increase simultaneously the precision and the credibility of the inference is to increase n.
MBACatólica 2006/07 Métodos Quantitativos 7-43
! A vending machine is calibrated to pour a quantity of liquid that follows a normal distribution with variance equal to 16 ml2. In a sample of 25 drinks, the average was:
We want:
a) To construct a 95% Confidence Interval for the true average quantity of liquid on the served drinks;
b) To determine how many drinks should be included on a new sample, if the interval precision is to be increased to 2 ml.
Problem
2 5 0x m l=
MBACatólica 2006/07 Métodos Quantitativos 7-44
Solution
a)
The width of the interval is 3.136 ml.
568.251432.24825
496.1250
25
496.1250
96,196.1
<<
+<<−
+<<−
µ
µ
σµσn
xn
x
1.96 1.96
4 4250 1.96 250 1.96
25 25248.432 251.568
x xn n
σ σµ
µ
µ
− < < +
− < < +
< <
MBACatólica 2006/07 Métodos Quantitativos 7-45
Solution
b)
Width =
nz
σα 2
62
84.7
496.122
2 2
==
×=
×
n
n
n
nz
σα
MBACatólica 2006/07 Métodos Quantitativos 7-46
Problem
! Ten analysts have given the following year earnings forecasts for a stock, which are normally distributed:
Compute a 95% confidence interval for the population mean of the forecasts.
Forecast ( ) Number of analysts ( )1.40 11.43 11.44 31.45 21.47 11.48 11.50 1
i iX n
MBACatólica 2006/07 Métodos Quantitativos 7-47
Population’s Variance unknown
! Until now we have assumed that the variance of the population was known. However, it usually is unknown and has to be estimated.
! We know that
is an unbiased estimator for the population variance.
2 2E S σ =
( )2
2 1
1
n
ii
X XS
n=
−=
−
∑
MBACatólica 2006/07 Métodos Quantitativos 7-48
Distribution of the sample mean from a Normal population with unknown σ
! If the population is Normal, is the sample mean distribution still given by
?
For small samples the answer is NO!
( )~ 0,1X
NS n
µ−
MBACatólica 2006/07 Métodos Quantitativos 7-49
Distribution of the sample mean from a Normal population with unknown σ
! With σ unknown, we have a “t” distribution:
where:
( )2
2 1
1
n
ii
x xS
n=
−=
−
∑
( )~ 1X
t nS n
µ− −
MBACatólica 2006/07 Métodos Quantitativos 7-50
z, t0
t (df = 5)
Normal (0,1)
t (df = 13)Also bell shapedAlso symmetric
But with wider tails
t distribution (Student’s distribution)
MBACatólica 2006/07 Métodos Quantitativos 7-51
0.90 0.95 0.975 0.99 0.9951 3.078 6.314 12.706 31.821 63.6562 1.886 2.920 4.303 6.965 9.9253 1.638 2.353 3.182 4.541 5.8414 1.533 2.132 2.776 3.747 4.6045 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.7077 1.415 1.895 2.365 2.998 3.4998 1.397 1.860 2.306 2.896 3.3559 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.10612 1.356 1.782 2.179 2.681 3.05513 1.350 1.771 2.160 2.650 3.01214 1.345 1.761 2.145 2.624 2.97715 1.341 1.753 2.131 2.602 2.947
26 1.315 1.706 2.056 2.479 2.77927 1.314 1.703 2.052 2.473 2.77128 1.313 1.701 2.048 2.467 2.76329 1.311 1.699 2.045 2.462 2.756inf 1.282 1.645 1.960 2.326 2.576
F(x)n
1-0.975
0 3.182
t (df = 3)
Student’s t distribution
MBACatólica 2006/07 Métodos Quantitativos 7-52
(1- α)% CI for the mean:Normal Pop. and σ unknown
! For a Normal population with σ unknown:
1. Define the level of confidence (1- α)%
2. Collect a sample with size n. Compute
3. Obtain from the statistical tables
4. The confidence interval is given by:
x( )1
/ 2ntα
−
( ) ( )1 1/ 2 / 2
n ns sx t x t
n nα αµ− −− < < +
MBACatólica 2006/07 Métodos Quantitativos 7-53
Problem
! Ten analysts have given the following year earnings forecasts for a stock, which are normally distributed:
Compute a 95% confidence interval for the population mean of the forecasts.
Forecast ( ) Number of analysts ( )1.40 11.43 11.44 31.45 21.47 11.48 11.50 1
i iX n
MBACatólica 2006/07 Métodos Quantitativos 7-54
Solution
! For a 99% confidence level, the interval would be:
90.025
1.45; 0.02789; 10; 9
2.262
0.02789 0.027891.45 2.262 1.45 2.262
10 101.43 1.47
x s n df
t
µ
µ
= = = ==
− ≤ ≤ +
≤ ≤
0.02789 0.027891.45 3.250 1.45 3.250
10 101.421 1.479
µ
µ
− ≤ ≤ +
≤ ≤
MBACatólica 2006/07 Métodos Quantitativos 7-55
Distribution of the sample mean
CLT
We don’t know the
distribution
CLT
We don’t know the
distribution
Not Normal
Population
CLT
Normal
Population
n≥30n<30n≥30n<30
σ Unkownσ Known
~ (0,1)X
N
n
µσ
−~ (0,1)
XN
Sn
µ−~ (0,1)
XN
n
µσ
−
~ (0,1)X
N
n
µσ
−~ (0,1)
XN
Sn
µ−
~ ( 1)X
t nS
n
µ− −
MBACatólica 2006/07 Métodos Quantitativos 7-56
! The true proportion of a population is p.
The estimator of p is the proportion on the sample,
i.e., , where X is a binomial variable:n
Xf
n=
Confidence interval for a proportion
[ ] [ ] pp
npXE
nPE === 1
[ ] [ ]1n
npE f E X p
n n= = =
[ ] [ ] ( ) ( )2 2
1 11n
np p p pV f V X
n n n
− −= = =
MBACatólica 2006/07 Métodos Quantitativos 7-57
Confidence interval for a proportion
! For a large n:
! The confidence interval is given by:
( )( )~ 0 ,1
1nf p
Np p
n
−−
( ) ( )2 2
1 1n n n nn n
f f f ff z p f z
n nα α− −
− < < +
MBACatólica 2006/07 Métodos Quantitativos 7-58
(1- α)% CI for a proportion :with large samples
1. Define the level of confidence (1- α)%
2. Collect a sample of size n. Compute
3. Obtain zα/2 from the statistic tables
4. The confidence interval is given by:
nf
( ) ( )2 2
1 1n n n nn n
f f f ff z p f z
n nα α− −
− < < +
MBACatólica 2006/07 Métodos Quantitativos 7-59
Problem
! We want to estimate the proportion of voters in a political party. 400 citizens were interviewed and 140 of them revealed the intention to vote on that party.Compute a 99% confidence interval for the proportion of votes on that party.
MBACatólica 2006/07 Métodos Quantitativos 7-60
Solution
/ 2
400
140 / 400 0.35, 1 0.65
1 0.99, / 2 0.005, 2.57
0.35*0.65 0.35*0.650.35 2.57 0.35 2.57
400 4000.28871 0.41129
n n
n
f f
z
p
p
αα α
== = − =
− = = =
− ≤ ≤ +
≤ ≤
MBACatólica 2006/07 Métodos Quantitativos 7-61
Selection of the sample size
! The sample size is a decision variable reflecting a conflict between precision and the cost of sampling.
Very large:
• Too expensive
Very small:
• Imprecise results
MBACatólica 2006/07 Métodos Quantitativos 7-62
Selection of the sample size
! Question: for a desirable minimum precision, what should be the minimum sample size to be drawn?
The choice of n is affected by 3 factors:
1. The level of precision or the level of margin of error (interval width)
2. Level of confidence
3. The dispersion of the population
MBACatólica 2006/07 Métodos Quantitativos 7-63
Sample size:Estimation of a proportion
! Since the confidence interval is given by:
it can also be written as
with e being the margin of error.
( ) ( )2 2
1 1n n n nn n
f f f ff z p f z
n nα α− −
− < < +
n nf e p f e− < < +
MBACatólica 2006/07 Métodos Quantitativos 7-64
Sample size:Estimation of a proportion
! Fixing e, it is possible to obtain n as:
! BUT: the value of is unknown before the sample is drawn.
The value used for should be the one that maximizes p(1-p), i.e., .
( )22 2
1( ) n nf f
n zeα−
=
nf
0.5nf =
nf
MBACatólica 2006/07 Métodos Quantitativos 7-65
Problem
! Determine the minimum size of a sample in order to compute a 95% confidence interval for the proportion of consumers who are willing to buy a new product, with a margin of error of one percentage point.
! Recompute that confidence interval if you were sure that, given the high price of the product, no more than 25% of consumers would buy it.
MBACatólica 2006/07 Métodos Quantitativos 7-66
Solution
! If we knew “a priori” that p<0.25, then
/ 2
22
0.01
5%
1.96
0.5 0.51.96 9604
0.01
e
Z
n
α
α==
=×= =
22
0.25 0.751.96 7203
0.01n
×= =
MBACatólica 2006/07 Métodos Quantitativos 7-67
Sample size:Estimation of the mean
! The confidence interval is given by:
and it can be written as:
Thus:
2 2x z x zn n
α ασ σµ− < < +
x e x eµ− < < +2
22 2
( )n zeασ=
MBACatólica 2006/07 Métodos Quantitativos 7-68
Sample size:Estimation of the mean
! If σ is unknown:1. Collect a pilot sample, with a smaller size, to
estimate σ.
2. If the population is approximately normal:
Prob[µ ± 2σ]=0.95 and Prob[µ ± 3σ]=0.997
Therefore (and using past data or subjective evaluations of the population), we can “estimate”:
ι. σ = (Percentile 97.5- Percentile 2.5)/4
ιι. σ = (MAX- MIN)/6
MBACatólica 2006/07 Métodos Quantitativos 7-69
Problem
! Suppose you want to estimate the population mean of the analysts forecasts for next year stock earnings to within ± 0.01 with 95% confidence.On the basis of past studies, you believe the standard deviation of those forecasts to be 0.03.Find the minimum sample size needed.
MBACatólica 2006/07 Métodos Quantitativos 7-70
Solution
We need at least 35 forecasts in our sample.
/ 2
22
2
0.01
0.03
5%
1.96
0.031.96 34.6
0.01
e
z
n
α
σα
===
=
= =