View
223
Download
1
Category
Tags:
Preview:
Citation preview
1
Review #2
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Chapter 9
• A statistic is a random variable describing a characteristic of a random samples.– Sample mean– Sample variance
• We use statistic values in inferential statistics (make inference about population characteristics from sample characteristics).
• Statistics have distributions of their own.
The Central Limit Theorem
– The distribution of the sample mean is normal if the parent distribution is normal.
– The distribution of the sample mean approaches the normal distribution for sufficiently large samples (n 30), even if the parent distribution is not normal.
– The parameters of the sample distribution of the mean are:
• Mean:
• Standard deviation:
nx
x
xx
Problem 1
• Given a normal population whose mean is 50 and whose standard deviation is 5,– Find the probability that a random sample of 4
has a mean between 49 and 52– Answer:
.4435.3446.7881.8)Z.4P(
)45
5052Z
45
5049P(52)xP(49
-.4 .8
Problem 2
– Find the probability that a random sample of 16 has a mean between 49 and 52.
– Answer
.7213.2119.93321.6)Z.8P(
)165
5052Z
165
5049P(52)xP(49
Problem 2
• The amount of time per day spent by adults watching TV is normally distributed with =6 and =1.5 hours.– What is the probability that a
randomly selected adult watches TV for more than 7 hours a day?
– Answer:
4.251.74861.67)P(Z1.567
ZP7)P(X
– What is the probability that 5 adults watch TV on the average 7 or more hours?
– Answer:
.0681.931911.49)P(Z
51.5
67ZP7)XP(
Problem 2
• Additional question– What is the probability that the total TV
watching time of the five adults sampled will exceed 28 hours?
– Answer:
551
665528
.
.)/( ZPXP
Sampling distribution of the sample proportion
• In a sample of size n, if np > 5 and n(1-p) > 5, then the sample proportion p = x/n is approximately normally distributed with the following parameters:
^
np)p(1
ppZ
therefore,,np)p(1
σandpμ pp
ˆ
ˆˆ
Problem 3
• A commercial of a household appliances manufacturer claims that less than 5% of all of its products require a service call in the first year.
• A survey of 400 households that recently purchased the manufacturer products was conducted to check the claim.
Problem 3
– Assuming the manufacturer is right, what is the probability that more than 10% of the surveyed households require a service call within the first year?
059440005105
051010
).(
).(.
..).ˆ( ZPZPpP
If indeed 10% of the sampled households reported a call for service within the first year, what does ittell you about the the manufacturer claim?
Chapter 10
• A population’s parameter can be estimated by a point estimator and by an interval estimator.
• A confidence interval with 1- confidence level is an interval estimator that covers the estimated parameters (1-)% of the time.
• Confidence intervals are constructed using sampling distributions.
Confidence interval of the mean
• We use the central limit theorem to build the following confidence interval
nzx
nzx
22 //
z/2-z/2
/2/2 1 -
Problem 4
• How many classes university students miss each semester? A survey of 100 students was conducted. (see Missed Classes)
• Assuming the standard deviation of the number of classes missed is 2.2, estimate the mean number of classes missed per student.
• Use 99% confidence level.
Problem 4
– Solution = 10.21 2.575 = 10.21 .57
nzx
2/
1- = .99 = .01/2 = .005Za/2 = Z.005= 2.575
100
2.2
LCL = 9.64, UCL = 10.78
Missed classes
Mean 10.21Standard Error 0.21755993Median 10Mode 10Standard Deviation 2.1755993Sample Variance 4.73323232Kurtosis 0.91111511Skewness -0.107237Range 14Minimum 3Maximum 17Sum 1021Count 100
Selecting the sample size
• The shorter the confidence interval, the more accurate the estimate.
• We can, therefore, limit the width of the interval to W, and get
• From here we have
nzWor
nzxWx
22 //
22
W
zn /
Problem 5
• An operation manager wants to estimate the average amount of time needed by a worker to assemble a new electronic component.
• Sigma is known to be 6 minutes.• The required estimate accuracy is within 20
seconds. • The confidence level is 90%; 95%.• Find the sample size.
Problem 5
– Solution = 6 min; W = 20 sec = 1/3 min;
• 1 - =.90 Z/2 = Z.05 = 1.645
• 1- = .95, Z/2 = Z.025 = 1.96
877
758763166451
2205
22
nTake
W
z
W
zn .
/
)(../
1245671244316961
2
nTaken .
/
)(.
Chapter 11
• Hypotheses tests– In hypothesis tests we hypothesize on a value of
a population parameter, and test to see if there is sufficient evidence to support our belief.
– The structure of hypotheses test• Formulate two hypotheses.
– H0: The one we try to reject in favor of …
– H1: The alternative hypothesis, the one we try to prove.
• Define a significance level
Hypotheses tests
– The significance level is the probability of erroneously reject the null hypothesis.
= P(reject H0 when H0 is true)– Sample from the population and calculate a
statistic that provides an indication whether or not the parameter value defined under H1 is more probable.
– We shall test the population mean assuming the standard deviation is known.
Problem 6
• A machine is set so that the average diameter of ball bearings it produces is .50 inch. In a sample of 100 ball bearings the mean diameter was .51 inch. Assuming the standard deviation is .05 inch, can we conclude at 5% significance level that the mean diameter is not .50 inch.
Problem 6
• The population studied is the ball-bearing diameters.
• We hypothesize on the population mean.• A good point estimator for the population
mean is the sample mean.• We use the distribution of the sample mean
to build a sample statistic to test whether = .50 inch.
Problem 6
• Solution– Define the hypotheses:
• H0: = .50
• H1: = .50
Define a rejection region. Note that this is a two tail test because of the inequality.
L2L1
L2L1
L2L1
ZZarearejectionlsymmetricatakeusLet
.05.50)μthatgivenZZorZP(Z
.05.50)μthatgivenXXorXXP(
Probability of type one error
Problem 6
.05.50)μthatgivenZZorZP(Z .025.025
Calculate the value of the sample Z statistic and compare it to the critical value
Z.025 = 1.96 (obtained from the Z-table)
Build a rejection region: Zsample> Z/2, or
Zsample<-Z/2
Critical Z
210005.50.51.
nX
Zsample
Since 2 > 1.96, there is sufficient evidence to rejectH0 in favor of H1 at 5% significance level.
1.96-1.96
Problem 6
• We can perform the test in terms of the mean value.
• Let us find the critical mean values for rejection
XL1=0 + Z.025 =.50+1.96(.05/(100)1/2=.5098
XL2=0 - Z.025 =.50 -1.96(.05/(100)1/2=.402
n
n
Since.51 > .5098, there is sufficient evidence to reject the null hypothesis at 5% significance level.
Problem 7
• The average annual return on investment for American banks was found to be 10.2% with standard deviation of 0.8%.
• It is believed that banks that exercise comprehensive planning do better.
• A sample of 26 banks that conducted a comprehensive training provided the following result: Mean return = 10.5%.
• Can we infer that the belief about bank performance is supported at 10% significance level by this sample result?
Problem 7
– The population tested is the “annual rate of return.”
H0: = 10.2
H1: > 10.2
– Let us perform the test with the p-value method:• P(X > 10.5 given that = 10.2) =
P(Z > (10.5 – 10.2)/[.8/(26)1/2] = P(Z > 1.91) = 1 - .5719 = .0281
– Since .0281 < .10 we reject the null hypothesis at 10% significance level.
Problem 7
• Note the equivalence between the standardized method or the rejection region method and the p-value method.
• P(Z>Z.10) = .10Z10 = 1.28
• Run the test with Data Analysis Plus.See data in Return
1.911.28
.0281
Type II Error
• Type II error occurs when H0 is erroneously not rejected.
• The probability of a type II error is called =P(Do not reject H0 when H1 is true)
• To calculate – H1 specifies an actual parameter value (not a range of
values). Example: H0: = 100; H1: = 110
– The critical value is expressed in original terms (not in standard terms).
Problem 7a
• What is the probability you’ll believe the mean return in problem 7 is 10.2% while actually it’s 10.6%, if the sample provided a mean return of 10.5%?
• Solution– The two hypotheses are:
H0: = 10.2
H1: = 10.6
– H0 is not rejected (we believe = 10.2) if the sample mean is less than a critical value.
– Therefore, the probability required is: = P(X < Xcr | = 10.6).
Problem 7a
• The critical value is (recall, this problem was a case of a right hand tail test, with 10% significance level):
= P(X<10.4 when = 10.6) =
P(Z < (10.4-10.6)/[.8/(26)1/2]) = P(Z < -1.27) = .102
10.4026.8
1.2810.2nσ
ZμX .100L
Problem 7a
Chapter 12
• Generally, the standard deviation is unknown the same way the mean may be unknown.
• When the standard deviation is unknown, we need to change the test statistic from “Z” to “t”.
• We shall test three population parameters:– Mean
– Variance
– Proportion
Testing the mean (unknown variance)
• Replace the statistic Z with “t”
The original distribution must be normal (or at least mound shaped).
ns
μXt
Problem 8
• A federal agency inspects packages to determine if the contents is at least as great as that advertised.
• A random sample of (i)5, (ii)50 containers whose packaging states that the weight was 8.04 ounces was drawn. (See Content).
• From the sample results…– Can we conclude that the average weight does not meet
the weight stated? (use = .05).– Estimate the mean weight of all containers with 99%
confidence– What assumption must be met?
Problem 8
• Solution– We hypothesize on the mean weight.
• H0: = 8.04
• H1: < 8.04
• (i) n=5. For small samples let us solve manuallyAssume the sample was: 8.07, 8.03, 7.99, 7.95, 7.94
– The rejection region: t < -tn = -t.05,5-1 = -2.132The tsample = ?
– Mean = (8.07+…+7.94)/5 = 7.996Std. Dev.={[(8.07-7.996)2+…+(7.94- 7.996)2]/4}1/2 = 0.054
-2.132
Problem 8
• The t sample is calculated as follows:
• Since -1.32 > -2.132 the sample statistic does not fall into the rejection region. There is insufficient evidence to conclude that the mean weight is smaller than 8, at 5% significance level.
32150540
0489967.
.
..
ns
Xt
Rejection Region
-1.32
-2.132
Problem 8
• (ii) n=50. To calculate the sample statistics we use Excel, “Descriptive statistics” from the Tools>Data analysis menu. From the sample we obtain:Mean = 8.02; Std. Dev. = .04
• The confidence interval is calculated by = 8.02 2.678 = 8.02 .015
or LCL = 8.005, UCL = 8.35
n
stx 2/
50
04.
t.005,50-1 = about 2.678 from the t - table
1- = .99 = .01/2 = .005
• Comments– Check whether it appears that the distribution is
normal
Problem 8
Frequency
0
5
10
15
20
7.93 7.97 8.01 8.05 8.09 More
– To obtain an exact value for ‘t’ use the TINV function:
The exact value:
Using Excel
=TINV(0.01,49)
.01 is the two tail probability
Degrees of freedom
2.6799535
Problem 8
– In our example recall:• H0: = 8.4
• H1: < 8.4
• The p-value = .000187 < .05
– There is sufficient evidence to reject the H0 in favor of H1.
t-Test: Two-Sample Assuming Unequal Variances
Weights V 2Mean 8.0182 8.04Variance 0.001627 0Observations 50 50Hypothesized Mean Difference0df 49t Stat -3.82126P(T<=t) one-tail 0.000187t Critical one-tail 1.676551P(T<=t) two-tail 0.000375t Critical two-tail 2.009574
Note: t = (8.018-8.04)/[.0403/(50)1.2]=-3.82. < -t.05,49 = -1.676
Inference about the population Variance
• The following statistic is 2 (Chi squared) distributed with n-1 degrees of freedom:
• We use this relationship to test and estimate the variance.
2
22 1
sn )(
Inference about the population Variance
• The Hypotheses tested are:
• The rejection region is:
20
20
20
21
20
20
ororH
H
:
:
.with2
replacetesttailtwotheFor
)(,,
211
212
0
21nn or
sn
Problem 9
• A random sample of 100 observations was taken from a normal population. The sample variance was 29.76.
• Can we infer at 2.5% significance level that the population variance exceeds 30?
• Estimate the population variance with 90% confidence.
Problem 9
• Solution:• H0:2 = 30• H1:2 < 30
2 = = = 97.42
2,n-1 = 2
.025,100-1 = about 129.561
– Since 97.42 < 129.42 we conclude that there is sufficient evidence at 2.5% significance level that the variance is smaller than 30.
(n – 1)s2
2(100 – 1)29.762
2
For the confidence interval look at page 370.
Rejection region: 2 < 2, n-1
– We can get an exact value of the probability P(2d.f.> 2) =
for a given 2 and known d.f. This makes it possible to
determine the p-value.
– Use the CHIDIST function: For example: = .526
That is: P(299> 97.42) = .526
– In our example we had a left hand tail rejection region. The
p-value is calculated based on the 2 value (97.42):
P(299 < 97.42) = 1 - .526
= CHIDIST(97.42,99)
Using Excel
=CHIDIST(2,d.f.)
Using Excel
– We can get the exact 2 value for which P(2d.f.>
2) = for any given probability and known d.f.
– Use the CHIINV function
For example: =CHIINV(.025,99) = 128.4219
That is: P(299 > ?) = .025. 2 = 128.4219
=CHIINV(,d.f.)
Inference about a population proportion
• The test and the confidence interval are based on the approximated normal distribution of the sample proportion, if np>5 and n(1-p)>5.
• For the confidence interval of p we have:
where p = x/n
• For the hypotheses test, we run a Z test.
n
)p̂(p̂Zp̂
12
^
Problem 10
• A consumer protection group run a survey of 400 dentists to check a claim that 4 out of 5 dentists recommend ingredients included in a certain toothpaste.
• The survey results are as follows: 71 – No; 329 – Yes
• At 5% significance level, can the consumer group infer that the claim is true?
Problem 10
• Solution– The two hypotheses are:
• H0: p = .8• H1: p > .8
Z.05 = 1.645
– Since 1.18 < 1.645 the consumer group cannot confirm the claim at 5% significance level.
The rejection region: Z > Z
181400822518225
88225
1.
).(.
..
)ˆ(ˆ
ˆ
npp
ppZ
Recommended