MATH408: Probability & StatisticsSummer 1999
WEEK 6
Dr. Srinivas R. ChakravarthyProfessor of Mathematics and Statistics
Kettering University(GMI Engineering & Management Institute)
Flint, MI 48504-4898Phone: 810.762.7906
Email: [email protected]: www.kettering.edu/~schakrav
Confidence Interval
• Recall point estimate for the parameter under study.
• For example, suppose that µ= mean tensile strength of a piece of wire.
• If a random sample of size 36 yielded a mean of 242.4psi.
• Can we attach any confidence to this value?• Answer: No! What do we do?
Confidence Interval (cont’d)
• Given a parameter, say, , let denote its UMV estimator.
• Given , 100(1- )% CI for is constructed using the sampling (probability) distribution of as follows.
• Find L and U such that P(L < < U) = 1- .• Note that L and U are functions of .
Interpretation of CI
• With 100(1- )% confidence, we can say that the true value of will lie between L and U; or equivalently, if 100 samples of size n were taken, then we would expect at least 100(1- ) of the 100 values of will be between L and U.
• We will illustrate this in the laboratory.
Horsepower Example (Revisited)
Confidence IntervalsThe assumed sigma = 10.0
Variable N Mean StDev SEMean 95.0 % CIhp@4500 16 253.25 13.51 2.50 ( 248.35, 258.15)hp@5500 16 241.06 23.16 2.50 ( 236.16, 245.96)
Student’s t-distribution
• Referring to HP example, we assumed that the population standard deviation was known (to be 10).
• However, in practice, it is usually unknown. Hence, we need to estimate it first. If the sample size is reasonably large (n 30), we can still use the normal distribution for inferential part (as justified by the CLT).
Student’s t-distribution
• What happens if the sample is small (n < 30)?• In this case we cannot use normal since the sample
size is small and by using the sample standard deviation to estimate s, we bring in more variability into the picture and the appropriate distribution to use is the student's t-distribution.
• In 1908, William S.Gosset, a chemist working for a brewery company, under the pseudonym Student, first deduced this distribution.
Student’s t-distribution
• Suppose that X1, X2, …, Xn are n random samples from a normal distribution with mean and standard deviation . Then the PDF of
• is given by
ns
XT
/
,,]1)/[(
1
)2/(
]2/)1[()(
2/)1(2
tktkk
ktf
k.,)(
0
1 knumberpositiveanyfordxexk xk
Student’s t-distribution (cont’d)
• Student’s t-distribution, like normal, – is bell-shaped. It depends on the sample size.
– It is more spread than normal and approaches normal as n approaches infinity.
• So in the case when n is small, is unknown and with the assumption that the population is approximately normal, 100(1-)% C.I for is given by
nstXnstX /,/ 2/2/
HP Example (cont’d):
Confidence Intervals
Variable N Mean StDev SE Mean 95.0 % CIhp@4500 16 253.25 13.51 3.38 (246.05, 260.45)hp@5500 16 241.06 23.16 5.79 (228.72, 253.40)
VERIFYING THE NORMALITY ASSUMPTION
• Note that in constructing the above confidence interval, we assumed that the populations (for 4500 RPM and 5500 RPM) are normal.
• How do we verify that the assumptions are not grossly violated?
• Recall normal probability plot?• If there is a reason to believe that this assumption is not
valid in any given problem, then one has to transform the data or to rely on nonparametric methods.
INFERENCE ON THE VARIANCE
• Recall that sample variance is an UMV estimator for the population variance.
• Here, we will see how to construct CI and test hypotheses on 2.
• Assuming that the population is normal, the statistic:
follows a chi-square distribution with n-1 degrees of freedom
2 = (n-1) s2/ 2