Chapter 7 : Inference for Distributions(The t Distributions, One-Sample t Confidence Interval, One-Sample t Test, Matched Pairs t Procedures, Robustness of the t Procedures))

INTRODUCTION TO STATISTICS & PROBABILITY

Chapter 7: Inference for Distributions

Dr. Nahid Sultana

1

Chapter 7 Inference for Distributions

7.1 Inference for the Mean of a Population

7.2 Comparing Two Means

2

7.1 Inference for the Mean of a Population

3

The t Distributions

One-Sample t Confidence Interval

One-Sample t Test

Matched Pairs t Procedures

Robustness of the t Procedures

4

Cola manufacturers want to test how much the sweetness of a new cola drink is affected by storage. The sweetness difference due to storage was evaluated by 10 professional tasters (by comparing the sweetness before and after storage):

Taster Sweetness difference (D = Before – After) 1 2.0 2 0.4 3 0.7 4 2.0 5 −0.4 6 2.2 7 −1.3 8 1.2 9 1.1 10 2.3

We want to test if storage results in a loss of sweetness:

0 µ : H ; 0 µ : H diffadiff0 >=

This looks familiar. However, here we do not know the population parameter σ.

This situation is very common with real data.

Example – Sweetening colas

5

When σ is unknown The sample s.d. s provides an estimate of the population s.d. σ.

When the sample size is large,

then s is a good estimate of σ.

But when the sample size is small,

then s is a poor estimate of σ.

The sample is likely to contain elements representative of the whole population.

The sample contains only a few individuals.

6

Standard deviation s – standard error s/√n

When σ is not known, we estimate it with the sample standard deviation s. Then we estimate the standard deviation of by . This quantity is called the standard error of the sample mean and we denote it by SEM or .

Example: A simple random sample of five female basketball players is selected. Their heights (in cm) are 170, 175, 169, 183, and 177.

What is the standard error of the mean of these height measurements?

Solution: Sample mean = 174.8

Sample s.d, s= √32.2

SEM = = √(32.2/5) = 2.538.

7

The t Distributions

Suppose that an SRS of size n is drawn from an N(µ,σ) population.

When σ is known, the sampling distribution is N(μ, σ/√n).

When σ is estimated from the sample standard deviation s, the

sampling distribution follows a t-distribution t(μ, s/√n) with degrees of

freedom n − 1.

8

When comparing the density curves of the standard Normal distribution and t distributions, several facts are apparent:

The density curves of the t distributions are similar in shape to the standard Normal curve. The spread of the t distributions is a bit

greater than that of the standard Normal distribution. The t distributions have more probability

in the tails and less in the center than does the standard Normal. As the degrees of freedom increase, the t

density curve approaches the standard Normal curve ever more closely.

We can use Table D to determine critical values t* for t distributions with different degrees of freedom.

The t Distributions

9

Standardizing the data before using Table D As with the normal distribution, the first step is to standardize the data.

Then we can use Table D to obtain the area under the curve.

Here, μ is the mean (center) of the sampling distribution,

and the standard error of the mean s/√n is its standard deviation (width).

10

Using Table D

Suppose you want to construct a 95% confidence interval for the mean µ of a Normal population based on an SRS of size n = 12.

What critical t* should you use?

In Table D, we consult the row corresponding to df = n – 1 = 11.

We move across that row to the entry that is directly above 95% confidence level.

Upper-tail probability p df .05 .025 .02 .01 10 1.812 2.228 2.359 2.764 11 1.796 2.201 2.328 2.718 12 1.782 2.179 2.303 2.681 z* 1.645 1.960 2.054 2.326

90% 95% 96% 98% Confidence level C

11

11

One-Sample t Confidence Interval

Choose an SRS of size n from a population having unknown mean µ.

A level C confidence interval for µ is:

where t* is the critical value for the t(n – 1) distribution.

The margin of error is

This interval is exact when the population distribution is Normal and approximately correct for large n in other cases.

The One-Sample t Interval for a Population Mean

x ± t *sx

n

t *sx

n

The level C confidence interval is an interval with probability C of containing the true population parameter. C is the area between −t* and t*. We find t* from Table D for df = n−1 and confidence level C.

12

Example: Listening to music on cell phones. On average, U.K. subscribers with 3G phones spent an average of 8.3 hours per month listening to full-track music on their cell phones. Suppose we want to determine a 95% confidence interval for the U.S. average and draw the following random sample of size 8 from the U.S. population of 3G subscribers: 5 6 0 4 11 9 2 3

. n-. s x 71df ; 633 , d. s. ; 5 mean, sample Here ====The standard error is

t∗ = 2.365

The 95% CI is

13

The One-Sample t Test

As in the previous chapter, a test of hypotheses requires a few

steps:

1. Stating the null and alternative hypotheses (H0 versus Ha)

2. Calculating t and its degrees of freedom

3. Finding the area under the curve with Table D

4. Stating the P-value and interpreting the result

t =x − µ0

sxn

14

The One-Sample t Test (Cont…) The P-value is calculated as the corresponding area under the curve,

one-tailed or two-tailed depending on Ha:

15

The level of dissolved oxygen (DO) in a stream or river is an important indicator of the water’s ability to support aquatic life.

A researcher measures the DO level at 15 randomly chosen locations along a stream. Here are the results in milligrams per liter:

We want to perform a test at the α = 0.05 significance level of:

H0: µ = 5

Ha: µ < 5

where µ is the actual mean dissolved oxygen level in this stream.

Example

4.53 5.04 3.29 5.23 4.13 5.50 4.83 4.40

5.42 6.38 4.01 4.66 2.87 5.73 5.55

A dissolved oxygen level below 5 mg/l puts aquatic life at risk.

16

The P-value is between 0.15 and 0.20, which is greater than our α = 0.05 significance level,

- We fail to reject H0.

-We don’t have enough evidence to conclude that the mean DO level in the stream is less than 5 mg/l.

The sample mean and standard deviation are:

P-value The P-value is the area to the left of t = –0.94 under the t distribution curve with df = 15 – 1 = 14. But P(t < –0.94 ) = P(t > 0.94 )

Upper-tail probability p df .25 .20 .15 13 .694 .870 1.079

14 .692 .868 1.076

15 .691 .866 1.074 50% 60% 70%

Confidence level C

Example (Cont…)

H0: µ = 5; Ha: µ < 5

Example – Sweetening colas cont…) Is there evidence that storage results in sweetness loss for the new cola recipe at the α=5% level of significance? H0: μ = 0 versus Ha: μ > 0 (one-sided test)

There is a significant loss of sweetness, on average, following storage. 17

18



To compare the responses to the two treatments in a matched-pairs

design, find the difference between the responses within each pair. Then

apply the one-sample t procedures to these differences.

The members of one sample are identical to, or matched (paired) with, the members of the other sample.

Example: Pre-test and post-test studies look at data collected on the same sample elements before and after some experiment is performed.

19


Conceptually, this is not different from tests on one population.

Example – Sweetening colas (revisited) The sweetness loss due to storage was evaluated by 10 professional tasters (comparing the sweetness before and after storage):

Although the text didn’t mention it explicitly, this is a pre-/post-test design and the variable is the difference in cola sweetness before minus after storage. A matched pairs test of significance is indeed just like a one-sample test. 20

Example: Does lack of caffeine increase depression?

Individuals diagnosed as caffeine dependent are assigned to receive daily pills. Sometimes, the pills contain caffeine and other times they contain a placebo. Depression was assessed.

For each individual in the sample, we have calculated a difference in depression score (placebo minus caffeine).

Caffeine deprivation causes a significant increase in depression. 21

Engineering

Chapter 7 : Inference for Distributions(The t Distributions, One-Sample t Confidence Interval, One-Sample t Test, Matched Pairs t Procedures, Robustness of the t Procedures))