17
The t-test Inferences about Population Means when population SD is unknown

The t-test Inferences about Population Means when population SD is unknown

Embed Size (px)

Citation preview

Page 1: The t-test Inferences about Population Means when population SD is unknown

The t-test

Inferences about Population Means when population SD is unknown

Page 2: The t-test Inferences about Population Means when population SD is unknown

Confidence intervals in z (Review) Want to estimate height of students at USF.

Sampled N=100 students. Found mean =68 in and SD = 6 in.

Best guess for population mean is 68 inches plus or minus some.

95%CI = 95%CI=68±(1.96)[6/sqrt(100)] 68 ±1.96(.6) = 68 ±1.18 Interval is 66.82 to 69.18. Such an interval will

contain the mean 95% of the time.

XzX 05. NX

X

Page 3: The t-test Inferences about Population Means when population SD is unknown

Problem with z

Formulas so far use population SD, and they have been correct, but SD is usually unknown, so we have to estimate

Estimate will be off a bit; would be nice to account for this

The statistic called ‘t’ adjusts for error in estimate of SD. Estimate of SD is better as sample size increases, so t changes with N. The values of t are basically the same as z, but t spreads out more and more as the sample size gets small.

Page 4: The t-test Inferences about Population Means when population SD is unknown

The t DistributionWe use t when the population variance is unknown (the usual case) and sample size is small (N<100, the usual case). If you use a stat package for testing hypotheses about means, you will use t.

The t distribution is a short, fat relative of the normal. The shape of t depends on its df. As N becomes infinitely large, t becomes normal.

Page 5: The t-test Inferences about Population Means when population SD is unknown

Example values from t and z

Area beyond value

z t (df=100) t (df=25)

[t changes with df (N)]

.50 0 0 0

.25 .67 .68 .68

.025 1.96 1.98 2.06

.005 2.57 2.62 2.79

Page 6: The t-test Inferences about Population Means when population SD is unknown

Degrees of Freedom

For the t distribution, degrees of freedom are always a simple function of the sample size, e.g., (N-1).

One way of explaining df is that if we know the total or mean, and all but one score, the last (N-1) score is not free to vary. It is fixed by the other scores. 4+3+2+X = 10. X=1.

Page 7: The t-test Inferences about Population Means when population SD is unknown

t table

Page 8: The t-test Inferences about Population Means when population SD is unknown

Confidence Intervals in t

XstX 05.N

N

XX

N

ss X

X1

)( 2

Want to estimate height of students at USF. Sampled N=100 students. Found mean =68 in and SD = 6 in.Best guess for population mean is 68 inches plus or minus some.

95%CI =

95%CI=68±(1.98)[6/sqrt(100)]

68 ±1.98(.6) = 68 ±1.19

Interval is 66.81 to 69.19. Such an interval will contain the mean 95% of the time.

98.1)99,2,05.(05. dftailstt

Note this is virtually the same as in z, where interval was 66.82 to 69.18. Matters more when N is small.

Page 9: The t-test Inferences about Population Means when population SD is unknown

CI in t, Example 2

Suppose we want to estimate mean curiosity score for psychology students. Sample N = 25 people, Mean = 52, SD = 10.

225

10ˆ;10ˆ;52ˆ

N

sss X

XXX

064.2)24,2,05(.)05(. dftailtt

)2(064.252%95 05. XstXCI

128.56872.47%95 toCI

Note: this is same as CI in z, except we use t instead of z. The value of t comes from a table. Tabled value depends on df.

Page 10: The t-test Inferences about Population Means when population SD is unknown

One-sample t-testWe can use a confidence interval to “test” or decide whether a population mean has a given value. For example, suppose we want to test whether the mean height of women at USF is equal to 68 inches.

Suppose we randomly sample 50 women students at USF. We find that their mean height is 63.05 inches. The SD of height in the sample is 5.75 inches. Then we find the standard error of the mean by dividing SD by sqrt(N) = 5.75/sqrt(50) = .81. The critical value of t with (50-1) df is 2.01(find this in a t-table). Our confidence interval is, therefore, 63.05 plus/minus 1.63. See the graph.

Page 11: The t-test Inferences about Population Means when population SD is unknown

One-sample t Example 1

8070605040

Height in Inches

10

8

6

4

2

0

Fre

qu

en

cy

N=50

M = 63.05

SD=5.75

8070605040

Height in Inches

Pop Mean = 68

S X .8 1

8070605040

Height in Inches

t=2.01

ci X 163.

8070605040

Height in Inches

One sample t testConfidence interval veiw

8070605040

Height in Inches

Histogram of Sample Height

Take a sample, set a confidence interval around the sample mean. Does the interval contain the hypothesized value?

Page 12: The t-test Inferences about Population Means when population SD is unknown

Conventional Steps (Cookbook) 1. Choose alpha (.05) 2. State null and alternative hypotheses (H0:

pop mean is 68) (Ha is not 68) 3. Calculate observed stat (t = ?) 4. Find critical value (tcrit =value in table) 5. State decision rule (if obs > tcrit, reject

null) 6. State conclusion (pop mean is not 68)

Page 13: The t-test Inferences about Population Means when population SD is unknown

7062

15

12

9

6

3

0

Freq

uenc

y

t distribution view

62 Height in Inches

One sample t test

68

S X .8 1

X 63 05.

tX

S X

4 9 5

8 16 1 1

.

..

X 4 95. t distribution

The sample mean is roughly six standard deviations (St. Errors) from the hypothesized population mean. If the population mean is really 68 inches, it is very, very unlikely that we would find a sample with a mean as small as 63.05 inches.

Page 14: The t-test Inferences about Population Means when population SD is unknown

One-sample t, Example 2

Over the years, smokers at M’s treatment center report smoking an average of 30 cigs per day. New treatment Smoke-B-Gon pills given to N=25 new clients. Did it help?

52.2,25 XsX

50.25

52.2

N

ss X

X

X

obs s

Xt

105.

3025

X

obs s

Xt

064.2)24,2,05.( dftailscrit tt

|tobs| > tcrit. Reject null. Result is significant.

Page 15: The t-test Inferences about Population Means when population SD is unknown

Application

We prefer to use the t test instead of the z test when the _____ is small. 1 mind 2 sample size 2 standard error 4 type II error

Page 16: The t-test Inferences about Population Means when population SD is unknown

Definition

The t test adjusts for error in estimating the population ____ during hypothesis testing. 1 mean 2 median 3 range 4 standard deviation

Page 17: The t-test Inferences about Population Means when population SD is unknown

Application

We compute a one-sample t test and find an obtained value of t of 2.5. The critical (tabled) value of t given the null hypothesis turns out to be 2.01. What do we decide? 1 the result is significant 2 the result is not significant 3 we made a type I error 4 we made a type II error