Lecture 4 T-tests

7/29/2019 Lecture 4 T-tests

1/26

Introduction to hypothesistesting: t-tests


2/26

In Lecture 3, we discussed the use of the normal distribution in descriptivestatistics(bell-shaped traits)

The second use of the normal distribution is instatistical inference

This is possible because there are other quantities showing a normal pattern:o For example, the mean of sample means!

Understanding this idea is key to statistical inference and testing 2

Confused? Thats normal


3/26

3

If these are bullet holes on a target, where do you guess the bulls eye is?

3

-


4/26

4

Somewhere around here?

-


5/26

Yesbut why?

5

-


6/26

Because we assume that error was randomly

distributed around the bulls eye

o if youre trying to hit the target, it is unlikely that you

are going to miss more often in one direction

As weve seen, the normal distribution

describes such random and symmetrical

deviations around a central value

This common-sense observation captures the

essence of the Central Limit Theorem, the

analytical foundation of predictive statistics

6


7/26

Back to statistics: lets say we take a smallsample from a population, and calculateproportion of atheists, or mean height insampleo This sample may not represent the true

proportion of atheists or the average height in

the population: it carries an error

But lets say I take many samples, andcalculate mean height in each one; whatwould their distribution look like?

Answer: the same as the bullet holes in thetargeto errors are random

o best guess for position of true mean height is thecentre of distribution of samples

o i.e. the best guess is the mean of the samples(the mean of sample means)

7

-


8/26

Example: lottery draw

o We have 100 balls numbered 1 to 100

o (in this case, mean=50.5, sd=29)

o Lets say we take samples of 5 balls, and

calculate their mean (one sample at a time)


9/26

What happens as number of samples

increases?

o mean of the means approaches a normal

distribution

o mean of means approaches true mean of the

population of 100 balls!

9

N= 10 samples of 5

N= 30 samples of 5

N=100 samples of 5

N=200 samples of 5


10/26

So when we are trying to identify the

true mean of a variable in a

population, the best guess is the

sample means

If we have many samples, their mean

is the best guess

o But this is very rarely the case!

In most cases, we have only one

sample; the sample mean is yourbest (and only!) estimator of truemean 10


11/26

Sample has mean and standard deviation; so what is the probabilitythat it identifies a certain value (the true mean we want t find)?

W can say that the true mean is the sample mean plus or minus x(the margin of error)o Thats why we calculate standard deviations and confidence intervals

Example: take lifespan variable

> mean(lifespan, na.rm=T)

[1] 69.71495

sd(lifespan, na.rm=T)

>[1] 9.644646

So what is the true mean?

Conventionally, we take the 95% interval around sample mean as theconfidence interval: it is 95% certain that the true mean is inside it

To calculate 95% confidence interval around sample mean, type

> t.test(lifespan)$conf.int

[1] 68.34922 71.08068

attr(,"conf.level")

[1] 0.95

Conclusion: if sample mean is 69.72 and standard deviation is 9.64,it is 95% likely that true mean is between 68.35 and 71.08

11

95%

95%

If my estimator has mean and

standard deviation shown, is

the true mean (the bulls eye)

inside my 95% confidence

interval? In this example, yes!


12/26

A difference: instead of standard deviation, we calculate astandard error ofmean (because the sample deviation is an error relative to true mean)

Sem is like the standard deviation, but divided by sample size (minus one to avoid bias technicality!)

sem =

1

Interpretation:

(i)sem is proportional to standard deviation in the population()o when sampled population (balls in a bag, subjects in my study) shows more variation, samples are more

variable and error (deviation between sample mean and true mean) is larger

(ii)sem is inversely proportional to the size of the sample (n)o A random sample of 20 gives better estimate of true mean than samples of 5

12


13/26

Last thing: to calculate confidence intervals

(margin of error), we have to use the

Students t-distribution

o it is similar to normal, but used when standard

deviation in population is unknown (remember:

we only know standard deviation of sample)

o works better than normal with small sample

sizes

o

approaches normal when n is large

This is why tests comparing means are

called t-tests

13

95%


14/26

We are ready to test hypotheses about means

o is a sample mean representative of true mean? (one-

sample t-test)

o are European countries richer than sub-Saharan countries?

(two-sample t-test)o does a new drug increase survival of patients (paired t-test)

t-tests provide such group comparisons; they are

important to validate statements about socialindicators, income, fairness, justice, historical

processes etc.

o does European colonisation affect country income

o does gender affect income?


15/26

Test of significance of difference has totake two things into account:

(i) the sample sizes If my sample sizes are very large, even a very

small difference in means will be statisticallysignificant

example: difference in colon cancerincidence between people who eat morethan 600 g of red meat per week and thosewho dont is 13%, and is only identifiable withlarge samples (~100,000 people)

(ii) the measured difference in means If difference is too large, it will be significant

even if sample size is small Example: if I am comparing average size in

mice and elephants, a sample of 1 mouseand 1 elephant is good enough!

size


16/26

t-tests simply calculate whether the difference between twomeans/values is real = statistically significant = differentfrom zeroo if difference is zero, they are not different!

In order to use probability distributions, we muststandardise variables; so the difference is standardised

t=

So what we want to know is whether t (difference) is toodifferent from zero (i.e. not similar)

What is too different? Conventionally, we calculate 95%confidence intervals; if a value is inside it, it is not differentfrom test value and there is no difference (well see how it

works)

95% 2.5%2.5%

0t=-1.96 t=1.96


17/26

Basic rule: what we need to know is the P-value(probability value) of a t-test

In a t-test, the null hypothesis (=status quo,

conservative hypothesis) is always that there isno difference between the two compared valueso i.e. if you want to prove that two groups differ, you

must reject the null hypothesis

The P-value of a test is the probability that nullhypothesis is true (i.e. groups are not different)o conventionally, we only reject null hypothesis is P

value is less than 5% =P


18/26

Example: is life expectancy in the world differentfrom 70 years?o one-sample t-test: we are comparing a group to a value

(=the test value; a hypothetical true value of 70 years)

How to do it in R? Just specify test value as mu=70

> t.test(lifespan, mu=70)

One Sample t-test

data: lifespan

t = -0.4117, df = 193, p-value = 0.681

alternative hypothesis: true mean is not equal to 70

95 percent confidence interval:

68.34922 71.08068

sample estimates:

mean of x

69.71495

Sample mean=69.71o that doesnt seem to be very

different from 70)

t=-0.41o t statistic, the standardised

difference between samplemean and test value, is close

to zero 95% CI: [68.35-71.08]

o Confidence interval oflifespan:

o my sample suggests that lifeexpectancy in the world inbetween 68.3 and 71.08years; and this it includes 70years


19/26

Example: is life expectancy in the world differentfrom 70 years?o one-sample t-test: we are comparing a group to a value

(=the test value; a hypothetical true value of 70 years)

How to do it in R? Just specify test value as mu=70


One Sample t-test

data: lifespan

t = -0.4117, df = 193, p-value = 0.681

alternative hypothesis: true mean is not equal to 70


68.34922 71.08068

sample estimates:

mean of x

69.71495

P value=0.681=68%o This is the probability of

null hypothesis (=lifeexpectancy is notdifferent from 70 years)

P is high

o Therefore, you mustaccept the nullhypothesis

Conclusion: based onour sample, lifeexpectancy in the worldis not significantly

different/shorter than70 years


20/26

But is life expectancy in the world different from 75 years?o now we set mu=75


One Sample t-test

data: lifespan

t = -7.6324, df = 193, p-value = 1.033e-12alternative hypothesis: true mean is not equal to 75


68.34922 71.08068

sample estimates:

mean of x

69.71495

So what is the probability that average lifespan across countries is 75 years?o P= 1.033*10(-12) = 0.000000000001033 = 0.00000000001033%;

o This is very low! We must reject null hypothesis and and accept alternative hypothesis

o t=-7.63; thats significantly different from 0

o 75 years is outside 95% CI

o Therefore, life expectancy is below 75 years


21/26

You may also want to test whether twosamples are significantly different in somerespecto for example, are South and Southeast Asian

countries richer than Latin American countries?

o i.e. do differences or similarities in economicmodels in recent decades cause differences inaverage income between the two areas?

Procedure is similar: but t-statistic is now thedifference between means of the two

compared groups

t =12

sedm (standard error of the difference ofmeans) is automatically calculated by R


22/26

In file HDR2011, variable continent is seasia for Southand Southeast Asian countries and latin for Latin Americancountries; others are NA (non-available)

> t.test(GNI ~ continent)

Welch Two Sample t-test

data: GNI by continentt = -1.1455, df = 20.327, p-value = 0.2653

alternative hypothesis: true difference in means is not equal to0


-13340.319 3876.397

sample estimates:

mean in group latin mean in group seasia

9054.355 13786.316

Conclusion:

We may think that the difference of ~US4,700 between theareas was large enough to prove a significant difference

But it isnt: there is too much variation in income in the two

areas

Welch test is name ofthis t-test

P-value=0.26=26% We cannot reject null

hypothesis: Areas do not differ byincome Notice that 95% CI ofdifference in incomebetween the areas

includes zero; i.e. youcannot exclude zerodifference in incomefrom your confidenceinterval


23/26

A paired test should be used when the two compared measurements arelinked, i.e. the subjects/cases are not independent

For example, the two group means may be two measurements from thesame individual

o In the case of a trial of a new drug for blood pressure, blood pressure before andafter drug administration in the same patients

Run library ISwR> attach(intake) #this is a file in the library ISwR

> intake # what does it look like? or try head(intake)

The file intake has data on pre- and post-menstrual calorie consumptionin 11 women;o Question: is there a difference in caloric intake before and after menstrual cycle?


24/26

So now lets try a paired t-test:

> t.test(pre, post, paired=T)

Paired t-test

data: pre and post

t = 11.9414, df = 10, p-value = 3.059e-07

alternative hypothesis: true difference in means is not equal to 0


1074.072 1566.838

sample estimates:

mean of the differences

1320.455

P value: very low!

We must reject null hypothesis (no difference)

Confidence interval: 95% likely that difference in calorie is between 1074 and 1566kcal

Conclusion: there is a clear difference between calorie intake pre and post


25/26

1) One-sample t-test

Is income per capita (GNI) in the world significantly less thanUS$20000?

2) Two-sample t-test

Let us compare schooling years in Southeast Asia and LatinAmericao What is the average schooling of children in the two regions?

o Does schooling significant differ between the two areas? What isthe probability that they do differ?

3) Paired t-test

Give two examples of studies that could require paired t-tests


26/26

Confidence intervals and all t-tests assume a normal distribution,even when sample is small

o And they are based on a theory of means of various samples, which in

practice we dont haveo Thats why you do not prove differences; you compare groups and give an

estimate of the probability that they are difference or similar

Remember: null hypothesis is always that means are not different

Current trend is to provide confidence intervals rather then Pvalueswhen reporting results of tests in general (not just t-tests), so getused to calculating and interpreting them

Documents

Lecture 4 T-tests