Upload
ze-chen
View
217
Download
0
Embed Size (px)
Citation preview
7/29/2019 Lecture 4 T-tests
1/26
Introduction to hypothesistesting: t-tests
7/29/2019 Lecture 4 T-tests
2/26
In Lecture 3, we discussed the use of the normal distribution in descriptivestatistics(bell-shaped traits)
The second use of the normal distribution is instatistical inference
This is possible because there are other quantities showing a normal pattern:o For example, the mean of sample means!
Understanding this idea is key to statistical inference and testing 2
Confused? Thats normal
7/29/2019 Lecture 4 T-tests
3/26
3
If these are bullet holes on a target, where do you guess the bulls eye is?
3
-
7/29/2019 Lecture 4 T-tests
4/26
4
Somewhere around here?
-
7/29/2019 Lecture 4 T-tests
5/26
Yesbut why?
5
-
7/29/2019 Lecture 4 T-tests
6/26
Because we assume that error was randomly
distributed around the bulls eye
o if youre trying to hit the target, it is unlikely that you
are going to miss more often in one direction
As weve seen, the normal distribution
describes such random and symmetrical
deviations around a central value
This common-sense observation captures the
essence of the Central Limit Theorem, the
analytical foundation of predictive statistics
6
7/29/2019 Lecture 4 T-tests
7/26
Back to statistics: lets say we take a smallsample from a population, and calculateproportion of atheists, or mean height insampleo This sample may not represent the true
proportion of atheists or the average height in
the population: it carries an error
But lets say I take many samples, andcalculate mean height in each one; whatwould their distribution look like?
Answer: the same as the bullet holes in thetargeto errors are random
o best guess for position of true mean height is thecentre of distribution of samples
o i.e. the best guess is the mean of the samples(the mean of sample means)
7
-
7/29/2019 Lecture 4 T-tests
8/26
Example: lottery draw
o We have 100 balls numbered 1 to 100
o (in this case, mean=50.5, sd=29)
o Lets say we take samples of 5 balls, and
calculate their mean (one sample at a time)
7/29/2019 Lecture 4 T-tests
9/26
What happens as number of samples
increases?
o mean of the means approaches a normal
distribution
o mean of means approaches true mean of the
population of 100 balls!
9
N= 10 samples of 5
N= 30 samples of 5
N=100 samples of 5
N=200 samples of 5
7/29/2019 Lecture 4 T-tests
10/26
So when we are trying to identify the
true mean of a variable in a
population, the best guess is the
sample means
If we have many samples, their mean
is the best guess
o But this is very rarely the case!
In most cases, we have only one
sample; the sample mean is yourbest (and only!) estimator of truemean 10
7/29/2019 Lecture 4 T-tests
11/26
Sample has mean and standard deviation; so what is the probabilitythat it identifies a certain value (the true mean we want t find)?
W can say that the true mean is the sample mean plus or minus x(the margin of error)o Thats why we calculate standard deviations and confidence intervals
Example: take lifespan variable
> mean(lifespan, na.rm=T)
[1] 69.71495
sd(lifespan, na.rm=T)
>[1] 9.644646
So what is the true mean?
Conventionally, we take the 95% interval around sample mean as theconfidence interval: it is 95% certain that the true mean is inside it
To calculate 95% confidence interval around sample mean, type
> t.test(lifespan)$conf.int
[1] 68.34922 71.08068
attr(,"conf.level")
[1] 0.95
Conclusion: if sample mean is 69.72 and standard deviation is 9.64,it is 95% likely that true mean is between 68.35 and 71.08
11
95%
95%
If my estimator has mean and
standard deviation shown, is
the true mean (the bulls eye)
inside my 95% confidence
interval? In this example, yes!
7/29/2019 Lecture 4 T-tests
12/26
A difference: instead of standard deviation, we calculate astandard error ofmean (because the sample deviation is an error relative to true mean)
Sem is like the standard deviation, but divided by sample size (minus one to avoid bias technicality!)
sem =
1
Interpretation:
(i)sem is proportional to standard deviation in the population()o when sampled population (balls in a bag, subjects in my study) shows more variation, samples are more
variable and error (deviation between sample mean and true mean) is larger
(ii)sem is inversely proportional to the size of the sample (n)o A random sample of 20 gives better estimate of true mean than samples of 5
12
7/29/2019 Lecture 4 T-tests
13/26
Last thing: to calculate confidence intervals
(margin of error), we have to use the
Students t-distribution
o it is similar to normal, but used when standard
deviation in population is unknown (remember:
we only know standard deviation of sample)
o works better than normal with small sample
sizes
o
approaches normal when n is large
This is why tests comparing means are
called t-tests
13
95%
7/29/2019 Lecture 4 T-tests
14/26
We are ready to test hypotheses about means
o is a sample mean representative of true mean? (one-
sample t-test)
o are European countries richer than sub-Saharan countries?
(two-sample t-test)o does a new drug increase survival of patients (paired t-test)
t-tests provide such group comparisons; they are
important to validate statements about socialindicators, income, fairness, justice, historical
processes etc.
o does European colonisation affect country income
o does gender affect income?
7/29/2019 Lecture 4 T-tests
15/26
Test of significance of difference has totake two things into account:
(i) the sample sizes If my sample sizes are very large, even a very
small difference in means will be statisticallysignificant
example: difference in colon cancerincidence between people who eat morethan 600 g of red meat per week and thosewho dont is 13%, and is only identifiable withlarge samples (~100,000 people)
(ii) the measured difference in means If difference is too large, it will be significant
even if sample size is small Example: if I am comparing average size in
mice and elephants, a sample of 1 mouseand 1 elephant is good enough!
size
7/29/2019 Lecture 4 T-tests
16/26
t-tests simply calculate whether the difference between twomeans/values is real = statistically significant = differentfrom zeroo if difference is zero, they are not different!
In order to use probability distributions, we muststandardise variables; so the difference is standardised
t=
So what we want to know is whether t (difference) is toodifferent from zero (i.e. not similar)
What is too different? Conventionally, we calculate 95%confidence intervals; if a value is inside it, it is not differentfrom test value and there is no difference (well see how it
works)
95% 2.5%2.5%
0t=-1.96 t=1.96
7/29/2019 Lecture 4 T-tests
17/26
Basic rule: what we need to know is the P-value(probability value) of a t-test
In a t-test, the null hypothesis (=status quo,
conservative hypothesis) is always that there isno difference between the two compared valueso i.e. if you want to prove that two groups differ, you
must reject the null hypothesis
The P-value of a test is the probability that nullhypothesis is true (i.e. groups are not different)o conventionally, we only reject null hypothesis is P
value is less than 5% =P
7/29/2019 Lecture 4 T-tests
18/26
Example: is life expectancy in the world differentfrom 70 years?o one-sample t-test: we are comparing a group to a value
(=the test value; a hypothetical true value of 70 years)
How to do it in R? Just specify test value as mu=70
> t.test(lifespan, mu=70)
One Sample t-test
data: lifespan
t = -0.4117, df = 193, p-value = 0.681
alternative hypothesis: true mean is not equal to 70
95 percent confidence interval:
68.34922 71.08068
sample estimates:
mean of x
69.71495
Sample mean=69.71o that doesnt seem to be very
different from 70)
t=-0.41o t statistic, the standardised
difference between samplemean and test value, is close
to zero 95% CI: [68.35-71.08]
o Confidence interval oflifespan:
o my sample suggests that lifeexpectancy in the world inbetween 68.3 and 71.08years; and this it includes 70years
7/29/2019 Lecture 4 T-tests
19/26
Example: is life expectancy in the world differentfrom 70 years?o one-sample t-test: we are comparing a group to a value
(=the test value; a hypothetical true value of 70 years)
How to do it in R? Just specify test value as mu=70
> t.test(lifespan, mu=70)
One Sample t-test
data: lifespan
t = -0.4117, df = 193, p-value = 0.681
alternative hypothesis: true mean is not equal to 70
95 percent confidence interval:
68.34922 71.08068
sample estimates:
mean of x
69.71495
P value=0.681=68%o This is the probability of
null hypothesis (=lifeexpectancy is notdifferent from 70 years)
P is high
o Therefore, you mustaccept the nullhypothesis
Conclusion: based onour sample, lifeexpectancy in the worldis not significantly
different/shorter than70 years
7/29/2019 Lecture 4 T-tests
20/26
But is life expectancy in the world different from 75 years?o now we set mu=75
> t.test(lifespan, mu=75)
One Sample t-test
data: lifespan
t = -7.6324, df = 193, p-value = 1.033e-12alternative hypothesis: true mean is not equal to 75
95 percent confidence interval:
68.34922 71.08068
sample estimates:
mean of x
69.71495
So what is the probability that average lifespan across countries is 75 years?o P= 1.033*10(-12) = 0.000000000001033 = 0.00000000001033%;
o This is very low! We must reject null hypothesis and and accept alternative hypothesis
o t=-7.63; thats significantly different from 0
o 75 years is outside 95% CI
o Therefore, life expectancy is below 75 years
7/29/2019 Lecture 4 T-tests
21/26
You may also want to test whether twosamples are significantly different in somerespecto for example, are South and Southeast Asian
countries richer than Latin American countries?
o i.e. do differences or similarities in economicmodels in recent decades cause differences inaverage income between the two areas?
Procedure is similar: but t-statistic is now thedifference between means of the two
compared groups
t =12
sedm (standard error of the difference ofmeans) is automatically calculated by R
7/29/2019 Lecture 4 T-tests
22/26
In file HDR2011, variable continent is seasia for Southand Southeast Asian countries and latin for Latin Americancountries; others are NA (non-available)
> t.test(GNI ~ continent)
Welch Two Sample t-test
data: GNI by continentt = -1.1455, df = 20.327, p-value = 0.2653
alternative hypothesis: true difference in means is not equal to0
95 percent confidence interval:
-13340.319 3876.397
sample estimates:
mean in group latin mean in group seasia
9054.355 13786.316
Conclusion:
We may think that the difference of ~US4,700 between theareas was large enough to prove a significant difference
But it isnt: there is too much variation in income in the two
areas
Welch test is name ofthis t-test
P-value=0.26=26% We cannot reject null
hypothesis: Areas do not differ byincome Notice that 95% CI ofdifference in incomebetween the areas
includes zero; i.e. youcannot exclude zerodifference in incomefrom your confidenceinterval
7/29/2019 Lecture 4 T-tests
23/26
A paired test should be used when the two compared measurements arelinked, i.e. the subjects/cases are not independent
For example, the two group means may be two measurements from thesame individual
o In the case of a trial of a new drug for blood pressure, blood pressure before andafter drug administration in the same patients
Run library ISwR> attach(intake) #this is a file in the library ISwR
> intake # what does it look like? or try head(intake)
The file intake has data on pre- and post-menstrual calorie consumptionin 11 women;o Question: is there a difference in caloric intake before and after menstrual cycle?
7/29/2019 Lecture 4 T-tests
24/26
So now lets try a paired t-test:
> t.test(pre, post, paired=T)
Paired t-test
data: pre and post
t = 11.9414, df = 10, p-value = 3.059e-07
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
1074.072 1566.838
sample estimates:
mean of the differences
1320.455
P value: very low!
We must reject null hypothesis (no difference)
Confidence interval: 95% likely that difference in calorie is between 1074 and 1566kcal
Conclusion: there is a clear difference between calorie intake pre and post
7/29/2019 Lecture 4 T-tests
25/26
1) One-sample t-test
Is income per capita (GNI) in the world significantly less thanUS$20000?
2) Two-sample t-test
Let us compare schooling years in Southeast Asia and LatinAmericao What is the average schooling of children in the two regions?
o Does schooling significant differ between the two areas? What isthe probability that they do differ?
3) Paired t-test
Give two examples of studies that could require paired t-tests
7/29/2019 Lecture 4 T-tests
26/26
Confidence intervals and all t-tests assume a normal distribution,even when sample is small
o And they are based on a theory of means of various samples, which in
practice we dont haveo Thats why you do not prove differences; you compare groups and give an
estimate of the probability that they are difference or similar
Remember: null hypothesis is always that means are not different
Current trend is to provide confidence intervals rather then Pvalueswhen reporting results of tests in general (not just t-tests), so getused to calculating and interpreting them