T-Tests
“Stories change people while statistics give them something to argue about.”
Bernie Siegel (US Writer & retired pediatric surgeon)
The t-distribution (a.k.a The Student t-distribution)
t-distribution (sampling distribution)
Normal distribution
William Sealy Gosset (1876-1937)
𝑑𝑓 = 𝑛 − 1
• Has fatter tails then the normal distribution • Degrees of freedom: • As sample size increases – it approaches the normal distribution • Properties:
Bell-shaped mean=median=mode=0 Variance > 1
http://www.google.ca/url?sa=i&source=images&cd=&cad=rja&uact=8&docid=1QitQP8A6rds_M&tbnid=X6lqKuCEoQB1BM:&ved=0CAgQjRw&url=http://en.wikipedia.org/wiki/William_Sealy_Gosset&ei=-0WsU8DJFYuyyATHl4Jg&psig=AFQjCNHY7kMGkpc9U8Ko4rwRHnKkD3sElg&ust=1403885435491267
One-sample T-test
Sample (t-distribution)
𝑥 +s -s
a = cutoff
area = probability that the population mean falls below the cutoff value.
Q(a): What is the probability that the true population mean falls above/below a given cutoff (a)?
Practical examples: • Crop yields with a new fertilizer, where we want
crops to achieve a certain yield. • Compare actual 911 response times to an ideal
response time of 10min or less.
P-value (in R): pt(t,df)
𝑡𝑎𝑐𝑡𝑢𝑎𝑙 =𝑎 − 𝑥 𝑠
𝑛
One-tailed Example: 𝐻𝑂: 𝜇 < 𝑎 𝐻𝑎: 𝜇 > 𝑎
T-value
P-value
Original Units
T-test in R:
t.test(sampleData,mu=a,alternative=“greater”)
One-sample T-test
Sample (t-distribution)
𝑥 +s -s
Q(a): What is the probability that the true population mean falls above/below a given cutoff (a)?
One-tailed Example: 𝐻𝑂: 𝜇 < 𝑎 𝐻𝑎: 𝜇 > 𝑎
T-value
P-value
Original Units
Signal
• If signal is small = small ratio • If signal is large & noise is small =
large (positive) value
P-value = probability “I would get the observed signal to noise ratio by random chance” • A very high ratio is unlikely , therefore
it is very unlikely the observation id due to random chance
(e.g. something happened)
𝑡𝑎𝑐𝑡𝑢𝑎𝑙 =𝑎 − 𝑥 𝑠
𝑛
𝑡𝑎𝑐𝑡𝑢𝑎𝑙 =𝑠𝑖𝑔𝑛𝑎𝑙
𝑛𝑜𝑖𝑠𝑒
Noise
One-sample T-test Q(b): What is the probability that the new value (b) belongs to the same population as the sample? We test the difference between b and sample mean Two-tailed Example: 𝐻𝑂: 𝑥 = 𝑏 𝐻𝑎: 𝑥 ≠ 𝑏
T-value
P-value
T-test in R:
t.test(sampleData,mu=b,alternative=“two.sided”,conf.level=0.95)
x +s -s
Practical example: • A size measurement of a suspected new beetle
species compared to an existing species.
b = new value How to answer this question: Let’s say p = 0.02 Therefore b is higher than 2% of scores But outside the 95% confidence interval (< 2.5%) Therefore we can reject the null hypothesis that 𝑥 = 𝑏, at the α=0.05 level.
Purple area = 95% C.I. 2.5% on each tail
Sample (t-distribution)
Original Units
P-value (in R): pt(t,df)
𝑡𝑎𝑐𝑡𝑢𝑎𝑙 =𝑏 − 𝑥 𝑠
𝑛
p=0.025
𝑡𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 = 𝑞𝑡(𝛼/2, 𝑑𝑓)
Critical t-value: qt(/2,df=n-1)
One-sample T-test Q(b): What is the probability that the new value (b) belongs to the same population as the sample? We test the difference between b and sample mean One-tailed Example: 𝐻𝑂: 𝑥 < 𝑏 𝐻𝑎: 𝑥 > 𝑏
T-value
P-value
T-test in R:
t.test(sampleData,mu=b,alternative=“less”,conf.level=0.95)
x +s -s
Practical example: • Is my 2km outrigger canoe time trial too slow to be
competitive at the national level?
b = new value How to answer this question: Let’s say p = 0.02 Therefore b is higher than 2% of scores But outside the 95% confidence interval (< 5%) Therefore we can reject the null hypothesis that 𝑥 < 𝑏, at the α=0.05 level.
Orange area = 95% C.I. 5% on lower tail
Sample (t-distribution)
Original Units
P-value (in R): pt(t,df)
𝑡𝑎𝑐𝑡𝑢𝑎𝑙 =𝑏 − 𝑥 𝑠
𝑛
𝑡𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 = 𝑞𝑡(𝛼, 𝑑𝑓)
Critical t-value: qt(,df=n-1)
Population A Population B
Sample A Sample B
Q: Are the means of populations A and B the same?
Q: Do samples A and B come from the same population?
SAME QUESTION!
Two-sample T-test
𝐻𝑂: 𝑥 1 = 𝑥 2 𝐻𝑎: 𝑥 1 ≠ 𝑥 2 or 𝐻𝑂: 𝑥 1 − 𝑥 2 = 0 𝐻𝑎: 𝑥 1 − 𝑥 2 ≠ 0
Practical example: • Is there a difference between fertilized and control
plots?
Sample A (x1)
Sample B (x2)
Compare Means
Meaningless unless we also compare variance.
Denominator = “pooled standard error” or variation within samples
Two-sample T-test Q(x1, x2): Do samples A (x1) and B (x2) come from the same population? We test the difference between x1 and x2 sample means
𝑡𝑎𝑐𝑡𝑢𝑎𝑙 =𝑥 1 − 𝑥 2
𝑠12
𝑛1+𝑠2
2
𝑛2
𝑡𝑎𝑐𝑡𝑢𝑎𝑙 =𝑠𝑖𝑔𝑛𝑎𝑙
𝑛𝑜𝑖𝑠𝑒
Signal
Noise
Two-sample T-test 𝐻𝑂: 𝑥 1 = 𝑥 2 𝐻𝑎: 𝑥 1 ≠ 𝑥 2 or 𝐻𝑂: 𝑥 1 − 𝑥 2 = 0 𝐻𝑎: 𝑥 1 − 𝑥 2 ≠ 0
Compare Means
A n1=10
B n2=10
𝑡𝑎𝑐𝑡𝑢𝑎𝑙 =𝑥 1 − 𝑥 2
𝑠12
𝑛1+𝑠2
2
𝑛2
Sample (t-distribution)
Two-sample T-test 𝐻𝑂: 𝑥 1 = 𝑥 2 𝐻𝑎: 𝑥 1 ≠ 𝑥 2 or 𝐻𝑂: 𝑥 1 − 𝑥 2 = 0 𝐻𝑎: 𝑥 1 − 𝑥 2 ≠ 0
Compare Means
A n1=10
B n2=10
`x1 -`x2
Red area = 95% C.I. α=0.05
tactual = 2.35
𝑡𝑎𝑐𝑡𝑢𝑎𝑙 =𝑥 1 − 𝑥 2
𝑠12
𝑛1+𝑠2
2
𝑛2
Critical t-value: qt(𝛼 2 ,df=n1+n2-2)
N1+2 = 20 df1+2 = n1 + n2 – 2 = 18
P-value (in R): pt(t,df)
Critical t-value > qt(p,df)
> qt(.975,18)
= 2.1009
> qt(p,df)
> qt(.025,18)
= -2.1009
T-value
P-value pactual = 0.985
T-test in R:
t.test(sampleDataA,sampleDataB,mu=0,alternative=“two.sided”,conf.level=0.95)
If tactual= 2.35
Actual p-value > pt(t,df)
> pt(2.35,18)
= 0.985
> pt(-2.35,18)
= 0.0152 pactual = 0.015
Reject Ho: 𝑥 1 ≠ 𝑥 2
Paired T-test
Compare Means
𝐻𝑂: 𝑥 1 = 𝑥 2 𝐻𝑎: 𝑥 1 ≠ 𝑥 2 or 𝐻𝑂: 𝑥 1 − 𝑥 2 = 0 𝐻𝑎: 𝑥 1 − 𝑥 2 ≠ 0
Practical example: • How does soil nutrition across forest plots change
from pre- to post-harvest
Q(x1, x2): Do samples A (x1) and B (x2) come from the same population? BUT x1 and x2 are the same individuals before and after a treatment is applied We test the difference between x1 and x2 sample means
Pre-treatment n=10
Post-treatment n=10
𝑡𝑎𝑐𝑡𝑢𝑎𝑙 =𝑋 𝐷 − 𝜇𝑜𝑠𝐷
𝑛
D: difference for pairs 𝑋 𝐷: the average of differences 𝜇𝑜: original mean (pre-treatment) 𝑠𝐷: standard deviation of the differences
𝑠𝐷 = 𝐷𝑖−𝑋 𝐷
2𝑛𝑖
(𝑛−1)
𝑛: number of observation pairs
T-test in R:
t.test(dataBefore,dataAfter,mu=0,alternative=“two.sided”,paired=T)
Standard Error of the difference
More powerful test because you know how much variation (i.e. error) to expect within the samples, making it easier to isolate the signal from the treatment.
You only need to account for the error in the sample once, because the sample is comprised of the same units.
All these comparisons only work if we assume:
1. Each observation of the dependent variable is independent of other observations
2. The experimental errors of your data are normally distributed
3. Equal variances among groups
Remember…
For a paired t-test we only require that the pair-differences (Ai-Bi) be independent from each other
Skew and kurtosis will limit your ability to make meaningful comparisons
Lentil Challenge!
Let’s say you are a geneticist in the agriculture field and you create a new GMO lentil (use Variety A from the class). Initially it shows increased yields so you go to your boss with this finding (hoping form some praise and compensation). Your boss sees the promise in your variety, but she also knows it is more expensive to produce and there could be a public backlash. She says she can only take the risk and move forward with production if you can be 90% sure that there will be at least a 30% productive gain (the current average yield is 500kg/ha)… How do you prove this to your boss (a.k.a. show me the math)?