Upload
letitia-simmons
View
212
Download
0
Embed Size (px)
Citation preview
Biostatistics, statistical software V.
Statistical errors, one-and two sided tests. One-way and multifactor
analysis of variance.
Krisztina Boda PhD
Department of Medical Informatics, University of Szeged
INTERREG 2Krisztina Boda
One- and two tailed (sided) tests Two tailed test
H0: there is no change
Ha: There is change (in either direction)
One-tailed test H0: the change is
negative or zero Ha: the change is positive
p-values: p(one-tailed)=p(two-tailed)/2
INTERREG 3Krisztina Boda
Significance Significant difference – if we claim that there is a
difference (effect), the probability of mistake is small (maximum - Type I error ).
Not significant difference – we say that there is not enough information to show difference. Perhaps there is no difference There is a difference but the sample size is small The dispersion is big The method was wrong
Even is case of a statistically significant difference one has to think about its biological meaning
INTERREG 4Krisztina Boda
Statistical errors
Truth Decision
do not reject H0 reject H0 (significance)
H0 is true correct Type I. error its probability:
Ha is true Type II. error correctits probability:
INTERREG 5Krisztina Boda
Error probabilities The probability of type I error is known ( ). The probability of type II error is not known
() It depends on
The significance level (), Sample size, The standard deviation(s) The true difference between populations others (type of the test, assumptions, design, ..)
The power of a test: 1- ability to detect a real effect; probability to have a significant p-value
INTERREG 6Krisztina Boda
The power of a test on case of fixed sample size and , with two alternative
hypotheses
INTERREG 7Krisztina Boda
ANOVAAnalysis of Variance
Comparison the mean of of several (>2), normally distributed samples
Types: One-way:
Control, treatment I, treatment II.
Two-way (treatment + sex)
Any „way” (factor) can be „independent” („between-subjects”) sex, treatments „repeated measures” („within-subjects”) data measured
on the same patient
INTERREG 8Krisztina Boda
Why not t-test (pair wise)?We can get significant result only by chance at
every 20th caseCSOP R1 R2 R3 R4 R5 R6 R7 R8
1.00 -.84 1.73 2.36 -.30 -.31 -.31 -.56 1.58
1.00 .59 .44 .60 -.75 -.28 -1.51 -.81 -.12
1.00 .19 -.73 -1.04 1.27 .69 -.21 -.52 -1.34
1.00 -1.05 .88 1.27 1.05 -.87 .68 -.17 -.15
1.00 .12 -.75 -.05 -1.13 2.21 .74 -.90 -.45
1.00 1.10 -.20 -.78 1.02 .67 .18 -.52 -.34
1.00 -.19 -.57 -.41 2.25 -1.26 -.27 .44 -2.52
1.00 .45 1.20 2.77 -.17 -.68 .60 .54 -.37
1.00 -.58 -.01 .60 1.66 2.14 2.31 -.90 -1.75
1.00 -.39 .93 -.51 .31 -.60 -.21 .55 .57
1.00 -.23 -1.21 -1.08 .02 .31 -1.28 1.20 1.62
1.00 .87 .97 -1.04 .60 -.29 .86 1.09 -.68
2.00 .42 -1.18 -.64 -.08 1.10 .39 -.66 2.12
2.00 1.26 -2.13 -1.78 -.60 -1.25 -1.10 .19 -1.54
2.00 -.60 -.83 -.94 1.61 .95 1.37 .10 -.97
2.00 -1.75 .63 .16 .24 -.25 1.49 .42 -2.01
2.00 .07 -.33 -.56 .36 .12 -.48 .78 -1.29
2.00 .15 .85 .10 -2.07 .18 2.14 1.71 .62
2.00 .98 -1.20 -.46 -.92 .08 -1.37 .80 -.67
2.00 -.42 1.05 -.29 .73 .10 1.42 .79 1.67
2.00 2.00 .06 2.24 -.31 -.13 -.01 .04 -.45
2.00 -1.85 -1.83 3.35 1.83 -.12 -.30 -1.68 .57
2.00 1.06 -.55 -.36 -.80 -1.41 -1.49 .89 .82
2.00 -.57 -2.15 2.15 -.99 -1.63 .00 -.41 1.42
t-pr. 0.882846 0.053926 0.96894 0.205339 0.418212 0.928912 0.391001 0.508963
sign 4 Type I error
INTERREG 9Krisztina Boda
The increase of type I error
It can be shown that when t tests are used to test for differences between multiple groups, the chance of mistakenly declaring significance (Type I Error) is increasing. For example, in the case of 5 groups, if no overall differences exist between any of the groups, using two-sample t tests pair wise, we would have about 30% chance of declaring at least one difference significant, instead of 5% chance.
In general, the t test can be used to test the hypothesis that two group means are not different. To test the hypothesis that three ore more group means are not different, analysis of variance should be used.
INTERREG 10Krisztina Boda
Each statistical test produces a ‘p’ value If the significance level is set at 0.05 (false
positive rate) and we do multiple significance testing on the data from a single clinical trial,
then the overall false positive rate for the trial will increase with each significance test.
INTERREG 11Krisztina Boda
False positive rate for each test = 0.05 Probability of incorrectly rejecting ≥ 1
hypothesis out of N testings = 1 – (1-0.05)N=1-(1-)n
INTERREG 12Krisztina Boda
The increase of experimentwise Type I error
0
0.2
0.4
0.6
0.8
1
1.2
0 10 20 30 40 50 60 70 80 90 100 110
Number of comparisons
Familywise type I. error probabilty by number of comparisons
0
0.2
0.4
0.6
0.8
1
1.2
0 10 20 30 40 50 60 70 80 90 100 110
Number of camparisons
INTERREG 13Krisztina Boda
Compound hypotheses
(H01 and H02 and... H0n ) null hypotheses, the significance levels are 1, 2, …, n
How to choose i-s so that the level of the compound hypothesis (H01 and H02 and ... H0n ) would be no greater than ? (0,1)
INTERREG 14Krisztina Boda
Bonferroni correction
The is divided by the number of comparisons. (H01 and H02 and H0n ) is rejected, if at least one pi</n
In case of many comparisons, this is too conservative (will not show real differences).
INTERREG 15Krisztina Boda
Holm-modification (SAS: step-down Bonferroni)
The pi-s are sorted. p1p2...pn
H0i is tested at level If any of them is significant, then reject (H01 and
H02 and... H0n ) . Pl. n=5 p1 /5=0.01 if p1 is not smaller, then finish p2 /4=0.0125 ha p2 is not smaller, then finish p3 /3=0.0166 is not smaller, then finish p4 /2=0.025 …. p5 /1=0.05
in 1
INTERREG 16Krisztina Boda
FDR (false discovery rate)
p1p2...pn
Begin with the greatest p-value, it remains the same
The next is tested at level Pl. n=5 p5 p4 /(4*5) p3 /(3*5) p2 /(2*5) p1 /(1*5)=0.05
)( inn
INTERREG 17Krisztina Boda
Correction of unique p-values The SAS System
The Multtest Procedure
p-Values
False Stepdown Discovery Test Raw Bonferroni Hochberg Rate
1 0.9999 1.0000 0.9999 0.9999 2 0.2318 0.9272 0.9272 0.5795 3 0.3771 1.0000 0.9999 0.6285 4 0.8231 1.0000 0.9999 0.9999 5 0.0141 0.0705 0.0705 0.0705
INTERREG 18Krisztina Boda
One-Way ANOVA
Let us suppose that we have t independent samples (t “treatment” groups) drawn from normal populations with equal variances ~N(µi,).
Assumptions: Independent samples normality Equal variances
Null hypothesis: population means are equal, µ1=µ2=.. =µt
INTERREG 19Krisztina Boda
http://lib.stat.cmu.edu/DASL/Stories/CancerSurvival.html.Cameron, E. and Pauling, L. (1978) Supplemental ascorbate in the
supportive treatment of cancer: re-evaluation of prolongation of survival times in terminal human cancer. Proceedings of the National Academy of
Science USA, 75, 4538Ð4542.
Original Square root transformed
116171713N =
GROUP
BreastOvaryColonBronchusStomach
SU
RV
IVA
L
5000
4000
3000
2000
1000
0
-1000
63
60
52
34
23
7
116171713N =
GROUP
BreastOvaryColonBronchusStomach
SQ
SU
RV
70
60
50
40
30
20
10
0
55
34
23
INTERREG 20Krisztina Boda
Method
If the null hypothesis is true, then the populations are the same: they are normal, and they have the same mean and the same variance. This common variance is estimated in two distinct ways: between-groups variance within-groups variance
If the null hypothesis is true, then these two distinct estimates of the variance should be equal
‘New’ (and equivalent) null hypothesis: 2between=2
within their equality can be tested by an F ratio test The p-value of this test:
if p>0.05, then we accept H0. The analysis is complete. if p<0.05, then we reject H0 at 0.05 level. There is at least one
group-mean different from one of the others
INTERREG 21Krisztina Boda
. 0
1
2
3
4
5
6
7
0 1 2 3 4 0
1
2
3
4
5
6
7
0 1 2 3 4 a) b
Random samples drawn from normal distribution with equal (a) and uneqal (b) means and unique dispersion.
INTERREG 22Krisztina Boda
The ANOVA table
Source of variation
Sum of squares Degrees of freedom
Variance F p
Between groups
2
1
)( xxnQ ii
t
ik
t-1 1
2
t
Qs k
k Fs
sk
b
2
2 p
Within groups
2
11
)( iij
n
j
t
ib xxQ
i
N-t tN
Qs b
b 2
Total
2
11
)( xxQ ij
n
j
t
i
i
N-1
ANOVA
SQSURV
3295.038 4 823.759 6.484 .000
7495.266 59 127.038
10790.304 63
Between Groups
Within Groups
Total
Sum ofSquares df Mean Square F Sig.
INTERREG 23Krisztina Boda
Pairwise comparisons As the two-sample t-test is inappropriate to do this, there are special tests for multiple
comparisons that keep the probability of Type I error as . The most often used multiple comparisons are the modified t-tests.
Modified t-tests(LSD) Bonferroni: α/(number of comparisons) Scheffé Tukey Dunnett: a test comparing a given group (control) with the others
Multiple Comparisons
Dependent Variable: SQSURV
Dunnett t (2-sided)a
-18.8090* 4.61748 .001 -30.3632 -7.2547
-19.9927* 4.36140 .000 -30.9062 -9.0793
-13.5661* 4.36140 .010 -24.4796 -2.6526
-7.6217 5.72032 .474 -21.9355 6.6922
(J) GROUPBreast
Breast
Breast
Breast
(I) GROUPStomach
Bronchus
Colon
Ovary
MeanDifference
(I-J) Std. Error Sig. Lower Bound Upper Bound
95% Confidence Interval
The mean difference is significant at the .05 level.*.
Dunnett t-tests treat one group as a control, and compare all other groups against it.a.
INTERREG 24Krisztina Boda
Examplehttp://lib.stat.cmu.edu/DASL/Stories/ReadingComprehension.html
Researchers at Purdue University conducted an experiment to compare three methods of teaching reading.
Students were randomly assigned to one of the three teaching methods, and their reading comprehension was tested before and after they received the instruction. Several different measures of reading comprehension, from both the pre- and posttests are included in the dataset.
Reference: Moore, David S., and George P. McCabe (1989). Introduction to the Practice of Statistics. Original source: study conducted by Jim Baumann and Leah Jones of the Purdue University Education Department.
INTERREG 25Krisztina Boda
INTERREG 26Krisztina Boda
ANOVA
POST2 Posttest score on second reading comprehension measure
95.121 2 47.561 8.407 .001
356.409 63 5.657
451.530 65
Between Groups
Within Groups
Total
Sum ofSquares df Mean Square F Sig.
INTERREG 27Krisztina Boda
Multiple Comparisons
Dependent Variable: POST2 Posttest score on second reading comprehension measure
-.682 .717 .345 -2.11 .75
-2.818* .717 .000 -4.25 -1.39
.682 .717 .345 -.75 2.11
-2.136* .717 .004 -3.57 -.70
2.818* .717 .000 1.39 4.25
2.136* .717 .004 .70 3.57
-.682 .717 1.000 -2.45 1.08
-2.818* .717 .001 -4.58 -1.05
.682 .717 1.000 -1.08 2.45
-2.136* .717 .012 -3.90 -.37
2.818* .717 .001 1.05 4.58
2.136* .717 .012 .37 3.90
(J) groupcode Typeof instruction thatstudent received2 DRTA
3 Strat
1 Basal
3 Strat
1 Basal
2 DRTA
2 DRTA
3 Strat
1 Basal
3 Strat
1 Basal
2 DRTA
(I) groupcode Typeof instruction thatstudent received1 Basal
2 DRTA
3 Strat
1 Basal
2 DRTA
3 Strat
LSD
Bonferroni
MeanDifference
(I-J) Std. Error Sig. Lower Bound Upper Bound
95% Confidence Interval
The mean difference is significant at the .05 level.*.
INTERREG 28Krisztina Boda
Nonparametric one-way ANOVAKruskal-Wallis test.
As a result, it gives one p-value. If it is nit significant, the null hypothesis is accepted.
If the null hypothesis is rejected, further tests are required to make pairwise comparisons. These pairwise comparisons are generally not available in standard statistical packages. Pairwise comparisons can be performed by Mann Whitney U tests and p-values can be corrected by Bonferroni correction
Test Statisticsa,b
14.954
4
.005
Chi-Square
df
Asymp. Sig.
SURVIVAL
Kruskal Wallis Testa.
Grouping Variable: GROUPb.
INTERREG 29Krisztina Boda
Two-way ANOVA, example
Does systolic blood pressure depend on Diabetes or not Male or femaleIndependent factors
INTERREG 30Krisztina Boda
Two-way repeated measurements ANOVA
Does QT widening in the Langendorff-perfused rat heart represent the effect of repolarization delay or conduction slowing? J Cardiovasc Pharmacol. 42 (2003) 612-21
INTERREG 31Krisztina Boda
Effect of regional ischemia and K+ content of the perfusion solution on the QT90 interval (A) and heart rate (B)
in drug-free isolated rat hearts (n = 12 hearts per group). (mean ± SEM)
3 mM K5 mM K
A.
Time (min)
-10 -5 0 5 10 15 20 25
QT
90
(m
s)
50
60
70
80
90
100 3 mM K+
5 mM K+
A.
Time (min)
-10 -5 0 5 10 15 20 25
Hea
rt r
ate
(bea
ts/m
in)
250
300
350
400
450B.
Time (min)
-10 -5 0 5 10 15 20 25
B.
INTERREG 32Krisztina Boda
Frequently, separate univariate analyses are used for every time point and take no account the fact that data are related in time. A second problem is the frequent occurrence of missing values in the data. A repeated measurement ANOVA model is more appropriate (Brown and Prescott).
repeated testing is taking place and therefore a significant effect is more likely to occur at some time point by chance.
INTERREG 33Krisztina Boda
Repeated measurement ANOVA model
We can examine: The treatment effect
(K+) Time-effect Their interaction
Time (min)
-10 -5 0 5 10 15 20 25H
ea
rt r
ate
(b
eat
s/m
in)
250
300
350
400
450B.
****
* **
Time (min)
-10 -5 0 5 10 15 20 25
B.
****
* **
3 mM K5 mM K3 mM K+
5 mM K+*
*
*
In high potassium concentration the heart rate is significantly higher, independently of the time it was measured
Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F KALIUM 1 22 9.14 0.0063 time 9 198 21.70 <.0001 KALIUM*time 9 198 0.54 0.8465
INTERREG 34Krisztina Boda
Review questions and exercises
Problems to be solved by hand-calculations ..\Handouts\Problems hand V.doc
Solutions ..\Handouts\Problems hand V solutions.doc
Problems to be solved using computer ..\Handouts\Problems comp V.doc, ..\Handouts\Problems comp V solutions.doc
INTERREG 35Krisztina Boda
Useful WEB pages
http://www-stat.stanford.edu/~naras/jsm http://www.ruf.rice.edu/~lane/rvls.html http://my.execpc.com/~helberg/statistics.html http://www.math.csusb.edu/faculty/stanton/m26
2/index.html