Upload
lykhanh
View
219
Download
6
Embed Size (px)
Citation preview
E509A: Principle of Biostatistics
(Week 11(2): Introduction to non-parametricmethods )
GY Zou
Sign test for two dependent samples
Ex 12.1
subj 1 2 3 4 5 6 7 8 9 10
baseline 166 135 189 180 156 142 176 156 164 142
post 138 120 176 180 160 150 152 140 160 130
diff 28 15 13 0 -4 -8 24 16 4 12
sign + + + 0 - - + + + +
Is 7 ‘+’ significant ?
Under H0, each subject could get a ‘+’ with p = 0.5. We thus can use
binomial distribution to obtain P -value. Let x denote the number of ‘+’ signs,
then
Pr(X = x) =(n
x
)0.5x(1 − 0.5)n−x =
(10
x
)0.5n
x 0 1 2 3 4 5 6 7 8 9 10
Pr 0.0010 0.0098 0.0439 0.1172 0.2051 0.2461 0.2051 0.1172 0.0439 0.0098 0.0010
By definition
P = 0.1172 + 0.0439 + 0.0098 + 0.0010 = 0.1719
which is one-sided. Two-sided P = 2 × 0.1719 = 0.3438
mid-P = .5 × 0.1172 + 0.0439 + 0.0098 + 0.0010 = .1133,
two-sided mid-P = 2 × 0.1133 = 0.2266.
Sign test disregards a lot of information.
The Wilcoxon signed-rank test (for two dependent samples)
The null hypothesis is the median of the differences is 0, i.e., H0 : Md = 0.
subj 1 2 3 4 5 6 7 8 9 10
diff 28 15 13 0 -4 -8 24 16 4 12
sign + + + - - + + + +
rank 9 6 5 1.5 3 8 7 1.5 4
rank×sign +9 +6 +5 -1.5 -3 +8 +7 +1.5 +4
Wilcoxon signed-rank test statistic is the sum of the positive ranks, denoted
by T .
Anybody can propose a test, the difficulty is to figure out the property of the
test.
For T ,
• if no positive rank, T = 0;
• if all positive, T = 1 + 2 + · · · + n = n(n + 1)/2, where n is the number
of observations with nonzero difference
• The mean for T becomes n(n + 1)/4 and variance under H0 is
var0(T ) =n(n + 1)(2n + 1)
24
• a large sample test
Z =T − n(n+1)
4√n(n+1)(2n+1)
24
∼ N(0, 1), under H0
Ex 12.2
Z =T − n(n+1)
4√n(n+1)(2n+1)
24
=40.5 − 9(9+1)
4√9(9+1)(2×9+1)
24
= 2.13
which yields a P -value of 0.0166 (1-sided).
Two-sided p-value is 2 × 0.0166 = 0.0313.
Wilcoxon-Mann-Whitney (WMW) test for two independent samples
• I use WMW is because this test has been proposed at least 7 times (Kruskal 1957 J
Am Stat Assoc 52: 356–360).
• Idea: Suppose we have n1 observations from group 1 and denoted as
x1, x2, · · · , xn1 ). We also have n2 observations from group 2, denoted as
y1, y2, · · · , yn2 .
•
x1 x2 · · · xn1 Total
y1
y2
.
.
.
yn2
Total U
• In each cell, if xi < yj we put 1, if xi > yj , we put 0, if xi = yj we put 0.5.
There should be n1 × n2 comparisons. Once we are done, we sum them up to get
U statistic (commonly referred to as Wilcoxon-Mann-Whitney U statistic)
Check to see if U < n1n2 − U then use U to proceed, otherwise use
n1n2 − U as U
The distribution of U under H0
• U ranges from 0 to n1 × n2, mean n1 × n2/2;
• Under H0, the variance of U can be shown to be
n1n2(n1 + n2 + 1)
12
•Z =
U − n1n22√
n1n2(n1+n2+1)12
which is asymptotically distributed as N(0, 1).
• This looks different from your book, because
S = U +n1(n1 + 1)
2
where n1 is the sample size for U used for the test.
Ex 12.3
0 7 11 16 Total
8 0 0 1 1 2
10 0 0 1 1 2
12 0 0 0 1 1
15 0 0 0 1 1
Total 0 0 2 4 U = 6
Z =U − n1n2
2√n1n2(n1+n2+1)
12
=6 − 4×4
2√4×4(4+4+1)
12
= −0.58
P -value is 0.28 (1-sided). Two-sided p-value is 2 × 0.28 = 0.56.
Some prefer 0.5 continuity correction so that
Z =U − n1n2
2+ .5√
n1n2(n1+n2+1)12
=6 − 4×4
2+ 0.5√
4×4(4+4+1)12
= −0.433
data a;
input group response @@;
cards;
1 8 1 10 1 12 1 15
0 0 0 7 0 11 0 16
;
proc npar1way wilcoxon data=a;
class group;
var response;
run;
The NPAR1WAY Procedure
Wilcoxon Scores (Rank Sums) for Variable response
Classified by Variable group
Sum of Expected Std Dev Mean
group N Scores Under H0 Under H0 Score
1 4 20.0 18.0 3.464102 5.0
0 4 16.0 18.0 3.464102 4.0
Wilcoxon Two-Sample Test
Statistic 20.0000
Normal Approximation
Z 0.4330
One-Sided Pr > Z 0.3325
Two-Sided Pr > |Z| 0.6650
t Approximation
One-Sided Pr > Z 0.3390
Two-Sided Pr > |Z| 0.6780
Z includes a continuity correction of 0.5.
Kruskal-Wallis Test
Chi-Square 0.3333
DF 1
Pr > Chi-Square 0.5637
napr1way for Wilcoxon-Mann-Whitney
data roc;
input disease out count;
cards;
1 1 35
1 2 46
1 3 27
1 4 4
1 5 3
0 1 32
0 2 67
0 3 60
0 4 11
0 5 2
;
proc npar1way wilcoxon data=roc;
class disease;
var out;
freq count;
run;
The NPAR1WAY Procedure
Wilcoxon Scores (Rank Sums) for Variable out
Classified by Variable disease
Sum of Expected Std Dev Mean
disease N Scores Under H0 Under H0 Score
1 115 14897.0 16560.0 653.005616 129.539130
0 172 26431.0 24768.0 653.005616 153.668605
Average scores were used for ties.
Wilcoxon Two-Sample Test
Statistic 14897.0000
Normal Approximation
Z -2.5459
One-Sided Pr < Z 0.0054
Two-Sided Pr > |Z| 0.0109
t Approximation
One-Sided Pr < Z 0.0057
Two-Sided Pr > |Z| 0.0114
Z includes a continuity correction of 0.5.
Kruskal-Wallis Test
Chi-Square 6.4856
DF 1
Pr > Chi-Square 0.0109
Misconception of WMW test
Hart A. 2001. Mann-Whitney test is not just a test of medians: differences in
spread can be important. BMJ 323: 391–3.
If no distribution assumption is made, then the null hypothesis is the
probability that a member of the first population drawn at random will
exceed a member of the second population drawn at random is 50%
Blind date in City of Toronto: the probability of the man taller than the
women.
If we assume two population distributions have same shape, then WMW is
testing the equality of medians. Otherwise, it is not testing the equality of
medians.
Hart also note ‘As Altman states, one form of the test statistic is an estimate
of the probability that one variable is less than the other, although this
statistic is not output by many statistical packages’.
Here I present a simple way using SAS proc freq with measures option.
This is because Pr(X < Y ) can also be expressed in terms of and Sommers’ d
(Somers, 1962, American Sociological Review 27: 799-811) as
Pr(X < Y ) = (d + 1)/2.
SAS gives d and its standard error, from which we can obtain estimate for
Pr(X < Y ) and its confidence interval because
var[
Pr(X < Y )]
=var(d)
4⇒ s.e.( Pr(X < Y )) =
s.e.(d)
2
Example. A clinical trial (Aurlien et al 1998 Bone Marrow Transplant 21:
873–878 ) involving 35 patients with malignant lymphoma was conducted to
estimate the effect in response between Hodgkin’s disease patients and
non-Hodgkin’s patients with respect to time (days) to neutrophil recovery.
Trt 1 (n1 = 25): 8, 9 , 10 , 10 , 10 , 10 , 10 , 10 , 11 , 11 , 11 , 11 , 12 , 12 ,
12 , 12, 13 , 13 , 13 , 13 , 13 , 14 , 14 , 14 , 15
Trt2 (n2 = 10): 10, 10, 11, 11 , 11 , 12 , 13 , 16 , 17, 24
Denoting X and Y as the responses obtained from non-Hodgkin’s and
Hodgkin’s patients, respectively, we are interested in Pr(X < Y ).
data a;
input group response @@;
cards;
1 8 1 9 1 10 1 10 1 10 1 10 1 10 1 10 1 11 1 11 1 11 1 11 1 12 1 12 1 12 1 12
1 13 1 13 1 13 1 13 1 13 1 14 1 14 1 14 1 15
2 10 2 10 2 11 2 11 2 11 2 12 2 13 2 16 2 17 2 24
;
proc freq;
tables group* response/ measures cl;
test measures;
run;
The FREQ Procedure
Statistics for Table of group by response
Statistic Value ASE
Gamma 0.2074 0.2480
Kendall’s Tau-b 0.1250 0.1519
Stuart’s Tau-c 0.1469 0.1805
Somers’ D C|R 0.1800 0.2192
Somers’ D R|C 0.0869 0.1057
Pearson Correlation 0.2998 0.1607
Spearman Correlation 0.1429 0.1742
Lambda Asymmetric C|R 0.0370 0.0813
Lambda Asymmetric R|C 0.3000 0.1449
Lambda Symmetric 0.1081 0.0687
Uncertainty Coefficient C|R 0.0895 0.0290
Uncertainty Coefficient R|C 0.3083 0.1066
Uncertainty Coefficient Symmetric 0.1388 0.0451
Sample Size = 35
PROC FREQ output provides d = 0.18 with standard error 0.2192, this gives
Pr(X < Y ) = 0.18+12
= 0.59 with standard error
s.e.( Pr(X < Y )) =0.2192
2= 0.1096
95% CI for Pr(X < Y ) is then
0.59 ± 1.96 × 0.1096 = (0.38, 0.80)
which include 0.50.
This confidence interval works only if point estimate is not far away from 0.50.
Otherwise the limits could be outside the range (0,1).
Newcombe (2006, Statistics in Medicine, 25(4): 543–573) was able to write
two papers on the topic.
data a;
input group response @@;
cards;
1 8 1 10 1 12 1 15
0 0 0 7 0 11 0 16
;
proc freq;
tables group* response/ measures cl;
test measures;
run;
Somers’ D C|R
Somers’ D C|R 0.2500
ASE 0.4330
95\% Lower Conf Limit -0.5987
95\% Upper Conf Limit 1.0000
Test of H0: Somers’ D C|R = 0
ASE under H0 0.4330
Z 0.5774
One-sided Pr > Z 0.2819
Two-sided Pr > |Z| 0.5637
Somers’ D R|C
Somers’ D R|C 0.1429
ASE 0.2474
95\% Lower Conf Limit -0.3421
95\% Upper Conf Limit 0.6278
Test of H0: Somers’ D R|C = 0
ASE under H0 0.2474
Z 0.5774
One-sided Pr > Z 0.2819
Two-sided Pr > |Z| 0.5637
Somers’d: C|R, response given row.
Effect: x−y√2×s2
versus x−y√s2 (Cohen’s effect size)
Let xi and yj be normal observations for two independent groups,
respectively. Pr(X < Y ) is given by
Pr(X < Y ) = Φ(x − y√2 × s2
)
where Φ is the Standard Normal Distribution, e.g. Φ(0) = 0.5, Φ(0.3) = 0.62.
Pr(X < Y ) denote the probability of a randomly chosen observation from one
group is less than a randomly chosen observation from the other group.
• Cohen (1977 Statistical Power Analysis for the Behavioral Sciences.
San Diego, CA: Academic Press, Section 2.2.1) attempted to provide an
intuitively compelling and meaningful interpretation for the effect size by
using percent nonoverlap index which he denoted as U3 = Φ( x−y√s2 ).
• What U3 really represents is the proportion of individual scores in one
group that are less than the average of scores in the other group.
Area (A) under the receiver operating characteristic (ROC) curve
The parameter Pr(X < Y ) we discussed here is actually the area under the
receiver operating characteristic (ROC) as shown by Bamber (1975, J Math
Psychol 12: 387–415).
ROC plots were developed in the 1950s for evaluating radar signal detection.
Hanley and McNeil (1982 Radiology 143: 29–36) is a classic. Such plot is
obtained by calculating the sensitivity and specificity for every distinct
observed data value and plotting sensitivity against 1-specificity.
The area under the ROC curve is usually regarded as a global measure of
diagnostic accuracy.
A one-page article by Altman and Bland (1994 BMJ 309: 188) may be a
good starting point in this field.
Test value D+ D− Pr(T+|D+) Pr(T−|D−)
< 1=(T+) 50/50 0/50
1 2 28 1=(T−), > 1=(T+) 48/50 28/50
2 4 14 1,2=(T−), > 2=(T+) 44/50 42/50
3 10 5 < 4=(T−), 4, 5=(T+) 34/50 47/50
4 14 2 < 5=(T−), 5=(T+) 20/50 49/50
5 20 1 > 5=(T+) 0/50 50/50
Total 50 50
True positive False positive
Sensitivity=Pr(T+|D+) 1-specificity=1 − Pr(T−|D−)
1.00 1.00
0.98 0.44
0.88 0.16
0.68 0.06
0.40 0.02
0 0
SAS proc freq to calculate AUC = 0.91 and its standard error 0.02925.
options nocenter ls=64;
data roc;
input disease out count;
cards;
1 1 2
1 2 4
1 3 10
1 4 14
1 5 20
0 1 28
0 2 14
0 3 5
0 4 2
0 5 1
;
proc freq;
tables disease*out/norow nocol nopercent measures cl;
weight count;
run;
test results
1 2 3 4 5
D+ 2 4 10 14 20
D− 28 14 5 2 1
Diagnostic accuracy means given disease status, what is the probability of a
test results. In this case, it is Somers’ D C|R
Statistics for Table of disease by out
Statistic Value ASE
Gamma 0.8952 0.0450
Kendall’s Tau-b 0.6543 0.0477
Stuart’s Tau-c 0.8200 0.0585
Somers’ D C|R 0.8200 0.0585
Somers’ D R|C 0.5220 0.0393
Pearson Correlation 0.7322 0.0544
Spearman Correlation 0.7284 0.0525
Lambda Asymmetric C|R 0.2571 0.0578
Lambda Asymmetric R|C 0.7200 0.0763
Lambda Symmetric 0.4500 0.0598
Uncertainty Coefficient C|R 0.2084 0.0396
Uncertainty Coefficient R|C 0.4737 0.0881
Uncertainty Coefficient Symmetric 0.2895 0.0546
Sample Size = 100
In diagnostic research, the area under the ROC curve is close to 1, the simple
CI method may produce upper limit that is greater than 1.
To avoid this, one may take a logit transformation logit(A) = ln A1−A
The 95% CI for logit(A) is given by
(l, u) = logit(A) ± Z0.975s.e.(A)
A(1 − A)
CI for A is thenel
1 + el,
eu
1 + eu
Ex: A = 0.91 and s.e.(A) = 0.02925.
(l, u) = log.91
1 − .91± 1.96 × 0.02925
.91(1 − .91)= (1.613635, 3.013635)
e1.613635
1 + e1.613635,
e3.013635
1 + e3.013635= (0.83, 0.95)
As an example for using continuous data as diagnostic tool, consider data
presented by Altman and Bland (1994 BMJ 309: 188)
• Values of an index of mixed epidermal cell lymphocyte reactions in
bone-marrow transplant recipients who did or did not develop
graft-versus-host disease.
• Without GVHD: .27 .31 .39 .48 .49 .50 .81 .82 .86 .92 1.10 1.52 1.88
2.01 2.40 2.45 2.60 2.64 3.78 4.72
• With GvHD: 1.1 1.16 1.45 1.50 1.85 2.30 2.34 2.44 3.7 3.73 4.13 4.52
4.52 4.71 5.07 9 10.11
data a;
do i = 1 to 20;
group=1;
input response @@;
output;
end;
do i =1 to 17;
group =2;
input response @@;
output;
end;
cards;
.27 .31 .39 .48 .49 .50 .81 .82 .86 .92 1.10 1.52 1.88 2.01 2.40
2.45 2.60 2.64 3.78 4.72
1.1 1.16 1.45 1.50 1.85 2.30 2.34 2.44 3.7 3.73 4.13 4.52 4.52 4.71 5.07 9 10.11
;
proc freq ;
tables group*response / measures CL;
run;
Statistics for Table of group by response
95\%
Statistic Value ASE Confidence Limits
Gamma 0.5929 0.1424 0.3138 0.8720
Kendall’s Tau-b 0.4230 0.1019 0.2233 0.6228
Stuart’s Tau-c 0.5873 0.1421 0.3088 0.8657
Somers’ D C|R 0.5912 0.1421 0.3126 0.8698
Somers’ D R|C 0.3027 0.0733 0.1590 0.4464
Pearson Correlation 0.4974 0.0918 0.3174 0.6774
Spearman Correlation 0.5105 0.1229 0.2696 0.7515
Lambda Asymmetric C|R 0.0286 0.0630 0.0000 0.1520
Lambda Asymmetric R|C 0.9412 0.0571 0.8293 1.0000
Lambda Symmetric 0.3269 0.0621 0.2051 0.4487
Uncertainty Coefficient C|R 0.1845 0.0073 0.1701 0.1989
Uncertainty Coefficient R|C 0.9457 0.0373 0.8725 1.0000
Uncertainty Coefficient Symmetric 0.3088 0.0120 0.2853 0.3322
Sample Size = 37
The AUC is estimated as 1.59122 with 95% Interval given by
( 1.31262 , 1.8698
2 ), i.e.,
0.7956 (.6563, .9349)
Criterion for interpretation of area under ROC curve
AUC Interpretation
0.50 to 0.75 fair
0.75 to 0.92 good
0.92 to 0.97 very good
0.97 to 1.00 excellent
Non-parametric for k > 2 independent samples (p. 558)
• Non-parametric ANOVA, Kruskal-wallis test
• Assume there are k populations to be compared and that a sample of nj
observations is available from pop j, j = 1, 2, · · · , k;
• The null hypothesis is that all populations have the same prob
distribution;
• All obs ranked without regard to group membership and then the sums of
ranks of the observations in each group are calculated. Denote these rank
sums as R1, R2, · · · , Rk; The degree to which the Rj ’s differ is given by
KW =12
N(N + 1)
k∑j=1
R2j
nj− 3(N + 1)
where N is total sample size.
• Under H0, KW distributed as χ2k−1.
Ex (Int J Cancer 1980)
Number of Glucocorticoid Receptor (GR) sites per Leukocyte Cell
• (N)ormal: 3500, 3500, 3500,
4000,4000,4000,4300,4500,4500,4900,5200,6000,6750,8000
• (H)airy-cell leukemia; 5710, 6110,8060,880,11400;
• (C)hronic Lymphatic; 2390, 3330, 3580, 3880, 4280, 5120;
• Chronic (M)yelocytic: 6320, 6860, 11400, 14000
• (A)cute: 3230, 3880, 7640, 7890, 8280, 16200, 18250, 29900
data leukaemia;
input group$ ngrs;
cards;
N 3500
N 3500
N 3500
....
A 16200
A 18250
A 29900
;
proc boxplot;
plot ngrs*group;
run;
proc npar1way data=leukaemia wilcoxon;
class group;
var ngrs;
run;
The NPAR1WAY Procedure
Wilcoxon Scores (Rank Sums) for Variable ngrs
Classified by Variable group
Sum of Expected Std Dev Mean
group N Scores Under H0 Under H0 Score
N 14 202.00 266.0 31.911394 14.428571
H 5 133.50 95.0 22.494577 26.700000
C 6 50.50 114.0 24.253494 8.416667
M 4 114.50 76.0 20.431714 28.625000
A 8 202.50 152.0 27.087058 25.312500
Average scores were used for ties.
Kruskal-Wallis Test
Chi-Square 16.6682
DF 4
Pr > Chi-Square 0.0022
Spearman (Rank) correlation (p. 560)
x cigar y exc =⇒ Rx Ry
20 0 11.5 1.5
0 0 3 1.5
20 1 11.5 3
10 2 10 4
5 3 8.5 5.5
4 5 7 8.5
3 5 6 8.5
5 6 8.5 10
0 3 3 5.5
0 4 3 7
0 7 3 11
0 8 3 12
rS =COV (Rx, Ry)√var(Rx)var(Ry)
=−5.64√
12.05 × 12.86= −0.453
CI for ρS :
• CI for .5 ln 1+ρS
1−ρS: (l.u) = .5 ln 1+(−0.453)
1−(−.453) ± 1.96/√
12 − 3
• CI for ρS :
e2l − 1e2l + 1
=e2×(−1.141802) − 1e2×(−1.141802) + 1
= −0.815
e2u − 1e2u + 1
=e2×(0.1648649) − 1e2×(0.1648649) + 1
= 0.163
data spearman;
input cigar exc @@;
cards;
20 0
0 0 20 1
10 2
5 3 4 5
3 5 5 6 0 3 0 4 0 7 0 8
;
proc corr SPEARMAN FISHER;
run;
The SAS System 12:19 Wednesday, November 22, 2006 46
The CORR Procedure
2 Variables: cigar exc
Simple Statistics
Variable N Mean Std Dev Median Minimum Maximum
cigar 12 5.58333 7.39113 3.50000 0 20.00000
exc 12 3.66667 2.64002 3.50000 0 8.00000
Spearman Correlation Coefficients, N = 12
Prob > |r| under H0: Rho=0
cigar exc
cigar 1.00000 -0.45366
0.1385
exc -0.45366 1.00000
0.1385
Spearman Correlation Statistics (Fisher’s z Transformation)
With Sample Bias Correlation
Variable Variable N Correlation Fisher’s z Adjustment Estimate
cigar exc 12 -0.45366 -0.48929 -0.02062 -0.43713
Spearman Correlation Statistics (Fisher’s z Transformation)
With p Value for
Variable Variable 95\% Confidence Limits H0:Rho=0
cigar exc -0.808262 0.182578 0.1421
data spearman;
input cigar exc @@;
cards;
20 0
0 0 20 1
10 2
5 3 4 5
3 5 5 6 0 3 0 4 0 7 0 8
;
proc corr SPEARMAN FISHER (BIASADJ=no);
run;
The CORR Procedure
2 Variables: cigar exc
Simple Statistics
Variable N Mean Std Dev Median Minimum Maximum
cigar 12 5.58333 7.39113 3.50000 0 20.00000
exc 12 3.66667 2.64002 3.50000 0 8.00000
Spearman Correlation Coefficients, N = 12
Prob > |r| under H0: Rho=0
cigar exc
cigar 1.00000 -0.45366
0.1385
exc -0.45366 1.00000
0.1385
Spearman Correlation Statistics (Fisher’s z Transformation)
With Sample p Value for
Variable Variable N Correlation Fisher’s z 95\% Confidence Limits H0:Rho=0
cigar exc 12 -0.45366 -0.48929 -0.815293 0.162572 0.1421
• Sample size (Noether GE. Sample size determination for some common
nonparametric statistics. J Am Stat Assoc 1987;82:6457). No reference
list.
• Wilcon-Mann-Whitney test for 2-independent samples
n =(Z1−α/2 + Z1−β)2
6(p − 0.50)2
where n is size of each group and p = Pr(X < Y ).
• 1st paragraph of Statistical Analysis section: ‘Estimates of sample size
were based on the number of new enhancing lesions observed during the
first 12 weeks after the first infusion in a previous clinical trial of
natalizumab. Using methods based on the Wilcoxon-Mann-Whitney
statistic (Noether, 1987) appropriate for a two-group comparison at a
two-sided level of significance of 5 percent, we calculated that
approximately 73 patients were needed in each group for the study to
have 80 percent power.’ (NEJM 348 (1): 15-23 JAN 2 2003)
•p =
Z1−α/2 + Z1−β√6 × n
=1.96 + 0.84√
6 × 73≈ 0.63.