Week 11(2): Introduction to non-parametric methodspublish.uwo.ca/~gzou2/W11.2s.pdf · (Week 11(2): Introduction to non-parametric methods) GY Zou ... by T. Anybody can propose

E509A: Principle of Biostatistics

(Week 11(2): Introduction to non-parametricmethods )

GY Zou

[email protected]

Sign test for two dependent samples

Ex 12.1

subj 1 2 3 4 5 6 7 8 9 10

baseline 166 135 189 180 156 142 176 156 164 142

post 138 120 176 180 160 150 152 140 160 130

diff 28 15 13 0 -4 -8 24 16 4 12

sign + + + 0 - - + + + +

Is 7 ‘+’ significant ?

Under H0, each subject could get a ‘+’ with p = 0.5. We thus can use

binomial distribution to obtain P -value. Let x denote the number of ‘+’ signs,

then

Pr(X = x) =(n

x

)0.5x(1 − 0.5)n−x =

(10

x

)0.5n

x 0 1 2 3 4 5 6 7 8 9 10

Pr 0.0010 0.0098 0.0439 0.1172 0.2051 0.2461 0.2051 0.1172 0.0439 0.0098 0.0010

By definition

P = 0.1172 + 0.0439 + 0.0098 + 0.0010 = 0.1719

which is one-sided. Two-sided P = 2 × 0.1719 = 0.3438

mid-P = .5 × 0.1172 + 0.0439 + 0.0098 + 0.0010 = .1133,

two-sided mid-P = 2 × 0.1133 = 0.2266.

Sign test disregards a lot of information.

The Wilcoxon signed-rank test (for two dependent samples)

The null hypothesis is the median of the differences is 0, i.e., H0 : Md = 0.

subj 1 2 3 4 5 6 7 8 9 10

diff 28 15 13 0 -4 -8 24 16 4 12

sign + + + - - + + + +

rank 9 6 5 1.5 3 8 7 1.5 4

rank×sign +9 +6 +5 -1.5 -3 +8 +7 +1.5 +4

Wilcoxon signed-rank test statistic is the sum of the positive ranks, denoted

by T .

Anybody can propose a test, the difficulty is to figure out the property of the

test.

For T ,

• if no positive rank, T = 0;

• if all positive, T = 1 + 2 + · · · + n = n(n + 1)/2, where n is the number

of observations with nonzero difference

• The mean for T becomes n(n + 1)/4 and variance under H0 is

var0(T ) =n(n + 1)(2n + 1)

24

• a large sample test

Z =T − n(n+1)

4√n(n+1)(2n+1)

24

∼ N(0, 1), under H0

Ex 12.2

Z =T − n(n+1)

4√n(n+1)(2n+1)

24

=40.5 − 9(9+1)

4√9(9+1)(2×9+1)

24

= 2.13

which yields a P -value of 0.0166 (1-sided).

Two-sided p-value is 2 × 0.0166 = 0.0313.

Wilcoxon-Mann-Whitney (WMW) test for two independent samples

• I use WMW is because this test has been proposed at least 7 times (Kruskal 1957 J

Am Stat Assoc 52: 356–360).

• Idea: Suppose we have n1 observations from group 1 and denoted as

x1, x2, · · · , xn1 ). We also have n2 observations from group 2, denoted as

y1, y2, · · · , yn2 .

•

x1 x2 · · · xn1 Total

y1

y2

.

.

.

yn2

Total U

• In each cell, if xi < yj we put 1, if xi > yj , we put 0, if xi = yj we put 0.5.

There should be n1 × n2 comparisons. Once we are done, we sum them up to get

U statistic (commonly referred to as Wilcoxon-Mann-Whitney U statistic)

Check to see if U < n1n2 − U then use U to proceed, otherwise use

n1n2 − U as U

The distribution of U under H0

• U ranges from 0 to n1 × n2, mean n1 × n2/2;

• Under H0, the variance of U can be shown to be

n1n2(n1 + n2 + 1)

12

•Z =

U − n1n22√

n1n2(n1+n2+1)12

which is asymptotically distributed as N(0, 1).

• This looks different from your book, because

S = U +n1(n1 + 1)

2

where n1 is the sample size for U used for the test.

Ex 12.3

0 7 11 16 Total

8 0 0 1 1 2

10 0 0 1 1 2

12 0 0 0 1 1

15 0 0 0 1 1

Total 0 0 2 4 U = 6

Z =U − n1n2

2√n1n2(n1+n2+1)

12

=6 − 4×4

2√4×4(4+4+1)

12

= −0.58

P -value is 0.28 (1-sided). Two-sided p-value is 2 × 0.28 = 0.56.

Some prefer 0.5 continuity correction so that

Z =U − n1n2

2+ .5√

n1n2(n1+n2+1)12

=6 − 4×4

2+ 0.5√

4×4(4+4+1)12

= −0.433

data a;

input group response @@;

cards;

1 8 1 10 1 12 1 15

0 0 0 7 0 11 0 16

;

proc npar1way wilcoxon data=a;

class group;

var response;

run;

The NPAR1WAY Procedure

Wilcoxon Scores (Rank Sums) for Variable response

Classified by Variable group

Sum of Expected Std Dev Mean

group N Scores Under H0 Under H0 Score

1 4 20.0 18.0 3.464102 5.0

0 4 16.0 18.0 3.464102 4.0

Wilcoxon Two-Sample Test

Statistic 20.0000

Normal Approximation

Z 0.4330

One-Sided Pr > Z 0.3325

Two-Sided Pr > |Z| 0.6650

t Approximation

One-Sided Pr > Z 0.3390


Z includes a continuity correction of 0.5.

Kruskal-Wallis Test

Chi-Square 0.3333

DF 1

Pr > Chi-Square 0.5637

napr1way for Wilcoxon-Mann-Whitney

data roc;

input disease out count;

cards;

1 1 35

1 2 46

1 3 27

1 4 4

1 5 3

0 1 32

0 2 67

0 3 60

0 4 11

0 5 2

;

proc npar1way wilcoxon data=roc;

class disease;

var out;

freq count;

run;


Wilcoxon Scores (Rank Sums) for Variable out

Classified by Variable disease


disease N Scores Under H0 Under H0 Score

1 115 14897.0 16560.0 653.005616 129.539130

0 172 26431.0 24768.0 653.005616 153.668605

Average scores were used for ties.

Wilcoxon Two-Sample Test

Statistic 14897.0000

Normal Approximation

Z -2.5459

One-Sided Pr < Z 0.0054


t Approximation

One-Sided Pr < Z 0.0057


Z includes a continuity correction of 0.5.

Kruskal-Wallis Test

Chi-Square 6.4856

DF 1


Misconception of WMW test

Hart A. 2001. Mann-Whitney test is not just a test of medians: differences in

spread can be important. BMJ 323: 391–3.

If no distribution assumption is made, then the null hypothesis is the

probability that a member of the first population drawn at random will

exceed a member of the second population drawn at random is 50%

Blind date in City of Toronto: the probability of the man taller than the

women.

If we assume two population distributions have same shape, then WMW is

testing the equality of medians. Otherwise, it is not testing the equality of

medians.

Hart also note ‘As Altman states, one form of the test statistic is an estimate

of the probability that one variable is less than the other, although this

statistic is not output by many statistical packages’.

Here I present a simple way using SAS proc freq with measures option.

This is because Pr(X < Y ) can also be expressed in terms of and Sommers’ d

(Somers, 1962, American Sociological Review 27: 799-811) as

Pr(X < Y ) = (d + 1)/2.

SAS gives d and its standard error, from which we can obtain estimate for

Pr(X < Y ) and its confidence interval because

var[

Pr(X < Y )]

=var(d)

4⇒ s.e.( Pr(X < Y )) =

s.e.(d)

2

Example. A clinical trial (Aurlien et al 1998 Bone Marrow Transplant 21:

873–878 ) involving 35 patients with malignant lymphoma was conducted to

estimate the effect in response between Hodgkin’s disease patients and

non-Hodgkin’s patients with respect to time (days) to neutrophil recovery.

Trt 1 (n1 = 25): 8, 9 , 10 , 10 , 10 , 10 , 10 , 10 , 11 , 11 , 11 , 11 , 12 , 12 ,

12 , 12, 13 , 13 , 13 , 13 , 13 , 14 , 14 , 14 , 15

Trt2 (n2 = 10): 10, 10, 11, 11 , 11 , 12 , 13 , 16 , 17, 24

Denoting X and Y as the responses obtained from non-Hodgkin’s and

Hodgkin’s patients, respectively, we are interested in Pr(X < Y ).

data a;


cards;

1 8 1 9 1 10 1 10 1 10 1 10 1 10 1 10 1 11 1 11 1 11 1 11 1 12 1 12 1 12 1 12

1 13 1 13 1 13 1 13 1 13 1 14 1 14 1 14 1 15

2 10 2 10 2 11 2 11 2 11 2 12 2 13 2 16 2 17 2 24

;

proc freq;

tables group* response/ measures cl;

test measures;

run;

The FREQ Procedure

Statistics for Table of group by response

Statistic Value ASE

Gamma 0.2074 0.2480

Kendall’s Tau-b 0.1250 0.1519

Stuart’s Tau-c 0.1469 0.1805

Somers’ D C|R 0.1800 0.2192

Somers’ D R|C 0.0869 0.1057

Pearson Correlation 0.2998 0.1607

Spearman Correlation 0.1429 0.1742

Lambda Asymmetric C|R 0.0370 0.0813

Lambda Asymmetric R|C 0.3000 0.1449

Lambda Symmetric 0.1081 0.0687

Uncertainty Coefficient C|R 0.0895 0.0290

Uncertainty Coefficient R|C 0.3083 0.1066

Uncertainty Coefficient Symmetric 0.1388 0.0451

Sample Size = 35

PROC FREQ output provides d = 0.18 with standard error 0.2192, this gives

Pr(X < Y ) = 0.18+12

= 0.59 with standard error

s.e.( Pr(X < Y )) =0.2192

2= 0.1096

95% CI for Pr(X < Y ) is then

0.59 ± 1.96 × 0.1096 = (0.38, 0.80)

which include 0.50.

This confidence interval works only if point estimate is not far away from 0.50.

Otherwise the limits could be outside the range (0,1).

Newcombe (2006, Statistics in Medicine, 25(4): 543–573) was able to write

two papers on the topic.

data a;


cards;

1 8 1 10 1 12 1 15

0 0 0 7 0 11 0 16

;

proc freq;

tables group* response/ measures cl;

test measures;

run;

Somers’ D C|R

Somers’ D C|R 0.2500

ASE 0.4330

95\% Lower Conf Limit -0.5987

95\% Upper Conf Limit 1.0000

Test of H0: Somers’ D C|R = 0

ASE under H0 0.4330

Z 0.5774

One-sided Pr > Z 0.2819

Two-sided Pr > |Z| 0.5637

Somers’ D R|C

Somers’ D R|C 0.1429

ASE 0.2474

95\% Lower Conf Limit -0.3421

95\% Upper Conf Limit 0.6278

Test of H0: Somers’ D R|C = 0

ASE under H0 0.2474

Z 0.5774

One-sided Pr > Z 0.2819

Two-sided Pr > |Z| 0.5637

Somers’d: C|R, response given row.

Effect: x−y√2×s2

versus x−y√s2 (Cohen’s effect size)

Let xi and yj be normal observations for two independent groups,

respectively. Pr(X < Y ) is given by

Pr(X < Y ) = Φ(x − y√2 × s2

)

where Φ is the Standard Normal Distribution, e.g. Φ(0) = 0.5, Φ(0.3) = 0.62.

Pr(X < Y ) denote the probability of a randomly chosen observation from one

group is less than a randomly chosen observation from the other group.

• Cohen (1977 Statistical Power Analysis for the Behavioral Sciences.

San Diego, CA: Academic Press, Section 2.2.1) attempted to provide an

intuitively compelling and meaningful interpretation for the effect size by

using percent nonoverlap index which he denoted as U3 = Φ( x−y√s2 ).

• What U3 really represents is the proportion of individual scores in one

group that are less than the average of scores in the other group.

Area (A) under the receiver operating characteristic (ROC) curve

The parameter Pr(X < Y ) we discussed here is actually the area under the

receiver operating characteristic (ROC) as shown by Bamber (1975, J Math

Psychol 12: 387–415).

ROC plots were developed in the 1950s for evaluating radar signal detection.

Hanley and McNeil (1982 Radiology 143: 29–36) is a classic. Such plot is

obtained by calculating the sensitivity and specificity for every distinct

observed data value and plotting sensitivity against 1-specificity.

The area under the ROC curve is usually regarded as a global measure of

diagnostic accuracy.

A one-page article by Altman and Bland (1994 BMJ 309: 188) may be a

good starting point in this field.

Test value D+ D− Pr(T+|D+) Pr(T−|D−)

< 1=(T+) 50/50 0/50

1 2 28 1=(T−), > 1=(T+) 48/50 28/50

2 4 14 1,2=(T−), > 2=(T+) 44/50 42/50

3 10 5 < 4=(T−), 4, 5=(T+) 34/50 47/50

4 14 2 < 5=(T−), 5=(T+) 20/50 49/50

5 20 1 > 5=(T+) 0/50 50/50

Total 50 50

True positive False positive

Sensitivity=Pr(T+|D+) 1-specificity=1 − Pr(T−|D−)

1.00 1.00

0.98 0.44

0.88 0.16

0.68 0.06

0.40 0.02

0 0

SAS proc freq to calculate AUC = 0.91 and its standard error 0.02925.

options nocenter ls=64;

data roc;

input disease out count;

cards;

1 1 2

1 2 4

1 3 10

1 4 14

1 5 20

0 1 28

0 2 14

0 3 5

0 4 2

0 5 1

;

proc freq;

tables disease*out/norow nocol nopercent measures cl;

weight count;

run;

test results

1 2 3 4 5

D+ 2 4 10 14 20

D− 28 14 5 2 1

Diagnostic accuracy means given disease status, what is the probability of a

test results. In this case, it is Somers’ D C|R

Statistics for Table of disease by out

Statistic Value ASE

Gamma 0.8952 0.0450

Kendall’s Tau-b 0.6543 0.0477

Stuart’s Tau-c 0.8200 0.0585

Somers’ D C|R 0.8200 0.0585

Somers’ D R|C 0.5220 0.0393

Pearson Correlation 0.7322 0.0544

Spearman Correlation 0.7284 0.0525

Lambda Asymmetric C|R 0.2571 0.0578

Lambda Asymmetric R|C 0.7200 0.0763

Lambda Symmetric 0.4500 0.0598

Uncertainty Coefficient C|R 0.2084 0.0396

Uncertainty Coefficient R|C 0.4737 0.0881

Uncertainty Coefficient Symmetric 0.2895 0.0546

Sample Size = 100

In diagnostic research, the area under the ROC curve is close to 1, the simple

CI method may produce upper limit that is greater than 1.

To avoid this, one may take a logit transformation logit(A) = ln A1−A

The 95% CI for logit(A) is given by

(l, u) = logit(A) ± Z0.975s.e.(A)

A(1 − A)

CI for A is thenel

1 + el,

eu

1 + eu

Ex: A = 0.91 and s.e.(A) = 0.02925.

(l, u) = log.91

1 − .91± 1.96 × 0.02925

.91(1 − .91)= (1.613635, 3.013635)

e1.613635

1 + e1.613635,

e3.013635

1 + e3.013635= (0.83, 0.95)

As an example for using continuous data as diagnostic tool, consider data

presented by Altman and Bland (1994 BMJ 309: 188)

• Values of an index of mixed epidermal cell lymphocyte reactions in

bone-marrow transplant recipients who did or did not develop

graft-versus-host disease.

• Without GVHD: .27 .31 .39 .48 .49 .50 .81 .82 .86 .92 1.10 1.52 1.88

2.01 2.40 2.45 2.60 2.64 3.78 4.72

• With GvHD: 1.1 1.16 1.45 1.50 1.85 2.30 2.34 2.44 3.7 3.73 4.13 4.52

4.52 4.71 5.07 9 10.11

data a;

do i = 1 to 20;

group=1;

input response @@;

output;

end;

do i =1 to 17;

group =2;

input response @@;

output;

end;

cards;

.27 .31 .39 .48 .49 .50 .81 .82 .86 .92 1.10 1.52 1.88 2.01 2.40

2.45 2.60 2.64 3.78 4.72

1.1 1.16 1.45 1.50 1.85 2.30 2.34 2.44 3.7 3.73 4.13 4.52 4.52 4.71 5.07 9 10.11

;

proc freq ;

tables group*response / measures CL;

run;

Statistics for Table of group by response

95\%

Statistic Value ASE Confidence Limits

Gamma 0.5929 0.1424 0.3138 0.8720

Kendall’s Tau-b 0.4230 0.1019 0.2233 0.6228

Stuart’s Tau-c 0.5873 0.1421 0.3088 0.8657

Somers’ D C|R 0.5912 0.1421 0.3126 0.8698

Somers’ D R|C 0.3027 0.0733 0.1590 0.4464

Pearson Correlation 0.4974 0.0918 0.3174 0.6774

Spearman Correlation 0.5105 0.1229 0.2696 0.7515

Lambda Asymmetric C|R 0.0286 0.0630 0.0000 0.1520

Lambda Asymmetric R|C 0.9412 0.0571 0.8293 1.0000

Lambda Symmetric 0.3269 0.0621 0.2051 0.4487

Uncertainty Coefficient C|R 0.1845 0.0073 0.1701 0.1989

Uncertainty Coefficient R|C 0.9457 0.0373 0.8725 1.0000

Uncertainty Coefficient Symmetric 0.3088 0.0120 0.2853 0.3322

Sample Size = 37

The AUC is estimated as 1.59122 with 95% Interval given by

( 1.31262 , 1.8698

2 ), i.e.,

0.7956 (.6563, .9349)

Criterion for interpretation of area under ROC curve

AUC Interpretation

0.50 to 0.75 fair

0.75 to 0.92 good

0.92 to 0.97 very good

0.97 to 1.00 excellent

Non-parametric for k > 2 independent samples (p. 558)

• Non-parametric ANOVA, Kruskal-wallis test

• Assume there are k populations to be compared and that a sample of nj

observations is available from pop j, j = 1, 2, · · · , k;

• The null hypothesis is that all populations have the same prob

distribution;

• All obs ranked without regard to group membership and then the sums of

ranks of the observations in each group are calculated. Denote these rank

sums as R1, R2, · · · , Rk; The degree to which the Rj ’s differ is given by

KW =12

N(N + 1)

k∑j=1

R2j

nj− 3(N + 1)

where N is total sample size.

• Under H0, KW distributed as χ2k−1.

Ex (Int J Cancer 1980)

Number of Glucocorticoid Receptor (GR) sites per Leukocyte Cell

• (N)ormal: 3500, 3500, 3500,

4000,4000,4000,4300,4500,4500,4900,5200,6000,6750,8000

• (H)airy-cell leukemia; 5710, 6110,8060,880,11400;

• (C)hronic Lymphatic; 2390, 3330, 3580, 3880, 4280, 5120;

• Chronic (M)yelocytic: 6320, 6860, 11400, 14000

• (A)cute: 3230, 3880, 7640, 7890, 8280, 16200, 18250, 29900

data leukaemia;

input group$ ngrs;

cards;

N 3500

N 3500

N 3500

....

A 16200

A 18250

A 29900

;

proc boxplot;

plot ngrs*group;

run;

proc npar1way data=leukaemia wilcoxon;

class group;

var ngrs;

run;


Wilcoxon Scores (Rank Sums) for Variable ngrs

Classified by Variable group


group N Scores Under H0 Under H0 Score

N 14 202.00 266.0 31.911394 14.428571

H 5 133.50 95.0 22.494577 26.700000

C 6 50.50 114.0 24.253494 8.416667

M 4 114.50 76.0 20.431714 28.625000

A 8 202.50 152.0 27.087058 25.312500

Average scores were used for ties.

Kruskal-Wallis Test

Chi-Square 16.6682

DF 4


Spearman (Rank) correlation (p. 560)

x cigar y exc =⇒ Rx Ry

20 0 11.5 1.5

0 0 3 1.5

20 1 11.5 3

10 2 10 4

5 3 8.5 5.5

4 5 7 8.5

3 5 6 8.5

5 6 8.5 10

0 3 3 5.5

0 4 3 7

0 7 3 11

0 8 3 12

rS =COV (Rx, Ry)√var(Rx)var(Ry)

=−5.64√

12.05 × 12.86= −0.453

CI for ρS :

• CI for .5 ln 1+ρS

1−ρS: (l.u) = .5 ln 1+(−0.453)

1−(−.453) ± 1.96/√

12 − 3

• CI for ρS :

e2l − 1e2l + 1

=e2×(−1.141802) − 1e2×(−1.141802) + 1

= −0.815

e2u − 1e2u + 1

=e2×(0.1648649) − 1e2×(0.1648649) + 1

= 0.163

data spearman;

input cigar exc @@;

cards;

20 0

0 0 20 1

10 2

5 3 4 5

3 5 5 6 0 3 0 4 0 7 0 8

;

proc corr SPEARMAN FISHER;

run;

The SAS System 12:19 Wednesday, November 22, 2006 46

The CORR Procedure

2 Variables: cigar exc

Simple Statistics

Variable N Mean Std Dev Median Minimum Maximum

cigar 12 5.58333 7.39113 3.50000 0 20.00000

exc 12 3.66667 2.64002 3.50000 0 8.00000

Spearman Correlation Coefficients, N = 12

Prob > |r| under H0: Rho=0

cigar exc

cigar 1.00000 -0.45366

0.1385

exc -0.45366 1.00000

0.1385

Spearman Correlation Statistics (Fisher’s z Transformation)

With Sample Bias Correlation

Variable Variable N Correlation Fisher’s z Adjustment Estimate

cigar exc 12 -0.45366 -0.48929 -0.02062 -0.43713


With p Value for

Variable Variable 95\% Confidence Limits H0:Rho=0

cigar exc -0.808262 0.182578 0.1421

data spearman;

input cigar exc @@;

cards;

20 0

0 0 20 1

10 2

5 3 4 5

3 5 5 6 0 3 0 4 0 7 0 8

;

proc corr SPEARMAN FISHER (BIASADJ=no);

run;

The CORR Procedure

2 Variables: cigar exc

Simple Statistics

Variable N Mean Std Dev Median Minimum Maximum

cigar 12 5.58333 7.39113 3.50000 0 20.00000

exc 12 3.66667 2.64002 3.50000 0 8.00000

Spearman Correlation Coefficients, N = 12

Prob > |r| under H0: Rho=0

cigar exc

cigar 1.00000 -0.45366

0.1385

exc -0.45366 1.00000

0.1385


With Sample p Value for

Variable Variable N Correlation Fisher’s z 95\% Confidence Limits H0:Rho=0

cigar exc 12 -0.45366 -0.48929 -0.815293 0.162572 0.1421

• Sample size (Noether GE. Sample size determination for some common

nonparametric statistics. J Am Stat Assoc 1987;82:6457). No reference

list.

• Wilcon-Mann-Whitney test for 2-independent samples

n =(Z1−α/2 + Z1−β)2

6(p − 0.50)2

where n is size of each group and p = Pr(X < Y ).

• 1st paragraph of Statistical Analysis section: ‘Estimates of sample size

were based on the number of new enhancing lesions observed during the

first 12 weeks after the first infusion in a previous clinical trial of

natalizumab. Using methods based on the Wilcoxon-Mann-Whitney

statistic (Noether, 1987) appropriate for a two-group comparison at a

two-sided level of significance of 5 percent, we calculated that

approximately 73 patients were needed in each group for the study to

have 80 percent power.’ (NEJM 348 (1): 15-23 JAN 2 2003)

•p =

Z1−α/2 + Z1−β√6 × n

=1.96 + 0.84√

6 × 73≈ 0.63.

Documents

Week 11(2): Introduction to non-parametric methodspublish.uwo.ca/~gzou2/W11.2s.pdf · (Week 11(2): Introduction to non-parametric methods) GY Zou ... by T. Anybody can propose