33
Lecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017

Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

Lecture 12: Non-Parametric Tests

S. Massa, Department of Statistics, University of Oxford

27 January 2017

Page 2: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

Motivation

I Comparing the means of two populations is very important;

I In the last lecture we saw what we can do if we assume thatthe samples are normally distributed.

I Also for large sample sizes, we can invoke the central limittheorem to claim that X, Y are approximately normal.

I However in some cases the data are clearly NOT normal, andthe sample size is too small to invoke the CLT.

Page 3: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

Alternative approach

I Both the z and the t- tests depend on an underlyingassumption:The data are normally distributed.

I Today we will see an alternative approach which isindependent of any assumption about the distribution of thedata.

I These non-parametric tests are very robust:the significance level is known regardless of the distribution ofthe data.

I This is extremely useful as in practice we can hardly checkthese assumptions.

I But of course nothing is perfect: What you gain in robustnessyou lose in power.

Page 4: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

ExampleRecall from Lecture 1 the experiment where infants exercised tomaintain their walking reflex.

I The treatment numbers seem smaller in general, but notquite, while the sample size is quite small.

I ConsiderH0: µT = µC, vs H1: µT < µC.

I The summary statistics are

µT = 10.1, µC = 11.7

sT = 1.45, sC = 1.52.

Page 5: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

Example

I The number of degrees of freedom is 6− 1 + 6− 1 = 10, atthe 10% level the critical value is 1.81 and thus the criticalregion is (−∞,−1.81].

I The pooled sample variance is

sp =

√(6− 1)1.452 + (6− 1)1.522

6 + 6− 2= 1.48,

I the standard error is

SE = sp

√1

6+

1

6= 0.85,

I and finally

tobs =x− ySE

=−1.6

0.85= −1.85.

Conclusion: We reject the null hypothesis, the difference betweenthe two mean is statistically significant.

Page 6: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

What could have gone wrong?

I First of all, the t-statistic was barely inside the critical region.

I The t-test depends on our observations coming from a normaldistribution. Do our data look normal?

Page 7: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

The Rank-Sum Test (Mann-Whitney)

We rank the observations according to their size relative to thewhole sample.

measurements 9.0 9.0 9.5 9.5 9.75 10.0 11.5 11.5 12.0 13.0 13.0 13.25ranks 1 2 3 4 5 6 7 8 9 10 11 12modified ranks 1.5 1.5 3.5 3.5 5 6 7.5 7.5 9 10.5 10.5 12

I When there are ties, we average the ranks to obtain 1.5, 3.5and so on.

I We want to test

H0 : distributions the same, vs H1 : controls are generally larger.

Page 8: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

The Rank-Sum Test (Mann-Whitney)

I Our test statistic R is then simply the sum of the ranks in thesmaller sample.

I In our case nx = ny so you can take either one—take thetreatment sample.

I If the null hypothesis is true, then the ranks should behavelike:a random sample from the numbers 1, . . . , nx + ny.

Page 9: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

The Critical Region

The critical region depends of course on the alternative.

I H1 : controls are generally larger, reject null if rank sum is toosmall

I H′1 : controls are generally smaller, reject null if rank sum istoo large

I H′′1 : distributions not the same, reject null if rank sum is toolarge or too small.

We then have to find the critical table.

Page 10: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

The Critical Table

I Critical values are given for two-tailed test.I Rows and columns correspond to the sizes of the smaller and

larger samples, respectively.I For every combination of row and column, there are two

subrows: the top gives the 10% critical values and the bottomthe 5% ones.

I For a one-sided test at 5% use the relevant top entry.

Page 11: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

ExampleIn our case the rank sum for the treatment group is R = 30.We are performing an one-sided test (alternative hypothesis thatthe treatment values are smaller) hence we would reject forR ≤ 28.Since R = 30 we retain the null.If we were doing the two-sided test, the critical values are 26, 52 atthe 5% level and again we would retain the null.

Page 12: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

Large Samples

I The table only goes up to large sample size 10.

I For larger samples use normal approximation.

z =R− µσ

,

µ =1

2nx(nx + ny + 1),

σ =

√nxny(nx + ny + 1)

12.

I Then compare with the normal table.

I e.g. for two-tailed test at 0.05 reject null if |z| > 1.96.

Page 13: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

Wilcoxon Signed-Rank Test (for Paired Data)

I The general procedure is as follows: we rank the combinedsample by absolute value.

I We then sum the ranks corresponding to positive values R+,and the sum of the ranks of the negative values, R−.

I The Wilcoxon statistic is defined as T = min{R+, R−}.

I We then compare with the appropriate table.

Page 14: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

Example

We consider a comparative study between different methods ofpreparing breasts for breastfeeding.

Each mother treated one breast, leaving the other untreated.

The following data gives the difference in the level of discomfort (1to 4) between treated and untreated breast for a particulartreatment. There are 19 measurements overall.

-0.525, 0.172, -0.577, 0.200, 0.040, -0.143, 0.043, 0.010, 0.000,-0.522, 0.007, -0.122, -0.040, 0.000, -0.100, 0.050, -0.575, 0.031,

-0.060.

Page 15: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

Example

We rank the observations by absolute value:

Diff 0.007 0.010 0.031 0.040 -0.040 0.043 0.050 -0.060 -0.100Rank 1 2 3 4.5 4.5 6 7 8 9

Diff -0.122 -0.143 0.172 0.200 -0.522 -0.525 -0.575 -0.577Rank 10 11 12 13 14 15 16 17

I Notice we dropped the two zero values

I We then compute

R+ = 1 + 2 + 3 + 4.5 + 6 + 7 + 12 + 13 = 48.5, R− = 104.5.

I Therefore T = 48.5.

Page 16: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

Example

I We computedT = 48.5

I Since we dropped twovalues our sample sizeis 19-2=17.

I Looking at thecorresponding row wefind the critical valueof 34 at the 5% level.

I To reject we wouldhave to observeT ≤ 34.

I Therefore the effectof the treatment isnot statisticallysignificant.

Page 17: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

How Would the t-test Do?

I The initial study performed the one tailed t-test at the 5%level.

I The critical value is -1.73.

I Since we have x = −0.11, sx = 0.25

tobs =x− 0

sx/√n

= −1.95

the null is rejected.

I This is really a close call: if the two-tailed test was performedthen the null would have been retained as the critical valuewould have been 2.10.

Page 18: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

Is the t-test Justified?

Does the data look like it comes from a normal distribution? Let’slook at the histogram.

Page 19: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

Median Test

I Suppose you have observations x1, . . . xnx and y1, . . . , yny

from two distinct populations with true medians mx and my.

I We want to test:H0: mx = my vs H1 : mx 6= (<,>)my.

I The test hinges on the following idea:if M is the median of the combined sample, then the x’s andy’s have equal chance of being above M .

I Write Px and Py for the proportion of x’s and y’s above M .

I We want to treat these as proportions of successes in nx andny trials: not straightforward.

Page 20: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

Median Test

Here is the combined ordered sample

9.0, 9.0, 9.5, 9.5, 9.75, 10.0, 11.5, 11.5, 12.0, 13.0, 13.0, 13.25,

where we have coloured the control group with red and thetreatment group with green.The combined median is 10.75. So we have 5 out of 6 of thecontrol group above the median.

Page 21: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

p-value: Exact CalculationHow do we compute the p-value then?

I Suppose that we have nx red balls and ny green balls in a box.

I Pick half of them at random to place above the median:suppose you get kx and ky.

I You expect to get nx/2 red balls and ny/2 green.

I In this particular case you pick 6 balls from 12 (6 red and 6green). What is the probability that you pick at least 5 redunder the null?

I The null states that all picks have equal chance, so

P (at least 5 red)

=# ways to pick 5 red, 1 green + # ways to pick 6 red, 0 green

# ways to pick 6 balls from 12

=

(65

)×(61

)+(66

)(126

)

Page 22: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

p-value: Exact CalculationThe null states that all picks have equal chance, so

# ways to pick 6 red, 0 green =

(6

6

)= 1, you have to pick all the reds

# ways to pick 5 red, 1 green =

(6

5

)(6

1

), 5 out of 6 and then 1 out of 6

# ways to pick 6 balls from 12 =

(12

6

).

Overall we get

P (at least 5 red) =

(6

5

)×(

6

1

)+

(6

6

)(126

) =36 + 1

924= 0.039.

So we reject at the 5% level.The two sided will be 2× 0.039 = 0.078, so we retain the null atthe 5% level.

Page 23: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

p-value: the Approximate Method

I So this seems a bit complicated, even for simple enoughsituations.

I It is slightly easier to use the normal approximation.

I We have observed n+ above median samples.

I The proportions from the x and y samples are px = kx/nxand py = ky/ny respectively.

I Let’s apply a z-test to check if these proportions are really thesame.

I Really? but the two samples are not independent....

Page 24: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

p-value: the Approximate Method

I The joint probability of “success” is estimated by

p =n+n.

I The standard error is given by

SE =√p(1− p)

√1

nx+

1

ny.

I Compute the z-statistic

Z =px − py

SE.

I For the baby walking example

p =1

2, SE =

1

2

√1

6+

1

6= 0.289.

Page 25: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

Continuity Correction I

I Since we observed 5 reds, we have px = 5/6 and py = 1/6.

I This would then give us Z = 2/(6× 0.289) = 2.3

I and a p-value of 0.01 which is way far from our exact value of0.039.

I Why is this?

I We have approximated a discrete distribution by a continuousone, so we must apply the continuity correction.

Page 26: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

Continuity correction I

I We want the probability of at least 5 reds, or of px ≥ 5/6, butwe are approximating with a continuous distribution;

I use the probability of getting at least 4.5 reds, or ofpx ≥ 4.5/6 = 0.75 py ≤ 1.5/6 = 0.25.

I Then the test statistic becomes

Z =4.5/6− 1.5/6

0.289= 1.73.

I The normal table gives this a p-value of 0.0418, not too farfrom the exact.

Page 27: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

Continuity correction

Here you can see why we need to compute the area from 4.5upwards and from 1.5 downwards.

Page 28: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

Discussion

There are several problems with the median test.There are only up to n/2 + 1 different outcomes. It considers howmany of each group are above and below the median, but not byhow much.It is therefore certainly less powerful than it could be.We will now see the rank-sum test.The rank-sum test is always preferred.But the median test allows one to see the essence ofnon-parametric tests.

Page 29: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

Tests for Paired Data

I As with the t-test in the case that the data is naturallystructured in pairs we can create more powerful tests bytaking this into account.

I Suppose we are given (x1, y1), . . . , (xn, yn) and want to testH0 : x and y come from the same distribution.

I We disregard the exact numbers, and only keep track ofwhether xi > yi or not.

I This brings us first to the sign test.

Page 30: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

The Sign Test

We can recast the null in the following formH0: the proportion of xi > yi should be p0 = 0.5.

I If the data came from the same distribution then it is equallylikely that xi > yi and that xi ≤ yi.

I So count the proportion of times that xi > yi. Under the nullshould be 1/2.

I Suppose that this proportion is p. Then the single samplez-test is given by

Z =p− 0.5

SE, where SE =

√1

np(1− p),

Page 31: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

Example

Unaffected Schizophrenic Sign

1.94 1.27 +1.44 1.63 -1.56 1.47 +1.58 1.39 +2.06 1.93 +1.66 1.26 +1.75 1.71 +1.77 1.67 +1.78 1.28 +1.92 1.85 +1.25 1.02 +1.93 1.34 +2.04 2.02 +1.62 1.59 +2.08 1.97 +

I Recall the study where thevolume of the hippocampuswas measured in identicaltwins one of which suffersfrom schizophrenia.

I We do a single sampleZ-test: under the null

Z :=p− p0

SE≈ N(0, 1),where

SE =√p0(1− p0)/15 = 0.129.

Page 32: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

Example

I 14+ out of 15,

p =14

15= 0.933.

I Then

Z =p− p0

SE=

0.933− 0.5

0.129= 3.36.

I Comparing with the 1.96 critical value for at the 0.05 level, wereject the null hypothesis.

Page 33: Lecture 12: Non-Parametric Testsmassa/Lecture 12.pdfLecture 12: Non-Parametric Tests S. Massa, Department of Statistics, University of Oxford 27 January 2017 Motivation I Comparing

Recap I

I When we don’t think that the assumptions of parametrictests, especially normality, are satisfied we are better offconducting non-parametric tests.

I For comparing the distributions of independent samples wehave the

Median test

Rank-Sum test (Mann-Whitney)

I For paired samples we have

Sign test

Wilcoxon signed rank test

I In practice the median and sign tests should be avoided, asthe rank-sum and Wilcoxon tests are more powerful.