Non Parametric test

Nonparametric tests I

Back to basics

Lecture Outline

• What is a nonparametric test? • Rank tests, distribution free tests and

nonparametric tests• Which type of test to use

MTB > dotplot 'Male' 'Female';SUBC> same. . : . . . . . . :: :..:::.. :..:: :... .:.. .. . : . .---+---------+---------+---------+---------+---------+---MALE ..: . : : : . .: ::::::.::.:. ::.: : . : . .---+---------+---------+---------+---------+---------+---FEMALE 0.32 0.48 0.64 0.80 0.96 1.12

MTB > dotplot 'Male' 'Female';SUBC> same. . : . . . . . . :: :..:::.. :..:: :... .:.. .. . : . .---+---------+---------+---------+---------+---------+---MALE ..: . : : : . .: ::::::.::.:. ::.: : . : . .---+---------+---------+---------+---------+---------+---FEMALE 0.32 0.48 0.64 0.80 0.96 1.12MTB > desc 'Male' 'Female’

Variable N Mean Median TrMean StDev SEMeanMALE 50 0.5908 0.5600 0.5770 0.1979 0.0280FEMALE 50 0.5180 0.4950 0.5102 0.1315 0.0186

Variable Min Max Q1 Q3MALE 0.2900 1.1300 0.4275 0.7150FEMALE 0.3200 0.8500 0.4100 0.6125

Lecture Outline

• What is a nonparametric test? – What is a parameter?– What are examples of non-parametric

tests?• Rank tests, distribution free tests and


Parameters

• are central to inference in GLM and ANOVA

• and represent assumptions about the underlying processes

LET K1=4.7 # Group 1 mean minus grand meanLET K2=-2.5 # Group 2 mean minus grand meanLET K3=10.4 # The grand meanLET K4=1.9 # Standard deviation of the error

RANDOM 30 'Error'LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error'



Fitted value = +

Group1 1

2 2

3 -1-2

Error has Normal Distribution with zero mean and standard deviation



Fitted value = +

Group1 1

2 2

3 -1-2

Error has Normal Distribution with zero mean and standard deviation

Parameters


• but represent assumptions about the underlying processes

Parameters



• can be done without in some simple situations

Parameters



• can be done without in some simple situations – BUT HOW?

Rnk Wt Sex1 0.29 12 0.32 23 0.34 14 0.34 25 0.34 26 0.36 17 0.36 18 0.37 19 0.37 110 0.37 111 0.37 212 0.37 213 0.38 114 0.38 115 0.38 216 0.38 217 0.39 218 0.40 219 0.40 220 0.40 221 0.41 122 0.41 123 0.41 224 0.41 225 0.41 2

26 0.41 227 0.42 128 0.43 129 0.43 230 0.43 231 0.45 132 0.45 233 0.45 234 0.45 235 0.46 236 0.47 137 0.47 138 0.48 139 0.48 140 0.48 241 0.48 242 0.49 243 0.49 244 0.50 145 0.50 146 0.50 147 0.50 248 0.50 249 0.51 150 0.51 2

51 0.52 152 0.52 253 0.52 254 0.53 255 0.53 256 0.55 257 0.56 158 0.56 159 0.56 160 0.57 161 0.58 262 0.58 263 0.59 164 0.59 265 0.59 266 0.60 167 0.61 168 0.61 269 0.62 170 0.62 171 0.62 272 0.62 273 0.62 274 0.63 175 0.63 2

76 0.65 177 0.66 178 0.67 179 0.67 280 0.67 281 0.67 282 0.68 183 0.71 184 0.72 285 0.73 186 0.75 187 0.75 188 0.77 189 0.78 190 0.78 291 0.78 292 0.82 293 0.83 194 0.85 195 0.85 296 0.88 197 0.98 198 0.98 199 1.05 1

100 1.13 1

Rnk Wt Sex1 0.29 12 0.32 23 0.34 14 0.34 25 0.34 26 0.36 17 0.36 18 0.37 19 0.37 110 0.37 111 0.37 212 0.37 213 0.38 114 0.38 115 0.38 216 0.38 217 0.39 218 0.40 219 0.40 220 0.40 221 0.41 122 0.41 123 0.41 224 0.41 225 0.41 2

26 0.41 227 0.42 128 0.43 129 0.43 230 0.43 231 0.45 132 0.45 233 0.45 234 0.45 235 0.46 236 0.47 137 0.47 138 0.48 139 0.48 140 0.48 241 0.48 242 0.49 243 0.49 244 0.50 145 0.50 146 0.50 147 0.50 248 0.50 249 0.51 150 0.51 2

51 0.52 152 0.52 253 0.52 254 0.53 255 0.53 256 0.55 257 0.56 158 0.56 159 0.56 160 0.57 161 0.58 262 0.58 263 0.59 164 0.59 265 0.59 266 0.60 167 0.61 168 0.61 269 0.62 170 0.62 171 0.62 272 0.62 273 0.62 274 0.63 175 0.63 2

76 0.65 177 0.66 178 0.67 179 0.67 280 0.67 281 0.67 282 0.68 183 0.71 184 0.72 285 0.73 186 0.75 187 0.75 188 0.77 189 0.78 190 0.78 291 0.78 292 0.82 293 0.83 194 0.85 195 0.85 296 0.88 197 0.98 198 0.98 199 1.05 1

100 1.13 1

Remember ties

1009080706050403020100

140

120

100

80

60

40

20

0

Mean Rank

1009080706050403020100

140

120

100

80

60

40

20

0

The ‘Male’ mean rank = 55.26The ‘Female’ mean rank = 45.74

Mean Rank

MTB > mann-whitney male female


Mann-Whitney Test and CI: MALE, FEMALE



MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950



MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950Point estimate for ETA1-ETA2 is 0.050095.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)



MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950Point estimate for ETA1-ETA2 is 0.050095.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)W = 2763.0



MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950Point estimate for ETA1-ETA2 is 0.050095.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)W = 2763.0

Sum of ranks of 2763 corresponds to a mean rank of 2763/50 = 55.26

1009080706050403020100

140

120

100

80

60

40

20

0


Mean Rank

1009080706050403020100

140

120

100

80

60

40

20

0


Mean Rank



MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950Point estimate for ETA1-ETA2 is 0.050095.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)W = 2763.0Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016



MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950Point estimate for ETA1-ETA2 is 0.050095.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)W = 2763.0Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016The test is significant at 0.1014 (adjusted for ties)




Cannot reject at alpha = 0.05









The null hypothesis is better expressed as “the distributions of male and female weights are the same”.

Parameters



• can be done without in some simple situations

Nonparametric vs Parametric


• Sign Test • One-sample t-test


• Sign Test • Mann-Whitney Test

• One-sample t-test• Two-sample t-test


• Sign Test • Mann-Whitney Test • Spearman Rank Test

• One-sample t-test• Two-sample t-test• Correlation/Regression


• Sign Test • Mann-Whitney Test • Spearman Rank Test • Kruskal-Wallis Test

• One-sample t-test• Two-sample t-test• Correlation/Regression• One-way ANOVA


• Sign Test • Mann-Whitney Test • Spearman Rank Test • Kruskal-Wallis Test• Friedman Test

• One-sample t-test• Two-sample t-test• Correlation/Regression• One-way ANOVA• One-way blocked ANOVA

Lecture Outline



A rose by any other name..

• Non-parametric tests lack parameters• Rank tests start by ranking the data• Distribution-free tests don’t assume a

Normal distribution (or any other)

These are mainly but not completely overlapping sets of tests (and some

are scale-invariant too).

Lecture Outline



Fewer assumptions but...• still some assumptions (including independence)• limited range of situations

– no more than 2 x-variables– can’t mix continuous and categorical x-variables

• provide p-values but estimation is dodgy• loss of efficiency if parametric assumptions are upheld• there is a grand scheme for parametric statistics

(GLM) but a lot of separate strange names for nonparametrics

When is there a choice?

• when there is a non-parametric test– fewer than two or three variables

altogether• and prediction is not required

How to choose:

• If the assumptions of parametric test are upheld, use it – on grounds of efficiency

• If not upheld, consider fixing the assumptions (e.g. by transforming the data, as in the practical)

• If assumptions not fixable, use nonparametric test

MTB > dotplot 'LogM' 'LogF';SUBC> same.

. . . . . ::: :.. . :::.. :..::.:....: : : . : . . +---------+---------+---------+---------+---------+-------LogM .: . : . . . : ::.:: : :. ::.::. ::.:. : . : .. +---------+---------+---------+---------+---------+-------LogF -1.25 -1.00 -0.75 -0.50 -0.25 0.00

MTB > dotplot 'LogM' 'LogF';SUBC> same.

. . . . . ::: :.. . :::.. :..::.:....: : : . : . . +---------+---------+---------+---------+---------+-------LogM .: . : . . . : ::.:: : :. ::.::. ::.:. : . : .. +---------+---------+---------+---------+---------+-------LogF -1.25 -1.00 -0.75 -0.50 -0.25 0.00

MTB > desc 'LogM' 'LogF'

Variable N Mean Median TrMean StDev SEMeanLogM 50 -0.5786 -0.5798 -0.5850 0.3248 0.0459LogF 50 -0.6878 -0.7032 -0.6928 0.2453 0.0347

Variable Min Max Q1 Q3LogM -1.2379 0.1222 -0.8499 -0.3355LogF -1.1394 -0.1625 -0.8916 -0.4902

Lecture Outline



Last remarks

• Nonparametric tests are an opportunity to revise the basic ideas of statistical inference

• They are sometimes useful in biology• They are often used in biology• NEXT WEEK: more nonparametrics,

including confidence intervals and randomisation tests. READ the handout

Documents

Non Parametric test