Upload
khon-sam
View
220
Download
0
Embed Size (px)
Citation preview
Nonparametric tests I
Back to basics
Lecture Outline
• What is a nonparametric test? • Rank tests, distribution free tests and
nonparametric tests• Which type of test to use
MTB > dotplot 'Male' 'Female';SUBC> same. . : . . . . . . :: :..:::.. :..:: :... .:.. .. . : . .---+---------+---------+---------+---------+---------+---MALE ..: . : : : . .: ::::::.::.:. ::.: : . : . .---+---------+---------+---------+---------+---------+---FEMALE 0.32 0.48 0.64 0.80 0.96 1.12
MTB > dotplot 'Male' 'Female';SUBC> same. . : . . . . . . :: :..:::.. :..:: :... .:.. .. . : . .---+---------+---------+---------+---------+---------+---MALE ..: . : : : . .: ::::::.::.:. ::.: : . : . .---+---------+---------+---------+---------+---------+---FEMALE 0.32 0.48 0.64 0.80 0.96 1.12MTB > desc 'Male' 'Female’
Variable N Mean Median TrMean StDev SEMeanMALE 50 0.5908 0.5600 0.5770 0.1979 0.0280FEMALE 50 0.5180 0.4950 0.5102 0.1315 0.0186
Variable Min Max Q1 Q3MALE 0.2900 1.1300 0.4275 0.7150FEMALE 0.3200 0.8500 0.4100 0.6125
Lecture Outline
• What is a nonparametric test? – What is a parameter?– What are examples of non-parametric
tests?• Rank tests, distribution free tests and
nonparametric tests• Which type of test to use
Parameters
• are central to inference in GLM and ANOVA
• and represent assumptions about the underlying processes
LET K1=4.7 # Group 1 mean minus grand meanLET K2=-2.5 # Group 2 mean minus grand meanLET K3=10.4 # The grand meanLET K4=1.9 # Standard deviation of the error
RANDOM 30 'Error'LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error'
LET K1=4.7 # Group 1 mean minus grand meanLET K2=-2.5 # Group 2 mean minus grand meanLET K3=10.4 # The grand meanLET K4=1.9 # Standard deviation of the error
RANDOM 30 'Error'LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error'
Fitted value = +
Group1 1
2 2
3 -1-2
Error has Normal Distribution with zero mean and standard deviation
LET K1=4.7 # Group 1 mean minus grand meanLET K2=-2.5 # Group 2 mean minus grand meanLET K3=10.4 # The grand meanLET K4=1.9 # Standard deviation of the error
RANDOM 30 'Error'LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error'
Fitted value = +
Group1 1
2 2
3 -1-2
Error has Normal Distribution with zero mean and standard deviation
Parameters
• are central to inference in GLM and ANOVA
• but represent assumptions about the underlying processes
Parameters
• are central to inference in GLM and ANOVA
• but represent assumptions about the underlying processes
• can be done without in some simple situations
Parameters
• are central to inference in GLM and ANOVA
• but represent assumptions about the underlying processes
• can be done without in some simple situations – BUT HOW?
Rnk Wt Sex1 0.29 12 0.32 23 0.34 14 0.34 25 0.34 26 0.36 17 0.36 18 0.37 19 0.37 110 0.37 111 0.37 212 0.37 213 0.38 114 0.38 115 0.38 216 0.38 217 0.39 218 0.40 219 0.40 220 0.40 221 0.41 122 0.41 123 0.41 224 0.41 225 0.41 2
26 0.41 227 0.42 128 0.43 129 0.43 230 0.43 231 0.45 132 0.45 233 0.45 234 0.45 235 0.46 236 0.47 137 0.47 138 0.48 139 0.48 140 0.48 241 0.48 242 0.49 243 0.49 244 0.50 145 0.50 146 0.50 147 0.50 248 0.50 249 0.51 150 0.51 2
51 0.52 152 0.52 253 0.52 254 0.53 255 0.53 256 0.55 257 0.56 158 0.56 159 0.56 160 0.57 161 0.58 262 0.58 263 0.59 164 0.59 265 0.59 266 0.60 167 0.61 168 0.61 269 0.62 170 0.62 171 0.62 272 0.62 273 0.62 274 0.63 175 0.63 2
76 0.65 177 0.66 178 0.67 179 0.67 280 0.67 281 0.67 282 0.68 183 0.71 184 0.72 285 0.73 186 0.75 187 0.75 188 0.77 189 0.78 190 0.78 291 0.78 292 0.82 293 0.83 194 0.85 195 0.85 296 0.88 197 0.98 198 0.98 199 1.05 1
100 1.13 1
Rnk Wt Sex1 0.29 12 0.32 23 0.34 14 0.34 25 0.34 26 0.36 17 0.36 18 0.37 19 0.37 110 0.37 111 0.37 212 0.37 213 0.38 114 0.38 115 0.38 216 0.38 217 0.39 218 0.40 219 0.40 220 0.40 221 0.41 122 0.41 123 0.41 224 0.41 225 0.41 2
26 0.41 227 0.42 128 0.43 129 0.43 230 0.43 231 0.45 132 0.45 233 0.45 234 0.45 235 0.46 236 0.47 137 0.47 138 0.48 139 0.48 140 0.48 241 0.48 242 0.49 243 0.49 244 0.50 145 0.50 146 0.50 147 0.50 248 0.50 249 0.51 150 0.51 2
51 0.52 152 0.52 253 0.52 254 0.53 255 0.53 256 0.55 257 0.56 158 0.56 159 0.56 160 0.57 161 0.58 262 0.58 263 0.59 164 0.59 265 0.59 266 0.60 167 0.61 168 0.61 269 0.62 170 0.62 171 0.62 272 0.62 273 0.62 274 0.63 175 0.63 2
76 0.65 177 0.66 178 0.67 179 0.67 280 0.67 281 0.67 282 0.68 183 0.71 184 0.72 285 0.73 186 0.75 187 0.75 188 0.77 189 0.78 190 0.78 291 0.78 292 0.82 293 0.83 194 0.85 195 0.85 296 0.88 197 0.98 198 0.98 199 1.05 1
100 1.13 1
Remember ties
1009080706050403020100
140
120
100
80
60
40
20
0
Mean Rank
1009080706050403020100
140
120
100
80
60
40
20
0
The ‘Male’ mean rank = 55.26The ‘Female’ mean rank = 45.74
Mean Rank
MTB > mann-whitney male female
MTB > mann-whitney male female
Mann-Whitney Test and CI: MALE, FEMALE
MTB > mann-whitney male female
Mann-Whitney Test and CI: MALE, FEMALE
MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950
MTB > mann-whitney male female
Mann-Whitney Test and CI: MALE, FEMALE
MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950Point estimate for ETA1-ETA2 is 0.050095.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)
MTB > mann-whitney male female
Mann-Whitney Test and CI: MALE, FEMALE
MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950Point estimate for ETA1-ETA2 is 0.050095.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)W = 2763.0
MTB > mann-whitney male female
Mann-Whitney Test and CI: MALE, FEMALE
MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950Point estimate for ETA1-ETA2 is 0.050095.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)W = 2763.0
Sum of ranks of 2763 corresponds to a mean rank of 2763/50 = 55.26
1009080706050403020100
140
120
100
80
60
40
20
0
The ‘Male’ mean rank = 55.26The ‘Female’ mean rank = 45.74
Mean Rank
1009080706050403020100
140
120
100
80
60
40
20
0
The ‘Male’ mean rank = 55.26The ‘Female’ mean rank = 45.74
Mean Rank
MTB > mann-whitney male female
Mann-Whitney Test and CI: MALE, FEMALE
MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950Point estimate for ETA1-ETA2 is 0.050095.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)W = 2763.0Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016
MTB > mann-whitney male female
Mann-Whitney Test and CI: MALE, FEMALE
MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950Point estimate for ETA1-ETA2 is 0.050095.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)W = 2763.0Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016The test is significant at 0.1014 (adjusted for ties)
MTB > mann-whitney male female
Mann-Whitney Test and CI: MALE, FEMALE
MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950Point estimate for ETA1-ETA2 is 0.050095.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)W = 2763.0Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016The test is significant at 0.1014 (adjusted for ties)
Cannot reject at alpha = 0.05
MTB > mann-whitney male female
Mann-Whitney Test and CI: MALE, FEMALE
MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950Point estimate for ETA1-ETA2 is 0.050095.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)W = 2763.0Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016The test is significant at 0.1014 (adjusted for ties)
Cannot reject at alpha = 0.05
MTB > mann-whitney male female
Mann-Whitney Test and CI: MALE, FEMALE
MALE N = 50 Median = 0.5600FEMALE N = 50 Median = 0.4950Point estimate for ETA1-ETA2 is 0.050095.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)W = 2763.0Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016The test is significant at 0.1014 (adjusted for ties)
Cannot reject at alpha = 0.05
The null hypothesis is better expressed as “the distributions of male and female weights are the same”.
Parameters
• are central to inference in GLM and ANOVA
• but represent assumptions about the underlying processes
• can be done without in some simple situations
Nonparametric vs Parametric
Nonparametric vs Parametric
• Sign Test • One-sample t-test
Nonparametric vs Parametric
• Sign Test • Mann-Whitney Test
• One-sample t-test• Two-sample t-test
Nonparametric vs Parametric
• Sign Test • Mann-Whitney Test • Spearman Rank Test
• One-sample t-test• Two-sample t-test• Correlation/Regression
Nonparametric vs Parametric
• Sign Test • Mann-Whitney Test • Spearman Rank Test • Kruskal-Wallis Test
• One-sample t-test• Two-sample t-test• Correlation/Regression• One-way ANOVA
Nonparametric vs Parametric
• Sign Test • Mann-Whitney Test • Spearman Rank Test • Kruskal-Wallis Test• Friedman Test
• One-sample t-test• Two-sample t-test• Correlation/Regression• One-way ANOVA• One-way blocked ANOVA
Lecture Outline
• What is a nonparametric test? • Rank tests, distribution free tests and
nonparametric tests• Which type of test to use
A rose by any other name..
• Non-parametric tests lack parameters• Rank tests start by ranking the data• Distribution-free tests don’t assume a
Normal distribution (or any other)
These are mainly but not completely overlapping sets of tests (and some
are scale-invariant too).
Lecture Outline
• What is a nonparametric test? • Rank tests, distribution free tests and
nonparametric tests• Which type of test to use
Fewer assumptions but...• still some assumptions (including independence)• limited range of situations
– no more than 2 x-variables– can’t mix continuous and categorical x-variables
• provide p-values but estimation is dodgy• loss of efficiency if parametric assumptions are upheld• there is a grand scheme for parametric statistics
(GLM) but a lot of separate strange names for nonparametrics
When is there a choice?
• when there is a non-parametric test– fewer than two or three variables
altogether• and prediction is not required
How to choose:
• If the assumptions of parametric test are upheld, use it – on grounds of efficiency
• If not upheld, consider fixing the assumptions (e.g. by transforming the data, as in the practical)
• If assumptions not fixable, use nonparametric test
MTB > dotplot 'LogM' 'LogF';SUBC> same.
. . . . . ::: :.. . :::.. :..::.:....: : : . : . . +---------+---------+---------+---------+---------+-------LogM .: . : . . . : ::.:: : :. ::.::. ::.:. : . : .. +---------+---------+---------+---------+---------+-------LogF -1.25 -1.00 -0.75 -0.50 -0.25 0.00
MTB > dotplot 'LogM' 'LogF';SUBC> same.
. . . . . ::: :.. . :::.. :..::.:....: : : . : . . +---------+---------+---------+---------+---------+-------LogM .: . : . . . : ::.:: : :. ::.::. ::.:. : . : .. +---------+---------+---------+---------+---------+-------LogF -1.25 -1.00 -0.75 -0.50 -0.25 0.00
MTB > desc 'LogM' 'LogF'
Variable N Mean Median TrMean StDev SEMeanLogM 50 -0.5786 -0.5798 -0.5850 0.3248 0.0459LogF 50 -0.6878 -0.7032 -0.6928 0.2453 0.0347
Variable Min Max Q1 Q3LogM -1.2379 0.1222 -0.8499 -0.3355LogF -1.1394 -0.1625 -0.8916 -0.4902
Lecture Outline
• What is a nonparametric test? • Rank tests, distribution free tests and
nonparametric tests• Which type of test to use
Last remarks
• Nonparametric tests are an opportunity to revise the basic ideas of statistical inference
• They are sometimes useful in biology• They are often used in biology• NEXT WEEK: more nonparametrics,
including confidence intervals and randomisation tests. READ the handout