114

Non-parametric statistics

  • Upload
    jeneva

  • View
    66

  • Download
    0

Embed Size (px)

DESCRIPTION

Non-parametric statistics. Today´s programme. Non-parametric tests (examples) Some repetition of key concepts (time permitting) Free experiment status Exercise Group tasks on non-parametric tests (worked examples of will be provided!) Free experiment supervision/help. Updates. - PowerPoint PPT Presentation

Citation preview

Page 1: Non-parametric statistics
Page 2: Non-parametric statistics

Non-parametric tests (examples) Some repetition of key concepts (time

permitting) Free experiment status

Exercise Group tasks on non-parametric tests

(worked examples of will be provided!) Free experiment supervision/help

Page 3: Non-parametric statistics

Did you get the compendium?

Remember: For week 12, regression and correlation, 100+ pages in compendium: No need to read all of it – read the introductions to each chapter, get the feel for the first simple examples – multiple regression and –correlation is for future reference

Page 4: Non-parametric statistics
Page 5: Non-parametric statistics

Two types of statistical test: Parametric tests:

Based on assumption that the data have certain characteristics or "parameters":

Results are only valid if:

(a) the data are normally distributed; (b) the data show homogeneity of variance; (c) the data are measurements on an interval or ratio

scale.

0

5

10

15

20

25

1 2

Group 1: M = 8.19 (SD = 1.33),

Group 2: M = 11.46 (SD = 9.18)

Page 6: Non-parametric statistics

Nonparametric tests Make no assumptions about the data's characteristics.

Use if any of the three properties below are true:

(a) the data are not normally distributed (e.g. skewed);

(b) the data show in-homogeneity of variance; (c) the data are measurements on an ordinal scale

(ranks).

Non-parametric tests are used when we do not have ratio/interval data, or when the assumptions of parametric tests are broken

Page 7: Non-parametric statistics

Just like parametric tests, which non-parametric test to use depends on the experimental design (repeated measures or within groups), and the number of/level of Ivs

Non-parametric tests are minimally affected by outliers, because scores are converted to ranks

Page 8: Non-parametric statistics

Examples of parametric tests and their non-parametric equivalents:

Parametric test: Non-parametric counterpart: Pearson correlation Spearman's correlation

(No equivalent test) Chi-Square test

Independent-means t-test Mann-Whitney test

Dependent-means t-test Wilcoxon test

One-way Independent Measures Analysis of Variance (ANOVA) Kruskal-Wallis test

One-way Repeated-Measures ANOVA Friedman's test

Page 9: Non-parametric statistics

Non-parametric tests make few assumptions about the distribution of the data being analyzed

They get around this by not using raw scores, but by ranking them: The lowest score get rank 1, the next lowest rank 2, etc. Different from test to test how ranking is carried out, but same

principle

The analysis is carried out on the ranks, not the raw data

Ranking data means we lose information – we do not know the distance between the ranks

This means that non-par tests are less powerful than par tests, and that non-par tests are less likely to discover an effect in our data than

par tests (increased chance of type II error)

Page 10: Non-parametric statistics
Page 11: Non-parametric statistics

This is the non-parametric equivalent of the independent t-test

Used when you have two conditions, each performed by a separate group of subjects.

Each subject produces one score. Tests whether there a statistically significant difference between the two groups.

Page 12: Non-parametric statistics

Example: Difference between men and dogs

We count the number of ”doglike” behaviors in a group of 20 men and 20 dogs over 24 hours

The result is a table with 2 groups and their number of doglike behaviors

We run a Kolmogorv-Smirnov test (Vodka test) to see if data are normally distributed. The test is significant though (p<.0.009), so we need a non-parametric test to analyze the data

Page 13: Non-parametric statistics

The MN test looks for differences in the ranked positions of scores in the two groups (samples)

Example ...

Page 14: Non-parametric statistics

Mann-Whitney test, step-by-step:

Does it make any difference to students' comprehension of statistics whether the lectures are in English or in Klingon?

Group 1: Statistics lectures in English. Group 2: Statistics lectures in Serbo-Croat

DV: Lecturer intelligibility ratings by students (0 = "unintelligible", 100 = "highly intelligible").

Ratings - So Mann-Whitney is appropriate.

Page 15: Non-parametric statistics

Step 1:Rank all the scores together, regardless of group.

English group (raw scores)

English group (ranks)

Serbo-croat group (raw scores)

Serbo-croat group (ranks)

18 17 17 1515 10.5 13 817 15 12 5.513 8 16 12.511 3.5 10 1.516 12.5 15 10.510 1.5 11 3.517 15 13 8

12 5.5Mean:S.D.:

14.632.97

Mean:S.D.:

13.222.33

Median: 15.5 Median: 13

Page 16: Non-parametric statistics

How to Rank scores: (a) Lowest score gets rank of “1”; next lowest gets “2”; and so

on.

(b) If two or more scores with the same value are “tied”. (i) Give each tied score the rank it would have had, had it been different from the other scores.(ii) Add the ranks for the tied scores, and divide by the number of tied scores. Each of the ties gets this average rank.(iii) The next score after the set of ties gets the rank it would have obtained, had there been no tied scores.

Example: raw score: 6 34 34 48 “original” rank: 1 2 3 4

“actual” rank: 1 2.5 2.5 4

Page 17: Non-parametric statistics

Formula for Mann-Whitney Test statistic: U

Nx (Nx + 1) U = N1 * N2 + ---------------- - Tx 2

T1 and T2 = Sum of ranks for groups 1 and 2 N1 and N2 = Number of subjects in groups 1 and 2 Tx = largest of the two rank totals Nx = Number of subjects in Tx-group

Page 18: Non-parametric statistics

Step 2: Add up the ranks for group 1, to get T1. Here, T1 = 83. Add up the ranks for group 2, to get T2. Here, T2 = 70.

Step 3: N1 is the number of subjects in group 1; N2 is the

number of subjects in group 2. Here, N1 = 8 and N2 = 9.

Step 4: Call the larger of these two rank totals Tx. Here, Tx = 83. Nx is the number of subjects in this group; here, Nx = 8.

Page 19: Non-parametric statistics

Step 5: Find U:

Nx (Nx + 1) U = N1 * N2 + ---------------- - Tx 2

In our example:

8 * (8 + 1) U = 8 * 9 + ---------------- - 83 2

U = 72 + 36 - 83 = 25

Page 20: Non-parametric statistics

If there are unequal numbers of subjects - as in the present case - calculate U for both rank totals and then use the smaller U.

In the present example, for T1, U = 25, and for T2, U = 47. Therefore, use 25 as U.

Step 6: Look up the critical value of U, (in a table), taking into

account N1 and N2. If our obtained U is smaller than the critical value of U, we reject the null hypothesis and conclude that our two groups do differ significantly.

Page 21: Non-parametric statistics

Here, the critical value of U for N1 = 8 and N2 = 9 is 15. Our obtained U of 25 is larger than this, and so we conclude that there is no significant difference between our two groups.

Conclusion: Ratings of lecturer intelligibility are unaffected by whether the lectures are given in English or in Serbo-Croat.

N 2

N 1 5 6 7 82 3 5 6 7 83 5 6 8 10 115 6 8 10 12 146 8 10 13 15 177 10 12 15 17 208 11 14 17 20 23

910

5678

9 10

Page 22: Non-parametric statistics

Mann-Whitney using SPSS - procedure:

Page 23: Non-parametric statistics

Mann-Whitney using SPSS - procedure:

Page 24: Non-parametric statistics

Mann-Whitney using SPSS - output:

Ranks

8 10.38 83.009 7.78 70.00

17

LanguageEnglishSerbo-croatTotal

IntelligibilityN Mean Rank Sum of Ranks

Test Statisticsb

25.00070.000-1.067

.286

.321a

Mann-Whitney UWilcoxon WZAsymp. Sig. (2-tailed)Exact Sig. [2*(1-tailedSig.)]

Intelligibility

Not corrected for ties.a.

Grouping Variable: Languageb.

SPSS gives us two boxes as the output:

Sum of ranks

The U statistic

Significance valueof the test

Can halve this ifOne-way hypothesis

Page 25: Non-parametric statistics
Page 26: Non-parametric statistics

The Wilcoxon test:

Used when you have two conditions, both performed by the same subjects.

Each subject produces two scores, one for each condition.

Tests whether there a statistically significant difference between the two conditions.

Page 27: Non-parametric statistics

Wilcoxon test, step-by-step:

Does background music affect the mood of factory workers?

Eight workers: Each tested twice.

Condition A: Background music. Condition B: Silence.

DV: Worker's mood rating (0 = "extremely miserable", 100 = "euphoric").

Ratings data, so use Wilcoxon test.

Page 28: Non-parametric statistics

Step 1:Find the difference between each pair of scores, keeping track of the sign (+ or -) of the difference - different from a Mann Whitney U test, where the data themselves are ranked!Step 2:Rank the differences, ignoring their sign. Lowest = 1.Tied scores dealt with as before.Ignore zero difference-scores.

Worker: Silence Music Difference Rank1 15 10 5 4.52 12 14 -2 2.53 11 11 0 Ignore4 16 11 5 4.55 14 4 10 66 13 1 12 77 11 12 -1 18 8 10 -2 2.5

Mean: 12.5, SD: 2.56 Mean: 9.13, SD: 4.36Median: 12.5 Median: 10.5

Page 29: Non-parametric statistics

Step 3: Add together the positive-signed ranks. = 22. Add together the negative-signed ranks. = 6.

Step 4: "W" is the smaller sum of ranks; W = 6. N is the number of differences, omitting zero

differences: N = 8 - 1 = 7.

Step 5: Use table of critical W-values to find the critical value of

W, for your N. Your obtained W has to be smaller than this critical value, for it to be statistically significant.

Page 30: Non-parametric statistics

The critical value of W (for an N of 7) is 2. Our obtained W of 6 is bigger than this. Our two conditions are not significantly different.

Conclusion: Workers' mood appears to be unaffected by presence or absence of background music.

One Tailed Significance levels: 0.025 0.01 0.005 Two Tailed significance levels:

N 0.05 0.02 0.01 6 0 - - 7 2 0 - 8 4 2 0 9 6 3 2

10 8 5 3

Page 31: Non-parametric statistics

Wilcoxon using SPSS - procedure:

Page 32: Non-parametric statistics

Wilcoxon using SPSS - procedure:

Page 33: Non-parametric statistics

Wilcoxon using SPSS - output:

Ranks

4a 5.50 22.003b 2.00 6.001c

8

Negative RanksPositive RanksTiesTotal

silence - musicN Mean Rank Sum of Ranks

silence < musica.

silence > musicb.

silence = musicc.

Test Statisticsb

-1.357a

.175ZAsymp. Sig. (2-tailed)

silence -music

Based on positive ranks.a.

Wilcoxon Signed Ranks Testb.

Significance value

What negative ranks refer to: Silence less score than w. musicWhat positive ranks refer to: Silence higher score than w. music

Ties = no changes in score w./wo. music

As for MN-test, z-scorebecomes more accurate with higher sample size

Number of SD´s from mean

Page 34: Non-parametric statistics
Page 35: Non-parametric statistics

Non-parametric tests for comparing three or more groups or

conditions:

Kruskal-Wallis test: Similar to the Mann-Whitney test, except that it enables

you to compare three or more groups rather than just two.

Different subjects are used for each group.

Friedman's Test (Friedman´s ANOVA): Similar to the Wilcoxon test, except that you can use it

with three or more conditions (for one group). Each subject does all of the experimental conditions.

Page 36: Non-parametric statistics

One IV, with multiple levels

Levels can differ:

(a) qualitatively/categorically - e.g. effects of managerial style (laissex-faire, authoritarian,

egalitarian) on worker satisfaction. effects of mood (happy, sad, neutral) on memory. effects of location (Scotland, England or Wales) on happiness ratings.

(b) quantitatively - e.g. effects of age (20 vs 40 vs 60 year olds) on optimism ratings. effects of study time (1, 5 or 10 minutes) before being tested on

recall of faces. effects of class size on 10 year-olds' literacy. effects of temperature (60, 100 and 120 deg.) on mood.

Page 37: Non-parametric statistics

Why have experiments with more than two levels of the IV?

(1) Increases generality of the conclusions: E.g. comparing young (20) and old (70) subjects tells you nothing

about the behaviour of intermediate age-groups.

(2) Economy: Getting subjects is expensive - may as well get as much data as

possible from them – i.e. use more levels of the IV (or more IVs)

(3) Can look for trends: What are the effects on performance of increasingly large doses

of cannabis (e.g. 100mg, 200mg, 300mg)?

Page 38: Non-parametric statistics
Page 39: Non-parametric statistics

Kruskal-Wallis test, step-by-step:

Does it make any difference to students’ comprehension of statistics whether the lectures are given in English, Serbo-Croat - or Cantonese? (similar case to MN-test, just one more language, i.e. group of people)

Group A – 4 ppl: Lectures in English; Group B – 4 ppl: Lectures in Serbo-Croat; Group C – 4 ppl: Lectures in Cantonese.

DV: student rating of lecturer's intelligibility on 100-point scale ("0" = "incomprehensible").

Ratings - so use a non-parametric test. 3 groups – so KW-test

Page 40: Non-parametric statistics

Step 1: Rank the scores, ignoring which group they belong to. Lowest score gets lowest rank. Tied scores get the average of the ranks they would otherwise

have obtained (note the difference from the Wilcoxon test!)

English (raw score)

English (rank)

Serbo-Croat (raw score)

Serbo-Croat (rank)

Cantonese (raw score)

Cantonese (rank)

20 3.5 25 7.5 19 1.5

27 9 33 10 20 3.5

19 1.5 35 11 25 7.5

23 6 36 12 22 5

Page 41: Non-parametric statistics

N is the total number of subjects;Tc is the rank total for each group;nc is the number of subjects in each group;H is the test statistic

131

12 2

N

nTc

NNH

c

Formula:

Page 42: Non-parametric statistics

Step 2: Find "Tc", the total of the ranks for each

group. Tc1 (the total for the English group) is 20.

Tc2 (for the Serbo-Croat group) is 40.5.

Tc3 (for the Cantonese group) is 17.5.

Page 43: Non-parametric statistics

N is the total number of subjects;Tc is the rank total for each group;nc is the number of subjects in each group.

131

12 2

N

nTc

NNH

c

Step 3: Find H.

Page 44: Non-parametric statistics

12.613362.58613*12

12

62.58645.17

45.40

420 2222

H

nTc

c

131

12 2

N

nTc

NNH

c

)(

Page 45: Non-parametric statistics

Step 4: In KW-test, we use degrees of freedom: Degrees of freedom are the number of groups minus one. d.f. = 3 - 1

= 2.

Step 5: H is statistically significant if it is larger than the critical value of Chi-

Square for this many d.f. [Chi-Square is a test statistic distribution we use]

Here, H is 6.12. This is larger than 5.99, the critical value of Chi-Square for 2 d.f. (SPSS gives us this, no need to look in a table, but we could do it)

So: The three groups differ significantly: The language in which statistics is taught does make a difference to the lecturer's intelligibility.

NB: the test merely tells you that the three groups differ; inspect group medians to decide how they differ.

Page 46: Non-parametric statistics

Using SPSS for the Kruskal-Wallis test:

"1" for "English",

"2" for "Serbo-Croat",

"3" for "Cantonese".

Independent measures-test type: One column gives scores, another column identifies which group each score belongs to.

Scorescolumn

Group column

Page 47: Non-parametric statistics

Using SPSS for the Kruskal-Wallis test:

Analyze > Nonparametric tests > k independent samples

Page 48: Non-parametric statistics

Using SPSS for the Kruskal-Wallis test :

Identify groupsChoose variable

Page 49: Non-parametric statistics

Test Statisticsa,b

6.1902

.045

Chi-SquaredfAsymp. Sig.

intelligibility

Kruskal Wallis Testa.

Grouping Variable: languageb.

Ranks

4 5.004 10.134 4.38

12

languageEnglishSerbo-croatCantoneseTotal

intelligibilityN Mean Rank

Test statistic (H)

DF

Significance

Mean rank values

Page 50: Non-parametric statistics

How do we find out how the four groups differed?

One way is to construct a box-whisker plot – and look at median values

What we really need is some contrasts and post-hoc tests like for ANOVA

Page 51: Non-parametric statistics

One solution is to run series of Mann-Whitney tests, controlling for the build-up of Type I error

Need several MW-tests, each with a 5% chance of a Type I error – when serialling them this chance builds up (language 1 vs. language 2, language 1 vs. 3 etc. ...)

We therefore do a Bonferroni correction – use p<0.05 divided with number of MW-tests conducted

We can get away with only comparing with the control condition – so MN-test for each of the three languages compared to the control group We then see if any differences are significant

Page 52: Non-parametric statistics
Page 53: Non-parametric statistics

Friedman's Test (Friedman´s ANOVA):

Similar to the Wilcoxon test, except that you can use it with three or more conditions (for one group).

Each subject does all of the experimental conditions.

Page 54: Non-parametric statistics

Friedman’s test, step-by-step:

Effects on worker mood of different types of music:

Five workers. Each is tested three times, once under each of the following conditions:

Condition 1: Silence. Condition 2: “Easy-listening” music. Condition 3: Marching-band music.

DV: mood rating ("0" = unhappy, "100" = euphoric). Ratings - so use a non-parametric test.

NB: To avoid practice and fatigue effects, order of presentation of conditions is varied/randomized across subjects.

Page 55: Non-parametric statistics

Silence (raw score)

Silence (ranked score)

Easy (raw score)

Easy (ranked score)

Band (raw score)

Band (ranked score)

Wkr 1: 4 1 5 2 6 3Wkr 2: 2 1 7 2.5 7 2.5Wkr 3: 6 1.5 6 1.5 8 3Wrkr 4: 3 1 7 3 5 2Wrkr 5: 3 1 8 2 9 3

Step 1:Rank each subject's scores individually. Worker 1's scores are 4, 5, 6: these get ranks of 1, 2, 3. Worker 4's scores are 3, 7, 5: these get ranks of 1, 3, 2 .

Page 56: Non-parametric statistics

Step 2:Find the rank total for each condition, using the ranks from all subjects within that condition.

Rank total for ”Silence" condition: 1+1+1.5+1+1 = 5.5. Rank total for “Easy Listening” condition = 11. Rank total for “Marching Band” condition = 13.5.

  Silence (raw score)

Silence (ranked score)

Easy (raw score)

Easy (ranked score)

Band (raw score)

Band (ranked score)

Wkr 1: 4 1 5 2 6 3Wkr 2: 2 1 7 2.5 7 2.5Wkr 3: 6 1.5 6 1.5 8 3Wrkr 4: 3 1 7 3 5 2Wrkr 5: 3 1 8 2 9 3

Page 57: Non-parametric statistics

Step 3:Work out “r2“ (the test statistic name for Friedman´s ANOVA)

13

112 22

CNTcCCN

r

C is the number of conditions (here 3 types of music).N is the number of subjects (here 5 workers).Tc2 is the sum of the squared rank totals for each condition (5.5, 11 and 13.5 respectively for the three types of music).

Page 58: Non-parametric statistics

To get Tc2 :

(1) Square each rank total:5.52 = 30.25. 112 = 121. 13.52 = 182.25.

(2) Add together these squared totals. 30.25 + 121 + 182.25 = 333.5.

13

112 22

CNTcCCN

r

Page 59: Non-parametric statistics

In our example,

7.64535.333435

122

r

131

12 22

CNTcCCN

r

r2 = 6.7

Step 4:Degrees of freedom = number of conditions minus one. DF = 3 - 1 = 2.

Page 60: Non-parametric statistics

Step 5: Assessing the statistical significance of r2 depends on the number

of subjects and the number of groups.

(a) Less than 9 subjects: Use a special table of critical values for r2.

(b) 9 or more subjects: Use a Chi-Square table for critical values. Compare your obtained r2 value to the critical value of Chi-Square

for your number of DF If your obtained r2 is bigger than the critical Chi-Square value,

your conditions are significantly different.

The test only tells you that some kind of difference exists; look at the median score for each condition to see where the difference comes

from.

Page 61: Non-parametric statistics

We have 5 subjects and 3 conditions, so use Friedman table for small sample sizes:

Obtained r2 is 6.7. For N = 5, a r2 value of 6.4 would occur by chance with a probability of 0.039. Our obtained value is bigger than 6.4, so p<0.039.Conclusion: The conditions are significantly different. Music does affect worker mood.

Page 62: Non-parametric statistics

Using SPSS to perform Friedman’ s ANOVA

Repeated measures - each row is one participant's data.

Just like for Wilcoxon and other repeated measures tests

Page 63: Non-parametric statistics

Using SPSS to perform Friedman’ s ANOVA

Analyze > Nonparametric Tests > k related samples

Page 64: Non-parametric statistics

Using SPSS to perform Friedman’ s ANOVA

Analyze > Nonparametric Tests > k related samples

Note: here you select a Kolmogorov-Smirnov test for checking if your sample data are normally distributed

Page 65: Non-parametric statistics

Using SPSS to perform Friedman’ s ANOVA

Drag over variables to be included in the test

Page 66: Non-parametric statistics

Output from Friedman’ s ANOVADescriptive Statistics

5 3.6000 1.51658 2.00 6.005 6.6000 1.14018 5.00 8.005 7.0000 1.58114 5.00 9.00

silenceeasymarching

N Mean Std. Deviation Minimum Maximum

Ranks

1.102.202.70

silenceeasymarching

Mean Rank

Test Statisticsa

57.444

2.024

NChi-SquaredfAsymp. Sig.

Friedman Testa.

NB: slightly different value from 6.7 worked out by hand

Test statistic r2

Significance

Page 67: Non-parametric statistics

Mann-Whitney: Two conditions, two groups, each participant one score

Wilcoxon: Two conditions, one group, each participant two scores (one per condition)

Kruskal-Wallis: 3+ conditions, different people in all conditions, each participant one score

Friedman´s ANOVA: 3+ conditions, one group, each participant 3+ scores

Page 68: Non-parametric statistics

Which nonparametric test?

1. Differences in fear ratings for 3, 5 and 7-year olds in response to sinister noises from under their bed

1. Effects of cheese, brussel sprouts, wine and curry on vividness of a person's dreams

2. Number of people spearing their eardrums after enforced listening to Britney Spears, Beyonce, Robbie Williams and Boyzone

3. Pedestrians rate the aggressiveness of owners of different types of car. Group A rate Micra owners; group B rate 4x4 owners; group C rate Subaru owners; group D rate Mondeo owners.

Consider: How many groups? How many levels of IV/conditions?

Page 69: Non-parametric statistics

1. Differences in fear ratings for 3, 5 and 7-year olds in response to sinister noises from under their bed [3 groups, each one score, 2 conditions - Kruskal-Wallis].

2. Effects of cheese, brussel sprouts, wine and curry on vividness of a person's dreams [one group, each 4 scores, 4 conditions - Friedman´s ANOVA].

3. Number of people spearing their eardrums after enforced listening to Britney Spears, Beyonce, Robbie Williams and Boyzone [one group, each 4 scores, 4 conditions – Friedman´s ANOVA]

4. Pedestrians rate the aggressiveness of owners of different types of car. Group A rate Micra owners; group B rate 4x4 owners; group C rate Subaru owners; group D rate Mondeo owners. [4 groups, each one score – Kruskal-Wallis]

Page 70: Non-parametric statistics
Page 71: Non-parametric statistics

What is a ”population”??? Types of measure Normal distribution Standard Error Effect size

What, again!?!?

Page 72: Non-parametric statistics

The term does not necessarily refer to a set of individuals or items (e.g. cars). Rather, it refers to a state of individuals or items.

Example: After a major earthquake in a city (in which no one died) the actual set of individuals remains the same. But the anxiety level, for example, may change. The anxiety level of the individuals before and after the quake defines them as two populations.

“Population” is an abstract term we use in statistics

Page 73: Non-parametric statistics

My brain is the

size of a walnut!

Page 74: Non-parametric statistics

Scientists are interested in how variables change, and what causes the change

Anything that we can measure and which changes, is called a variable

”Why do people like the color red?” Variable: Preference of the color red

Variables can take many forms, i.e. numbers, abstract values, etc.

Page 75: Non-parametric statistics

Values are measureable Measuring size of variables is important

for comparing results between studies/projects

Different measures provide different quality of data:

Nominal (categorical) dataOrdinal data Interval dataRatio data

Non-parametric

Parametric

Page 76: Non-parametric statistics

Nominal data (categorical, frequency data)

When numbers are used as names

No relationship between the size of the number and what is being measured

Two things with same number are equivalent

Two things with different numbers are different

Page 77: Non-parametric statistics

E.g. Numbers on the shirts of soccer players

Nominal data are only used for frequencies How many times ”3” occurs in a sample How often player 3 scores compared to player

1

Page 78: Non-parametric statistics

Ordinal data

Provides information about the ordering of the data

Does not tell us about the relative differences between values

Page 79: Non-parametric statistics

For example: The order of people who complete a race – from the winner to the last to cross the finish line.

Typical scale for questionnaire data

Page 80: Non-parametric statistics

Interval dataWhen measurements are made on a scale with equal intervals between points on the scale, but the scale has no true zero point.

Page 81: Non-parametric statistics

Examples: Celsius temperature scale: 100 is water's boiling

point; 0 is an arbitrary zero-point (when water freezes), not a true absence of temperature.

Equal intervals represent equal amounts, but ratio statements are meaningless - e.g., 60 deg C is not twice as hot as 30 deg!

-4 -3 -2 -1 0 1 2 3 4

1 2 3 4 5 6 7 8 9

Page 82: Non-parametric statistics

Ratio data

When measurements are made on a scale with equal intervals between points on the scale, and the scale has a true zero point.

e.g. height, weight, time, distance. Measurements of relevance include:

Reaction times, numbers correct answered, error scores in usability tests.

Page 83: Non-parametric statistics

His brain has a

standard error ...

Page 84: Non-parametric statistics

If we take repeated samples, each sample has a mean height, a standard deviation (s), and a shape/distribution.

Due to random fluctuations, each sample is different - from other samples and from the parent population.

These differences are predictable - we can use samples to make inferences about their parent populations.

X 1X 2X 3...

s1

s2

s3

.

.

.

Samples

Page 85: Non-parametric statistics

25X 33X 30X 29X

30X

Page 86: Non-parametric statistics

Often we have more than one sample of a population

This permits the calculation different sample means, whose value will vary, giving us a sampling distribution = 10

M = 8M = 10

M = 9

M = 11

M = 12M = 11

M = 9

M = 10

M = 10Sample Mean

6 7 8 9 10 11 12 13 14

Freq

uenc

y

0

1

2

3

4

Mean = 10SD = 1.22

Sampling distribution

Page 87: Non-parametric statistics

The sampling distribution informs about the behavior of samples from the population

We can calculate SD for the sampling distribution

This is called the Standard Error of the Mean (SE)

Page 88: Non-parametric statistics

SE shows how much variation there is within a set of sample means

Therefore also how likely a specific sample mean is to be erroneous, as an estimate of the true population mean

means of different samples

actual population mean

Page 89: Non-parametric statistics

SE = SD of the sample means distribution

We can estimate SE via one sample

Estimate SE = SD of the sample divided with the square root of the sample size (n)

nx

Page 90: Non-parametric statistics

If the SE is small, our obtained sample mean is more likely to be similar to the true population mean than if the SE is large 

Increasing n reduces the size of the SE A sample mean based on 100 scores is probably closer to the population mean

than a sample mean based on 10 scores (!) Variation between samples decreases as sample size increases –

because extreme scores become less important to the mean

nx

Page 91: Non-parametric statistics

X 2100

2

100.20

Suppose the n = 16 instead of 100

X 216

24

0.50

Page 92: Non-parametric statistics

Almost finished .

..

Page 93: Non-parametric statistics
Page 94: Non-parametric statistics

The Normal curve is a mathematical abstraction which conveniently describes ("models") many frequency distributions of scores in real-life.

Page 95: Non-parametric statistics

length of pickled gherkins:

length of time before someone looks away in a staring contest:

Page 96: Non-parametric statistics

Francis Galton (1876) 'On the height and weight of boys aged 14, in town and country public schools.' Journal of the Anthropological Institute, 5, 174-180:

Page 97: Non-parametric statistics

Francis Galton (1876) 'On the height and weight of boys aged 14, in town and country public schools.' Journal of the Anthropological Institute, 5, 174-180:

Height of 14 year-old children

0

2

4

6

8

10

12

14

16

51-52

53-54

55-56

57-58

59-60

61-62

63-64

65-66

67-68

69-70

height (inches)

freq

uenc

y (%

)countrytown

Page 98: Non-parametric statistics

Properties of the Normal Distribution:

1. It is bell-shaped and asymptotic at the extremes.

Frequencyaxis

Size of score axis

Page 99: Non-parametric statistics

2. It's symmetrical around the mean.

Page 100: Non-parametric statistics

3. The mean, median and mode all have same value.

Page 101: Non-parametric statistics

4. It can be specified completely, once mean and SD are known.

Page 102: Non-parametric statistics

5. The area under the curve is directly proportional to the relative frequency of observations.

Page 103: Non-parametric statistics

e.g. here, 50% of scores fall below the mean, as does 50% of the area under the curve.

Page 104: Non-parametric statistics

e.g. here, 85% of scores fall below score X, corresponding to 85% of the area under the curve.

Page 105: Non-parametric statistics

Relationship between the normal curve and the standard deviation (SD):

All normal curves share this property: The SD cuts off a constant proportion of the distribution of scores:

-3 -2 -1 mean +1 +2 +3

Number of standard deviations either side of mean

freq

uenc

y

99.7%

68%

95%

Page 106: Non-parametric statistics

About 68% of scores will fall in the range of the mean plus and minus 1 s.d.;

95% in the range of the mean +/- 2 s.d.'s; 99.7% in the range of the mean +/- 3 s.d.'s.

e.g.: I.Q. is normally distributed, with a mean of 100 and s.d. of 15.

Therefore, 68% of people have I.Q's between 85 and 115 (100 +/- 15).

95% have I.Q.'s between 70 and 130 (100 +/- (2*15). 99.7% have I.Q's between 55 and 145 (100 +/- (3*15).

Page 107: Non-parametric statistics

85 (mean - 1 s.d.) 115 (mean + 1 s.d.)

68%

Page 108: Non-parametric statistics

Just by knowing the mean, SD, and that scores are normally distributed, we can tell a lot about a population.

If we encounter someone with a particular score, we can assess how they stand in relation to the rest of their group.

e.g.: someone with an I.Q. of 145 is quite unusual: This is 3 SD's above the mean. I.Q.'s of 3 SD's or above occur in only 0.15% of the population [ (100-99.7) / 2 ]. Note: divide with 2 as there are 2 sides to the normal distribution!

Page 109: Non-parametric statistics

Conclusions:Many psychological/biological

properties are normally distributed.

This is very important for statistical inference (extrapolating from samples to populations)

Page 110: Non-parametric statistics

My scaly butt is of

large size!

Page 111: Non-parametric statistics

Just because the test statistic is significant, does not mean that the effect measured is important - it may account for only a very small part of the variance in the dataset, even though it is bigger than the random variance

So we calculate effect sizes – a measure of the magnitude of an observed effect

A common effect size is Pearsons correlation coefficient – normally used to measure the strenght of the relationship between two variables We call this ”r”

Page 112: Non-parametric statistics

”r” is the proportion of the total variance in the dataset that can be explained by the experiment

It falls between 0 (experiment explains no variance at all, effect size = zero) and 1 (experiment explains all variance, perfect effect size)

Three normal levels of r: r = 0.1 – small effect, 1% of total variance

explained r = 0.3 – medium effect, 9% of total variance

explained r = 0.5 – large effect, 25% of variance explained

Page 113: Non-parametric statistics

Note: Not linear scale - r-values of 0.2 is not twice of 0.1

r is standardized – we can compare across studies

Effect sizes are objective measures of the importance of a measured effect

Page 114: Non-parametric statistics

The bigger the effect size of something, the easier it is to find experimentally, i.e.: If IV manipulation has a major effect on the

DV, effect size is large

r can be calculated from a lot of test statistics, notably z-scores

r = z-score / square root of sample size