19
CHEN10011 Tutorial Problems Tutorial Problems N.J. Goddard

Statistics University of Manchester

Embed Size (px)

Citation preview

Page 1: Statistics University of Manchester

CHEN10011 Tutorial Problems

Tutorial Problems

N.J. Goddard

Page 2: Statistics University of Manchester

CHEN10011 Tutorial Problems 2

Exercise 1• You will be given a set of observations

– The age (in months) of an MSc class• Calculate:

– The mean, mode and median age– The standard deviation of the age

• Determine if these ages are normally distributed at the 95% confidence level

• Determine if there are any outliers in this set of ages at the 95% confidence level using one of the three methods in the notes

Page 3: Statistics University of Manchester

CHEN10011 Tutorial Problems 3

Exercise 1 Dataset264276300258263348276474264370274

Page 4: Statistics University of Manchester

CHEN10011 Tutorial Problems 4

Exercise 1

Age Mean 306.0909 Sorted Age Absolute dev. Sorted Abs. dev.264 Rounded mean 306 258 30 12276 Mode 264 263 25 12300 Median 276 264 24 12258 264 24 14263 Standard deviation 66.80188 274 14 24348 Rounded St. Dev. 67 276 12 24276 276 12 25474 MAD 24 300 12 30264 348 60 60370 σ (MAD/0.6745) 35.58191 370 82 82274 Rounded σ 36 474 186 186

Page 5: Statistics University of Manchester

CHEN10011 Tutorial Problems 5

Exercise 1

• Are these ages normally distributed?• Are there any reasons to suppose that your ages

are normally distributed?

Page 6: Statistics University of Manchester

CHEN10011 Tutorial Problems 6

Exercise 1Cumulative Frequency Plot

Age (months)

200 250 300 350 400 450 500 550

Cum

ulat

ive

Freq

uenc

y

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Observed cumulative frequencyExpected cumulative frequency

Page 7: Statistics University of Manchester

CHEN10011 Tutorial Problems 7

Exercise 1

• Null hypothesis is that your ages are normally distributed at the 95% confidence level

Page 8: Statistics University of Manchester

CHEN10011 Tutorial Problems 8

Exercise 1

Age Frequency Cumulative Freq. SNV Frac. Cum. Freq. Expected Cum. Freq. Expected-Actual258 1 1 -0.719903549 0.083333333 0.235792191 0.152458858263 1 2 -0.645055354 0.166666667 0.259445657 0.092778991264 2 4 -0.630085715 0.333333333 0.264319253 0.069014081274 1 5 -0.480389325 0.416666667 0.315475292 0.101191375276 2 7 -0.450450047 0.583333333 0.326192983 0.257140351300 1 8 -0.091178711 0.666666667 0.463675296 0.202991371348 1 9 0.627363962 0.75 0.734789658 0.015210342370 1 10 0.956696021 0.833333333 0.830639646 0.002693687474 1 11 2.513538479 0.916666667 0.994023663 0.077356996

Max 0.257140351

Page 9: Statistics University of Manchester

CHEN10011 Tutorial Problems 9

Lilliefors Critical Valuesn α=0.20 α=0.15 α=0.10 α=0.05 α=0.01

4 0.3027 0.3216 0.3456 0.3754 0.4129 5 0.2893 0.3027 0.3188 0.3427 0.3959 6 0.2694 0.2816 0.2982 0.3245 0.3728 7 0.2521 0.2641 0.2802 0.3041 0.3504 8 0.2387 0.2502 0.2649 0.2875 0.3331 9 0.2273 0.2382 0.2522 0.2744 0.3162

10 0.2171 0.2273 0.2410 0.2616 0.3037 11 0.2080 0.2179 0.2306 0.2506 0.2905 12 0.2004 0.2101 0.2228 0.2426 0.2812 13 0.1932 0.2025 0.2147 0.2337 0.2714 14 0.1869 0.1959 0.2077 0.2257 0.2627 15 0.1811 0.1899 0.2016 0.2196 0.2545 16 0.1758 0.1843 0.1956 0.2128 0.2477 17 0.1711 0.1794 0.1902 0.2071 0.2408 18 0.1666 0.1747 0.1852 0.2018 0.2345 19 0.1624 0.1700 0.1803 0.1965 0.2285 20 0.1589 0.1666 0.1764 0.1920 0.2226

Page 10: Statistics University of Manchester

CHEN10011 Tutorial Problems 10

Exercise 1

• The maximum difference (0.2571) exceeds the Lilliefors’ critical value for 11 observations at the 95% confidence level (0.2506), so we can reject the null hypothesis

• We accept the alternate hypothesis that your ages are not normally distributed

• We would expect this result as the ages are actually drawn from three populations (3rd year UG, 4th year UG and PG Masters level)

Page 11: Statistics University of Manchester

CHEN10011 Tutorial Problems 11

Criteria for the Rejection of Data • Grubb’s test• ISO recommended method• The suspect value is that furthest away from the

mean• Null hypothesis is that all measurements are from

the same population• We calculate:

Sx

G

luesuspect va

Page 12: Statistics University of Manchester

CHEN10011 Tutorial Problems 12

Criteria for the Rejection of Data

n Gcrit n Gcrit n Gcrit n Gcrit

3 1.15 15 2.55 27 2.86 39 3.03

4 1.48 16 2.59 28 2.88 40 3.04

5 1.71 17 2.62 29 2.89 50 3.13

6 1.89 18 2.65 30 2.91 60 3.20

7 2.02 19 2.68 31 2.92 70 3.26

8 2.13 20 2.71 32 2.94 80 3.31

9 2.21 21 2.73 33 2.95 90 3.35

10 2.29 22 2.76 34 2.97 100 3.38

11 2.34 23 2.78 35 2.98 110 3.42

12 2.41 24 2.80 36 2.99 120 3.44

13 2.46 25 2.82 37 3.00 130 3.47

14 2.51 26 2.84 38 3.01 140 3.49

Critical values for Grubb’s test

Page 13: Statistics University of Manchester

CHEN10011 Tutorial Problems 13

Criteria for the Rejection of Data • Our null hypothesis is that all of our data are

described a single distribution• The critical value for 11 observations at the 95%

confidence level is 2.34 • We have one value that lies above these values

(age 474 months, standard normal values 2.514 )• We can reject the null hypothesis and say this

value is from a different distribution

Page 14: Statistics University of Manchester

CHEN10011 Tutorial Problems 14

Criteria for the Rejection of Data • Chauvanet’s criterion• Null hypothesis is that all measurements are from

the same population• Remove the presumed outlier• Recalculate mean and standard deviation• Calculate the confidence limits at the required

confidence level• If the observation lies outside the new confidence

limits, it can be rejected

Page 15: Statistics University of Manchester

CHEN10011 Tutorial Problems 15

Criteria for the Rejection of Data • Chauvanet’s criterion

Age Mean 289.3264 Standard deviation 38.8903276 t (95%, 2 tailed) 2.262300 E 27.81851258263 Upper conf. limit 317.1185348 Lower conf. limit 261.4815276264370274

nStx nP 1,

Page 16: Statistics University of Manchester

CHEN10011 Tutorial Problems 16

Criteria for the Rejection of Data • Chauvanet’s criterion• The presumed outlier (474 months) lies well

outside the upper confidence limit (317.12 months)

• We can therefore reject the null hypothesis• The observation is an outlier at the 95%

confidence level

Page 17: Statistics University of Manchester

CHEN10011 Tutorial Problems 17

Criteria for the Rejection of Data Rank Difference Ratio (Q statistic) n α = 0.10 α = 0.05 α = 0.01

3 0.886 0.941 0.9884 0.679 0.765 0.8895 0.557 0.642 0.7806 0.482 0.560 0.6987 0.434 0.507 0.6378 0.650 0.710 0.8299 0.594 0.657 0.776

10 0.551 0.612 0.72611 0.517 0.576 0.67912 0.490 0.546 0.64213 0.467 0.521 0.61514 0.448 0.501 0.59315 0.472 0.525 0.61616 0.454 0.507 0.59517 0.438 0.490 0.57718 0.424 0.475 0.56119 0.412 0.462 0.54720 0.401 0.450 0.535

or 1

1

1

12

xxxx

xxxx

n

nn

n

or 2

2

11

13

xxxx

xxxx

n

nn

n

or 3

2

12

13

xxxx

xxxx

n

nn

n

Page 18: Statistics University of Manchester

CHEN10011 Tutorial Problems 18

Criteria for the Rejection of Data

• Null hypothesis is that all measurements are from the same population

• For 11 observations and a high outlier we use:

597.0211126

263474348474

2

2

xxxx

n

nn

Page 19: Statistics University of Manchester

CHEN10011 Tutorial Problems 19

Criteria for the Rejection of Data

• The critical Q value for 11 observations and 95% confidence level is 0.576

• Our calculated Q value is above the critical value• We therefore accept the null hypothesis that 474

months is an outlier