Click here to load reader
Upload
john-ng
View
215
Download
0
Embed Size (px)
Citation preview
8/10/2019 S2_MEI
http://slidepdf.com/reader/full/s2mei 1/7
MEI Structured Mathematics
Module Summary Sheets
Statistics 2(Version B: reference to new book)
Topic 1: The Poisson Distribution
Topic 2: The Normal Distribution
Topic 3: Samples and Hypothesis Testing1. Test for population mean of a Normal distribution
2. Contingency Tables and the Chi-squared Test
Topic 4: Bivariate Data
Purchasers have the licence to make multiple copies for usewithin a single establishment
MEI November, 2005
MEI, Oak House, 9 Epsom Centre, White Horse Business Park, Trowbridge, Wiltshire. BA14 0XG.
Company No. 3265490 England and Wales Registered with the Charity Commission, number 1058911
Tel: 01225 776776. Fax: 01225 775755.
MEI Mathematics in Education and Industry
8/10/2019 S2_MEI
http://slidepdf.com/reader/full/s2mei 2/7
8/10/2019 S2_MEI
http://slidepdf.com/reader/full/s2mei 3/7
E.g. The distribution of masses of adult males
may be modelled by a Normal distribution with
mean 75 kg and standard deviation 8 kg. Findthe probability that a man chosen at random
will have mass between 70 kg and 90 kg.
E.g. Find the probability that when a die isthrown 30 times there are at least 10 sixes.
Using the binomial distribution requires
P(30 sixes) + P(29 sixes) + ..........+ P(10 sixes).
However, using N(!", !"#) where ! = 30 and
" =1/6, gives N(5, 4.167).
P( $ > 9.5) = P( & > ' 1)
where ' 1 = (9.5 5)/2.04 = 2.205
= 1 0.9863 = 0.0137
N.B. a continuity correction is applied because
the original distribution (binomial) is being
approximated by a continuous distribution
(Normal).
For N(0,1), P( & < ' 1) can be found from tables
(Students Handbook, page 44)
E.g. P( & < 0.7) = 0.7580
P( & > 0.7) = 1 0.7580 = 0.2420
For N(2,9) [ = 2, = 3]
P( $ < 5) = P( & < ' 1) where ' 1 = (5 2)/3 = 1
= 0.8413
Modelling Many distributions in the real world, such as adult
heights or intelligence quotients, can be modelled well bya Normal distribution with appropriate mean and
variance.
Given the mean and standard deviation the Normal
distribution N( ) may often be used.
When the underlying distribution is discrete then the
Normal distribution may often be used, but in this case acontinuity correction must be applied. This requires us to
take the mid-point between successive possible values
when working with continuous distribution tables.
E.g. P( $ ) > 30 means P( $ > 30.5) if $ can take only
integer values.
The Normal distribution, N( ), is a continuous,
symmetric distribution with mean and standard
deviation The standard Normal distribution N(0,1) has
mean 0 and standard deviation 1.
P(X< (1) is represented by the area under the curve
below (1.
(It is a special case of a continuous probability densityfunction which is a topic in Statistics 3.)
The area under the standard Normal distribution curve
can be found from tables.
To find the area under any other Normal distributioncurve, the values need to be standardised by the formula
The Normal approximation to the binomial
distribution.
This is a valid process provided (i) ! is large,
(ii) " is not too close to 0 or 1.
Mean = !", Variance = !"#.
The approximation will be N(!", !"#).
A continuity correction must be applied because we are
approximating a discrete distribution by a continuousdistribution.
Statistics 2
Version B: page 3
Competence statements N1, N2, N3, N5
© MEI
Exercise 2A
Q. 4
Exercise 2B
Q. 1
)(*+",- ./0
1*2- 34
Exercise 2B
Q. 4
References:
Chapter 2
Pages 32-44
References:
Chapter 2
Pages 49, 50
)(*+",- ./3
1*2- 56
78++*9: 7. ;<"=> . The Normal Distribution
( '
References:
Chapter 2
Pages 50-52
1 2
1
2
We require P(70< 90) P( )
70 75
where 0.6258
90 75and 1.875
8
P(70 90)
P( 1.875) P 0.625
P( 1.875) 1 P( 0.625)
0.9696 (1 0.7340) 0.7036
$ ' & '
'
'
$
& &
& &
The Normal approximation to the Poisson
distribution.
This is a valid process provided is sufficiently large for
the distribution to be reasonably symmetric. A goodguideline is if is at least 10.
For a Poisson distribution, mean = variance =
The approximation will be N( .
As with the Binomial Distribution, a continuity
correction must be applied because we are approximat-
ing a discrete distribution by a continuous distribution.
References:
Chapter 2
Pages 52-54
Exercise 2BQ. 11
Exercise 2C
Q. 1, 6, 7
E.g. A large firm has 50 telephone lines. On
average, 40 lines are in use at once and the dis-
tribution may be modelled by Poisson(40). Find
the probability of there not being enough lines.
1
1
The distribution is Poisson(40).
Approximate by N(40,40).
Then we require P( 50) P( )
50.5 40where 1.66
40
P( 50) 1 0.9515 0.0485
$ & '
'
$
8/10/2019 S2_MEI
http://slidepdf.com/reader/full/s2mei 4/7
!"# %&'()&*+(&,- ,. '/012# 0#/-' "# $ %&%'($)*&+ ,$- ./ ,&0/((/0 .- $ 1&2,$( 0*3)2*.'4
)*&+ $+0 3$,%(/3 &# 3*5/ n $2/ )$6/+ #2&, )7/
%&%'($)*&+8 )7/+ )7/ 0*3)2*.')*&+ &# ,/$+3 &# )7/3/
3$,%(/3 *3 $(3& 1&2,$(9
:)$)*3)*;3 <
=/23*&+ >? %$@/ A
B&,%/)/+;/ 3)$)/,/+)3 1C
D EF"
FG/2;*3/ HI
J9 KL*M8L***M
Summary S2 Topic 3 Samples and Hypothesis Testing: Estimating the population mean of a Normal distribution
N/#/2/+;/3?
B7$%)/2 H
O$@/3 CP4QK
<
<
"# )7/ %$2/+) %&%'($)*&+ *3 1 8 )7/+ )7/
3$,%(*+@ 0*3)2*.')*&+ &# ,/$+3 *3 1 8 9n
FG$,%(/ H9KO$@/ QR
341,("#'&' (#'( .,) ("# 0#/- +'&-5 ("# 6,)0/2
%&'()&*+(&,-
S/3)3 &+ )7/ ,/$+ '3*+@ $ 3*+@(/ 3$,%(/9
TR *3 U R V7/2/ R *3 3&,/ 3%/;*#*/0 W$('/9
TK ,$- ./ &+/ )$*(/0? X R &2 Y R
&2 )V& )$*(/0? R9
"+ &)7/2 V&2038 @*W/+ )7/ ,/$+ &# )7/ 3$,%(/ )$6/+ V/
$36 )7/ Z'/3)*&+8 [B&'(0 )7/ ,/$+ &# )7/ %$2/+) %&%'($)*&+ ./ V7$) V/ )7*+6 *) *3\]
I()/2+$)*W/(-8 *# )7/ W$('/ &# )7/ ,/$+ &# )7/ 3$,%(/(*/3 *+3*0/ )7/ $;;/%)$+;/ 2/@*&+ )7/+ V/ V&'(0 $;;/%)
TR8 .') *# *) ($- *+ )7/ ;2*)*;$( 2/@*&+ )7/+ V/ V&'(02/^/;) TR *+ #$W&'2 &# TK9
I()/2+$)*W/(-8 ;$(;'($)/ )7/ %2&.$.*(*)- )7$) )7/ W$('/ *3
@2/$)/2 )7$+ )7/ W$('/ #&'+0 $+0 3// *# *) (/33 )7$+ )7/3*@+*#*;$+;/ (/W/( ./ '3/09
N/#/2/+;/3?
B7$%)/2 H
O$@/3 QK4QH
7-,8- /-% #'(&0/(#% '(/-%/)% %#9&/(&,- S7/ 7-%&)7/3*3 )/3) 0/3;2*./0 $.&W/ 2/Z'*2/3 )7/ W$('/
&# )7/ 3)$+0$20 0/W*$)*&+ &# )7/ %$2/+) %&%'($)*&+9
"+ 2/$(*)- )7/ 3)$+0$20 0/W*$)*&+ &# )7/ %$2/+)
%&%'($)*&+ V*(( '3'$((- +&) ./ 6+&V+ $+0 V*(( 7$W/ )&
./ /3)*,$)/0 #2&, )7/ 3$,%(/ 0$)$9"# )7/ 3$,%(/ 3*5/ *3 3'##*;*/+)(- ($2@/8 )7/ 3909 &# )7/
3$,%(/ ,$- ./ '3/0 $3 )7/ 3909 &# )7/ %$2/+) %&%'($4
)*&+9
I @&&0 @'*0/(*+/ *3 )& 2/Z'*2/ n _R9
N/#/2/+;/3?
B7$%)/2 H
O$@/3 QH4QA
FG$,%(/ H9HO$@/ QA
FG/2;*3/ HI
J9 C
F9@9 I %&%'($)*&+ 7$3 W$2*$+;/ KC9 ") *3 2/Z'*2/0 )&
)/3) $) )7/ R9_` (/W/( &# 3*@+*#*;$+;/ V7/)7/2 )7/,/$+ &# )7/ %&%'($)*&+ ;&'(0 ./ KR &2 V7/)7/2 *)
*3 (/33 )7$+ )7*39 I 2$+0&, 3$,%(/ 3*5/ <_ 7$3 $,/$+ &# P9C9
I()/2+$)*W/(-8 *# )7/ ,/$+ *3 KR )7/+ )7/3$,%(*+@ 0*3)2*.')*&+ &# ,/$+3 *3 1LKR8 R9CAM
<
<
:'%%&3/ )7/ %$2/+) %&%'($)*&+ *3 1 8 8 )7/+ )7/
3$,%(*+@ 0*3)2*.')*&+ &# ,/$+3 *3 1 8 9S7/
;2*)*;$( W$('/3 $2/ )7/2/#&2/ V7/2/ )7/
W$('/ &# 0/%/+03 &+ )7/ (/W/( &# 3*@+*#*;$+;/
$+0 V7
n
k n
k
/)7/2 *) *3 $ &+/ &2 )V&4)$*(/0 )/3)9
a/ )7/2/#&2/ ;$(;'($)/ )7/ W$('/ $+0
;&,%$2/ *) )& )7/ W$('/ #&'+0 *+ )$.(/39
x
z n
:
R
K
R
T ? KR
T ? KR
<9_P L#&2 R9_ ̀(/W/(8 K4)$*(/0 )/3)M
AB2*)*;$( W$('/ *3 KR <9_P KR <9_P Q9bHC
_
:*+;/ P9C Y Q9bHC V/ $;;/%) T c)7/2/ *3 +& /W*0/+;/
$) )7/ R9_ ̀(/W/( &# 3*@+*#*;$+;/ )7$) )7/ ,/$+ *3
(
k
n
/33 )7$+ KR9
F9@9 ") *3 )7&'@7) )7$) )7/ %$2/+) %&%'($)*&+ *3 1&24
,$((- 0*3)2*.')/0 V*)7 ,/$+ <R9I 2$+0&, 3$,%(/ &# _R 0$)$ *)/,3 7$3 $ 3$,%(/,/$+ &# <A9< $+0 3909 P9H9
"3 )7/2/ $+- /W*0/+;/ $) )7/ R9K` 3*@+*#*;$+;/
(/W/( )7$) )7/ ,/$+ &# )7/ %&%'($)*&+ *3 +&) <R\
R
S7/+ OL P9CM K K
:*+;/ R9RARK R9RR_ V/ $;;/%) T
X
R
K
T ? <R
T ? <R
L1&)/ )7$) $()7&'@7 )7/ ,/$+ &# )7/ 3$,%(/
*3 @2/$)/2 )7$+ )7/ %2&%&3/0 ,/$+8 V/ 0& +&)
7$W/ Y<R ./;$'3/ &# )7/ V&20*+@ &# )7/
Z'/3)*&+9M
<A9< <R H9_QPP9H
_R
B2*)*;$( W$('/ #2&,
x z
n
:
R K
)$.(/3 #&2 )V&4)$*(/08
R9K` 3*@+*#*;$+;/ (/W/( *3 H9<Q
:*+;/ H9_QPYH9<Q V/ 2/^/;) T *+ #$W&'2 &# T 9
S7/2/ *3 /W*0/+;/ )7$) )7/ ,/$+ &# )7/
%&%'($)*&+ *3 +&) <R9
F9@9 "# )7/ %$2/+) %&%'($)*&+ *3 1LKR8 KCM $+0 $
3$,%(/ &# 3*5/ <_ 7$3 ,/$+ P9C8 )7/+ )7*3 W$('/
;&,/3 #2&, )7/ 3$,%(*+@ 0*3)2*.')*&+ &# ,/$+3V7*;7 *3 1LKR8 R9CAM9
8/10/2019 S2_MEI
http://slidepdf.com/reader/full/s2mei 5/7
F9@9 "# 02*W*+@ )& B&((/@/ $+0 )7/ 0*3)$+;/ (*W/0
$2/ +&) $33&;*$)/0 /W/+)3 )7/+ *# &+/ 3)'0/+) *3
;7&3/+ $) 2$+0&, )7/ /3)*,$)/0 %2&.$.*(*)*/3 $2/
"+ $ 3*,*($2 V$- )7/ /+)2*/3 *+ )7/ &)7/2 )72//
.&G/3 $2/ ;$(;'($)/0 )& @*W/ )7/ #&((&V*+@?
a/ )/3) )7/ 7-%&)7/3/3?
TR? )7/ )V& /W/+)3 $2/ +&) $33&;*$)/0TK? S7/ )V& /W/+)3 $2/ $33&;*$)/09
"# )7/ )/3) *3 $) )7/ _` (/W/( )7/+ )7/ )$.(/3 &+
%$@/ A_ &# )7/ EF" :)'0/+)3d T$+0.&&6 @*W/3 )7/;2*)*;$( W$('/ &# H9PAK Lv U KM9
:*+;/ A9K<KA Y H9PAK V/ 2/^/;) )7/ +'(( 7-%&)7/4
3*38 TR8 $+0 ;&+;('0/ )7$) )7/2/ *3 /W*0/+;/ )7$) )7/
)V& /W/+)3 $2/ $33&;*$)/09
"# )7/ )/3) V/2/ $) )7/ K` 3*@+*#*;$+;/ (/W/( )7/+V/ V&'(0 ;&+;('0/ )7$) )7/2/ V$3 +&) /+&'@7 /W*4
0/+;/ )& 2/^/;) )7/ +'(( 7-%&)7/3*39
F9@9 $ @2&'% &# _R 3)'0/+)3 V$3 3/(/;)/0 $) 2$+0&,#2&, )7/ V7&(/ %&%'($)*&+ &# 3)'0/+)3 $) $
B&((/@/9 F$;7 V$3 $36/0 V7/)7/2 )7/- 02&W/ )&
B&((/@/ &2 +&) $+0 V7/)7/2 )7/- (*W/0 ,&2/ )7$+
&2 (/33 )7$+ KR 6, #2&, )7/ B&((/@/9 S7/ 2/3'()3
$2/ 37&V+ *+ )7*3 )$.(/9
!"# ; <(/(&'(&= >?"&@'A+/)#% '(/(&'(&=B
S7*3 3)$)*3)*; ,/$3'2/3 7&V #$2 $%$2) $2/ )7/ 3/) &#
&.3/2W/0 $+0 /G%/;)/0 #2/Z'/+;*/39
?,-(&-5#-=4 !/*2#'
:'%%&3/ )7/ /(/,/+)3 &# $ %&%'($)*&+ 7$W/ < 3/)3 &# 0*34
)*+;) ;7$2$;)/2*3)*;3 e X, Y f8 /$;7 3/) ;&+)$*+*+@ $ #*+*)/
+',./2 &# 0*3;2/)/ ;7$2$;)/2*3)*;3 X U e xK8 x<8g98 xmf $+0
Y U e yK8 y<8gg98 ynf )7/+ /$;7 /(/,/+) &# )7/ %&%'($)*&+
V*(( 7$W/ $ %$*2 &# ;7$2$;)/2*3)*;3 L xi8 y jM9
S7/ #2/Z'/+;- &# )7/3/ m n %$*23 L xi8 y jM ;$+ ./ )$.'($)/0
*+)& $+ m n =,-(&-5#-=4 (/*2#9
S7/ 0/)5&-/2 (,(/2' $2/ )7/ 3', &# )7/ 2&V3 $+0 )7/ 3',
&# )7/ ;&(',+3 $+0 *) *3 '3'$( )& $00 $ 2&V $+0 $ ;&(',+
#&2 )7/3/9
S7/ 2/Z'*2/,/+) *3 )& 0/)/2,*+/ )7/ /G)/+) )& V7*;7 )7/W$2*$.(/3 $2/ 2/($)/09
"# )7/- $2/ +&) 2/($)/0 .') *+0/%/+0/+)8 )7/+ )7/&2/)*;$(
%2&.$.*(*)*/3 ;$+ ./ /3)*,$)/0 #2&, )7/ 3$,%(/ 0$)$9h&' +&V 7$W/ )V& )$.(/38 &+/ ;&+)$*+*+@ )7/ $;)'$(
L&.3/2W/0M #2/Z'/+;*/3 $+0 )7/ &)7/2 ;&+)$*+*+@ )7/ /3)*4
,$)/0 /G%/;)/0 #2/Z'/+;*/3 .$3/0 &+ )7/ $33',%)*&+ )7$)
)7/ W$2*$.(/3 $2/ *+0/%/+0/+)9
!"# "41,("#'&' (#'(
TR? S7/ W$2*$.(/3 $2/ +&) $33&;*$)/09
TK? S7/ W$2*$.(/3 $2/ $33&;*$)/09
:)$)*3)*;3 <
=/23*&+ >? %$@/ _
B&,%/)/+;/ 3)$)/,/+)3 TK8 T<
D EF"
FG/2;*3/ H>J9 A8 _
N/#/2/+;/3?
B7$%)/2 H
O$@/3 PK4P_
N/#/2/+;/3?
B7$%)/2
O$@/3 PQ4b<
Summary S2 Topic 3 Samples and Hypothesis Testing2: Contingency Tables and the Chi-squared Test
yK y< yn
xK f K8K f K8< f K8n
x< f <8K f <8< f 28n
xm f m8K f m8< f m,n
1/$2/2
)7$+ KR 6,
i'2)7/2 )7$+ KR
6,
j2*W/3 KK KQ ;C
j&/3 +&)
02*W/
K_ Q ;;
;D ;E FG
<P <AOL02*W/3M 8OL(*W/3 #'2)7/2 )7$+ KR 6,M
_R _R
<P <A$+0 OL02*W/3 $+0 (*W/3 #'2)7/2 )7$+ KR 6,M U_R _R
R9<CPP
:& &') &# _R %/&%(/ V/ V&'(0 /G%/;) _R R9<CPP
U KH9AA
1/$2/2
)7$+ KR 6,
i'2)7/2 )7$+ KR
6,
j2*W/3 KA9_C KH9AA ;C
j&/3 +&)02*W/ KK9AA KR9_C ;;
;D ;E FG
<
<
<
&.3/2W/0 #2/Z'/+;- 4 /G%/;)/0 #2/Z'/+;-
/G%/;)/0 #2/Z'/+;-
o e
e
X
f f
f
H#5)##' ,. .)##%,0
S7/ 0*3)2*.')*&+ 0/%/+03 &+ )7/ +',./2 &# #2// W$2*$.(/3
)7/2/ $2/8 ;$((/0 )7/ 0/@2//3 &# #2//0&,8 9
S7*3 *3 )7/ +',./2 &# ;/((3 (/33 )7/ +',./2 &# 2/3)2*;)*&+3
%($;/0 &+ )7/ 0$)$9
i&2 $ << )$.(/ 3';7 $3 )7/ /G$,%(/ @*W/+ )7/ +',./2 &#;/((3 )& ./ #*((/0 *3 A8 .') )7/ &W/2$(( )&)$( *3 _R V7*;7 *3 $
2/3)2*;)*&+ $+0 )7/ %2&%&2)*&+3 #&2 /$;7 W$2*$.(/ V/2/ $(3&
/3)*,$)/0 #2&, )7/ 0$)$8 @*W*+@ )V& #'2)7/2 2/3)2*;)*&+39
:& )7/ +',./2 &# 0/@2//3 &# #2//0&, *+ )7/ /G$,%(/ *3 K9
"+ @/+/2$( )7/ +',./2 &# 0/@2//3 &# #2//0&, #&2 $+ m n
)$.(/ *3 Lm KMLn KM9
<
<
< < < <KK KA9_C KQ KH9AA K_ KK9AA Q KR9_C
KA9_C KH9AA KK9AA KR9_C
R9PQRA R9bAHR K9KRQP K9<RR<A9K<KA
o e
e
f f X
f
N/#/2/+;/3?
B7$%)/2 H
O$@/ P_
FG/2;*3/ HB
J9 K8 P
8/10/2019 S2_MEI
http://slidepdf.com/reader/full/s2mei 6/7
For the data above:
For the data above:
r can be found directly with an appropriatecalculator.
Example 1
The length ( x cm) and width ( y cm) of leaves
of a tree were measured and recorded as
follows:
x 2.1 2.3 2.7 3.0 3.4 3.9
y 1.1 1.3 1.4 1.6 1.9 1.7
The scatter graph is drawn as shown.
Pearson’s Product Moment Correlation Coefficient provides a standardised measure of covariance.
The pmcc lies between -1 and +1.
Correlation and Regression If the x and y values are both regarded as values of randomvariables, then the analysis is correlation. Choose a sample
from a population and measure two attributes.
If the x value is non-random (e.g. time at fixed intervals)
then the analysis is regression. Choose the value of one vari-able and measure the corresponding value of another.
Bivariate Data are pairs of values ( x, y) associated with a
single item.
e.g. lengths and widths of leaves.The individual variables x and y may be discrete or
continuous.
A scatter diagram is obtained by plotting the points ( x1, y1),( x2, y2) etc.Correlation is a measure of the linear association between
the variables.
There is said to be perfect correlation if all the points lie on
a line.
Statistics 2
Version B: page 6
Competence statements b1, b2, b3
© MEI
Example 4.1
Page 112
Exercise 4A
Q. 2
References:
Chapter 4
Pages 111-114
References:
Chapter 4
Pages 104-109
References:
Chapter 4
Pages 110-111
Summary S2 Topic 4 Bivariate Data 1
_ _
_ _
A line of best fit is a line drawn to fit the set of data points
as closely as possible.
This line will pass through the mean point ( , ) where
is the mean of the values and is the mean of the
x y
x x y
values. y
x
2.0 3.0 4.01.0
1.5
y2.0
The mean point is ( ) which is (2.9, 1.5)
The line of best fit is drawn through the point (2.9,1.5)
_ _
x, y
17.4; 9.0; 26.9717.4 9.0
26.97 0.876
xy xy
x y xy
S S
2
2
52.76 2.3
13.92 0.42
0.870.885
2.3 0.42
xx
yy
xy
xx yy
x S
y S
S r
S S
E.g. 50 students are selected at random and
their heights and weights are measured. This
will require correlation analysis.A ball is bounced 5 times from each of a
number of different heights and the height is
recorded. This will require regressionanalysis.
,
The alternative form for is
= =
xy i i xx i i
yy i i
xy
i i
xy i i i i
S x x y y S x x x x
S y y y y
S
x yS x y x y n x y
n
_ _ _ _
_ _
_ _
Notation for pairs of observations ( , ).n x y
_ _
2 2 _ _
_ _
_ _ 2 2 2 2
.
i i xy
xx yy
i i
i i
i i
x x y yS
r S S
x x y y
x y n x y
x n x y n y
8/10/2019 S2_MEI
http://slidepdf.com/reader/full/s2mei 7/7
For the data of Example 1:
r = 0.885
We wish to test the hypothesis that there is
positive correlation between lengths and
widths of the leaves of the tree.
Ho: = 0 There is no correlation betweenthe two variables.
H1: > 0 There is positive correlation between the two variables
(1-tailed test).
From the Students’ Handbook, the critical
value for n = 8 at 5% level (one tailed test) is
0.6215.
Since 0.885>0.6215 there is evidence that H0
can be rejected and that there is positive
correlation between the two variables.
Example 2
2 judges ranked 5 competitors as follows:
Competitor A B C D E
Judge 1 1 3 4 2 5
Judge 2 2 3 1 4 5d 1 0 -3 2 0
d 2 1 0 9 4 0
The least squares regression line For each value of x the value of y given and the value on the
line may be different by an amount called the residual.
If the data pair is ( xi ,yi) where the line of best fit is
y = a + bx then yi - (a + bxi ) = ei giving the residual ei .
The least squares regression line is the line that minimises the
sum of the squares of the residuals.
Spearman’s coefficient of rank correlation
If the data do not look linear when plotted on a scatter graph
(but appear to be reasonably monotomic), or if the rank order
instead of the values is given, then the Pearson correlation
coefficient is not appropriate.
Instead, Spearman’s rank correlation coefficient should beused. It is usually calculated using the formula
where d is the difference in ranks for each data pair.
This coefficient is used:
(i) when only ranked data are available,
(ii) the data cannot be assumed to be linear.
In the latter case, the data should be ranked.
Where r s has been found the hypothesis test is set up in the
same way. The condition here is that the sample is random.Make sure that you use the right tables!
Tied Ranks If two ranks are tied in, say, the 3rd place then each should be
given the rank 31/2.
Testing a parent population correlation by means of a
sample where r has been found
The value of r found for a sample can be used to test
hypotheses about the value of , the correlation in the parent population.
Conditions:
(i) the values of x and y must be taken from a bivariate
Normal distribution,(ii) the data must be a random sample.
An indication that a bivariate Normal distribution is a validmodel is shown by a scatter plot which is roughly elliptical
with the points denser near the middle.
Statistics 2
Version B:
page 7
Competence
statements
b4, b5, b6, b7, b8© MEI
Exercise 4CQ. 3
Example 4.3
Page 134
References:
Chapter 4
Pages 132-134
References:
Chapter 4
Pages 142-144
Summary S2 Topic 4 Bivariate Data 2
2
2
61
( 1)
i
s
d r
n n
_
The equation of the line is ( ) xy
xx
S y y x x
S
_
52
1
6 1414 1 0.3
5 24 s
xd r
x
0
1
1
H : = 0 There is no correlation between
the two variables.
H : 0 There is correlation between
the two variables (2 - tailed test.)
Or :
H : > 0 Ther
1
e is positive correlation between
the two variables (1- tailed test.)
Or :
H : < 0 There is negative correlation between
the two variables (1- tailed test.)
References:
Chapter 4
Pages 118-124
For the data of example 2:
r s = 0.3
Ho: = 0 There is no correlation between
the two variables.
H1: > 0 There is positive correlation
between the two variables
(1-tailed test).
For n = 5 at the 5% level (1 tailed test), the
critical value is 0.9.
Since 0.3 < 0.9 we are unable to reject Ho
and conclude that there is no evidence tosuggest correlation.
Exercise 4C
Q. 5
Exercise 4D
Q. 4
E.g. For the data x 1 2 3 4 5
y 1.1 2.4 3.6 4.7 6.1
2
_ _
_ 2 2 2
_ _
15, 17.9, 55, 65
15 17.93, 3.58
5 5
55 6 3 10
65 5 3 3.58 11.3
11.33.58 3 1.13 0.19
10
xx
xy
x y x xy
x y
S x n x
S xy n x y
y x y x