Upload
martha-cook
View
243
Download
2
Tags:
Embed Size (px)
Citation preview
Measure of Variability (Dispersion, Spread)
1. Range
2. Inter-Quartile Range
3. Variance, standard deviation
4. Pseudo-standard deviation
Measure of Central Location
1. Mean
2. Median
1. Range
R = Range = max - min
2. Inter-Quartile Range (IQR)
Inter-Quartile Range = IQR = Q3 - Q1
Example
The data Verbal IQ on n = 23 students arranged in increasing order is:
80 82 84 86 86 89 90 94 94 95 95 96 99 99 102 102 104 105 105 109 111 118 119
Q2 = 96Q1 = 89 Q3 = 105min = 80 max = 119
Range and IQR
Range = max – min = 119 – 80 = 39
Inter-Quartile Range
= IQR = Q3 - Q1 = 105 – 89 = 16
3. Sample Variance
Let x1, x2, x3, … xn denote a set of n numbers.
Recall the mean of the n numbers is defined as:
n
xxxxx
n
xx nn
n
ii
13211
The numbers
are called deviations from the the mean
xxd 11
xxd 22
xxd 33
xxd nn
The sum
is called the sum of squares of deviations from the the mean.
Writing it out in full:
or
n
ii
n
ii xxd
1
2
1
2
223
22
21 ndddd
222
21 xxxxxx n
The Sample Variance
Is defined as the quantity:
and is denoted by the symbol
111
2
1
2
n
xx
n
dn
ii
n
ii
2s
The Sample Standard Deviation s
Definition: The Sample Standard Deviation is defined by:
Hence the Sample Standard Deviation, s, is the square root of the sample variance.
111
2
1
2
n
xx
n
ds
n
ii
n
ii
Example
Let x1, x2, x3, x4, x5 denote a set of 5 denote the set of numbers in the following table.
i 1 2 3 4 5
xi 10 15 21 7 13
Then
= x1 + x2 + x3 + x4 + x5
= 10 + 15 + 21 + 7 + 13
= 66
and
5
1iix
n
xxxxx
n
xx nn
n
ii
13211
2.135
66
The deviations from the mean d1, d2, d3, d4, d5 are given in the following table.
i 1 2 3 4 5
x i 10 15 21 7 13-3.2 1.8 7.8 -6.2 -0.2
10.24 3.24 60.84 38.44 0.04i id x x
22i id x x
The sum
and
n
ii
n
ii xxd
1
2
1
2
22222 2.02.68.78.12.3
80.112
04.044.3884.6024.324.10
2.28
4
8.112
11
2
2
n
xxs
n
ii
Also the standard deviation is:
31.52.28
4
8.112
11
2
2
n
xxss
n
ii
Interpretations of s
• In Normal distributions– Approximately 2/3 of the observations will lie
within one standard deviation of the mean– Approximately 95% of the observations lie
within two standard deviations of the mean– In a histogram of the Normal distribution, the
standard deviation is approximately the distance from the mode to the inflection point
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 5 10 15 20 25
s
Inflection point
Mode
s
2/3
s
2s
Example
A researcher collected data on 1500 males aged 60-65.
The variable measured was cholesterol and blood pressure.
– The mean blood pressure was 155 with a standard deviation of 12.
– The mean cholesterol level was 230 with a standard deviation of 15
– In both cases the data was normally distributed
Interpretation of these numbers
• Blood pressure levels vary about the value 155 in males aged 60-65.
• Cholesterol levels vary about the value 230 in males aged 60-65.
• 2/3 of males aged 60-65 have blood pressure within 12 of 155. i.e. between 155-12 =143 and 155+12 = 167.
• 2/3 of males aged 60-65 have Cholesterol within 15 of 230. i.e. between 230-15 =215 and 230+15 = 245.
• 95% of males aged 60-65 have blood pressure within 2(12) = 24 of 155. Ii.e. between 155-24 =131 and 155+24 = 179.
• 95% of males aged 60-65 have Cholesterol within 2(15) = 30 of 230. i.e. between 230-30 =200 and 230+30 = 260.
A Computing formula for:
Sum of squares of deviations from the the mean :
The difficulty with this formula is that will have many decimals.
The result will be that each term in the above sum will also have many decimals.
n
ii xx
1
2
x
The sum of squares of deviations from the the mean can also be computed using the following identity:
n
x
xxx
n
iin
ii
n
ii
2
1
1
2
1
2
To use this identity we need to compute:
and 211
n
n
ii xxxx
222
21
1
2n
n
ii xxxx
Then:
n
x
xxx
n
iin
ii
n
ii
2
1
1
2
1
2
11 and
2
1
1
2
1
2
2
nn
x
x
n
xxs
n
iin
ii
n
ii
11
and
2
1
1
2
1
2
nn
x
x
n
xxs
n
iin
ii
n
ii
Example
The data Verbal IQ on n = 23 students arranged in increasing order is:
80 82 84 86 86 89 90 94
94 95 95 96 99 99 102 102
104 105 105 109 111 118 119
= 80 + 82 + 84 + 86 + 86 + 89
+ 90 + 94 + 94 + 95 + 95 + 96 + 99 + 99 + 102 + 102 + 104
+ 105 + 105 + 109 + 111 + 118 + 119 = 2244
= 802 + 822 + 842 + 862 + 862 + 892
+ 902 + 942 + 942 + 952 + 952 + 962 + 992 + 992 + 1022 + 1022 + 1042
+ 1052 + 1052 + 1092 + 1112
+ 1182 + 1192 = 221494
n
iix
1
n
iix
1
2
Then:
n
x
xxx
n
iin
ii
n
ii
2
1
1
2
1
2
652.2557
23
2244221494
2
You will obtain exactly the same answer if you use the left hand side of the equation
11 and
2
1
1
2
1
2
2
nn
x
x
n
xxs
n
iin
ii
n
ii
26.116
22
652.2557
2223
2244221494
2
11 Also
2
1
1
2
1
2
nn
x
x
n
xxs
n
iin
ii
n
ii
26.116
22
652.2557
2223
2244221494
2
782.10
A quick (rough) calculation of s
The reason for this is that approximately all (95%) of the observations are between
and
Thus
4
Ranges
sx 2.2sx
sx 2max .2min and sx .22minmax and sxsxRange
s4
4
Range Hence s
Example
Verbal IQ on n = 23 students min = 80 and max = 119
This compares with the exact value of s which is 10.782.The rough method is useful for checking your calculation of s.
75.94
39
4
80-119s
The Pseudo Standard Deviation (PSD)
Definition: The Pseudo Standard Deviation (PSD) is defined by:
35.1
Range ileInterQuart
35.1
IQRPSD
Properties
• For Normal distributions the magnitude of the pseudo standard deviation (PSD) and the standard deviation (s) will be approximately the same value
• For leptokurtic distributions the standard deviation (s) will be larger than the pseudo standard deviation (PSD)
• For platykurtic distributions the standard deviation (s) will be smaller than the pseudo standard deviation (PSD)
Example
Verbal IQ on n = 23 students Inter-Quartile Range
= IQR = Q3 - Q1 = 105 – 89 = 16
Pseudo standard deviation
This compares with the standard deviation
85.1135.1
16
35.1
IQRPSD
782.10s
• An outlier is a “wild” observation in the data
• Outliers occur because– of errors (typographical and computational)– Extreme cases in the population
• We will now consider the drawing of box-plots where outliers are identified
Box-whisker Plots showing outliers
• An outlier is a “wild” observation in the data
• Outliers occur because– of errors (typographical and computational)– Extreme cases in the population
• We will now consider the drawing of box-plots where outliers are identified
To Draw a Box Plot we need to:
• Compute the Hinge (Median, Q2) and the Mid-hinges (first & third quartiles – Q1 and Q3 )
• To identify outliers we will compute the inner and outer fences
The fences are like the fences at a prison. We expect the entire population to be within both sets of fences.
If a member of the population is between the inner and outer fences it is a mild outlier.
If a member of the population is outside of the outer fences it is an extreme outlier.
Lower outer fence
F1 = Q1 - (3)IQR
Upper outer fence
F2 = Q3 + (3)IQR
Lower inner fence
f1 = Q1 - (1.5)IQR
Upper inner fence
f2 = Q3 + (1.5)IQR
• Observations that are between the lower and upper fences are considered to be non-outliers.
• Observations that are outside the inner fences but not outside the outer fences are considered to be mild outliers.
• Observations that are outside outer fences are considered to be extreme outliers.
• mild outliers are plotted individually in a box-plot using the symbol
• extreme outliers are plotted individually in a box-plot using the symbol
• non-outliers are represented with the box and whiskers with– Max = largest observation within the fences– Min = smallest observation within the fences
Inner fencesOuter fence
Mild outliers
Extreme outlierBox-Whisker plot representing the data that are not outliers
Example
Data collected on n = 109 countries in 1995.
Data collected on k = 25 variables.
The variables
1. Population Size (in 1000s)
2. Density = Number of people/Sq kilometer
3. Urban = percentage of population living in cities
4. Religion
5. lifeexpf = Average female life expectancy
6. lifeexpm = Average male life expectancy
7. literacy = % of population who read
8. pop_inc = % increase in popn size (1995)
9. babymort = Infant motality (deaths per 1000)
10. gdp_cap = Gross domestic product/capita
11. Region = Region or economic group
12. calories = Daily calorie intake.
13. aids = Number of aids cases
14. birth_rt = Birth rate per 1000 people
15. death_rt = death rate per 1000 people
16. aids_rt = Number of aids cases/100000 people
17. log_gdp = log10(gdp_cap)
18. log_aidsr = log10(aids_rt)
19. b_to_d =birth to death ratio
20. fertility = average number of children in family
21. log_pop = log10(population)
22. cropgrow = ??
23. lit_male = % of males who can read
24. lit_fema = % of females who can read
25. Climate = predominant climate
The data file as it appears in SPSS
Consider the data on infant mortality
Stem-Leaf diagram stem = 10s, leaf = unit digit
0 4455555666666666777778888899 1 0122223467799 2 0001123555577788 3 45567999 4 135679 5 011222347 6 03678 7 4556679 8 5 9 4 10 1569 11 0022378 12 46 13 7 14 15 16 8
median = Q2 = 27
Quartiles
Lower quartile = Q1 = the median of lower half
Upper quartile = Q3 = the median of upper half
Summary Statistics
1 3
12 12 66 6712, 66.5
2 2Q Q
Interquartile range (IQR)
IQR = Q1 - Q3 = 66.5 – 12 = 54.5
lower = Q1 - 3(IQR) = 12 – 3(54.5) = - 151.5
The Outer Fences
No observations are outside of the outer fences
lower = Q1 – 1.5(IQR) = 12 – 1.5(54.5) = - 69.75
The Inner Fences
upper = Q3 = 1.5(IQR) = 66.5 – 1.5(54.5) = 148.25
upper = Q3 = 3(IQR) = 66.5 – 3(54.5) = 230.0
Only one observation (168 – Afghanistan) is outside of the inner fences – (mild outlier)
Box-Whisker Plot of Infant Mortality
0
0 50 100 150 200
Infant Mortality
Example 2
In this example we are looking at the weight gains (grams) for rats under six diets differing in level of protein (High or Low) and source of protein (Beef, Cereal, or Pork).
– Ten test animals for each diet
TableGains in weight (grams) for rats under six diets
differing in level of protein (High or Low)and source of protein (Beef, Cereal, or Pork)
Level High Protein Low protein
Source Beef Cereal Pork Beef Cereal Pork
Diet 1 2 3 4 5 6
73 98 94 90 107 49
102 74 79 76 95 82
118 56 96 90 97 73
104 111 98 64 80 86
81 95 102 86 98 81
107 88 102 51 74 97
100 82 108 72 74 106
87 77 91 90 67 70
117 86 120 95 89 61
111 92 105 78 58 82
Median 103.0 87.0 100.0 82.0 84.5 81.5
Mean 100.0 85.9 99.5 79.2 83.9 78.7
IQR 24.0 18.0 11.0 18.0 23.0 16.0
PSD 17.78 13.33 8.15 13.33 17.04 11.05
Variance 229.11 225.66 119.17 192.84 246.77 273.79
Std. Dev. 15.14 15.02 10.92 13.89 15.71 16.55
Non-Outlier MaxNon-Outlier Min
Median; 75%25%
Box Plots: Weight Gains for Six Diets
Diet
We
igh
t G
ain
40
50
60
70
80
90
100
110
120
130
1 2 3 4 5 6
High Protein Low Protein
Beef Beef Cereal Cereal Pork Pork
Conclusions
• Weight gain is higher for the high protein meat diets
• Increasing the level of protein - increases weight gain but only if source of protein is a meat source
Measures of Shape
Measures of Shape• Skewness
• Kurtosis
00.020.040.060.080.1
0.120.140.16
0 5 10 15 20 250
0.020.040.060.080.1
0.120.140.16
0 5 10 15 20 25
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 5 10 15 20 25
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 5 10 15 20 250
-3 -2 -1 0 1 2 3
0
-3 -2 -1 0 1 2 3
Positively skewed
Negatively skewed
Symmetric
PlatykurticLeptokurticNormal
(mesokurtic)
• Measure of Skewness – based on the sum of cubes
• Measure of Kurtosis – based on the sum of 4th powers
n
ii xx
1
3
n
ii xx
1
4
The Measure of Skewness
3
11 3
22
1
n
ii
n
ii
n x x
g
x x
The Measure of Kurtosis
4
12
2
1
3
n
ii
n
ii
x xg
n x x
The 3 is subtracted so that g2 is zero for the normal distribution
Interpretations of Measures of Shape
• Skewness
• Kurtosis
00.020.040.060.080.1
0.120.140.16
0 5 10 15 20 25
00.020.040.060.080.1
0.120.140.16
0 5 10 15 20 25
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 5 10 15 20 25
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 5 10 15 20 25
0
-3 -2 -1 0 1 2 3
0
-3 -2 -1 0 1 2 3
g1 > 0 g1 = 0 g1 < 0
g2 < 0 g2 = 0 g2 > 0
Descriptive techniques for Multivariate data
In most research situations data is collected on more than one variable (usually many variables)
Graphical Techniques
• The scatter plot
• The two dimensional Histogram
The Scatter Plot
For two variables X and Y we will have a measurements for each variable on each case:
xi, yi
xi = the value of X for case i
and
yi = the value of Y for case i.
To Construct a scatter plot we plot the points:
(xi, yi)
for each case on the X-Y plane.
(xi, yi)
xi
yi
Data Set #3
The following table gives data on Verbal IQ, Math IQ,Initial Reading Acheivement Score, and Final Reading Acheivement Score
for 23 students who have recently completed a reading improvement program
Initial FinalVerbal Math Reading Reading
Student IQ IQ Acheivement Acheivement
1 86 94 1.1 1.72 104 103 1.5 1.73 86 92 1.5 1.94 105 100 2.0 2.05 118 115 1.9 3.56 96 102 1.4 2.47 90 87 1.5 1.88 95 100 1.4 2.09 105 96 1.7 1.7
10 84 80 1.6 1.711 94 87 1.6 1.712 119 116 1.7 3.113 82 91 1.2 1.814 80 93 1.0 1.715 109 124 1.8 2.516 111 119 1.4 3.017 89 94 1.6 1.818 99 117 1.6 2.619 94 93 1.4 1.420 99 110 1.4 2.021 95 97 1.5 1.322 102 104 1.7 3.123 102 93 1.6 1.9
Scatter Plot
0
20
40
60
80
100
120
140
0 20 40 60 80 100 120 140
Verbal IQ
Mat
h I
Q
Scatter Plot
0
20
40
60
80
100
120
140
0 20 40 60 80 100 120 140
Verbal IQ
Mat
h I
Q
(84,80)
Scatter Plot
60
70
80
90
100
110
120
130
60 70 80 90 100 110 120 130
Verbal IQ
Mat
h I
Q
Some Scatter Patterns
-100
-50
0
50
100
150
200
250
40 60 80 100 120 140
-100
-50
0
50
100
150
200
250
40 60 80 100 120 140
• Circular
• No relationship between X and Y
• Unable to predict Y from X
0
20
40
60
80
100
120
140
160
40 60 80 100 120 140
0
20
40
60
80
100
120
140
160
40 60 80 100 120 140
• Ellipsoidal
• Positive relationship between X and Y
• Increases in X correspond to increases in Y (but not always)
• Major axis of the ellipse has positive slope
0
20
40
60
80
100
120
140
160
40 60 80 100 120 140
Example
Verbal IQ, MathIQ
Scatter Plot
60
70
80
90
100
110
120
130
60 70 80 90 100 110 120 130
Verbal IQ
Mat
h I
Q
Some More Patterns
0
20
40
60
80
100
120
140
40 60 80 100 120 140
0
20
40
60
80
100
120
140
40 60 80 100 120 140
• Ellipsoidal (thinner ellipse)
• Stronger positive relationship between X and Y
• Increases in X correspond to increases in Y (more freqequently)
• Major axis of the ellipse has positive slope
• Minor axis of the ellipse much smaller
0
20
40
60
80
100
120
140
40 60 80 100 120 140
• Increased strength in the positive relationship between X and Y
• Increases in X correspond to increases in Y (almost always)
• Minor axis of the ellipse extremely small in relationship to the Major axis of the ellipse.
0
20
40
60
80
100
120
140
40 60 80 100 120 140
0
20
40
60
80
100
120
140
40 60 80 100 120 140
• Perfect positive relationship between X and Y
• Y perfectly predictable from X
• Data falls exactly along a straight line with positive slope
0
20
40
60
80
100
120
140
40 60 80 100 120 140
0
20
40
60
80
100
120
140
40 60 80 100 120 140
• Ellipsoidal
• Negative relationship between X and Y
• Increases in X correspond to decreases in Y (but not always)
• Major axis of the ellipse has negative slope slope
0
20
40
60
80
100
120
140
40 60 80 100 120 140
• The strength of the relationship can increase until changes in Y can be perfectly predicted from X
0
20
40
60
80
100
120
140
40 60 80 100 120 140
0
20
40
60
80
100
120
140
40 60 80 100 120 140
0
20
40
60
80
100
120
140
40 60 80 100 120 140
0
20
40
60
80
100
120
140
40 60 80 100 120 140
0
20
40
60
80
100
120
140
40 60 80 100 120 140
Some Non-Linear Patterns
0
200
400
600
800
1000
1200
-20 -10 0 10 20 30 40 50
0
200
400
600
800
1000
1200
-20 -10 0 10 20 30 40 50
• In a Linear pattern Y increase with respect to X at a constant rate
• In a Non-linear pattern the rate that Y increases with respect to X is variable
Growth Patterns
-20
0
20
40
60
80
100
120
0 10 20 30 40 50
-150
-100
-50
0
50
100
150
0 10 20 30 40 50
-20
0
20
40
60
80
100
120
0 10 20 30 40 50
• Growth patterns frequently follow a sigmoid curve
• Growth at the start is slow
• It then speeds up
• Slows down again as it reaches it limiting size
0
20
40
60
80
100
120
0 10 20 30 40 50
Measures of strength of a relationship (Correlation)
• Pearson’s correlation coefficient (r)
• Spearman’s rank correlation coefficient (rho, )
Assume that we have collected data on two variables X and Y. Let
(x1, y1) (x2, y2) (x3, y3) … (xn, yn)
denote the pairs of measurements on the on two variables X and Y for n cases in a sample (or population)
From this data we can compute summary statistics for each variable.
The means
and
n
xx
n
ii
1
n
yy
n
ii
1
The standard deviations
and
11
2
n
xxs
n
ii
x
11
2
n
yys
n
ii
y
These statistics:
• give information for each variable separately
but
• give no information about the relationship between the two variables
x yxs ys
Consider the statistics:
n
iixx xxS
1
2
n
iiyy yyS
1
2
n
iiixy yyxxS
1
The first two statistics:
• are used to measure variability in each variable
• they are used to compute the sample standard deviations
n
iixx xxS
1
2
n
iiyy yyS
1
2and
1
n
Ss xx
x 1
n
Ss yy
y
The third statistic:
• is used to measure correlation• If two variables are positively related the sign of
will agree with the sign of
n
iiixy yyxxS
1
xxi
yyi
•When is positive will be positive.
•When xi is above its mean, yi will be above its
mean
•When is negative will be negative.
•When xi is below its mean, yi will be below its
mean
The product will be positive for most cases.
xxi yyi
xxi yyi
yyxx ii
This implies that the statistic
• will be positive
• Most of the terms in this sum will be positive
n
iiixy yyxxS
1
On the other hand
• If two variables are negatively related the sign of
will be opposite in sign to
xxi
yyi
•When is positive will be negative.
•When xi is above its mean, yi will be below its
mean
•When is negative will be positive.
•When xi is below its mean, yi will be above its
mean
The product will be negative for most cases.
xxi yyi
xxi yyi
yyxx ii
Again implies that the statistic
• will be negative
• Most of the terms in this sum will be negative
n
iiixy yyxxS
1
Pearsons correlation coefficient is defined as below:
n
ii
n
ii
n
iii
yyxx
xy
yyxx
yyxx
SS
Sr
1
2
1
2
1
The denominator:
is always positive
n
ii
n
ii yyxx
1
2
1
2
The numerator:
• is positive if there is a positive relationship between X ad Y and
• negative if there is a negative relationship between X ad Y.
• This property carries over to Pearson’s correlation coefficient r
n
iii yyxx
1
Properties of Pearson’s correlation coefficient r
1. The value of r is always between –1 and +1.2. If the relationship between X and Y is positive, then
r will be positive.3. If the relationship between X and Y is negative,
then r will be negative.4. If there is no relationship between X and Y, then r
will be zero.
5. The value of r will be +1 if the points, (xi, yi) lie on a straight line with positive slope.
6. The value of r will be -1 if the points, (xi, yi) lie on a straight line with negative slope.
0
20
40
60
80
100
120
140
40 60 80 100 120 140
r =1
0
20
40
60
80
100
120
140
40 60 80 100 120 140
r = 0.95
0
20
40
60
80
100
120
140
40 60 80 100 120 140
r = 0.7
0
20
40
60
80
100
120
140
160
40 60 80 100 120 140
r = 0.4
-100
-50
0
50
100
150
200
250
40 60 80 100 120 140
r = 0
0
20
40
60
80
100
120
140
40 60 80 100 120 140
r = -0.4
0
20
40
60
80
100
120
140
40 60 80 100 120 140
r = -0.7
0
20
40
60
80
100
120
140
40 60 80 100 120 140
r = -0.8
0
20
40
60
80
100
120
140
40 60 80 100 120 140
r = -0.95
0
20
40
60
80
100
120
140
40 60 80 100 120 140
r = -1
Computing formulae for the statistics:
n
iixx xxS
1
2
n
iiyy yyS
1
2
n
iiixy yyxxS
1
n
x
xxxS
n
iin
ii
n
iixx
2
1
1
2
1
2
n
yx
yx
n
ii
n
iin
iii
11
1
n
y
yyyS
n
iin
ii
n
iiyy
2
1
1
2
1
2
n
iiixy yyxxS
1
To compute
first compute
Then
xxS yyS xyS
n
iixC
1
2
n
iii yxE
1
n
iiyD
1
2
n
iiyB
1
n
iixA
1
n
ACSxx
2
n
BDS yy
2
n
BAESxy
Example
Verbal IQ, MathIQ
Data Set #3
The following table gives data on Verbal IQ, Math IQ,Initial Reading Acheivement Score, and Final Reading Acheivement Score
for 23 students who have recently completed a reading improvement program
Initial FinalVerbal Math Reading Reading
Student IQ IQ Acheivement Acheivement
1 86 94 1.1 1.72 104 103 1.5 1.73 86 92 1.5 1.94 105 100 2.0 2.05 118 115 1.9 3.56 96 102 1.4 2.47 90 87 1.5 1.88 95 100 1.4 2.09 105 96 1.7 1.7
10 84 80 1.6 1.711 94 87 1.6 1.712 119 116 1.7 3.113 82 91 1.2 1.814 80 93 1.0 1.715 109 124 1.8 2.516 111 119 1.4 3.017 89 94 1.6 1.818 99 117 1.6 2.619 94 93 1.4 1.420 99 110 1.4 2.021 95 97 1.5 1.322 102 104 1.7 3.123 102 93 1.6 1.9
Scatter Plot
60
70
80
90
100
110
120
130
60 70 80 90 100 110 120 130
Verbal IQ
Mat
h I
Q
Now
Hence
2214941
2
n
iix 227199
1
n
iii yx234363
1
2
n
iiy
23071
n
iiy2244
1
n
iix
652.255723
2244221494
2
xxS
87.296023
2307234363
2
yyS
043.2116
23
23072244227199 xyS
Thus Pearsons correlation coefficient is:
yyxx
xy
SS
Sr
769.087.2960652.2557
043.2116