Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Stat 203 Additional (FOR INTEREST) material.
You are not responsible for knowing this.
We’ve been looking at the Pearson correlation r without
looking at how it’s calculated.
For correlating the response variable to multiple explanatory
variables, the easiest way is to use the sum of squares error
and total (SSE and SST)
For only one y variable and one x variable we have a more
directed way.
r is the Pearson correlation coefficient.
n is the sample size.
The parts in the brackets are “How many standard errors
above the x mean and above the y mean” respectively
This following notation isn’t exactly right, but it will serve our
purposes. zx and zy are the standardized scores of x and y
(the raw scores).
For a set of 5 dragons, we might have a dataset like this:
Length in cm (x) Weight in grams (y) 34.3 670 24.8 373 30.0 557 28.7 480 30.9 567
Which produces this scatterplot:
If y (weight) increases with x (length), then above-average x
values will occur for the same cases as above-average y values.
So zx > 0 usually when zy > 0 .
That means, for most values, (zx )(zy) > 0
In the correlation formula you’re adding mostly positive
numbers, and your correlation will end up positive.
If y decreases as x increases, below-average x occurs with
above-average y.
So zx < 0 usually when zy > 0 .
That means, for most values, (zx )(zy) < 0
In the correlation formula you’re adding mostly negative
numbers, and your correlation will end up negative.
First, standardize the scores.
Length in cm (x) Weight in grams (y) 34.3 z = 1.32 670 z = 1.27 24.8 z = -1.43 373 z = -1.41 30.0 z = 0.08 557 z = 0.25 28.7 z = -0.30 480 z = -0.45 30.9 z = 0.34 567 z= 0.34
Then multiply each one together
Length in cm (x) Weight in grams (y) (zx )(zy)
34.3 z = 1.32 670 z = 1.27 1.68
24.8 z = -1.43 373 z = -1.41 2.02
30.0 z = 0.08 557 z = 0.25 0.02
28.7 z = -0.30 480 z = -0.45 0.13
30.9 z = 0.34 567 z= 0.34 0.11
Then add the multiplied values
Length in cm (x) Weight in grams (y) (zx )(zy)
34.3 z = 1.32 670 z = 1.27 1.68
24.8 z = -1.43 373 z = -1.41 2.02
30.0 z = 0.08 557 z = 0.25 0.02
28.7 z = -0.30 480 z = -0.45 0.13
30.9 z = 0.34 567 z= 0.34 0.11
TOTAL
3.97
This pretty much does the whole formula for us.
r= 0.985, very strong positive.
Final note: The correlation formula doesn’t show up in your
textbook in this form, but in an equivalent but longer form.
For the equivalence and more information I recommend
http://en.wikipedia.org/wiki/Pearson_product-
moment_correlation_coefficient