Stat 203 Additional (FOR INTEREST) material. You are not …jackd/Stat203_2011/Wk08_Extra.pdf · 2012. 7. 5. · We’ve been looking at the Pearson correlation r without looking

Stat 203 Additional (FOR INTEREST) material.

You are not responsible for knowing this.

We’ve been looking at the Pearson correlation r without

looking at how it’s calculated.

For correlating the response variable to multiple explanatory

variables, the easiest way is to use the sum of squares error

and total (SSE and SST)

For only one y variable and one x variable we have a more

directed way.

r is the Pearson correlation coefficient.

n is the sample size.

The parts in the brackets are “How many standard errors

above the x mean and above the y mean” respectively

This following notation isn’t exactly right, but it will serve our

purposes. zx and zy are the standardized scores of x and y

(the raw scores).

For a set of 5 dragons, we might have a dataset like this:

Length in cm (x) Weight in grams (y) 34.3 670 24.8 373 30.0 557 28.7 480 30.9 567

Which produces this scatterplot:

If y (weight) increases with x (length), then above-average x

values will occur for the same cases as above-average y values.

So zx > 0 usually when zy > 0 .

That means, for most values, (zx )(zy) > 0

In the correlation formula you’re adding mostly positive

numbers, and your correlation will end up positive.

If y decreases as x increases, below-average x occurs with

above-average y.

So zx < 0 usually when zy > 0 .

That means, for most values, (zx )(zy) < 0

In the correlation formula you’re adding mostly negative

numbers, and your correlation will end up negative.

First, standardize the scores.

Length in cm (x) Weight in grams (y) 34.3 z = 1.32 670 z = 1.27 24.8 z = -1.43 373 z = -1.41 30.0 z = 0.08 557 z = 0.25 28.7 z = -0.30 480 z = -0.45 30.9 z = 0.34 567 z= 0.34

Then multiply each one together

Length in cm (x) Weight in grams (y) (zx )(zy)

34.3 z = 1.32 670 z = 1.27 1.68

24.8 z = -1.43 373 z = -1.41 2.02

30.0 z = 0.08 557 z = 0.25 0.02

28.7 z = -0.30 480 z = -0.45 0.13

30.9 z = 0.34 567 z= 0.34 0.11

Then add the multiplied values

Length in cm (x) Weight in grams (y) (zx )(zy)

34.3 z = 1.32 670 z = 1.27 1.68

24.8 z = -1.43 373 z = -1.41 2.02

30.0 z = 0.08 557 z = 0.25 0.02

28.7 z = -0.30 480 z = -0.45 0.13

30.9 z = 0.34 567 z= 0.34 0.11

TOTAL

3.97

This pretty much does the whole formula for us.

r= 0.985, very strong positive.

Final note: The correlation formula doesn’t show up in your

textbook in this form, but in an equivalent but longer form.

For the equivalence and more information I recommend

http://en.wikipedia.org/wiki/Pearson_product-

moment_correlation_coefficient

http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

Documents

Stat 203 Additional (FOR INTEREST) material. You are not …jackd/Stat203_2011/Wk08_Extra.pdf · 2012. 7. 5. · We’ve been looking at the Pearson correlation r without looking