29
Anthony Greene 1 Correlation The Association Between Variables

Anthony Greene1 Correlation The Association Between Variables

Embed Size (px)

Citation preview

Page 1: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 1

Correlation

The Association Between Variables

Page 2: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 2

When to Use

t-test or ANOVA

When the independent variable is Categorical

0

5

10

15

20

25

30

Control Treat A0

102030405060708090

100

Control Treat A Treat B Treat C

MaleFemale

Page 3: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 3

When to UseCorrelation

& Regression

When the independent variable is Ratio or Interval

Tour De France Gatorade Consumption

0

2

4

6

8

10

12

14

16

60 70 80 90 100 110 120Average Daytime Temperature

Lit

ers

of G

ator

ade

Page 4: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 4

ScatterPlots

Page 5: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 5

ScatterPlots

Page 6: Anthony Greene1 Correlation The Association Between Variables

Times and costs for five word-processing jobs

0

50

100

150

200

250

300

350

400

450

500

0 5 10 15 20 25

Page 7: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 7

Four data points

0

1

2

3

4

5

6

7

0 1 2 3 4 5

Page 8: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 8

Direct Relationship: Positive Slope

Page 9: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 9

Age and price data for a sample of 11 used cars

Page 10: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 10

Scatter diagram for the age and price data of used cars

Page 11: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 11

Inverse Relationship: Negative Slope

Page 12: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 12

Various degrees of linear correlation (Slide 1 of 3)

Page 13: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 13

Various degrees of linear correlation (Slide 3 of 3)

Page 14: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 14

Various degrees of linear correlation (Slide 2 of 3)

Page 15: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 15

Examples of positive and negative relationships

Page 16: Anthony Greene1 Correlation The Association Between Variables

16

The Simple Idea

• If the corresponding x and y z-scores are always in agreement, r will be high.

• If they are sometimes in agreement r will be moderate

• If they are generally different, r will be near zero• If they are in agreement, but in opposite

directions, r will be negative

n

zzr yx

Page 17: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 17

The Basic Idea n

zzr yx

Zx Zy Zx Zy Zx Zy

+1.2 +1.6 +1.2 -2.3 +1.2 -1.6

-1.1 -0.7 -1.1 -0.2 -1.1 +0.7

+0.8 +1.1 +0.8 +1.6 +0.8 -1.1

+3.2 +2.8 +3.2 -0.7 +3.2 -2.8

-2.7 -2.3 -2.7 +1.1 -2.7 +2.3

+0.1 -0.2 +0.1 +2.8 +0.1 +0.2

Page 18: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 18

We define SSx, SSp and SSy by

Notation Used in Regression and Correlation

nyySS

nyxxyS

nxxSS

MySS

MyMxS

MxSS

y

p

x

yy

yxp

xx

22

22

2

2

formula nalcomputatio Or the

Page 19: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 19

Obtaining the three sums of squares for the used car data using the computational formulas

Page 20: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 20

The linear correlation coefficient, r, of n data points is defined by

or by the computational formula

Linear Correlation Coefficient

yx

p

SSSS

Sr

n

zzr yx

Page 21: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 21

Linear Correlation Coefficient

n

zzr yx

22yx

yx

MyMx

MyMxr

n

My

n

Mxn

MyMxr

yx

yx

22

yx

p

SSSS

Sr

Page 22: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 22

Coefficient of Determination

The coefficient of determination, r2, is the proportion of variation in the observed values of the response variable that is explained by the regression:

The coefficient of the determination always lies between 0 and 1 and is a descriptive measure of of the utility of the regression equation for making predictions. Values of r2 near 0 indicate that the regression equation is not useful for making predictions, whereas values near 1 indicate that the regression equation is extremely useful for making predictions.

varianceexplained of proportion variancetotal

varianceexplained2 r

Page 23: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 23

t-Distribution for a Correlation Test

t r

1 r2

n 2

For samples of size n, the variable

has the t-distribution with df = n – 2 if the null hypothesis ρ = 0 ρ or rho is pronounced “row”

Page 24: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 24

The t-test for correlation (Slide 1 of 3)

With df = n-2 use table B.6

Page 25: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 25

The t-test for correlation (Slide 2 of 3)

Page 26: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 26

The t-test for correlation (Slide 3 of 3) Step 4 Compute the test statistic r. Table B.6 allows a

direct lookup. Alternatively, r has a t distribution and Table B.2 will yield an identical conclusion

Step 5 If the value of the test statistic falls in the rejection region, reject the null hypothesis.

Step 6 State the conclusion in words

21 2

nr

rt

Page 27: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 27

Criterion for deciding whether or not to reject the null hypothesis

Page 28: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 28

Correlation Matrix

a b c d

a 1.00

b 0.84 1.00

c 0.68 0.58 1.00

d 0.12 0.19 0.08 1.00

Page 29: Anthony Greene1 Correlation The Association Between Variables

Anthony Greene 29

Computer printouts for correlations