23
Correlation By Dr.Muthupandi,

Correlation By Dr.Muthupandi,. Correlation Correlation is a statistical technique which can show whether and how strongly pairs of variables are related

Embed Size (px)

Citation preview

Correlation

By Dr.Muthupandi,

Correlation

Correlation is a statistical technique which can show whether and how strongly pairs of variables are related. For example, height and weight are related - taller people tend to be heavier than shorter people.

Correlation

Correlation is used to measure and describe a relationship between two variables.Usually these two variables are simply observed as they exist in the environment; there is no attempt to control or manipulate the variables.

Correlation

The correlation coefficient measures two characteristics of the relationship between X and Y: The direction of the relationship. The degree of the relationship.

Product Moment Correlation was developed by Karl Pearson. (Pearson’s r)

Direction of Relationship

A scatter plot shows at a glance the direction of the relationship. A positive correlation appears as a

cluster of data points that slopes from the lower left to the upper right.

Positive CorrelationIf the higher scores on X are generally paired with the higher scores on Y, and the lower scores on X are generally paired with the lower scores on Y, then the direction of the correlation between two variables is positive. As the value of one variable increases (Degreases) the value of the other variable increase (Degreases) is called passitive Correlation.

Positive Correlation

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

Age

Intelligence

Positive Correlation

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

Direction of Relationship

A scatter plot shows at a glance the direction of the relationship. A negative correlation appears as a

cluster of data points that slopes from the upper left to the lower right.

Negative CorrelationIf the higher scores on X are generally paired with the lower scores on Y, and the lower scores on X are generally paired with the higher scores on Y, then the direction of the correlation between two variables is negative. As the value of one variable degrease (increase) the value of the other variable increase (Degreases) is called negative Correlation.

Nagative Correlation

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

Age

Innocenc

e

Negative Correlation

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

No Correlation (Spurious Correlation)

In cases where there is no correlation between two variables (both high and low values of X are equally paired with both high and low values of Y), there is no direction in the pattern of the dots.They are scattered about the plot in an irregular pattern.

Perfect Correlation

When there is a perfect linear relationship, every change in the X variable is accompanied by a corresponding change in the Y variable.

Form of Relationship

Pearson’s r assumes an underlying linear relationship (a relationship that can be best represented by a straight line).Not all relationships are linear.

Strength of Relationship

How can we describe the strength of the relationship in a scatter plot? A number between -1 and +1 that indicates

the relationship between two variables. The sign (- or +) indicates the direction of

the relationship. The number indicates the strength of the

relationship.

-1 ------------ 0 ------------ +1Perfect Relationship No Relationship Perfect Relationship

The closer to –1 or +1, the stronger the relationship.

Correlation Coefficient

Pearson’s r

Definitional formula:

))()()((

))(()(2222

YYnXXn

YXXYnr

r COVXY(sx )(sy) n

YYXXCOVXY

))((

separately vary Y and X which todegree

ther vary togeY and X which todegreer

Computational formula:

An Example: Correlation

What is the relationship between level of education and lifetime earnings?

Education Level and Lifetime Earnings

0

1

2

3

4

5

0 2 4 6 8 10

Education (Predictor Variable)

Lif

etim

e E

arn

ing

s (C

rite

rio

n V

aria

ble

)X (Education) Y (Income)8 3.47 4.46 2.55 2.14 1.63 1.52 1.21 1

An Example: CorrelationX Education Y Income XY X2 Y2

8 3.4 27.2 64 11.567 4.4 30.8 49 19.366 2.5 15 36 6.255 2.1 10.5 25 4.414 1.6 6.4 16 2.563 1.5 4.5 9 2.252 1.2 2.4 4 1.441 1 1 1 136 17.7 97.8 204 48.83

8

83.48

204

8.97

7.17

36

2

2

n

Y

X

XY

Y

X

))()()((

))(()(2222

YYnXXn

YXXYnr

An Example: Correlation

8

83.48

204

8.97

7.17

36

2

2

n

Y

X

XY

Y

X

An Example: Correlation

))()()((

))(()(2222

YYnXXn

YXXYnr

X X2 Y Y2 XY184 33856 10 100 1840213 45369 6 36 1278234 54756 2 4 468197 38809 7 49 1379189 35721 13 169 2457221 48841 10 100 2210237 56169 4 16 948192 36864 9 81 17281667 350385 61 555 12308

77.0

)61()555(8)1667()350385(8

)61)(1667()12308(822

r

Interpreting Pearson’s r

Correlation does not equal causation. Can tell you the strength and direction of a relationship between two variables but not the nature of the relationship. The third variable problem. The directionality problem.