22
Correlation

Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

Embed Size (px)

Citation preview

Page 1: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

Correlation

Page 2: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

Definition

The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values in a sample.

We can often see a relationship between two variables by constructing a scatterplot.

Page 3: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

Scatterplots of Paired Data

Page 4: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

Scatterplots of Paired Data

Page 5: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

Requirements

1. The sample of paired (x, y) data is a random sample.

2. Visual examination of the scatter plot must confirm that the points approximate a certain pattern.

3. The outliers must be removed if they are known to be errors.

Page 6: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

Notation for the Linear Correlation Coefficient

n represents the number of pairs of data present.

denotes the addition of the items indicated.

x denotes the sum of all x-values.

x2 indicates that each x-value should be squared and then those squares added.

(x)2 indicates that the x-values should be added and the total then squared.

xy indicates that each x-value should be first multiplied by its corresponding y-value. After obtaining all such products, find their sum.

r represents linear correlation coefficient for a sample.

represents linear correlation coefficient for a population.

Page 7: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

nxy – (x)(y)

n(x2) – (x)2 n(y2) – (y)2r =

The linear correlation coefficient r measures the strength of a linear relationship between the paired values in a sample.

Formula

Page 8: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

3

5

1

8

3

6

5

4

Datax

y

Example: Calculating r

Using the simple random sample of data below, find the value of r.

Page 9: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

Example: Calculating r - cont

Page 10: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

nxy – (x)(y)

n(x2) – (x)2 n(y2) – (y)2r =

61 – (12)(23)

4(44) – (12)2 4(141) – (23)2r =

-32

33.466 r = = -0.956

3

5

1

8

3

6

5

4

Datax

y

Example: Calculating r - cont

Page 11: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

Properties of the Linear Correlation Coefficient r

1. –1 r 1

2. The value of r does not change if all values of either variable are converted to a different scale.

3. The value of r is not affected by the choice of x and y. Interchange all x- and y-values and the value of r will not change.

4. r measures strength of a linear relationship.

Page 12: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

Interpreting r : Explained Variation

The value of r2 is the proportion of the variation in y that is explained by the linear relationship between x and y.

For Example if r = 0.926, we get r2 = 0.857.

We conclude that 0.857 (or about 86%) of the variation in Y can be explained by the linear relationship between X and Y. This implies that 14% of the variation in Y cannot be explained by X

Page 13: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

Formal Hypothesis Test

We wish to determine whether there is a significant linear correlation between two variables.

H0: = (no significant linear correlation)

H1: (significant linear correlation)

Page 14: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

Test statistic:

Critical values:

Use Tables with degrees of freedom = n – 2

1 – r 2

n – 2

rt =

Test Statistic is t

Page 15: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

P-value: Use Tables with degrees of freedom = n – 2

Conclusion: If the absolute value of t is > critical value reject H0 and conclude that there is a linear correlation. If the absolute value of t ≤ critical value, fail to reject H0; there is not sufficient evidence to conclude that there is a linear correlation.

Page 16: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

Test Statistic is t(follows format of earlier chapters)

Page 17: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

SlideSlide 17

CovarianceMeasure of linear relationship between variables

If the relationship between the random variables is nonlinear, the covariance might not be sensitive to the relationship

Page 18: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

SlideSlide 18

Pearson’s Correlation Coeff.Pearson's correlation coefficient between two variables is defined as the covariance of the two variables divided by the product of their standard deviations:

The above formula defines the population correlation coefficient, commonly represented by the Greek letter ρ (rho). Substituting estimates of the covariances and variances based on a sample gives the sample correlation coefficient, commonly denoted r :

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         

Page 19: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

SlideSlide 19

Pearson correlation coefficient

Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

The Spearman correlation coefficient is often thought of as being the Pearson correlation coefficient between the ranked variables. In practice, however, a simpler procedure is normally used to calculate ρ. The n raw scores Xi, Yi are converted to ranks xi, yi, and the differences di = xi − yi between the ranks of each observation on the two variables are calculated

Page 20: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

SlideSlide 20Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

A Spearman correlation of 1 results when the two variables being compared are monotonically related, even if their relationship is not linear. In contrast, this does not give a perfect Pearson correlation

Page 21: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

SlideSlide 21Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

When the data are roughly elliptically distributed and there are no prominent outliers, the Spearman correlation and Pearson correlation give similar values

Page 22: Correlation. Definition The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values

SlideSlide 22Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

The Spearman correlation is less sensitive than the Pearson correlation to strong outliers that are in the tails of both samples