12
INTRODUCTION TO STATISTICS & PROBABILITY Chapter 2: Looking at Data–Relationships (Part 2) Dr. Nahid Sultana 1

Chapter 2 part2-Correlation

  • Upload
    nszakir

  • View
    114

  • Download
    1

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Chapter 2 part2-Correlation

INTRODUCTION TO STATISTICS & PROBABILITY Chapter 2: Looking at Data–Relationships (Part 2)

Dr. Nahid Sultana

1

Page 2: Chapter 2 part2-Correlation

2

Chapter 2: Looking at Data–Relationships

2.1: Scatterplots

2.2: Correlation

2.3: Least-Squares Regression

2.5: Data Analysis for Two-Way Tables

Page 3: Chapter 2 part2-Correlation

Objectives

The correlation coefficient “r”

r does not distinguish between x and y

r has no units of measurement

r ranges from -1 to +1

Influential points

2.2: Correlation 3

Page 4: Chapter 2 part2-Correlation

The correlation coefficient "r"

The correlation coefficient is a measure of the direction and strength of a linear relationship.

It is calculated using the mean and the standard deviation of both the x and y variables.

Correlation can only be used to describe quantitative variables. Categorical variables don’t have means and standard deviations.

4

Page 5: Chapter 2 part2-Correlation

The correlation coefficient “r“ (Cont…)

Time to swim: = 35, sx = 0.7 Pulse rate: = 140, sy = 9.5

x

y 5

r =1

n −1xi − x

sx

i=1

n

∑ yi − y sy

Suppose that we have data on variables x and y for n individuals.

The means and standard deviations of the two variables are and for the x-values, and and for y-values.

The correlation r between x and y

xy

Page 6: Chapter 2 part2-Correlation

“r” does not distinguish x & y

The correlation coefficient, r, treats x and y symmetrically.

"Time to swim" is the explanatory variable here, and belongs on the x axis. However, in either plot r is the same (r=-0.75).

r = -0.75 r = -0.75

r =1

n −1xi − x

sx

i=1

n

∑ yi − y sy

6

Page 7: Chapter 2 part2-Correlation

Changing the units of variables does not change the correlation coefficient "r“.

"r" has no unit r = -0.75

r = -0.75

7

standardized value of x (unit less)

standardized value of y (unit less)

Page 8: Chapter 2 part2-Correlation

"r" ranges from -1 to +1

Properties of Correlation

r is always a no. between –1 and 1. r > 0 indicates a positive association.

r < 0 indicates a negative association. Values of r near 0 indicate a very

weak linear relationship. The strength of the linear relationship

increases as r moves away from 0 toward –1 or 1. The extreme values r = –1 and r = 1

occur only in the case of a perfect linear relationship.

8

Page 9: Chapter 2 part2-Correlation

9

“r” increases as variation decreases

When variability in one or both variables decreases, the correlation coefficient gets stronger ( closer to +1 or -1).

Page 10: Chapter 2 part2-Correlation

Correlation only describes linear relationships

10

No matter how strong the association, r does not describe curved relationships.

Page 11: Chapter 2 part2-Correlation

11

Influential points

Correlations are calculated using

means and standard deviations,

and thus are NOT resistant to

outliers.

Just moving one point away from the general trend here decreases

the correlation from -0.91 to -0.75

Page 12: Chapter 2 part2-Correlation

12

12

Influential points (Cont…)