Upload
nszakir
View
114
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
INTRODUCTION TO STATISTICS & PROBABILITY Chapter 2: Looking at Data–Relationships (Part 2)
Dr. Nahid Sultana
1
2
Chapter 2: Looking at Data–Relationships
2.1: Scatterplots
2.2: Correlation
2.3: Least-Squares Regression
2.5: Data Analysis for Two-Way Tables
Objectives
The correlation coefficient “r”
r does not distinguish between x and y
r has no units of measurement
r ranges from -1 to +1
Influential points
2.2: Correlation 3
The correlation coefficient "r"
The correlation coefficient is a measure of the direction and strength of a linear relationship.
It is calculated using the mean and the standard deviation of both the x and y variables.
Correlation can only be used to describe quantitative variables. Categorical variables don’t have means and standard deviations.
4
The correlation coefficient “r“ (Cont…)
Time to swim: = 35, sx = 0.7 Pulse rate: = 140, sy = 9.5
x
y 5
r =1
n −1xi − x
sx
i=1
n
∑ yi − y sy
Suppose that we have data on variables x and y for n individuals.
The means and standard deviations of the two variables are and for the x-values, and and for y-values.
The correlation r between x and y
xy
“r” does not distinguish x & y
The correlation coefficient, r, treats x and y symmetrically.
"Time to swim" is the explanatory variable here, and belongs on the x axis. However, in either plot r is the same (r=-0.75).
r = -0.75 r = -0.75
r =1
n −1xi − x
sx
i=1
n
∑ yi − y sy
6
Changing the units of variables does not change the correlation coefficient "r“.
"r" has no unit r = -0.75
r = -0.75
7
standardized value of x (unit less)
standardized value of y (unit less)
"r" ranges from -1 to +1
Properties of Correlation
r is always a no. between –1 and 1. r > 0 indicates a positive association.
r < 0 indicates a negative association. Values of r near 0 indicate a very
weak linear relationship. The strength of the linear relationship
increases as r moves away from 0 toward –1 or 1. The extreme values r = –1 and r = 1
occur only in the case of a perfect linear relationship.
8
9
“r” increases as variation decreases
When variability in one or both variables decreases, the correlation coefficient gets stronger ( closer to +1 or -1).
Correlation only describes linear relationships
10
No matter how strong the association, r does not describe curved relationships.
11
Influential points
Correlations are calculated using
means and standard deviations,
and thus are NOT resistant to
outliers.
Just moving one point away from the general trend here decreases
the correlation from -0.91 to -0.75
12
12
Influential points (Cont…)