Chapter 7 Scatterplots and Correlation Scatterplots: graphical
display of bivariate data Correlation: a numerical summary of
bivariate data
Slide 3
Objectives Chapter 7 Scatterplots Scatterplots Explanatory and
response variables Interpreting scatterplots Outliers Categorical
variables in scatterplots
Slide 4
Chapter 7 Basic Terminology Univariate data: 1 variable is
measured on each sample unit or population unit e.g. height of each
student in a sample Bivariate data: 2 variables are measured on
each sample unit or population unit e.g. height and GPA of each
student in a sample; (caution: data from 2 separate univariate
samples is not bivariate data)
Slide 5
Basic Terminology (cont.) Multivariate data: several variables
are measured on each unit in a sample or population. For each
student in a sample of NCSU students, measure height, GPA, and
distance between NCSU and hometown; Focus on bivariate data in
chapter 7
Slide 6
Same goals with bivariate data that we had with univariate data
Graphical displays and numerical summaries Seek overall patterns
and deviations from those patterns Descriptive measures of specific
aspects of the data
Slide 7
StudentBeersBlood Alcohol 150.1 220.03 390.19 470.095 530.07
630.02 740.07 850.085 980.12 1030.04 1150.06 1250.05 1360.1 1470.09
1510.01 1640.05 Here, we have two quantitative variables for each
of 16 students. 1) How many beers they drank, and 2) Their blood
alcohol level (BAC) We are interested in the relationship between
the two variables: How is one affected by changes in the other
one?
Slide 8
Scatterplots Useful method to graphically describe the
relationship between 2 quantitative variables
Slide 9
StudentBeersBAC 150.1 220.03 390.19 470.095 530.07 630.02
740.07 850.085 980.12 1030.04 1150.06 1250.05 1360.1 1470.09
1510.01 1640.05 Scatterplot: Blood Alcohol Content vs Number of
Beers In a scatterplot, one axis is used to represent each of the
variables, and the data are plotted as points on the graph.
Slide 10
Focus on Three Features of a Scatterplot Look for an overall
pattern regarding 1. Shape - ? Approximately linear, curved,
up-and-down? 2. Direction - ? Positive, negative, none? 3. Strength
- ? Are the points tightly clustered in the particular shape, or
are they spread out? and deviations from the overall pattern:
Outliers
Slide 11
Scatterplot: Fuel Consumption vs Car Weight. x=car weight,
y=fuel cons. (x i, y i ): (3.4, 5.5) (3.8, 5.9) (4.1, 6.5) (2.2,
3.3) (2.6, 3.6) (2.9, 4.6) (2, 2.9) (2.7, 3.6) (1.9, 3.1) (3.4,
4.9)
Slide 12
Explanatory (independent) variable: number of beers Response
(dependent) variable: blood alcohol content x y Explanatory and
response variables response variable the variable of interest.
explanatory variable explains changes in the response variable.
Typically, the explanatory (or independent variable) is plotted on
the x axis, and the response (or dependent variable) is plotted on
the y axis.
Slide 13
SAT Score vs Proportion of Seniors Taking SAT NC 74% 1010 IW
IL
Slide 14
Correlation: a numerical summary of bivariate data when both
variables are quantitative. Correlation The correlation coefficient
r r does not distinguish x and y r has no units r ranges from -1 to
+1 Influential points
Slide 15
The correlation coefficient is a measure of the direction and
strength of the linear relationship between 2 quantitative
variables. It is calculated using the mean and the standard
deviation of both the x and y variables. The correlation
coefficient "r" Correlation can only be used to describe
quantitative variables. Categorical variables dont have means and
standard deviations.
Slide 16
Correlation: Fuel Consumption vs Car Weight r =.9766
Slide 17
Example: calculating correlation (x 1, y 1 ), (x 2, y 2 ), (x
3, y 3 ) (1, 3) (1.5, 6) (2.5, 8)
Slide 18
Properties of Correlation r is a measure of the strength of the
linear relationship between x and y. No units [like demand
elasticity in economics (-infinity, 0)] -1 < r < 1
Slide 19
Properties (cont.) r ranges from -1 to+1 "r" quantifies the
strength and direction of a linear relationship between 2
quantitative variables. Strength: how closely the points follow a
straight line. Direction: is positive when individuals with higher
X values tend to have higher values of Y.
Slide 20
Properties of Correlation (cont.) r = -1 only if y = a + bx
with slope b0 y = 1 + 2x y = 11 - x
Slide 21
Properties (cont.) High correlation does not imply cause and
effect CARROTS: Hidden terror in the produce department at your
neighborhood grocery Everyone who ate carrots in 1920, if they are
still alive, has severely wrinkled skin!!! Everyone who ate carrots
in 1865 is now dead!!! 45 of 50 17 yr olds arrested in Raleigh for
juvenile delinquency had eaten carrots in the 2 weeks prior to
their arrest !!!
Slide 22
Properties (cont.) Cause and Effect There is a strong positive
correlation between the monetary damage caused by structural fires
and the number of firemen present at the fire. (More firemen-more
damage) Improper training? Will no firemen present result in the
least amount of damage?
Slide 23
Properties (cont.) Cause and Effect r measures the strength of
the linear relationship between x and y; it does not indicate cause
and effect correlation r =.935 x = fouls committed by player; y =
points scored by same player (1,2) (24,75) (1,0) (18,59) (9,9)
(3,7) (5,35) (20,46) (1,0) (3,2) (22,57) The correlation is due to
a third lurking variable playing time