38
Chapter 4 Describing the Relation Between Two Variables 4.1 Scatter Diagrams; Correlation

Math n Statistic

Embed Size (px)

DESCRIPTION

Describing the relations between two variables

Citation preview

Page 1: Math n Statistic

Chapter 4Describing the Relation Between Two Variables

4.1

Scatter Diagrams; Correlation

Page 2: Math n Statistic

Bivariate data is data in which two variables are measured on an individual.

The response variable is the variable whose value can be explained or determined based upon the value of the predictor variable.

A lurking variable is one that is related to the response and/or predictor variable, but is excluded from the analysis

Page 3: Math n Statistic

A scatter diagram shows the relationship between two quantitative variables measured on the same individual. Each individual in the data set is represented by a point in the scatter diagram. The predictor variable is plotted on the horizontal axis and the response variable is plotted on the vertical axis. Do not connect the points when drawing a scatter diagram.

Page 4: Math n Statistic

EXAMPLE Drawing a Scatter Diagram

The following data are based on a study for drilling rock. The researchers wanted to determine whether the time it takes to dry drill a distance of 5 feet in rock increases with the depth at which the drilling begins. So, depth at which drilling begins is the predictor variable, x, and time (in minutes) to drill five feet is the response variable, y. Draw a scatter diagram of the data.Source: Penner, R., and Watts, D.G. “Mining Information.” The American Statistician, Vol. 45, No. 1, Feb. 1991, p. 6.

Page 5: Math n Statistic
Page 6: Math n Statistic
Page 7: Math n Statistic
Page 8: Math n Statistic
Page 9: Math n Statistic
Page 10: Math n Statistic

Two variables that are linearly related are said to be positively associated when above average values of one variable are associated with above average values of the corresponding variable. That is, two variables are positively associated when the values of the predictor variable increase, the values of the response variable also increase.

Page 11: Math n Statistic

Two variables that are linearly related are said to be negatively associated when above average values of one variable are associated with below average values of the corresponding variable. That is, two variables are negatively associated when the values of the predictor variable increase, the values of the response variable decrease

Page 12: Math n Statistic

The linear correlation coefficient or Pearson product moment correlation coefficient is a measure of the strength of linear relation between two quantitative variables. We use the Greek letter (rho) to represent the population correlation coefficient and r to represent the sample correlation coefficient. We shall only present the formula for the sample correlation coefficient.

Page 13: Math n Statistic

1. The linear correlation coefficient is always between -1 and 1, inclusive. That is, -1 < r < 1.

2. If r = +1, there is a perfect positive linear relation between the two variables.

3. If r = -1, there is a perfect negative linear relation between the two variables.

4. The closer r is to +1, the stronger the evidence of positive association between the two variables.

5. The closer r is to -1, the stronger the evidence of negative association between the two variables.

Properties of the Linear Correlation CoefficientProperties of the Linear Correlation Coefficient

Page 14: Math n Statistic

6. If r is close to 0, there is evidence of no linear relation between the two variables. Because the linear correlation coefficient is a measure of strength of linear relation, r close to 0 does not imply no relation, just no linear relation.

7. It is a unitless measure of association. So, the unit of measure for x and y plays no role in the interpretation of r.

Properties of the Linear Correlation CoefficientProperties of the Linear Correlation Coefficient

Page 15: Math n Statistic
Page 16: Math n Statistic
Page 17: Math n Statistic
Page 18: Math n Statistic
Page 19: Math n Statistic
Page 20: Math n Statistic
Page 21: Math n Statistic
Page 22: Math n Statistic
Page 23: Math n Statistic
Page 24: Math n Statistic

EXAMPLE Drawing a Scatter Diagram and Computing the Correlation

Coefficient

For the following data

(a)Draw a scatter diagram and comment on the type of relation that appears to exist between x and y.

(b) By hand, compute the linear correlation coefficient.

Page 25: Math n Statistic

EXAMPLE Determining the Linear Correlation Coefficient

Determine the linear correlation coefficient of the drilling data.

Page 26: Math n Statistic
Page 27: Math n Statistic

i

x

x x

s

i

y

y y

s

i i

x y

x x y y

s s

x y

Page 28: Math n Statistic

A linear correlation coefficient that implies a strong positive or negative association that is computed using observational data does not imply causation among the variables.

Page 29: Math n Statistic

Chapter 4Describing the Relation Between Two Variables

4.2

Least-squares Regression

Page 30: Math n Statistic

EXAMPLE Finding an Equation that Describes a Linear Relation

(a) Find a linear equation that relates x (the predictor variable) and y (the response variable) by selecting two points and finding the equation of the line containing the points.

(b) Graph the equation on the scatter diagram.

(c) Use the equation to predict y if x = 5.

Using the following sample data:

Page 31: Math n Statistic

The difference between the observed value of y and the predicted value of y is the error or residual. That is

residual = observed - predicted

Compute the residual for the prediction corresponding to x = 5.

Page 32: Math n Statistic
Page 33: Math n Statistic
Page 34: Math n Statistic

EXAMPLE Finding the Least-squares Regression Line

Using the sample data:

(a) Find the least-squares regression line.

(b) Interpret the slope and intercept.

(c) Predict y if x = 5.

(d) Compute the residual for x = 5.

(e) Draw the least-squares regression line on the scatter diagram of the data.

Page 35: Math n Statistic

EXAMPLE Computing the Sum of Squared Residuals

Compute the sum of squared residuals for the line describing the relation between x and y that was obtained using two points. Compute the sum of squared residuals for the least-squares regression line. Which is smaller?

Page 36: Math n Statistic

EXAMPLE Finding the Least-squares Regression Line

(a) Find the least-squares regression line for the drilling data.

(b) Use the line to predict the drilling time at x = 130 feet.

(c) Should the line be used to predict the drilling time at x = 400 feet? Why?

(d) Interpret the slope and y-intercept.

Page 37: Math n Statistic
Page 38: Math n Statistic