73
1 Chapter 10 Correlation and Regression

1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

Embed Size (px)

Citation preview

Page 1: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

1

Chapter 10

Correlation and Regression

Page 2: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

2

Consider two variables of a population denoted x and y (e.g. weight and height)

Goal:

Determine if there is a relation between x and y (correlation).

If there is a relation, find a method of predicting values (regression).

Objective

Page 3: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

3

Examples

1. x : Height of the mothery : Height of the daughter

2. x : Number of cigarettes per dayy : Lifespan

3. x : Daily calorie intakey : Weight

3. x : Shoe sizey : Number of friends on Facebook

Page 4: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

4

ExampleThis table includes a random sample of heights of mothers, fathers, and their daughters.

QuestionAre the heights of daughters independent of the height of their mothers?

Or is there a correlation between the heights of mothers and those of daughters?

If yes, how strong is it?

Page 5: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

5

ExampleThis table includes a random sample of heights of mothers, fathers, and their daughters.

The heights of mothers and their daughters in this sample seem to be strongly correlated…

But heights of fathers and their daughters in this sample seem to be weakly correlated (if at all).

(we will soon see how we came to this conclusion)

Page 6: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

6

Page 7: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

7

Objective

Investigate how two variables (x and y) are related (i.e. correlated). That is, how much they depend on each other.

Section 10.2Correlation between two

variables (x and y)

Page 8: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

8

Definitions

A correlation exists between two variables when the values of one appears to somehow affect the values of the other in some way.

In this class, we are only interested in linear correlation

Page 9: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

9

Linear correlation coefficient : rA numerical measure of the strength of the linear relationship between two variables, x and y, representing quantitative data.

r always belongs in the interval (-1,1)

( i.e. –1 r 1 )

We use this value to conclude if there is (or is not) a linear correlation between the two variables.

Definitions

Page 10: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

10

Exploring the Data

We can often see a relationship between two variables by constructing a scatterplot.

Page 11: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

11

Positive Correlation

We say the data has positive correlation if the data follows a line (with a positive slope).

The correlation coefficient (r) will be close to +1

Page 12: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

12

Negative Correlation

We say the data has negative correlation if the data follows a line (with a negative slope).

The correlation coefficient (r) will be close to –1

Page 13: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

13

We say the data has no correlation if the data does not seem to follow any line.

The correlation coefficient (r) will be close to 0

No Correlation

Page 14: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

14

r ≈ 1 Strong positive linear correlation

r ≈ 0 Weak linear correlation

r ≈ -1 Strong negative linear correlation

Interpreting r

Page 15: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

15

Nonlinear Correlation

The data may follow a curve, but if the data is not linear, the linear correlation coefficient (r) will be close to zero.

Page 16: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

16

1. The sample of paired (x, y) data is a random sample of quantitative data.

2. Visual examination of the scatterplot must confirm that the points approximate a straight-line pattern.

3. The outliers must be removed if they are known to be errors. (Note: We will not do this in this course)

Requirements

Page 17: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

17

n Number of pairs of sample data

Denotes the addition of the items

x The sum of all x-valuesx = x1 + x2 +…+ xn

y The sum of all y-valuesy = y1 + y2 +…+ yn

Notation

Page 18: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

18

x2 The sum of the squares for all x-valuesx2 = x1

2 + x22 +…+ xn

2

(x)2 The sum of the x-values, then squared(x)2 = (x1 + x2 +…+ xn)2

xy The sum of the products of x and yxy = x1y1 + x1y2 +…+ xnyn

Notation

Page 19: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

19

r Sample linear correlation coefficient

Population linear correlation coefficient(i.e. the linear correlation between the two populations)

Correlation Coefficient

n(xy) – (x)(y)

n(x2) – (x)2 n(y2) – (y)2r =

r measures the strength of a linear relationship between the paired values in a sample.

We use StatCrunch compute r (Don’t panic!)

Page 20: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

20

Make a scatterplot for the heights of mother , daughter

Enter data on StatCrunch (Mother in 1st column, daughter in 2nd column)1

Example 1a

Page 21: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

21Graphics – Scatter Plot

Make a scatterplot for the heights of mother , daughter

2

Example 1a

Page 22: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

22

Select var1 for X variable (height of mother)Select var2 for Y variable (height of daughter)Click Create Graph!

Make a scatterplot for the heights of mother , daughter

3

Example 1a

Page 23: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

23Voila! (Does there appear to be correlation?)

Make a scatterplot for the heights of mother , daughter

4

Example 1a

Page 24: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

24

Find the linear correlation coefficient of the heights

Example 1b

Stat – Summary Stats – Correlation1

Page 25: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

25

Select var1 and var2 so they appear in the right boxClick Calculate2

Find the linear correlation coefficient of the heights

Example 1b

Page 26: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

26

Find the linear correlation coefficient of the heights

Example 1b

3 The Correlation Coefficient is r = 0.802 (rounded)

Page 27: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

27

Determining if Correlation Exists

We determine whether a population is correlated via a two-tailed test on a sample using a significance level (α)

H0 : ρ = 0 (i.e. not correlated)

H1 : ρ ≠ 0 (i.e. is correlated)

Again, two methods available:

Critical Regions (Use Table A-6)

P-value (Use StatCrunch) Note: In most cases we use significance level = 0.05

Page 28: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

28

Use Table A-6 to find the critical values, which depends on the sample size n. Use both positive a negative values (two-tailed)

● If the r is in the critical region, we conclude that there is a linear correlation.

(Since H0 is rejected)

● If the r is not in the critical region, we say there is insufficient evidence of correlation.

(Since we fail to reject H0)

Using Critical Regions

-1 10

Page 29: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

29

Page 30: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

30

Use a 0.05 significance level to determine if the heights are linearly correlated.

Example 1c

Using Critical Regions

● From Example 1b, we found r = 0.802

● Since n = 20 and α = 0.05, using Table A-6, we find the critical values to be: 0.444, -0.444

Since r is in the critical region (reject H0), we conclude the data is linearly correlated (under 0.05 significance).

Page 31: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

31

Use StatCrunch to calculate the two-tailed P-value from a sample set (see Example 1c)

● If the P-value is less than α, we conclude that there is a linear correlation.

(Since H0 is rejected)

● If the P-value is greater than α, we say there is insufficient evidence of correlation.

(Since we fail to reject H0)

Using P-value

Page 32: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

32

Use a 0.05 significance level to determine if the heights are linearly correlated.

Example 1c

Using P-value

● On StatCrunch:

Stat – Summary Stats – Correlation

● Select var1, var2 so they appear in right box Click Next

● Check “Display two-sides P-value from sig. test” Click Calculate

● Result: P-value < 0.0001

Since P-value is less than α=0.05 (reject H0), we conclude the data is linearly correlated

Page 33: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

33

Caution!Know that the methods of this section apply only to a linear correlation.

If you conclude that there is no linear correlation, it is possible that there is some other association that is not linear.

Page 34: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

34

Round r to three decimal places so that it can be compared to critical values in Table A-6

Rounding the Linear Correlation Coefficient

Page 35: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

35

Properties of the Linear Correlation Coefficient r

1. –1 r 1

2. If all values of either variable are converted to a different scale, the value of r does not change.

3. The value of r is not affected by the choice of x and y. Interchange all x-values and y-values and the value of r will not change.

4. r measures strength of a linear relationship.

5. r is very sensitive to outliers, they can dramatically affect its value.

Page 36: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

36

A new medication for high blood pressure was tested on a batch of 18 patients with different ages. The results were as follows:

(a) Plot the points

(b) Find the correlation coefficient

(c) Use a 0.05 significance level to test linear correlation

Example 2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Age 56 34 76 12 56 33 67 69 22 11 65 43 23 66 19 84 27 39

Blood Pressure 194 133 250 71 201 133 227 230 124 68 219 157 123 222 182 298 113 146

Page 37: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

37

● Enter data in StatCrunch

● Go to: Graphics – Scatter Plot

● Select var1 and var2, hit Create Graph!

Example 2a Plot the points1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Age 56 34 76 12 56 33 67 69 22 11 65 43 23 66 19 84 27 39

Blood Pressure 194 133 250 71 201 133 227 230 124 68 219 157 123 222 182 298 113 146

Page 38: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

38

● Go to: Stats – Summary Stats – Correlation

● Select var1 and var2, hit Calculate

Example 2b

r = 0.964

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Age 56 34 76 12 56 33 67 69 22 11 65 43 23 66 19 84 27 39

Blood Pressure 194 133 250 71 201 133 227 230 124 68 219 157 123 222 182 298 113 146

Find Correlation Coefficient

Page 39: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

39

Using Critical Values

Given n=18 and α=0.05, using Table A-6, the critical values are 0.468, -0.468

r = 0.964

Example 2c Test for Correlation (α = 0.05)

Since r is in the critical region, (reject H0), we conclude there is linear correlation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Age 56 34 76 12 56 33 67 69 22 11 65 43 23 66 19 84 27 39

Blood Pressure 194 133 250 71 201 133 227 230 124 68 219 157 123 222 182 298 113 146

Page 40: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

40

Using P-value

● Go to: Stats – Summary Stats – Correlation

● Select var1 and var2, hit Next

● Check box, hit Calculate

Example 2c Test for Correlation (α = 0.05)

P-value < 0.0001

Since P-value less than α=0.05 (reject H0), we conclude there is linear correlation

r = 0.964

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Age 56 34 76 12 56 33 67 69 22 11 65 43 23 66 19 84 27 39

Blood Pressure 194 133 250 71 201 133 227 230 124 68 219 157 123 222 182 298 113 146

Page 41: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

41

A survey of 15 people was conducted to see how many friends people had on Facebook vs. their shoe size. The results were as follows:

(a) Plot the points

(b) Find the correlation coefficient

(c) Use a 0.05 significance level to test linear correlation

Example 3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Friends on FB 170 170 680 510 680 425 425 680 85 680 850 850 680 17 510Shoe size 8.4 9.0 9.1 7.8 8.8 8.7 8.8 9.1 8.5 9.6 9.1 8.2 9.3 8.0 9.4

Page 42: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

42

● Enter data in StatCrunch

● Go to: Graphics – Scatter Plot

● Select var1 and var2, hit Create Graph!

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Friends on FB 170 170 680 510 680 425 425 680 85 680 850 850 680 17 510Shoe size 8.4 9.0 9.1 7.8 8.8 8.7 8.8 9.1 8.5 9.6 9.1 8.2 9.3 8.0 9.4

Example 3a Plot the points

Page 43: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

43

● Go to: Stats – Summary Stats – Correlation

● Select var1 and var2, hit Calculate

Example 3b1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Friends on FB 170 170 680 510 680 425 425 680 85 680 850 850 680 17 510Shoe size 8.4 9.0 9.1 7.8 8.8 8.7 8.8 9.1 8.5 9.6 9.1 8.2 9.3 8.0 9.4

Find Correlation Coefficient

r = 0.409

Page 44: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

44

Using Critical Values

Given n=15 and α=0.05, using Table A-6, the critical values are 0.514, -0.514

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Friends on FB 170 170 680 510 680 425 425 680 85 680 850 850 680 17 510Shoe size 8.4 9.0 9.1 7.8 8.8 8.7 8.8 9.1 8.5 9.6 9.1 8.2 9.3 8.0 9.4

r = 0.409

Example 3c Test for Correlation (α = 0.05)

Since r is not in the critical region, (fail to reject H0), we conclude there is no correlation

Page 45: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

45

Using P-value

● Go to: Stats – Summary Stats – Correlation

● Select var1 and var2, hit Next

● Check box, hit Calculate

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Friends on FB 170 170 680 510 680 425 425 680 85 680 850 850 680 17 510Shoe size 8.4 9.0 9.1 7.8 8.8 8.7 8.8 9.1 8.5 9.6 9.1 8.2 9.3 8.0 9.4

Example 3c Test for Correlation (α = 0.05)

P-value = 0.1297

Since P-value greater than α=0.05 (fail to reject H0), we conclude there is no correlation

r = 0.409

Page 46: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

46

Interpreting r:Explained Variation

The value of r2 is the proportion of the variation in y that is explained by the linear relationship between x and y.

r = 0.623r 2 = 0.388

r = 0.998r 2 = 0.996

Low variance High variance

Page 47: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

47

Using the data in Example 2 (blood pressure vs. age), we found that the linear correlation coefficient is r = 0.964

What proportion of the variation in the patients’ blood pressure can be explained by the variation in the patients’ age?

With r = 0.964, we get r2 = 0.929

We conclude that 0.929 (or about 93%) of the variation in blood pressure can be explained by the linear relationship between the age and blood pressure.

This implies about 7% of the variation in blood pressure cannot be explained by the age.

Example 4

Page 48: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

48

Common Errors Involving Correlation

1. Causation: It is wrong to conclude that correlation implies causality.

2. Linearity: There may be some relationship between x and y even when there is no linear correlation.

Page 49: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

49

Caution!!!Know that correlation does not

imply causality.

There may be correlation without causality.

Page 50: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

50

Page 51: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

51

Objective

Given two linearly correlated variables (x and y), find the linear function (equation) that best describes the trend.

Section 10.3Regression

Page 52: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

52

Equation of a line

Recall that the equation of a line is given by its slope and y-intercept

y = m x + b

Page 53: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

53

Regression

For a set of data (with variables x and y) that is linearly correlated, we want to find the equation of the line that best describes the trend.

This process is called Regression

Page 54: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

54

x : The predictor variable (Also called the explanatory variable or independent variable)

y : The response variable (Also called the dependent variable)

Regression Equation The equation that describes the algebraically relationship between the two variables

Regression Line The graph of the regression equation (also called the line of best fit or least squares line)

Definitions

Page 55: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

55

Regression Equation

y = b0 + b1x

b0 : y-intercept b1 : slope

Regression Line

Definitions

Page 56: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

56

Notation for Regression Equation

y-intercept

Slope

Equation

Population

0

1

y = 0 + 1 x

Sample

b0

b1

y = b0 + b1 x

Page 57: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

57

1. The sample of paired (x, y) data is a random sample of quantitative data.

2. Visual examination of the scatterplot shows that the points approximate a straight-line pattern.

3. Any outliers must be removed if they are known to be errors. Consider the effects of any outliers that are not known errors.

Requirements

Page 58: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

58

Rounding b0 and b1

Round to three significant digits

If you use the formulas from the book, do not round intermediate values.

Page 59: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

59

Refer to the sample data given in Table 10-1 in the Chapter Problem.

Find the equation of the regression line in which the explanatory variable (x-variable) is the cost of a slice of pizza and the response variable (y-variable) is the corresponding cost of a subway fare.

(CPI=Consumer Price Index, not used)

Example 1

Page 60: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

60

x : 0.15 0.35 1.00 1.25 1.75 2.00

y : 0.15 0.35 1.00 1.35 1.50 2.00

1. Enter data in StatCrunch (columns)

Example 1

Page 61: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

61

x : 0.15 0.35 1.00 1.25 1.75 2.00

y : 0.15 0.35 1.00 1.35 1.50 2.00

2. Stat – Regression – Simple Linear

Example 1

Page 62: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

62

x : 0.15 0.35 1.00 1.25 1.75 2.00

y : 0.15 0.35 1.00 1.35 1.50 2.00

2. Select var1 and var2 (i.e. x and y values) Click Calculate

Example 1

Page 63: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

63

x : 0.15 0.35 1.00 1.25 1.75 2.00

y : 0.15 0.35 1.00 1.35 1.50 2.00

b0 = 0.0345

b1 = 0.945

Regression Equation

y = (0.0345) + (0.945)x

Example 1

Page 64: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

64

Regression Equation

y = (0.0345) + (0.945)x

Example 1

Page 65: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

65

1. Predicted value of y is y = b0 + b1x

2. Use the regression equation for predictions only if the graph of the regression line on the scatterplot confirms that the regression line fits the points reasonably well.

Using the Regression Equation for Predictions

3. Use the regression equation for predictions only if the linear correlation coefficient r indicates that there is a linear correlation between the two variables.

Page 66: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

66

4. Use the regression line for predictions only if the value of x does not go much beyond the scope of the available sample data.

Predicting too far beyond the scope of the available sample data is called extrapolation, and it could result in bad predictions.

Using the Regression Equation for Predictions

5. If the regression equation does not appear to be useful for making predictions, the best predicted value of a variable is its point estimate, which is its sample mean ( y )

_

Page 67: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

67

Using the Regression Equation for Predictions

Source: www.xkcd.com

Page 68: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

68

Strategy for Predicting Values of Y

Page 69: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

69

If the regression equation is not a good model, the best predicted value of y is simply y (the mean of the y values)

Remember, this strategy applies to linear patterns of points in a scatterplot.

Using the Regression Equation for Predictions

_

Page 70: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

70

For a pair of sample x and y values, the residual is the difference between the observed sample value of y and the y-value that is predicted by using the regression equation. That is,

Definition

Residual = (observed y) – (predicted y) = y – y

Page 71: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

71

Residuals

Page 72: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

72

A straight line satisfies the least-squares property if the sum of the squares of the residuals is the smallest sum possible.

The best possible regression line satisfies this properties (hence why it is also called the least squares line)

Definition

Page 73: 1 Chapter 10 Correlation and Regression. 2 Consider two variables of a population denoted x and y (e.g. weight and height) Goal: Determine if there is

73

Least Squares Property

sum = (-5)2 + 112 + (-13) 2 + 72 = 364(any other line would yield a sum larger than 364)