70
Chapter 14 Inference for Regression

Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Chapter 14Inference for Regression

Page 2: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Lesson 14-1, Part 1Inference for Regression

Page 3: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Review Least-Square RegressionA family doctor is interested in examining the relationship betweenpatient’s age and total cholesterol. He randomly selects 14 of his femalepatients and obtains the data present in Table 1. The data are based uponresults obtained from the National Center for Health Statistics.

Table 1

Age Total Cholesterol Age Total Cholesterol

25 180 42 183

25 195 48 204

28 186 51 221

32 180 51 243

32 210 58 208

32 197 62 228

38 239 65 269

Page 4: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Review Least-Square Regression

1. What is the least-square regression line for

predicting total cholesterol from age for women?

The least square regression equation is ŷ = 151.3537 + 1.3991x, where ŷ represents the predicted total cholesterol for a female who age is x.

Page 5: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Review Least-Square Regression2. What is the correlation coefficient between age and

cholesterol? Interpret the correlation coefficient in the

context of the problem

The linear correlation coefficient is 0.718. There is a moderate, positive linear relationship between female age and total cholesterol.

Page 6: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Review Least-Square Regression3. What is the predicted cholesterol level of 67 year old

female?

ˆ 151.3537 1.399

151.3537 1.3991( )

151.3537 1.3991(67)

245

y x

cholesterol age

Page 7: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Review Least-Square Regression4. Interpret the slope of the regression line in the context of

the problem?

For each increase in age of one year, the total

cholesterol is predicted to increases by 1.3991.

Page 8: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Statistics and Parameters

• When doing inference for regression, we use to estimate the population regression line.▫ a and b are estimators of population parameters α and

β, the intercept and slope of the population regression line.

y a bx

Page 9: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Conditions

• The conditions necessary for doing inference for regression are:▫ For each given value of x, the values of the response

variable y-values are independent and normally distributed.

▫ For each value of x, the standard deviation, σ, of y-values is the same.

▫ The mean response of the y values for the fixed values of x are linearly related by the equation μy = α + βx

Page 10: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Standard Error of the Regression Line

• Gives the variability of the vertical distances of the y-values from the regression line

• Remember that a residual was the error involved when making a prediction from the regression equation

• The spread around the line is measured with the standard deviation of the residual, s.

2 2

ˆ

2 2i iy y residuals

sn n

Page 11: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Standard Error of the Slope of the Regression Line

• Gives the variability of the estimates of the slope of the regression line

2

2 2

ˆ

2i i

b

i i

y y

s nSEx x x x

Page 12: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Summary• Inference for regression depends upon estimating μy = α + βx with ŷ = a + bx

• For each x, the response values of y are independent and follow a normal distribution, each distribution having the same standard deviation.

• Inference for regression depends on the following statistics:▫ a, the estimate of the y intercept, α, of μy

▫ b, the estimate of the slope, β, of μy

▫ s, the standard error of the residuals

▫ SEb the standard error of the slope of the regression line.

Page 13: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Computing Standard Error of the Residual

Age, x Total Cholesterol, y

ŷ = 151.3537 + 1.3991x Residuals(y – ŷ)

Residuals2

(y – ŷ)2

25 180 186.33 -6.33 40.0689

25 195 186.33 8.67 75.1689

28 186 190.53 -4.53 20.5209

32 180 196.12 -16.12 259.8544

32 210 196.12 13.88 192.6544

32 197 196.12 0.88 0.7744

38 239 204.52 34.48 1188.8704

62 228 238.10 -10.10 102.01

65 269 242.30 26.70 712.89

Σ residuals2 = 4553.708

Page 14: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Computing Standard Error

24553.705

19.482 14 2

residualsS

n

Page 15: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 787, #14.2Body weights and backpack weights were collected for eight

students

Weight (lbs)

120 187 109 103 131 165 158 116

Backpack weight (lbs)

26 30 26 24 29 35 31 28

These data were entered into a statistical package and least-

squares regression of backpack weight on body weight as

requested. Here are the results.

Page 16: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 787, #14.2

Predictor Coef Stdev t-ratio p

Constant 16.265 3.937 4.13 0.006

BodyWT 0.09080 0.02831 3.21 0.018

S = 2.270 R-sq = 63.2% R-sq(adj) = 57.0%

A) What is the equation of the least-square line?

Backpack weight = 16.265 + 0.09080(bodyweight)

Page 17: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 787, #14.2Predictor Coef Stdev t-ratio p

Constant 16.265 3.937 4.13 0.006

BodyWT 0.09080 0.02831 3.21 0.018

S = 2.270 R-sq = 63.2% R-sq(adj) = 57.0%

B) The model for regression inference has three parameters,

which we call α, β and σ. Can you determine the

estimates for α and β from the computer printout?

a = 16.265 estimates the true intercept α and b = 0.09080

estimates the true slope β.

Page 18: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 787, #14.2Predictor Coef Stdev t-ratio p

Constant 16.265 3.937 4.13 0.006

BodyWT 0.09080 0.02831 3.21 0.018

S = 2.270 R-sq = 63.2% R-sq(adj) = 57.0%

C) The computer output reports that s = 2.270. This is an

estimate of the parameter σ. Use the formula for s to

verify the computer’s value of s.

Use your TI to verify this.

Page 19: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 788, #14.4Exercise 3.71 on page 187 provided data on the speed of

competitive runners and the number of steps they took per

second. Good runners take more steps per second as they

speed up. Here is the data again.

15.86 16.88 17.50 18.62 19.97 21.06 22.11

3.05 3.12 3.17 3.25 3.36 3.46 3.55

speed

steps

A)Enter the data into your calculator, perform least-square

regression, and plot the scatterplot with the least-square

line. What is the strength of the association between

speed and steps per second?

Page 20: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 788, #14.4

Steps = 1.77 + 0.0803(speed). There is a very strong

positive linear relationship between speed and steps; r = 0.999.

nearly all the variation (r2 = 0.998) 99.8% of it in steps per

second is explained by the linear relationship.

Page 21: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 788, #14.4

speed (feet per second)

ste

ps p

er

se

con

d

Page 22: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 788, #14.4C) The model for regression inference has three parameters,

α, β and σ. Estimate these parameters from the data

a = 1.766 is the estimate of α

b = 0.0803 is the estimate of β

s = 0.0091 is the estimate of σ

Page 23: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Lesson 14-1, Part 2Inference for Regression

Page 24: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Significance Test for the Slope of a Regression Line

• We want to test whether the slope of the regression line is zero or not.▫ If the slope of the line is zero, then there is no linear

relationship between x and y variables.

▫ Remember (formula for b) if r = 0, then b = 0

• Hypothesis▫ Two Tailed: Ho: β = 0 and Ha: β ≠ 0

▫ Left Tailed: Ho: β = 0 and Ha: β < 0

▫ Right Tailed: Ho: β = 0 and Ha: β > 0

Page 25: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Test Statistics and Confidence Interval

• t distribution with n – 2 degrees of freedom

• SEb = Standard error of the slope

b b

b β bt

SE SE *

bb t SE

2b

i

sSE

x x

Page 26: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Reading Computer Printouts

Page 27: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 794, #14.6Exercise 14.1 (page 786) presents data on the lengths of two

bones in five fossil specimens of the extinct beast

Archaeopteryx. Here is part of the output from the S-PLUS

statistical software when we regress the length y of the

humerus on the length x of the femur.

Coefficients

Value Std Error t value Pr(>|t|)

(Intercepts) – 3.6596 4.4590 – 0.8207 0.4719

Femur 1.1969 0.0751

Page 28: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 794, #14.6

A) What is the equation of the least-squares regression line?

Coefficients

Value Std Error t value Pr(>|t|)

(Intercepts) – 3.6596 4.4590 – 0.8207 0.4719

Femur 1.1969 0.0751

3.6596 1.1969( )humerus femur

Page 29: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 794, #14.6

B) We left out the t statistic for testing Ho: β = 0 and its

P-value. Use the output to find t.

Coefficients

Value Std Error t value Pr(>|t|)

(Intercepts) – 3.6596 4.4590 – 0.8207 0.4719

Femur 1.1969 0.0751

1.1969

15.940.0751b

bt

S

Page 30: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 794, #14.6C)How many degrees of freedom does t have? Use Table C

to approximate the P-value of t against the one-sided

alternative Ha: β > 0.

df = 3; since t > 12.92, we know P-value < 0.0005

4(15.9374, 99,3) 2.685 10tcdf E

Page 31: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 794, #14.6

D)Write a sentence to describe your conclusion about the

slope of the true regression line.

There is very strong evidence that β > 0, that is, that

the line is useful for predicting the length of the

humerus given the length of the femur

Page 32: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 794, #14.6

E)Determine a 99% confidence interval for the true slope

of the regression line.

Page 33: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 794, #14.6

1.1969 5.841(0.0751)

(0.758,1.636)

*bb t S

Page 34: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 794, #14.8There is some evidence that drinking moderate amounts

of wine helps prevent heart attacks. Exercise 3.63 (Page 183)

gives data on yearly wine consumption (liters of alcohol from

drinking wine, per person) and yearly deaths from heart

disease (deaths per 100,000 people) in 19 developed

nations.

A) Is there statistically significant evidence of a negative

association between wine consumption and heart disease

deaths? Carry out the appropriate test of significance and

write a summary statement about your conclusions.

Page 35: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 794, #14.8

Page 36: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 794, #14.8

: 0

: 0

o

a

H β

H β

β = negative association between wine consumption

and heart disease deaths.

Page 37: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 794, #14.8Linear Regression T-test

Condition

1. The observations are independent

2. The true relationship is linear (check scatterplot to check

that the overall pattern is linear or plot of residuals

against the predicted values)

3. The standard deviation of the response about the true

line is the same everywhere (make sure the spread

around the line is nearly constant)

4. The response varies normally about the true regression

line (normal probability plot of residuals is quite straight)

Page 38: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 794, #14.8

22.976.47

3.357b

bt

S

2( )

b

sS

x x

62.96 10

0.0005

p value

P value

Reject Ho, since p-value = 0.0005 < = 0.05 and conclude

that there a linear relationship between wine consumption

and heart disease deaths.

Page 39: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 795, #14.10Exercise 14.4 (page 788) presents data on the relationship

between the speed of runners (x, in feet per second) and

the number of steps y that they take in a second. Here

is part of the Data Desk Regression output for these data:

R squared = 99.8%

s = 0.0091 with 7 – 2 = 5 degrees of freedom

Variable Coefficient s.e. of Coeff t-ratio prob

Constant 1.76608 0.0307 57.6 <0.0001

speed 0.080284 0.0016 49.7 <0.0001

Page 40: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 795, #14.10

R squared = 99.8%

s = 0.0091 with 7 – 2 = 5 degrees of freedom

Variable Coefficient s.e. of Coeff t-ratio prob

Constant 1.76608 0.0307 57.6 <0.0001

speed 0.080284 0.0016 49.7 <0.0001

A)How can you tell from this output, even without the

scatterplot, that there is a very strong straight-line

relationship between running speed and steps per second?

Page 41: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 795, #14.10

R squared = 99.8%

s = 0.0091 with 7 – 2 = 5 degrees of freedom

Variable Coefficient s.e. of Coeff t-ratio prob

Constant 1.76608 0.0307 57.6 <0.0001

speed 0.080284 0.0016 49.7 <0.0001

r2 is very close to 1, which means that nearly all the variation

in steps per second is accounted for by foot speed. Also, the

P-value for β is small.

Page 42: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 795, #14.10

R squared = 99.8%

s = 0.0091 with 7 – 2 = 5 degrees of freedom

Variable Coefficient s.e. of Coeff t-ratio prob

Constant 1.76608 0.0307 57.6 <0.0001

speed 0.080284 0.0016 49.7 <0.0001

B) What parameter in the regression model gives the rate at

which steps per second increase as running speed

increases? Give a 99% confidence interval for this rate.

Page 43: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 795, #14.10R squared = 99.8%

s = 0.0091 with 7 – 2 = 5 degrees of freedom

Variable Coefficient s.e. of Coeff t-ratio prob

Constant 1.76608 0.0307 57.6 <0.0001

speed 0.080284 0.0016 49.7 <0.0001

β (the slope) is this rate; the estimate is listed as coeffincient

of “Speed,” 0.080284.

* 0.080284 4.032(0.0016) (0.074,0.087)bb t S

Page 44: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Lesson 14-2, Part 1Predictions and Conditions

Page 45: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Confidence Intervals• Write the given value of the explanatory variable x

as x*.▫ The distinction between predicting a single outcome

and predicting the mean of all outcomes when x = x* determines what margin of error is correct.

• Estimate the mean response, we use a confidence interval.▫ µy = α + βx*

• Estimate an individual response y, we use a prediction interval

Page 46: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Confidence Intervals for Regression Response

ˆμy t SE

A level C confidence interval for the mean response

µy when x takes the value x* is

The standard error

2*

ˆ 2

x xSE s

n x x

Page 47: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Prediction Intervals for Regression Response

ˆyy t SE

A level C prediction interval for a single observation

on y when x takes the value x*

The standard error

2*

ˆ 2

11y

x xSE s

n x x

Page 48: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Conditions for Regression Inference

• The observations are independent

• The true relationship is linear

• The standard deviation of the response about the true line is the same everywhere.

• The response varies normally about the true regression line.

• Check conditions using the residuals.

Page 49: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Examine the residual plot to check that the relationship is roughly linear and that the scatter about the line is the same from end to end.

Page 50: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Violations of the regression conditions: The variation of the residuals is not constant.

Page 51: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Violations of the regression conditions: There is a curved relationship between the response variable and the explanatory variable.

Page 52: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 802, #14.12

A)The residuals for the crying and IQ data appear in

Example 14.3 (page 785). Make a stemplot to display

the distribution of the residuals. Are there outliers or

signs of strong departures from normality?

19.20 31.13 22.65 15.18 12.18 15.15 16.63 6.18

1.70 22.60 6.68 6.17 9.15 23.58 9.14 2.80

9.14 1.66 6.14 12.60 0.34 8.62 2.85 14.30

9.82 10.82 0.37 8.85 10.87 19.34 10.89 2.55

20.85 24.35 18.94 32.89 18.47 51.32

Page 53: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 802, #14.123 1

2 4 3 3

1 0 8 5 5 3 2

0 9 9 9 9 7 6 6 6 3 2 2 0

0 0 3 3 9

1 0 1 1 1 4 8 9 9

2 1 4

3 3

4

5 1

One residual (51.32) may be a high

outlier, but the stemplot does not

Show any other deviations from

normality.

Page 54: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 802, #14.12

B) What other assumptions or conditions are required for

using inference for regression on these data? Check that

those conditions are satisfied and then describe your

findings.

Page 55: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 802, #14.12

Page 56: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 802, #14.12The scatter of the data points about the regression line varies

to a extent as we move along the line, but the variation is

not serious, as a residual plot shows. The other conditions can

be assumed to be satisfied.

Page 57: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 802, #14.12C) Would a 95% prediction interval for x = 25 be narrower,

the same size, or wider than a 95% confidence interval?

Explain your reasoning.

A prediction interval would be wider. For a fixed

confidence level, the margin of error is always larger

when we are predicting a single observation than when

we are estimating the mean response.

Page 58: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 802, #14.12D) A computer package reports that the 95% prediction

interval for x = 25 is (91.85, 165.33). Explain what this

interval means in simple language.

We are 95% confident that when x (crying intensity) = 25,

the corresponding value of y (IQ) will be between 91.85

and 165.33

Page 59: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 802, #14.14In exercise 14.11 (page 795) we regressed the lean of the

Leaning Tower of Pisa on year to estimate the rate at which

the tower is tilting. Here are the residuals from that

regression, in order by years across the rows:

4.220 3.099 0.418 1.264 2.055 3.626 2.308

5.011 0.670 4.648 5.967 1.714 7.396

Use the residuals to check the regression conditions, and

describe your findings. Is the regression in exercise 14.11

trustworthy?

Page 60: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 802, #14.14In exercise 14.11 (page 795) we regressed the lean of the

Leaning Tower of Pisa on year to estimate the rate at which

the tower is tilting. Here are the residuals from that

regression, in order by years across the rows:

4.220 3.099 0.418 1.264 2.055 3.626 2.308

5.011 0.670 4.648 5.967 1.714 7.396

Use the residuals to check the regression conditions, and

describe your findings. Is the regression in exercise 14.11

trustworthy?

Page 61: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 802, #14.14

Residual

Normal Prop.

Of Residual

The scatterplot of the residual versus year does not suggest

any problems. The regression in Exercise 14.11 should be

fairly reliable

Page 62: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 809, #14.24Here are data on the time (in minutes) Professor Moore takes

to swim 2000 yards and his pulse rate (beat per minute)

after swimming:

Time: 34.12 35.72 34.72 34.05 34.13 35.72 36.17 35.57

Pulse: 152 124 140 152 146 128 136 144

Time: 35.37 35.57 35.43 36.05 34.85 34.70 34.75 33.93

Pulse: 148 144 136 124 148 144 140 156

Time: 34.60 34.00 34.35 35.62 35.68 35.28 35.97

Pulse: 136 148 148 132 124 132 139

Page 63: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 809, #14.24A scatterplot shows a negative linear relationship: a faster

time (fewer minutes) is associated with a higher heart rate.

Here is part of the output from the regression function in

Excel spreadsheets.

Coefficients Standard Error t Stat P-value

Intercepts 479.9341457 66.22779275 7.246718119 3.87075E–07

X variable – 9.694903394 1.888664503 – 5.1332057 4.37908E–05

Give a 90% confidence interval for the slope of the true

regression line. Explain what your result tells us about the

relationship between the professor’s swimming time and

heart rate.

Page 64: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 809, #14.24

Coefficients Standard Error t Stat P-value

Intercepts 479.9341457 66.22779275 7.246718119 3.87075E–07

X variable – 9.694903394 1.888664503 – 5.1332057 4.37908E–05

*

21

9.9649 1.721(1.8887)

bb t SE

– 12.9454 to – 6.4444 bpm per minute

With a 90% confidence, we can say that for each

1-minute increase in swimming time, pulse rate

drops by 6 to 13 bpm.

Page 65: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 809, #14.24

Using the TI

Page 66: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 809, #14.25Exercise 14.24 gives data on a swimmer’s time and heart

rate. One day the swimmer completes his laps in 34.3

minutes but forgets to take his pulse. Minitab gives this

prediction for heart rate when x* = 34.3:

Fit StDev Fit 90.0% CI 90.0% PI

147.40 1.97 (144.02, 150.78) (135.79, 159.01)

A) Verify that “Fit” is the predicted heart rate from the

least-square line found in exercise 14.24. Then choose

one of the intervals from the output to estimate the

swimmer’s heart rate that day and explain why you

chose this interval.

Page 67: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 809, #14.25

( ) 479.9 9.6949 ( )y pulse x time

when x = 34.3 minutes

( ) 479.9 9.6949(34.3) 147.37y pulse

this agrees the output

Fit StDev Fit 90.0% CI 90.0% PI

147.40 1.97 (144.02, 150.78) (135.79, 159.01)

Page 68: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 809, #14.25

Fit StDev Fit 90.0% CI 90.0% PI

147.40 1.97 (144.02, 150.78) (135.79, 159.01)

The prediction interval is appropriate for estimating one

value (as opposed to mean of many values): 135.79 to

159.01 bpm

Page 69: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 809, #14.25

Fit StDev Fit 90.0% CI 90.0% PI

147.40 1.97 (144.02, 150.78) (135.79, 159.01)

B) Minitab gives only one of the two standard errors used

in prediction. It is the standard error for estimating

the mean response. Use this fact and a critical value

from table C to verify Minitab’s 90% confidence interval

for the mean heart rate on days when the swimming time

is 34.3 minutes.

ˆSE

Page 70: Chapter 14 - StartLogicwellsmat.startlogic.com/.../apstat_ch14_cn.pdfExample –Page 794, #14.6 Exercise 14.1 (page 786) presents data on the lengths of two bones in five fossil specimens

Example – Page 809, #14.25

Fit StDev Fit 90.0% CI 90.0% PI

147.40 1.97 (144.02, 150.78) (135.79, 159.01)

*

21 ˆˆ

147.40 1.721(1.97)

y t SE

144.01 to 150.79, which agrees with the computer

output