Ch 15 – Inference for Regression. Example #1: The following data are pulse rates and heights for a group of 10 female statistics students. Height5559606364

Ch 15 – Inference for Regression

Example #1:The following data are pulse rates and heights for a group of 10 female statistics students.

Height 55 59 60 63 64 64 66 70 70 72

Pulse 53 53 58 62 63 65 68 70 73 76

a. Sketch a scatterplot of the data. What is the least-squares regression line for predicting pulse rate from height?

ˆ 27.5784 1.42579y x where y = predicted pulse rates

x = height

b. What is the correlation coefficient between height and pulse rate? Interpret this number.

r = 0.9746

Strong, Positive relationship

c. What is the predicted pulse rate of a 59” tall student?

ˆ 27.5784 1.42579y x

ˆ 27.5784 1.42579 59y

ˆ 56.54y

d. What is the residual for the 59” student?

Height 55 59 60 63 64 64 66 70 70 72

Pulse 53 53 58 62 63 65 68 70 73 76

ˆy y 53 – 56.54 = – 3.54

e. Construct a residual plot and describe its meaning.

No pattern, so good linear model

Ok, so what is the new stuff for chapter 15?

y a bx This is not the true line for the population!

y x Where = true y-intercept and = true slope of the population

Remember: Residuals tell you information about the line and if it is a good model

Chapter 15 only focuses on slope.

We are going to determine if there is a linear relationship between two variables. (or = 0)

Conditions for Inference:

• The observations are independent

• The relationship is linear

Can’t do repeated observations on the same individual!

Look for patterns in the residual plot

• The standard deviation of the response about the true line is the same everywhere

• The response varies Normally about the true regression line

Look for spread in the residual plot

Histogram for residuals, look to see if approx normal

Conditions for Inference:

Standard Error about the LSRL:

s = unbiased estimator of

Standard deviation of residuals

2ˆ

2

y y

n

2residuals

2s

n

Calculator Tip! Standard Error

Stat – Tests - LinRegTTest

L1: xL2: yUse Leave RegEq blankCalculates = standard error

Confidence Intervals for Regression Slope:

*2n bb t SE

where bSE Standard error of the slope

2bs

SEx x

2

2

ˆ

2

y y

n

x x

SEb estimates the variability in the sampling distribution of the estimated slope (how much slopes vary from experiment to experiment.

Minitab Printout:

The regression equation isPredicted y = y-intercept + slope x-variable ˆ( )y a bx

Predictor Coef StDev T PConstant y-intercept (a) ignore ignore ignore

X-variable Slope (b) SEb test-statistic p-value(2-sided)

s = standard deviation R-sq = r2 R-sq(adj) = ignoreof residuals

Example #1Infants who cry easily may be more easily stimulated than others. This may be a sign of higher IQ. Child development researchers explored the relationship between the crying of infants four to ten days old and later their IQ test scores. A snap of a rubber band on the sole of the foot caused the infants to cry. The researchers recorded the crying and measured its intensity by the number of peaks in the most active 20 seconds. They later measured the children’s IQ at age three years using the Stanford-Binet IQ test. The data is below.

Crying IQ Crying IQ Crying IQ Crying IQ

10 87 20 90 17 94 12 94

12 97 16 100 19 103 12 103

9 103 23 103 13 104 14 106

16 106 27 108 18 109 10 109

18 109 15 112 18 112 23 113

15 114 21 114 16 118 9 119

12 119 12 120 19 120 16 124

20 132 15 133 22 135 31 135

16 136 17 141 30 155 22 157

33 159 13 162

a. Label all important parts of the Minitab printout.

The regression equation isIQ = 91.3 + 1.49 Crycount

Predictor Coef StDev T PConstant 91.3 8.934 10.22 0.000Crycount 1.49 0.4870 3.07 0.004

s = 17.50 R-sq = 20.7% R-sq(adj) = 21%

LSRL

(y-int)(slope) (SEb)

(standard deviation of the residuals)

(correlation of determination)

b. Sketch a scatterplot of the data.

c. Calculate the standard deviation of the residuals using your calculator.

2residuals

2s

n

17.4987

d. Construct a 95% confidence interval for the slope.

P: True slope of the line for crying vs. IQ

A:

The observations are independent

Infants who cry easily may be more easily stimulated than others. This may be a sign of higher IQ. Child development researchers explored the relationship between the crying of infants four to ten days old and later their IQ test scores. A snap of a rubber band on the sole of the foot caused the infants to cry. The researchers recorded the crying and measured its intensity by the number of peaks in the most active 20 seconds. They later measured the children’s IQ at age three years using the Stanford-Binet IQ test.

Each infant should be separate from another, not influencing the next test

The relationship is linearA:

No apparent patterns in the residuals

A: The standard deviation of the response about the true line is the same everywhere

Residuals spread out evenly

The response varies Normally about the true regression line

A:

Slightly skewed right.

Line of regression T-interval N:

I: *2n bb t SE

*38 21.49 0.4870t

1.49 2.042 0.4870

(0.49844 , 2.48735)

C: I am 95% confident the true slope of the line for crying vs. IQ is between 0.49844 and 2.48735.

Note: 0 is not in the interval! This means they have an linear relationship.

OR

I am 95% confident the mean IQ increases by between 0.49844 and 2.48735 points for each additional peak in crying.

Ch 15B – Hypothesis Testing for Slope

Remember:

brSySx

so, if r = 0, then b = 0

Ho: 0

Or there is no true linear relationship between x and y.

t b

SEb

Test Statistic:

Calculator Tip! Line Regression Test


L1: xL2: yLeave RegEq blank

Example #1How well does the number of beers a student drink predict his or her blood alcohol content (BAC). Sixteen students volunteers at Ohio State University drank a randomly assigned number of cans of beer. Thirty minutes later, a police officer measured their BAC. The data is below. Stu # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Beer 5 2 9 8 3 7 3 5 3 5 4 6 5 7 1 4

BAC 0.10 0.03 0.19 0.12 0.04 0.095 0.07 0.06 0.02 0.05 0.07 0.10 0.085 0.09 0.01 0.05

a. What is the least-squares regression line?

ˆ 0.0127 0.0180y x where y = predicted BAC

x = # of beers

b. Make a scatterplot of the data and describe its shape.

Positive, strong, linear relationship

c. What is the correlation coefficient? What does it mean?

r = 0.894

Strong, positive relationship

d. Label all important parts of the Minitab printout.

The regression equation isBAC = – 0.0127 + 0.0180 BeersPredictor Coef StDev T PConstant – 0.0127 0.01264 –1.00 0.332Beers 0.017964 0.002402 7.48 0.000

s = 0.02044 R-sq = 80% R-sq(adj) = 78.6%

LSRL

(y-int)(slope)

(SEb)

(standard deviation of the residuals)

(correlation of determination)

(test statistic)

(Prob, 2-tailed)

e. Verify the results by using your calculator.


L1: xL2: yLeave RegEq blank

2bs

SEx x

0.0024020.02044

72.4375

f. Conduct the hypothesis test to see if there is a positive relationship between # beers and BAC.

P: determine if there is a positive linear relationship between # beers and BAC

H:

Ho: =0 The number of beers has no effect on BAC

Ha: > 0 The number of beers has a positive linear effect on BAC.

The relationship is linearA:

No apparent patterns in the residuals

A: The standard deviation of the response about the true line is the same everywhere

Residuals spread out evenly

The response varies Normally about the true regression line

A:

N: Line of Regression T-Test

T:

b

btSE

0.0180

0.002402 7.48

O:P(t > 7.48) =

df = n – 2 = 16 – 2 = 14

O:P(t > 7.48) =

df = n – 2 = 16 – 2 = 14

Less than 0.0005

Or: on calc

P(t > 7.48) = 0.000001

M:

____ p 0.000001 0.05

<

Reject the Null

S: There is enough evidence to claim that an increased number of beers does increase BAC.

Documents

Ch 15 – Inference for Regression. Example #1: The following data are pulse rates and heights for a group of 10 female statistics students. Height5559606364