Upload
preston-manning
View
215
Download
0
Embed Size (px)
Citation preview
Ch 15 – Inference for Regression
Example #1:The following data are pulse rates and heights for a group of 10 female statistics students.
Height 55 59 60 63 64 64 66 70 70 72
Pulse 53 53 58 62 63 65 68 70 73 76
a. Sketch a scatterplot of the data. What is the least-squares regression line for predicting pulse rate from height?
ˆ 27.5784 1.42579y x where y = predicted pulse rates
x = height
b. What is the correlation coefficient between height and pulse rate? Interpret this number.
r = 0.9746
Strong, Positive relationship
c. What is the predicted pulse rate of a 59” tall student?
ˆ 27.5784 1.42579y x
ˆ 27.5784 1.42579 59y
ˆ 56.54y
d. What is the residual for the 59” student?
Height 55 59 60 63 64 64 66 70 70 72
Pulse 53 53 58 62 63 65 68 70 73 76
ˆy y 53 – 56.54 = – 3.54
e. Construct a residual plot and describe its meaning.
No pattern, so good linear model
Ok, so what is the new stuff for chapter 15?
y a bx This is not the true line for the population!
y x Where = true y-intercept and = true slope of the population
Remember: Residuals tell you information about the line and if it is a good model
Chapter 15 only focuses on slope.
We are going to determine if there is a linear relationship between two variables. (or = 0)
Conditions for Inference:
• The observations are independent
• The relationship is linear
Can’t do repeated observations on the same individual!
Look for patterns in the residual plot
• The standard deviation of the response about the true line is the same everywhere
• The response varies Normally about the true regression line
Look for spread in the residual plot
Histogram for residuals, look to see if approx normal
Conditions for Inference:
Standard Error about the LSRL:
s = unbiased estimator of
Standard deviation of residuals
2ˆ
2
y y
n
2residuals
2s
n
Calculator Tip! Standard Error
Stat – Tests - LinRegTTest
L1: xL2: yUse Leave RegEq blankCalculates = standard error
Confidence Intervals for Regression Slope:
*2n bb t SE
where bSE Standard error of the slope
2bs
SEx x
2
2
ˆ
2
y y
n
x x
SEb estimates the variability in the sampling distribution of the estimated slope (how much slopes vary from experiment to experiment.
Minitab Printout:
The regression equation isPredicted y = y-intercept + slope x-variable ˆ( )y a bx
Predictor Coef StDev T PConstant y-intercept (a) ignore ignore ignore
X-variable Slope (b) SEb test-statistic p-value(2-sided)
s = standard deviation R-sq = r2 R-sq(adj) = ignoreof residuals
Example #1Infants who cry easily may be more easily stimulated than others. This may be a sign of higher IQ. Child development researchers explored the relationship between the crying of infants four to ten days old and later their IQ test scores. A snap of a rubber band on the sole of the foot caused the infants to cry. The researchers recorded the crying and measured its intensity by the number of peaks in the most active 20 seconds. They later measured the children’s IQ at age three years using the Stanford-Binet IQ test. The data is below.
Crying IQ Crying IQ Crying IQ Crying IQ
10 87 20 90 17 94 12 94
12 97 16 100 19 103 12 103
9 103 23 103 13 104 14 106
16 106 27 108 18 109 10 109
18 109 15 112 18 112 23 113
15 114 21 114 16 118 9 119
12 119 12 120 19 120 16 124
20 132 15 133 22 135 31 135
16 136 17 141 30 155 22 157
33 159 13 162
a. Label all important parts of the Minitab printout.
The regression equation isIQ = 91.3 + 1.49 Crycount
Predictor Coef StDev T PConstant 91.3 8.934 10.22 0.000Crycount 1.49 0.4870 3.07 0.004
s = 17.50 R-sq = 20.7% R-sq(adj) = 21%
LSRL
(y-int)(slope) (SEb)
(standard deviation of the residuals)
(correlation of determination)
b. Sketch a scatterplot of the data.
c. Calculate the standard deviation of the residuals using your calculator.
2residuals
2s
n
17.4987
d. Construct a 95% confidence interval for the slope.
P: True slope of the line for crying vs. IQ
A:
The observations are independent
Infants who cry easily may be more easily stimulated than others. This may be a sign of higher IQ. Child development researchers explored the relationship between the crying of infants four to ten days old and later their IQ test scores. A snap of a rubber band on the sole of the foot caused the infants to cry. The researchers recorded the crying and measured its intensity by the number of peaks in the most active 20 seconds. They later measured the children’s IQ at age three years using the Stanford-Binet IQ test.
Each infant should be separate from another, not influencing the next test
The relationship is linearA:
No apparent patterns in the residuals
A: The standard deviation of the response about the true line is the same everywhere
Residuals spread out evenly
The response varies Normally about the true regression line
A:
Slightly skewed right.
Line of regression T-interval N:
I: *2n bb t SE
*38 21.49 0.4870t
1.49 2.042 0.4870
(0.49844 , 2.48735)
C: I am 95% confident the true slope of the line for crying vs. IQ is between 0.49844 and 2.48735.
Note: 0 is not in the interval! This means they have an linear relationship.
OR
I am 95% confident the mean IQ increases by between 0.49844 and 2.48735 points for each additional peak in crying.
Ch 15B – Hypothesis Testing for Slope
Remember:
brSySx
so, if r = 0, then b = 0
Ho: 0
Or there is no true linear relationship between x and y.
t b
SEb
Test Statistic:
Calculator Tip! Line Regression Test
Stat – Tests - LinRegTTest
L1: xL2: yLeave RegEq blank
Example #1How well does the number of beers a student drink predict his or her blood alcohol content (BAC). Sixteen students volunteers at Ohio State University drank a randomly assigned number of cans of beer. Thirty minutes later, a police officer measured their BAC. The data is below. Stu # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Beer 5 2 9 8 3 7 3 5 3 5 4 6 5 7 1 4
BAC 0.10 0.03 0.19 0.12 0.04 0.095 0.07 0.06 0.02 0.05 0.07 0.10 0.085 0.09 0.01 0.05
a. What is the least-squares regression line?
ˆ 0.0127 0.0180y x where y = predicted BAC
x = # of beers
b. Make a scatterplot of the data and describe its shape.
Positive, strong, linear relationship
c. What is the correlation coefficient? What does it mean?
r = 0.894
Strong, positive relationship
d. Label all important parts of the Minitab printout.
The regression equation isBAC = – 0.0127 + 0.0180 BeersPredictor Coef StDev T PConstant – 0.0127 0.01264 –1.00 0.332Beers 0.017964 0.002402 7.48 0.000
s = 0.02044 R-sq = 80% R-sq(adj) = 78.6%
LSRL
(y-int)(slope)
(SEb)
(standard deviation of the residuals)
(correlation of determination)
(test statistic)
(Prob, 2-tailed)
e. Verify the results by using your calculator.
Stat – Tests - LinRegTTest
L1: xL2: yLeave RegEq blank
2bs
SEx x
0.0024020.02044
72.4375
f. Conduct the hypothesis test to see if there is a positive relationship between # beers and BAC.
P: determine if there is a positive linear relationship between # beers and BAC
H:
Ho: =0 The number of beers has no effect on BAC
Ha: > 0 The number of beers has a positive linear effect on BAC.
The relationship is linearA:
No apparent patterns in the residuals
A: The standard deviation of the response about the true line is the same everywhere
Residuals spread out evenly
The response varies Normally about the true regression line
A:
N: Line of Regression T-Test
T:
b
btSE
0.0180
0.002402 7.48
O:P(t > 7.48) =
df = n – 2 = 16 – 2 = 14
O:P(t > 7.48) =
df = n – 2 = 16 – 2 = 14
Less than 0.0005
Or: on calc
P(t > 7.48) = 0.000001
M:
____ p 0.000001 0.05
<
Reject the Null
S: There is enough evidence to claim that an increased number of beers does increase BAC.