1 Chapter 12 Simple Linear Regression Simple Linear Regression Model Least Squares Method...

Preview:

Citation preview

1

Chapter 12 Simple Linear Regression

Simple Linear Regression ModelLeast Squares Method Coefficient of DeterminationModel AssumptionsTesting for SignificanceUsing the Estimated Regression Equation for Estimation and PredictionComputer SolutionResidual Analysis: Validating Model Assumptions

2

The equation that describes how y is related to x and an error term is called the regression model.The simple linear regression model is:

y = 0 + 1x +

– 0 and 1 are called parameters of the model.

– is a random variable called the error term.

Simple Linear Regression Model

3

n The simple linear regression equation is:

EE((yy) = ) = 00 + + 11xx

• Graph of the regression equation is a straight line.

• 0 is the y intercept of the regression line.

• 1 is the slope of the regression line.

• E(y) is the expected value of y for a given x value.

Simple Linear Regression EquationSimple Linear Regression Equation

4

Simple Linear Regression Equation

n Positive Linear Relationship

EE((yy))

xx

Slope Slope 11

is positiveis positive

Regression lineRegression line

InterceptIntercept00

5

Simple Linear Regression Equation

n Negative Linear Relationship

EE((yy))

xx

Slope Slope 11

is negativeis negative

Regression lineRegression lineInterceptIntercept00

6

Simple Linear Regression EquationSimple Linear Regression Equation

n No Relationship

EE((yy))

xx

Slope Slope 11

is 0is 0

Regression lineRegression lineInterceptIntercept

00

7

n The estimated simple linear regression equation is:

• The graph is called the estimated regression line.

• b0 is the y intercept of the line.

• b1 is the slope of the line.

• is the estimated value of y for a given x value.

Estimated Simple Linear Regression Estimated Simple Linear Regression EquationEquation

0 1y b b x 0 1y b b x

yy

8

Estimation Process

Regression Modely = 0 + 1x +

Regression EquationE(y) = 0 + 1x

Unknown Parameters0, 1

Sample Data:Sample Data:x yx y

xx11 y y11

. .. . . .. . xxnn yynn

EstimatedEstimatedRegression EquationRegression Equation

Sample StatisticsSample Statistics

bb00, , bb11

b0 and b1

provide estimates of0 and 1

0 1y b b x 0 1y b b x

9

Least Squares Criterion

where:yi = observed value of the dependent

variable for the ith observationyi = estimated value of the dependent

variable for the ith observation

Least Squares Method

min (y yi i )2min (y yi i )2

^

10

Slope for the Estimated Regression Equation

bx y x y n

x x ni i i i

i i1 2 2

( )/

( ) /b

x y x y n

x x ni i i i

i i1 2 2

( )/

( ) /

The Least Squares Method

11

n y-Intercept for the Estimated Regression Equation

where:xi = value of independent variable for ith observationyi = value of dependent variable for ith observation

x = mean value for independent variable y = mean value for dependent variable n = total number of observations

____

The Least Squares MethodThe Least Squares Method

0 1b y b x 0 1b y b x

12

Example: Reed Auto Sales

Simple Linear Regression Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales are shown on the next slide.

13

Example: Reed Auto Sales

n Simple Linear Regression

Number of TV AdsNumber of TV Ads Number of Cars SoldNumber of Cars Sold11 141433 242422 181811 171733 2727

14

Slope for the Estimated Regression Equation

b1 = 220 - (10)(100)/5 = _____

24 - (10)2/5

y-Intercept for the Estimated Regression Equation

b0 = 20 - 5(2) = _____

Estimated Regression Equation

y = 10 + 5x

^

Example: Reed Auto Sales

15

Example: Reed Auto Sales

Scatter Diagram

y = 10 + 5x

0

5

10

15

20

25

30

0 1 2 3 4TV Ads

Ca

rs S

old ^

16

Relationship Among SST, SSR, SSE

SST = SSR + SSE

where: SST = total sum of squares SSR = sum of squares due to

regression SSE = sum of squares due to error

The Coefficient of Determination

( ) ( ) ( )y y y y y yi i i i 2 2 2( ) ( ) ( )y y y y y yi i i i 2 2 2^^

17

n The coefficient of determination is:

r2 = SSR/SST

where: SST = total sum of squares SSR = sum of squares due to

regression

The Coefficient of DeterminationThe Coefficient of Determination

18

Coefficient of Determination

r2 = SSR/SST = 100/114 =The regression relationship is very strong

because 88% of the variation in number of cars sold can be explained by the linear relationship between the number of TV ads and the number of cars sold.

Example: Reed Auto Sales

19

The Correlation Coefficient

Sample Correlation Coefficient

where: b1 = the slope of the estimated

regressionequation

21 ) of(sign rbrxy 21 ) of(sign rbrxy

ionDeterminat oft Coefficien ) of(sign 1brxy ionDeterminat oft Coefficien ) of(sign 1brxy

xbby 10ˆ xbby 10ˆ

20

Sample Correlation Coefficient

The sign of b1 in the equation is “+”.

rxy = +.9366

Example: Reed Auto Sales

21 ) of(sign rbrxy 21 ) of(sign rbrxy

ˆ 10 5y x ˆ 10 5y x

=+ .8772xyr =+ .8772xyr

21

Model Assumptions

Assumptions About the Error Term 1. The error is a random variable with mean

of zero.2. The variance of , denoted by 2, is the

same for all values of the independent variable.

3. The values of are independent.4. The error is a normally distributed

random variable.

22

Testing for Significance

To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of 1 is zero.

Two tests are commonly used– t Test– F Test

Both tests require an estimate of 2, the variance of in the regression model.

23

An Estimate of 2

The mean square error (MSE) provides the estimate

of 2, and the notation s2 is also used.

s2 = MSE = SSE/(n-2)

where:

Testing for Significance

210

2 )()ˆ(SSE iiii xbbyyy 210

2 )()ˆ(SSE iiii xbbyyy

24

Testing for Significance

An Estimate of – To estimate we take the square root of 2.– The resulting s is called the standard error of

the estimate.

2

SSEMSE

n

s2

SSEMSE

n

s

25

Hypotheses

H0: 1 = 0

Ha: 1 = 0

Test Statistic

where

Testing for Significance: t Test

tbsb

1

1

tbsb

1

1

2)(1

xx

ss

i

b

2)(1

xx

ss

i

b

26

n Rejection Rule

Reject H0 if t < -tor t > t

where: t is based on a t distribution

with n - 2 degrees of freedom

Testing for Significance: Testing for Significance: tt Test Test

27

t Test – Hypotheses

H0: 1 = 0

Ha: 1 = 0

– Rejection Rule For = .05 and d.f. = 3, t.025 =

_____ Reject H0 if t > t.025 = _____

Example: Reed Auto Sales

28

n t Test

• Test Statistics

t t = _____/_____ = 4.63= _____/_____ = 4.63

• Conclusions

t t = 4.63 > 3.182, so reject = 4.63 > 3.182, so reject HH00

Example: Reed Auto SalesExample: Reed Auto Sales

29

Confidence Interval for 1

We can use a 95% confidence interval for 1 to test the hypotheses just used in the t test.H0 is rejected if the hypothesized value of 1 is not included in the confidence interval for 1.

30

The form of a confidence interval for 1 is:

where b1 is the point estimate

is the margin of erroris the t value providing an

areaof /2 in the upper tail of a

t distribution with n - 2 degrees

of freedom

Confidence Interval for 1

12/1 bstb 12/1 bstb

12/ bst 12/ bst2/t 2/t

31

Rejection RuleReject H0 if 0 is not included in

the confidence interval for 1.

95% Confidence Interval for 1

= 5 +/- 3.182(1.08) = 5 +/- 3.44

or ____ to ____

Conclusion0 is not included in the confidence interval.

Reject H0

Example: Reed Auto Sales

12/1 bstb 12/1 bstb

32

n HypothesesHypotheses

HH00: : 11 = 0 = 0

HHaa: : 11 = 0 = 0

n Test StatisticTest Statistic

FF = MSR/MSE = MSR/MSE

Testing for Significance: F Test

33

n Rejection Rule

Reject Reject HH00 if if FF > > FF

where:where: FF is based on an is based on an FF distribution distribution

with 1 d.f. in the numerator andwith 1 d.f. in the numerator and

nn - 2 d.f. in the denominator - 2 d.f. in the denominator

Testing for Significance: Testing for Significance: FF Test Test

34

n F F Test Test

• HypothesesHypotheses

HH00: : 11 = 0 = 0

HHaa: : 11 = 0 = 0

• Rejection RuleRejection Rule

For For = .05 and d.f. = 1, 3: = .05 and d.f. = 1, 3: FF.05.05 = = ____________

Reject Reject HH00 if F > if F > FF.05.05 = ______. = ______.

Example: Reed Auto SalesExample: Reed Auto Sales

35

n F Test

• Test Statistic

FF = MSR/MSE = ____ / ______ = 21.43 = MSR/MSE = ____ / ______ = 21.43

• Conclusion

FF = 21.43 > 10.13, so we reject = 21.43 > 10.13, so we reject HH00..

Example: Reed Auto SalesExample: Reed Auto Sales

36

Some Cautions about theInterpretation of Significance Tests

Rejecting H0: 1 = 0 and concluding that the relationship between x and y is significant does not enable us to conclude that a cause-and-effect relationship is present between x and y.Just because we are able to reject H0: 1 = 0 and demonstrate statistical significance does not enable us to conclude that there is a linear relationship between x and y.

37

n Confidence Interval Estimate of E(yp)

n Prediction Interval Estimate of yp

yypp ++ tt/2 /2 ssindind

where:where: confidence coefficient is 1 - confidence coefficient is 1 - andand

tt/2 /2 is based on ais based on a t t distributiondistribution

with with nn - 2 degrees of freedom - 2 degrees of freedom

Using the Estimated Regression Equationfor Estimation and Prediction

/ y t sp yp 2 / y t sp yp 2

38

Point EstimationIf 3 TV ads are run prior to a sale, we expect the mean number of cars sold to be:

y = 10 + 5(3) = ______ cars^

Example: Reed Auto Sales

39

n Confidence Interval for E(yp)

95% confidence interval estimate of the 95% confidence interval estimate of the meanmean number of cars sold when 3 TV ads are run is:number of cars sold when 3 TV ads are run is:

25 25 ++ 4.61 = ______ to _______ cars 4.61 = ______ to _______ cars

Example: Reed Auto Sales

40

n Prediction Interval for yp

95% prediction interval estimate of the 95% prediction interval estimate of the number of cars sold in number of cars sold in one particular weekone particular week when 3 TV ads are run is:when 3 TV ads are run is:

25 25 ++ 8.28 = _____ to ______ cars 8.28 = _____ to ______ cars

Example: Reed Auto Sales

41

Residual for Observation i

yi – yi

Standardized Residual for Observation i

where:

and

Residual Analysis

^

y ysi i

y yi i

y ysi i

y yi i

^

^

s s hy y ii i 1s s hy y ii i 1^

2

2

)(

)(1

xx

xx

nh

i

ii

2

2

)(

)(1

xx

xx

nh

i

ii

42

Example: Reed Auto Sales

Residuals

Observation Predicted Cars Sold Residuals

1 15 -1

2 25 -1

3 20 -2

4 15 2

5 25 2

Observation Predicted Cars Sold Residuals

1 15 -1

2 25 -1

3 20 -2

4 15 2

5 25 2

43

Example: Reed Auto Sales

Residual Plot

TV Ads Residual Plot

-3

-2

-1

0

1

2

3

0 1 2 3 4TV Ads

Re

sid

ua

ls

44

Residual Analysis

Residual Plot

xx

ˆy y ˆy y

00

Good PatternGood Pattern

Resi

du

al

Resi

du

al

45

Residual Analysis

n Residual Plot

xx

ˆy y ˆy y

00

Nonconstant VarianceNonconstant Variance

Resi

du

al

Resi

du

al

46

Residual AnalysisResidual Analysis

n Residual Plot

xx

ˆy y ˆy y

00

Model Form Not AdequateModel Form Not Adequate

Resi

du

al

Resi

du

al

47

End of Chapter 12

Recommended