1 1 Slide Simple Linear Regression Part A n Simple Linear Regression Model n Least Squares Method n...

Preview:

Citation preview

1 1 Slide

Slide

Simple Linear RegressionPart A

Simple Linear Regression Model Least Squares Method Coefficient of Determination Model Assumptions Testing for Significance

2 2 Slide

Slide

Simple Linear Regression Model

y = b0 + b1x +e

where: b0 and b1 are called parameters of the model,

e is a random variable called the error term.

The simple linear regression model is:

The equation that describes how y is related to x and an error term is called the regression model.

dependent variable

independent variable

3 3 Slide

Slide

Simple Linear Regression Equation

The simple linear regression equation is:

• E(y) is the expected value of y for a given x value.

• b1 is the slope of the regression line.

• b0 is the y intercept of the regression line.

• Graph of the regression equation is a straight line.

E(y) = 0 + 1x

4 4 Slide

Slide

Simple Linear Regression Equation

Positive Linear Relationship

E(y)E(y)

xx

Slope b1

is positive

Regression line

Intercept b0

5 5 Slide

Slide

Simple Linear Regression Equation

Negative Linear Relationship

E(y)E(y)

xx

Slope b1

is negative

Regression lineIntercept b0

6 6 Slide

Slide

Simple Linear Regression Equation

No Relationship

E(y)E(y)

xx

Slope b1

is 0

Regression lineIntercept

b0

7 7 Slide

Slide

Estimated Simple Linear Regression Equation

The estimated simple linear regression equation

0 1y b b x

• is the estimated value of y for a given x value.y• b1 is the slope of the line.

• b0 is the y intercept of the line.

• The graph is called the estimated regression line.

8 8 Slide

Slide

Estimation Process

Regression Modely = b0 + b1x +e

Regression EquationE(y) = b0 + b1x

Unknown Parametersb0, b1

Sample Data:x y

x1 y1

. . . . xn yn

b0 and b1

provide estimates ofb0 and b1

EstimatedRegression Equation

Sample Statistics

b0, b1

0 1y b b x

9 9 Slide

Slide

Least Squares Method

Least Squares Criterion

min (y yi i )2

where:yi = observed value of the dependent variable

for the ith observation^yi = estimated value of the dependent variable

for the ith observation

10 10 Slide

Slide

Slope for the Estimated Regression Equation

1 2

( )( )

( )i i

i

x x y yb

x x

Least Squares Method

11 11 Slide

Slide

y-Intercept for the Estimated Regression Equation

Least Squares Method

0 1b y b x

where:xi = value of independent variable for ith observation

n = total number of observations

_y = mean value for dependent variable

_x = mean value for independent variable

yi = value of dependent variable for ith observation

12 12 Slide

Slide

Reed Auto periodically hasa special week-long sale. As part of the advertisingcampaign Reed runs one ormore television commercialsduring the weekend preceding the sale. Data

from asample of 5 previous sales are shown on the next

slide.

Simple Linear Regression

Example: Reed Auto Sales

13 13 Slide

Slide

Simple Linear Regression

Example: Reed Auto Sales

Number of TV Ads

Number ofCars Sold

13213

1424181727

14 14 Slide

Slide

Estimated Regression Equation

ˆ 10 5y x

1 2

( )( ) 205

( ) 4i i

i

x x y yb

x x

0 1 20 5(2) 10b y b x

Slope for the Estimated Regression Equation

y-Intercept for the Estimated Regression Equation

Estimated Regression Equation

15 15 Slide

Slide

Excel Worksheet (showing data)

Estimated Regression Equation

A B C D1 Week TV Ads Cars Sold 2 1 1 14 3 2 3 24 4 3 2 18 5 4 1 17 6 5 3 27 7

Download Ch12-CarSales.xlsx

16 16 Slide

Slide

Producing a Scatter Diagram and Trend Line

Step 1 Select cells B2:C6Step 2 Click the Insert tab on the RibbonStep 3 In the Charts group, click Scatter

Step 4 When the list of scatter diagram subtypes appears,

Click Scatter with only Markers

Estimated Regression Equation

Step 5 In the Chart Layouts group, click Layout 1

Step 6 Right-click on the Chart Title to display options; choose DeleteStep 7 Select the Horizontal (Value) Axis Title and replace it with TV Ads

17 17 Slide

Slide

Producing a Scatter Diagram and Trend Line

Estimated Regression Equation

Step 8 Select the Vertical (Value) Axis Title and replace it with Cars Sold

Step 9 Right-click on the Series 1 Legend Entry to display a list of options; choose Delete

Step 10 Position the mouse pointer over any Vertical (Value) Axis Major Gridline in the diagram and right-click to display a list of options; choose Delete

18 18 Slide

Slide

Producing a Scatter Diagram and Trend Line

Estimated Regression Equation

Step 11 Position the mouse pointer over any data point in the diagram and right-click to display a list of options; choose Add TrendlineStep 12 When the Format Trendline dialog box appears, Select Trendline Options and then Choose Linear from the Trend/Regression Type list Choose Display Equation on chart Click Close

19 19 Slide

Slide

Scatter Diagram and Trend Line

y = 5x + 10

0

5

10

15

20

25

30

0 1 2 3 4TV Ads

Ca

rs S

old

20 20 Slide

Slide

Coefficient of Determination

Relationship Among SST, SSR, SSE

where: SST = total sum of squares SSR = sum of squares due to regression SSE = sum of squares due to error

SST = SSR + SSE

2( )iy y 2ˆ( )iy y 2ˆ( )i iy y

21 21 Slide

Slide

The coefficient of determination is:

Coefficient of Determination

where:SSR = sum of squares due to regressionSST = total sum of squares

r2 = SSR/SST

22 22 Slide

Slide

Coefficient of Determination

r2 = SSR/SST = 100/114 = .8772

The regression relationship is very strong; 88%of the variability in the number of cars sold can beexplained by the linear relationship between thenumber of TV ads and the number of cars sold.

23 23 Slide

Slide

Using Excel to Produce r 2

Step 3 When the Format Trendline dialog box appears, Select Trendline Options and then Choose Display R-squared value on chart

Click Close

Step 2 Choose Add Trendline

Step 1 Position the mouse pointer over any data point

in the scatter diagram and right click at display a list of options

Coefficient of Determination

24 24 Slide

Slide

Excel Value Worksheet (showing r 2)

Coefficient of Determination

y = 5x + 10

R2 = 0.8772

0

5

10

15

20

25

30

0 1 2 3 4TV Ads

Ca

rs S

old

25 25 Slide

Slide

Sample Correlation Coefficient

21 ) of(sign rbrxy

ionDeterminat oft Coefficien ) of(sign 1brxy

where: b1 = the slope of the estimated regression

equation xbby 10ˆ

Excel approach:rxy = CORREL(x range, y range)

26 26 Slide

Slide

21 ) of(sign rbrxy

The sign of b1 in the equation is “+”.ˆ 10 5y x

=+ .8772xyr

Sample Correlation Coefficient

rxy = +.9366

27 27 Slide

Slide

Using Excel’s Regression Tool

The Regression tool can be used to perform a complete regression analysis.

Excel also has a comprehensive tool in its Data Analysis package called Regression.

Up to this point, you have seen how Excel can be used for various parts of a regression analysis.

28 28 Slide

Slide

Using Excel’s Regression Tool

Excel Worksheet (showing data)

A B C D1 Week TV Ads Cars Sold 2 1 1 14 3 2 3 24 4 3 2 18 5 4 1 17 6 5 3 27 7

29 29 Slide

Slide

Using Excel’s Regression Tool

Performing the Regression Analysis

Step 3 Choose Regression from the list of Analysis Tools

Step 2 In the Analysis group, click Data AnalysisStep 1 Click the Data tab on the Ribbon

30 30 Slide

Slide

Using Excel’s Regression Tool

Excel Regression Dialog Box

31 31 Slide

Slide

Using Excel’s Regression Tool

Excel Value WorksheetA B C D E F G H I

1 Week TV Ads Cars Sold2 1 1 143 2 3 244 3 2 185 4 1 176 5 3 2778 SUMMARY OUTPUT9

10 Regression Statistics11 Multiple R 0.93658581212 R Square 0.87719298213 Adjusted R Square 0.8362573114 Standard Error 2.16024689915 Observations 51617 ANOVA18 df SS MS F Significance F19 Regression 1 100 100 21.42857 0.01898623120 Residual 3 14 4.66666721 Total 4 1142223 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%24 Intercept 10 2.366431913 4.225771 0.024236 2.468950436 17.53104956 2.468950436 17.5310495625 TV Ads 5 1.08012345 4.6291 0.018986 1.562561893 8.437438107 1.562561893 8.43743810726

ANOVA Output

Regression Statistics Output

Data

Estimated Regression Equation Output

32 32 Slide

Slide

Using Excel’s Regression Tool

Note: Columns F-I are not shown.

Excel Value Worksheet (bottom-left portion)

A B C D E2223 Coeffic. Std. Err. t Stat P-value24 Intercept 10 2.36643 4.2258 0.0242425 TV Ads 5 1.08012 4.6291 0.0189926

ˆ 10 5y x

Estimated Regression Equation

33 33 Slide

Slide

Using Excel’s Regression Tool

Note: Columns C-E are hidden.

Excel Value Worksheet (bottom-right portion) A B F G H I

2223 Coeffic. Low. 95% Up. 95% Low. 95.0% Up. 95.0%24 Intercept 10 2.46895 17.53105 2.46895044 17.531049625 TV Ads 5 1.562562 8.437438 1.56256189 8.4374381126

34 34 Slide

Slide

Using Excel’s Regression Tool

Excel Value Worksheet (middle portion)A B C D E F1617 ANOVA18 df SS MS F Significance F19 Regression 1 100 100 21.4286 0.01898623120 Residual 3 14 4.6666721 Total 4 11422

35 35 Slide

Slide

Using Excel’s Regression Tool

Excel Value Worksheet (top portion)

A B C9

10 Regression Statistics11 Multiple R 0.93658581212 R Square 0.87719298213 Adjusted R Square 0.8362573114 Standard Error 2.16024689915 Observations 516

36 36 Slide

Slide

Assumptions About the Error Term e

1. The error is a random variable with mean of zero.1. The error is a random variable with mean of zero.

2. The variance of , denoted by 2, is the same for all values of the independent variable.2. The variance of , denoted by 2, is the same for all values of the independent variable.

3. The values of are independent.3. The values of are independent.

4. The error is a normally distributed random variable.4. The error is a normally distributed random variable.

37 37 Slide

Slide

Testing for Significance

To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of b1 is zero.

To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of b1 is zero.

Two tests are commonly used: Two tests are commonly used:

t Testt Test and F TestF Test

Both the t test and F test require an estimate of s 2, the variance of e in the regression model. Both the t test and F test require an estimate of s 2, the variance of e in the regression model.

Individualsignificance

test

Overallsignificance

test

38 38 Slide

Slide

An Estimate of s

Testing for Significance

210

2 )()ˆ(SSE iiii xbbyyy

where:

s 2 = MSE = SSE/(n - 2)

The mean square error (MSE) provides the estimateof s 2, and the notation s2 is also used.

39 39 Slide

Slide

Testing for Significance

An Estimate of s

2

SSEMSE

n

s

• To estimate s we take the square root of s 2.

• The resulting s is called the standard error of the estimate.

40 40 Slide

Slide

Hypotheses

Test Statistic

Testing for Significance: F Test

F = MSR/MSE

0 1: 0H

1: 0aH

41 41 Slide

Slide

Rejection Rule

Testing for Significance: F Test

where:F is based on an F distribution with

1 degree of freedom in the numerator andn - 2 degrees of freedom in the denominator

p-value approach: Reject H0 if p-value < aCritical value approach: Reject H0 if F > F

42 42 Slide

Slide

1. Determine the hypotheses.

2. Specify the level of significance.

3. Compute the value of the test statistic F.

a = .05

Testing for Significance: F Test

0 1: 0H

1: 0aH

p-value and critical value approach

F = MSR/MSE = 100/4.667 = 21.43test statistic

Use Ch12-CarSales.xlsxp-value

43 43 Slide

Slide

4. Compute the p-value.

Testing for Significance: F Testp-value approach

p-value is the area on the right of F with numerator’sdegrees of freedom 1 and denominator’s degrees offreedom n-2=5-2=3

F=21.43p-value

0.01<p-value<0.025

Excel approach:P-value=FIDST()=FDIST(21.43,1,3)=0.019

44 44 Slide

Slide

Testing for Significance: F Test

5. Determine whether to reject H0.

The p-value corresponding to F = 21.43 is less than .05. Hence, we reject H0.

The statistical evidence is sufficient to concludethat we have a significant relationship between thenumber of TV ads aired and the number of cars sold.

p-value approach

45 45 Slide

Slide

4. Compute the critical value.

Testing for Significance: F TestCritical value approach

Critical value is the value when the area on the right ofit is with numerator’s degrees of freedom 1 anddenominator’s degrees offreedom n-2=5-2=3

critical valuealpha level

Excel approach:Critical value=FINV()=FINV(0.05,1,3)=10.13

46 46 Slide

Slide

Testing for Significance: F Test

5. Determine whether to reject H0.

The critical value corresponding to the level of significance 0.05 is 10.13. The test statistic F = 21.43 is greater than the critical value. Hence, we reject H0.

The statistical evidence is sufficient to concludethat we have a significant relationship between thenumber of TV ads aired and the number of cars sold.

Critical value approach

47 47 Slide

Slide

Hypotheses

Test Statistic

Testing for Significance: t Test

0 1: 0H

1: 0aH

1

1

b

bt

s

1 2( )b

i

ss

x xwhere

48 48 Slide

Slide

Rejection Rule

Testing for Significance: t Test

where: t is based on a t distribution

with n - 2 degrees of freedom

p-value approach: Reject H0 if p-value < aCritical value approach: Reject H0 if t < -tor t > t

49 49 Slide

Slide

1. Determine the hypotheses.

2. Specify the level of significance.

3. Compute the test statistic.

a = .05

Testing for Significance: t Test

0 1: 0H

1: 0aH

1

1

b

bt

s

p-value and critical value approach

test statistic

p-value

Use Ch12-CarSales.xlsx

50 50 Slide

Slide

4. Compute the p-value.

Testing for Significance: t Testp-value approach

p-value is the area on both tails beyond the test statistic t.It’s based on n-2=3 degrees of freedom.

T=4.63

upper tail area

0.01<p-value<0.02p-value is from both tails.

Double the range.

Excel approach:P-value=TIDST()=TDIST(4.63,3,2)=0.019

51 51 Slide

Slide

Testing for Significance: t Test

5. Determine whether to reject H0.

The p-value is less than the alpha level. We can reject H0.

The TV Ads independent variable is a significant factorat the 0.05 level.

p-value approach

52 52 Slide

Slide

Testing for Significance: t Test

4. Compute the critical value and identify rejection rule.

Rejection Rule: Reject H0 if t > t or t < -t

Critical value approach

Critical values for two-tailed test are -tand t.At .05 level, they are -t.025and t.025 (with 3 degrees offreedom).

critical value t.025Excel approach: Critical value t.025 =TINV(0.025*2,3)=3.182 - t.025 =-TINV(0.025*2,3)=-3.182

53 53 Slide

Slide

Testing for Significance: t Test

5. Determine whether to reject H0.

t = 4.63 > 3.182. We can reject H0.

The TV Ads independent variable is a significant factor at the 0.05 level.

Critical value approach

54 54 Slide

Slide

Confidence Interval for 1

H0 is rejected if 0 is not included in the confidence interval for 1.

We can use a 95% confidence interval for 1 to test the hypotheses just used in the t test.

55 55 Slide

Slide

The form of a confidence interval for 1 is:

Confidence Interval for 1

11 / 2 bb t s

where is the t value providing an areaof a/2 in the upper tail of a t distributionwith n - 2 degrees of freedom

2/tb1 is the

pointestimat

or

is themarginof error

1/ 2 bt s

56 56 Slide

Slide

Confidence Interval for 1

Reject H0 if 0 is not included in the confidence interval for 1.

0 is not included in the confidence interval. Reject H0

= 5 +/- 3.182(1.08) = 5 +/- 3.4412/1 bstb

or 1.56 to 8.44

Rejection Rule

95% Confidence Interval for 1

Conclusion

b1 sb1Confidence Interval for 1

t.025 = TINV(0.025*2,3)=3.182

57 57 Slide

Slide

Confidence Interval for 1

Confidence Interval for 1

At a new level

A new level can be specified here

58 58 Slide

Slide

Some Cautions about theInterpretation of Significance Tests

Just because we are able to reject H0: b1 = 0 and demonstrate statistical significance does not enable

us to conclude that there is a linear relationshipbetween x and y.

Rejecting H0: b1 = 0 and concluding that the

relationship between x and y is significant does not enable us to conclude that a cause-and-effect

relationship is present between x and y.

59 59 Slide

Slide

Simple Linear RegressionPart B

Using the Estimated Regression Equation for Estimation Residual Analysis: Validating Model Assumptions Outliers and Influential Observations

60 60 Slide

Slide

1. If 3 TV ads are run prior to a sale, what is the estimated number of cars sold?

Estimation of y

^y = 10 + 5(3) = 25 cars

ˆ 10 5y x

61 61 Slide

Slide

Residual Analysis

ˆi iy y

Much of the residual analysis is based on an examination of graphical plots.

Residual for Observation i

The residuals provide the best information about e .

If the assumptions about the error term e appear questionable, the hypothesis tests about the significance of the regression relationship and the interval estimation results may not be valid.

62 62 Slide

Slide

Residual Plot Against x

If the assumption that the variance of e is the same for all values of x is valid, and the assumed regression model is an adequate representation of the relationship between the variables, then

The residual plot should give an overall impression of a horizontal band of points

63 63 Slide

Slide

x

ˆy y

0

Good PatternR

esi

du

al

Residual Plot Against x

64 64 Slide

Slide

Residual Plot Against x

x

ˆy y

0

Resi

du

al

Nonconstant Variance

65 65 Slide

Slide

Residual Plot Against x

x

ˆy y

0

Resi

du

al

Model Form Not Adequate

66 66 Slide

Slide

Residuals

Residual Plot Against x

Observation Predicted Cars Sold Residuals

1 15 -1

2 25 -1

3 20 -2

4 15 2

5 25 2

67 67 Slide

Slide

Using Excel to Produce a Residual Plot

• The output will include two new items:• A plot of the residuals against the

independent variable, and• A list of predicted values of y and the

corresponding residual values.

• When the Regression dialog box appears, we must also select the Residual Plot option.

• The steps outlined earlier to obtain the regression output are performed with one change.

Residual Plot Against x

68 68 Slide

Slide

Excel Value Worksheet (bottom portion)

Residual Plot Against x

A B C2829 RESIDUAL OUTPUT3031 Observation Predicted Cars Sold Residuals32 1 15 -133 2 25 -134 3 20 -235 4 15 236 5 25 237

69 69 Slide

Slide

Residual Plot Against x

TV Ads Residual Plot

-3

-2

-1

0

1

2

3

0 1 2 3 4TV Ads

Res

idu

als

70 70 Slide

Slide

Standardized Residual for Observation i

Standardized Residuals

ˆ

ˆ

i i

i i

y y

y ys

ˆ 1i i iy ys s h

2

2

( )1( )i

ii

x xh

n x x

where:

71 71 Slide

Slide

Standardized Residual Plot

The standardized residual plot can provide insight about the assumption that the error term e has a normal distribution.

If this assumption is satisfied, the distribution of the standardized residuals should appear to come from a standard normal probability distribution.

72 72 Slide

Slide

Excel’s Regression tool be used to obtain the standardized residuals.

The steps described earlier to conduct a regression analysis are performed with one change:• When the Regression dialog box appears,

we must select the Standardized Residuals option

The Standardized Residuals option does not automatically produce a standardized residual plot.

Standardized Residual Plot

73 73 Slide

Slide

Excel Value Worksheet

Standardized Residual Plot

A B C D2829 RESIDUAL OUTPUT3031 Observation Predicted Y Residuals Standard Residuals32 1 15 -1 -0.53452248433 2 25 -1 -0.53452248434 3 20 -2 -1.06904496835 4 15 2 1.06904496836 5 25 2 1.069044968

74 74 Slide

Slide

Excel’s Chart Wizard can be used to construct the standardized residual plot.

A scatter diagram is developed in which:• The values of the independent variable are

placed on the horizontal axis• The values of the standardized residuals are

placed on the vertical axis

Standardized Residual Plot

75 75 Slide

Slide

Excel Standardized Residual Plot

Standardized Residual Plot

A B C D2829 RESIDUAL OUTPUT3031 Observation Predicted Y ResidualsStandard Residuals32 1 15 -1 -0.53452233 2 25 -1 -0.53452234 3 20 -2 -1.06904535 4 15 2 1.06904536 5 25 2 1.06904537

-1.5

-1

-0.5

0

0.5

1

1.5

0 10 20 30

Cars Sold

Sta

nd

ard

Re

sid

ua

ls

76 76 Slide

Slide

Standardized Residual Plot

All of the standardized residuals are between –1.5 and +1.5 indicating that there is no reason to question the assumption that e has a normal distribution.

77 77 Slide

Slide

Outliers and Influential Observations

Detecting Outliers• An outlier is an observation that is unusual

in comparison with the other data.• Minitab classifies an observation as an

outlier if its standardized residual value is < -2 or > +2.

• This standardized residual rule sometimes fails to identify an unusually large observation as being an outlier.

• This rule’s shortcoming can be circumvented by using studentized deleted residuals.

• The |i th studentized deleted residual| will be larger than the |i th standardized residual|.

Recommended