98
Chapter 7 Simple Linear Regression Business Statistic

Ch 6. Simple Regression

Embed Size (px)

Citation preview

Page 1: Ch 6. Simple Regression

Chapter 7Simple Linear

Regression

Business Statistic

Page 2: Ch 6. Simple Regression

Models

Page 3: Ch 6. Simple Regression

Models Representation of some phenomenon Mathematical model is a

mathematical expression of some phenomenon

Often describe relationships between variables

Types Deterministic models Probabilistic models

Page 4: Ch 6. Simple Regression

Deterministic Models Hypothesize exact

relationships Suitable when prediction error

is negligible Example: force is exactly

mass times acceleration F = m·a

Page 5: Ch 6. Simple Regression

Probabilistic Models Hypothesize two components

DeterministicRandom error

Example: sales volume (y) is 10 times advertising spending (x) + random error

y = 10x + Random error may be due to factors

other than advertising

Page 6: Ch 6. Simple Regression

Types of Probabilistic Models

ProbabilisticModels

Regression Models

Correlation Models

Page 7: Ch 6. Simple Regression

Regression Models

Page 8: Ch 6. Simple Regression

Types of Probabilistic Models

ProbabilisticModels

Regression Models

Correlation Models

Page 9: Ch 6. Simple Regression

Regression Models• Answers ‘What is the relationship between the

variables?’• Equation used

– One numerical dependent (response) variable What is to be predicted

– One or more numerical or categorical independent (explanatory) variables

• Used mainly for prediction and estimation

Page 10: Ch 6. Simple Regression

Regression Modeling Steps

1. Hypothesize deterministic component2. Estimate unknown model parameters3. Specify probability distribution of

random error term• Estimate standard deviation of error

4. Evaluate model5. Use model for prediction and estimation

Page 11: Ch 6. Simple Regression

Model Specification

Page 12: Ch 6. Simple Regression

Regression Modeling Steps

1. Hypothesize deterministic component

2. Estimate unknown model parameters3. Specify probability distribution of

random error term• Estimate standard deviation of error

4. Evaluate model5. Use model for prediction and estimation

Page 13: Ch 6. Simple Regression

Specifying the Model

1. Define variables• Conceptual (e.g., Advertising, price)• Empirical (e.g., List price, regular price) • Measurement (e.g., $, Units)

2. Hypothesize nature of relationship• Expected effects (i.e., Coefficients’ signs)• Functional form (linear or non-linear)• Interactions

Page 14: Ch 6. Simple Regression

Model Specification Is Based on Theory

Theory of field (e.g., Sociology)

Mathematical theory Previous research ‘Common sense’

Page 15: Ch 6. Simple Regression

Advertising

Sales

Advertising

Sales

Advertising

Sales

Advertising

Sales

Thinking Challenge: Which Is More Logical?

Page 16: Ch 6. Simple Regression

Simple

1 ExplanatoryVariable

Types of Regression Models

RegressionModels

2+ ExplanatoryVariables

Multiple

Linear Non-Linear Linear Non-

Linear

Page 17: Ch 6. Simple Regression

Linear Regression Model

Page 18: Ch 6. Simple Regression

Types of Regression Models

Simple

1 ExplanatoryVariable

RegressionModels

2+ ExplanatoryVariables

Multiple

Linear Non-Linear Linear Non-

Linear

Page 19: Ch 6. Simple Regression

y x 0 1

Linear Regression Model

Relationship between variables is a linear function

Dependent (Response) Variable

Independent (Explanatory) Variable

Population Slope

Population y-intercept

Random Error

Page 20: Ch 6. Simple Regression

y

E(y) = β0 + β1x (line of means)

β0 = y-interceptx

Change in y

Change in xβ1 = Slope

Line of Means

Page 21: Ch 6. Simple Regression

Population & Sample Regression Models

Population

$ $

$

$ $

Unknown Relationship

0 1y x

Random Sample

$$ $$

$$ $$

0 1ˆ ˆ ˆy x

Page 22: Ch 6. Simple Regression

y

x

Population Linear Regression Model

0 1i i iy x

0 1E y x

Observedvalue

Observed value

i = Random error

Page 23: Ch 6. Simple Regression

y

x

0 1ˆ ˆ ˆi i iy x

Sample Linear Regression Model

0 1ˆ ˆˆi iy x

Unsampled observation

i = Random error

Observed value

^

Page 24: Ch 6. Simple Regression

Estimating Parameters:Least Squares Method

Page 25: Ch 6. Simple Regression

Regression Modeling Steps

1. Hypothesize deterministic component2. Estimate unknown model parameters3. Specify probability distribution of random

error term• Estimate standard deviation of error

4. Evaluate model5. Use model for prediction and estimation

Page 26: Ch 6. Simple Regression

Scattergram1. Plot of all (xi, yi) pairs2. Suggests how well model will fit

0204060

0 20 40 60

x

y

Page 27: Ch 6. Simple Regression

0204060

0 20 40 60x

y

Thinking Challenge

• How would you draw a line through the points?• How do you determine which line ‘fits best’?

Page 28: Ch 6. Simple Regression

Least Squares ‘Best fit’ means difference

between actual y values and predicted y values are a minimum

But positive differences off-set negative 2 2

1 1

ˆ ˆn n

ii ii i

y y

• Least Squares minimizes the Sum of the Squared Differences (SSE)

Page 29: Ch 6. Simple Regression

Least Squares Graphically

2

y

x

1 3

4

^^

^^

2 0 1 2 2ˆ ˆ ˆy x

0 1ˆ ˆˆi iy x

2 2 2 2 21 2 3 4

1

ˆ ˆ ˆ ˆ ˆLS minimizes n

ii

Page 30: Ch 6. Simple Regression

Coefficient Equations

1 1

11 2

12

1

ˆ

n n

i ini i

i ixy i

nxx

ini

ii

x yx ySS n

SSx

xn

Slope

y-intercept

0 1ˆ ˆy x Prediction Equation

0 1ˆ ˆy x

Page 31: Ch 6. Simple Regression

Computation Table

xi2

x12

x22

: :xn

2

Σxi2

yi

y1

y2

:yn

Σyi

xi

x1

x2

:xn

Σxi

2

2

2

2

2

yi

y1

y2

:yn

Σyi

xiyi

Σxiyi

x1y1

x2y2

xnyn

Page 32: Ch 6. Simple Regression

Interpretation of Coefficients

^

^^1. Slope (1)

• Estimated y changes by 1 for each 1unit increase in x— If 1 = 2, then Sales (y) is expected to increase by 2

for each 1 unit increase in Advertising (x)

2. Y-Intercept (0)• Average value of y when x = 0

— If 0 = 4, then Average Sales (y) is expected to be 4 when Advertising (x) is 0

^

^

Page 33: Ch 6. Simple Regression

Least Squares ExampleYou’re a marketing analyst for Hasbro Toys. You gather the following data:

Ad $ Sales (Units)1 12 13 24 25 4

Find the least squares line relatingsales and advertising.

Page 34: Ch 6. Simple Regression

01234

0 1 2 3 4 5

Scattergram Sales vs. Advertising

Sales

Advertising

Page 35: Ch 6. Simple Regression

Parameter Estimation Solution Table

2 2

1 1 1 1 12 1 4 1 23 2 9 4 64 2 16 4 85 4 25 16 20

15 10 55 26 37

xi yi xi yi xiyi

Page 36: Ch 6. Simple Regression

Parameter Estimation Solution

1 1

11 2 2

12

1

15 1037

5ˆ .7015

555

n n

i ini i

i ii

n

ini

ii

x yx y

n

xx

n

ˆ .1 .7y x

0 1ˆ ˆ 2 .70 3 .10y x

Page 37: Ch 6. Simple Regression

Parameter Estimation Computer Output

Parameter Estimates

Parameter Standard T for H0:Variable DF Estimate Error Param=0 Prob>|T|INTERCEP 1 -0.1000 0.6350 -0.157 0.8849ADVERT 1 0.7000 0.1914 3.656 0.0354

0^

1^

ˆ .1 .7y x

Page 38: Ch 6. Simple Regression

Coefficient Interpretation Solution1. Slope (1)

• Sales Volume (y) is expected to increase by .7 units for each $1 increase in Advertising (x)

^

2. Y-Intercept (0)• Average value of Sales Volume (y) is -.10 units

when Advertising (x) is 0— Difficult to explain to marketing manager— Expect some sales without advertising

^

Page 39: Ch 6. Simple Regression

01234

0 1 2 3 4 5

Regression Line Fitted to the Data

Sales

Advertising

ˆ .1 .7y x

Page 40: Ch 6. Simple Regression

Least Squares Thinking Challenge

You’re an economist for the county cooperative. You gather the following data:Fertilizer (lb.)Yield (lb.)

4 3.0 6 5.510 6.512 9.0

Find the least squares line relatingcrop yield and fertilizer.

Page 41: Ch 6. Simple Regression

02468

10

0 5 10 15

Scattergram Crop Yield vs. Fertilizer*

Yield (lb.)

Fertilizer (lb.)

Page 42: Ch 6. Simple Regression

Parameter Estimation Solution Table*

4 3.0 16 9.00 12 6 5.5 36 30.25 3310 6.5 100 42.25 6512 9.0 144 81.00 10832 24.0 296 162.50 218

xi2yi xi yi xiyi

2

Page 43: Ch 6. Simple Regression

Parameter Estimation Solution*

1 1

11 2 2

12

1

32 24218

4ˆ .6532

2964

n n

i ini i

i ii

n

ini

ii

x yx y

n

xx

n

0 1ˆ ˆ 6 .65 8 .80y x

ˆ .8 .65y x

Page 44: Ch 6. Simple Regression

Coefficient Interpretation Solution*

2. Y-Intercept (0)• Average Crop Yield (y) is expected to be 0.8 lb.

when no Fertilizer (x) is used

^

^1. Slope (1)• Crop Yield (y) is expected to increase by .65 lb.

for each 1 lb. increase in Fertilizer (x)

Page 45: Ch 6. Simple Regression

Regression Line Fitted to the Data*

02468

10

0 5 10 15

Yield (lb.)

Fertilizer (lb.)

ˆ .8 .65y x

Page 46: Ch 6. Simple Regression

Probability Distribution of Random Error

Page 47: Ch 6. Simple Regression

Regression Modeling Steps

1. Hypothesize deterministic component2. Estimate unknown model parameters3. Specify probability distribution of

random error term• Estimate standard deviation of error

4. Evaluate model5. Use model for prediction and estimation

Page 48: Ch 6. Simple Regression

Linear Regression Assumptions

1. Mean of probability distribution of error, ε, is 0

2. Probability distribution of error has constant variance

3. Probability distribution of error, ε, is normal

4. Errors are independent

Page 49: Ch 6. Simple Regression

Error Probability Distribution

x1 x2 x3

yE(y) = β0 + β1x

x

Page 50: Ch 6. Simple Regression

Random Error Variation

• Measured by standard error of regression model– Sample standard deviation of : s^

• Affects several factors– Parameter significance– Prediction accuracy

^• Variation of actual y from predicted y, y

Page 51: Ch 6. Simple Regression

y

xxi

Variation Measures

0 1ˆ ˆˆi iy x

yi 2ˆ( )i iy y

Unexplained sum of squares

2( )iy yTotal sum of squares

2ˆ( )iy yExplained sum of squares

y

Page 52: Ch 6. Simple Regression

Estimation of σ2

22 ˆ2 i i

SSEs where SSE y yn

2

2SSEs sn

Page 53: Ch 6. Simple Regression

Calculating SSE, s2, s ExampleYou’re a marketing analyst for Hasbro Toys. You gather the following data:

Ad $ Sales (Units)1 12 13 24 25 4

Find SSE, s2, and s.

Page 54: Ch 6. Simple Regression

Calculating SSE Solution

1 1

2 1

3 2

4 2

5 4

xi yi

.6

1.3

2

2.7

3.4

ˆy y

.4

-.3

0

-.7

.6

2ˆ( )y yˆ .1 .7y x

.16

.09

0

.49

.36

SSE=1.1

Page 55: Ch 6. Simple Regression

Calculating s2 and s Solution

2 1.1 .366672 5 2

SSEsn

.36667 .6055s

Page 56: Ch 6. Simple Regression

Testing for Significance

Evaluating the Model

Page 57: Ch 6. Simple Regression

Regression Modeling Steps

1. Hypothesize deterministic component2. Estimate unknown model parameters3. Specify probability distribution of

random error term• Estimate standard deviation of error

4. Evaluate model5. Use model for prediction and estimation

Page 58: Ch 6. Simple Regression

Test of Slope Coefficient Shows if there is a linear relationship

between x and y Involves population slope 1

Hypotheses H0: 1 = 0 (No Linear Relationship) Ha: 1 0 (Linear Relationship)

Theoretical basis is sampling distribution of slope

Page 59: Ch 6. Simple Regression

y

Population Linex

Sample 1 Line

Sample 2 Line

Sampling Distribution of Sample Slopes

1

Sampling Distribution

11

11SS

^

^

All Possible Sample Slopes

Sample 1: 2.5Sample 2: 1.6 Sample 3: 1.8Sample 4: 2.1 : :

Very large number of sample slopes

Page 60: Ch 6. Simple Regression

Slope Coefficient Test Statistic

1

1 1

ˆ

2

12

1

ˆ ˆ2

wherexx

n

ini

xx ii

t df nsSSS

xSS x

n

Page 61: Ch 6. Simple Regression

Test of Slope Coefficient Example

You’re a marketing analyst for Hasbro Toys. You find β0 = –.1, β1 = .7 and s = .6055.

Ad $ Sales (Units)1 12 13 24 25 4

Is the relationship significant at the .05 level of significance?

^^

Page 62: Ch 6. Simple Regression

Test of Slope Coefficient Solution

H0:Ha: df Critical

Value(s):

Test Statistic:

Decision:

Conclusion:

t0 3.182-3.182

.025

Reject H0 Reject H0

.025

1 = 01 0.055 - 2 = 3

Page 63: Ch 6. Simple Regression

Solution Table

xi yi xi2 yi

2 xiyi

1 1 1 1 12 1 4 1 23 2 9 4 64 2 16 4 85 4 25 16 20

15 10 55 26 37

Page 64: Ch 6. Simple Regression

Test StatisticSolution

1

1

ˆ 2

1

ˆ

.6055 .191415

555

ˆ .70 3.657.1914

xx

SSSS

tS

Page 65: Ch 6. Simple Regression

Test of Slope Coefficient Solution

H0: 1 = 0Ha: 1 0 .05df 5 - 2 = 3Critical

Value(s):

Test Statistic:

Decision:

Conclusion:

1

1

ˆ

ˆ .70 3.657.1914

tS

Reject at = .05

There is evidence of a relationshipt0 3.182-3.182

.025

Reject H0 Reject H0

.025

Page 66: Ch 6. Simple Regression

Test of Slope CoefficientComputer Output

Parameter Estimates Parameter Standard T for H0:Variable DF Estimate Error Param=0 Prob>|T|INTERCEP 1 -0.1000 0.6350 -0.157 0.8849ADVERT 1 0.7000 0.1914 3.656 0.0354

t = 1 / S

P-Value

S1 1 1

^^^^

Page 67: Ch 6. Simple Regression

Correlation Models

Page 68: Ch 6. Simple Regression

Types of Probabilistic Models

ProbabilisticModels

Regression Models

Correlation Models

Page 69: Ch 6. Simple Regression

Correlation Models Answers ‘How strong is the linear

relationship between two variables?’ Coefficient of correlation

Sample correlation coefficient denoted r

Values range from –1 to +1 Measures degree of association Does not indicate cause–effect

relationship

Page 70: Ch 6. Simple Regression

Coefficient of Correlation

xy

xx yy

SSr

SS SS

2

2

2

2

xx

yy

xy

xSS x

n

ySS y

nx y

SS xyn

where

Page 71: Ch 6. Simple Regression

Coefficient of Correlation Values

–1.0 +1.00–.5 +.5

No Linear Correlation

Perfect Negative

Correlation

Perfect Positive

Correlation

Increasing degree of positive correlation

Increasing degree of negative correlation

Page 72: Ch 6. Simple Regression

Coefficient of Correlation Example

You’re a marketing analyst for Hasbro Toys. Ad $ Sales (Units)

1 12 13 24 25 4

Calculate the coefficient ofcorrelation.

Page 73: Ch 6. Simple Regression

Solution Table

xi yi xi2 yi

2 xiyi

1 1 1 1 12 1 4 1 23 2 9 4 64 2 16 4 85 4 25 16 20

15 10 55 26 37

Page 74: Ch 6. Simple Regression

Coefficient of Correlation Solution

22

2

22

2

(15)55 105

(10)26 65

(15)(10)37 75

xx

yy

xy

xSS x

n

ySS y

nx y

SS xyn

7 .90410 6

xy

xx yy

SSr

SS SS

Page 75: Ch 6. Simple Regression

Coefficient of Correlation Thinking Challenge

You’re an economist for the county cooperative. You gather the following data:Fertilizer (lb.)Yield (lb.)

4 3.0 6 5.510 6.512 9.0

Find the coefficient of correlation.

Page 76: Ch 6. Simple Regression

Solution Table*

4 3.0 16 9.00 12 6 5.5 36 30.25 3310 6.5 100 42.25 6512 9.0 144 81.00 10832 24.0 296 162.50 218

xi2yi xi yi xiyi

2

Page 77: Ch 6. Simple Regression

Coefficient of Correlation Solution*

22

2

22

2

(32)296 404

(24)162.5 18.54

(32)(24)218 264

xx

yy

xy

xSS x

n

ySS y

nx y

SS xyn

26 .95640 18.5

xy

xx yy

SSr

SS SS

Page 78: Ch 6. Simple Regression

Coefficient of DeterminationProportion of variation ‘explained’ by relationship between x and y

2 Explained VariationTotal Variation

yy

yy

SS SSEr

SS

0 r2 1r2 = (coefficient of correlation)2

Page 79: Ch 6. Simple Regression

Coefficient of Determination Example

You’re a marketing analyst for Hasbro Toys. You know r = .904.

Ad $ Sales (Units)1 12 13 24 25 4

Calculate and interpret thecoefficient of determination.

Page 80: Ch 6. Simple Regression

Coefficient of Determination Solution

r2 = (coefficient of correlation)2

r2 = (.904)2

r2 = .817

Interpretation: About 81.7% of the sample variation in Sales (y) can be explained by using Ad $ (x) to predict Sales (y) in the linear model.

Page 81: Ch 6. Simple Regression

r2 Computer Output

Root MSE 0.60553 R-square 0.8167 Dep Mean 2.00000 Adj R-sq 0.7556 C.V. 30.27650

r2 adjusted for number of explanatory variables & sample size

r2

Page 82: Ch 6. Simple Regression

Using the Model for Prediction & Estimation

Page 83: Ch 6. Simple Regression

Regression Modeling Steps

1. Hypothesize deterministic component2. Estimate unknown model parameters3. Specify probability distribution of random

error term• Estimate standard deviation of error

4. Evaluate model5. Use model for prediction and

estimation

Page 84: Ch 6. Simple Regression

Prediction With Regression Models

Types of predictions Point estimates Interval estimates

What is predicted Population mean response E(y)

for given x Point on population regression line

Individual response (yi) for given x

Page 85: Ch 6. Simple Regression

What Is Predicted

Mean y, E(y)

y

^yIndividual

Prediction, y

E(y) = x

^x

xP

^ ^y i =

x

Page 86: Ch 6. Simple Regression

Confidence Interval Estimate for Mean Value of y at x = xp

xx

p

SSxx

nSty

2

2/1ˆ

df = n – 2

Page 87: Ch 6. Simple Regression

Factors Affecting Interval Width

1. Level of confidence (1 – )• Width increases as confidence increases

2. Data dispersion (s)• Width increases as variation increases

3. Sample size• Width decreases as sample size

increases4. Distance of xp from meanx

• Width increases as distance increases

Page 88: Ch 6. Simple Regression

Why Distance from Mean?

Sample 2 Line

y

xx1 x2

ySample 1 Line

Greater dispersion than x1

x

Page 89: Ch 6. Simple Regression

Confidence Interval Estimate Example

You’re a marketing analyst for Hasbro Toys.You find β0 = -.1, β 1 = .7 and s = .6055.

Ad $ Sales (Units)1 12 13 24 25 4

Find a 95% confidence interval forthe mean sales when advertising is $4.

^^

Page 90: Ch 6. Simple Regression

Solution Table

x i y i x i2 y i

2 x iy i

1 1 1 1 12 1 4 1 23 2 9 4 64 2 16 4 85 4 25 16 2015 10 55 26 37

Page 91: Ch 6. Simple Regression

Confidence Interval Estimate Solution

2

/ 2

2

ˆ .1 .7 4 2.7

4 312.7 3.182 .60555 10

1.645 ( ) 3.755

p

xx

x xy t s

n SS

y

E Y

x to be predicted

Page 92: Ch 6. Simple Regression

Prediction Interval of Individual Value of y at x = xp

2

/ 21ˆ 1 p

xx

x xy t S

n SS

Note!

df = n – 2

Page 93: Ch 6. Simple Regression

Why the Extra ‘S’?

Expected(Mean) y

yy we're trying topredict

Prediction, y

xxp

E(y) = x

^^ ^

y i = x i

Page 94: Ch 6. Simple Regression

Prediction Interval ExampleYou’re a marketing analyst for Hasbro Toys.You find β0 = -.1, β 1 = .7 and s = .6055.

Ad $ Sales (Units)1 12 13 24 25 4

Predict the sales when advertising is $4. Use a 95% prediction interval.

^^

Page 95: Ch 6. Simple Regression

Solution Table

x i y i x i2 y i

2 x iy i

1 1 1 1 12 1 4 1 23 2 9 4 64 2 16 4 85 4 25 16 2015 10 55 26 37

Page 96: Ch 6. Simple Regression

Prediction Interval Solution

2

/ 2

2

4

1ˆ 1

ˆ .1 .7 4 2.7

4 312.7 3.182 .6055 15 10

.503 4.897

p

xx

x xy t s

n SS

y

y

x to be predicted

Page 97: Ch 6. Simple Regression

Interval Estimate Computer Output

Dep Var Pred Std Err Low95% Upp95% Low95% Upp95%Obs SALES Value Predict Mean Mean Predict Predict 1 1.000 0.600 0.469 -0.892 2.092 -1.837 3.037 2 1.000 1.300 0.332 0.244 2.355 -0.897 3.497 3 2.000 2.000 0.271 1.138 2.861 -0.111 4.111 4 2.000 2.700 0.332 1.644 3.755 0.502 4.897 5 4.000 3.400 0.469 1.907 4.892 0.962 5.837

Predicted y when x = 4

Confidence Interval

SYPrediction

Interval

Page 98: Ch 6. Simple Regression

Confidence Intervals versus Prediction Intervals

x

y

x

^ ^ ^y i =

x i