Ch 6. Simple Regression

Preview:

Citation preview

Chapter 7Simple Linear

Regression

Business Statistic

Models

Models Representation of some phenomenon Mathematical model is a

mathematical expression of some phenomenon

Often describe relationships between variables

Types Deterministic models Probabilistic models

Deterministic Models Hypothesize exact

relationships Suitable when prediction error

is negligible Example: force is exactly

mass times acceleration F = m·a

Probabilistic Models Hypothesize two components

DeterministicRandom error

Example: sales volume (y) is 10 times advertising spending (x) + random error

y = 10x + Random error may be due to factors

other than advertising

Types of Probabilistic Models

ProbabilisticModels

Regression Models

Correlation Models

Regression Models

Types of Probabilistic Models

ProbabilisticModels

Regression Models

Correlation Models

Regression Models• Answers ‘What is the relationship between the

variables?’• Equation used

– One numerical dependent (response) variable What is to be predicted

– One or more numerical or categorical independent (explanatory) variables

• Used mainly for prediction and estimation

Regression Modeling Steps

1. Hypothesize deterministic component2. Estimate unknown model parameters3. Specify probability distribution of

random error term• Estimate standard deviation of error

4. Evaluate model5. Use model for prediction and estimation

Model Specification

Regression Modeling Steps

1. Hypothesize deterministic component

2. Estimate unknown model parameters3. Specify probability distribution of

random error term• Estimate standard deviation of error

4. Evaluate model5. Use model for prediction and estimation

Specifying the Model

1. Define variables• Conceptual (e.g., Advertising, price)• Empirical (e.g., List price, regular price) • Measurement (e.g., $, Units)

2. Hypothesize nature of relationship• Expected effects (i.e., Coefficients’ signs)• Functional form (linear or non-linear)• Interactions

Model Specification Is Based on Theory

Theory of field (e.g., Sociology)

Mathematical theory Previous research ‘Common sense’

Advertising

Sales

Advertising

Sales

Advertising

Sales

Advertising

Sales

Thinking Challenge: Which Is More Logical?

Simple

1 ExplanatoryVariable

Types of Regression Models

RegressionModels

2+ ExplanatoryVariables

Multiple

Linear Non-Linear Linear Non-

Linear

Linear Regression Model

Types of Regression Models

Simple

1 ExplanatoryVariable

RegressionModels

2+ ExplanatoryVariables

Multiple

Linear Non-Linear Linear Non-

Linear

y x 0 1

Linear Regression Model

Relationship between variables is a linear function

Dependent (Response) Variable

Independent (Explanatory) Variable

Population Slope

Population y-intercept

Random Error

y

E(y) = β0 + β1x (line of means)

β0 = y-interceptx

Change in y

Change in xβ1 = Slope

Line of Means

Population & Sample Regression Models

Population

$ $

$

$ $

Unknown Relationship

0 1y x

Random Sample

$$ $$

$$ $$

0 1ˆ ˆ ˆy x

y

x

Population Linear Regression Model

0 1i i iy x

0 1E y x

Observedvalue

Observed value

i = Random error

y

x

0 1ˆ ˆ ˆi i iy x

Sample Linear Regression Model

0 1ˆ ˆˆi iy x

Unsampled observation

i = Random error

Observed value

^

Estimating Parameters:Least Squares Method

Regression Modeling Steps

1. Hypothesize deterministic component2. Estimate unknown model parameters3. Specify probability distribution of random

error term• Estimate standard deviation of error

4. Evaluate model5. Use model for prediction and estimation

Scattergram1. Plot of all (xi, yi) pairs2. Suggests how well model will fit

0204060

0 20 40 60

x

y

0204060

0 20 40 60x

y

Thinking Challenge

• How would you draw a line through the points?• How do you determine which line ‘fits best’?

Least Squares ‘Best fit’ means difference

between actual y values and predicted y values are a minimum

But positive differences off-set negative 2 2

1 1

ˆ ˆn n

ii ii i

y y

• Least Squares minimizes the Sum of the Squared Differences (SSE)

Least Squares Graphically

2

y

x

1 3

4

^^

^^

2 0 1 2 2ˆ ˆ ˆy x

0 1ˆ ˆˆi iy x

2 2 2 2 21 2 3 4

1

ˆ ˆ ˆ ˆ ˆLS minimizes n

ii

Coefficient Equations

1 1

11 2

12

1

ˆ

n n

i ini i

i ixy i

nxx

ini

ii

x yx ySS n

SSx

xn

Slope

y-intercept

0 1ˆ ˆy x Prediction Equation

0 1ˆ ˆy x

Computation Table

xi2

x12

x22

: :xn

2

Σxi2

yi

y1

y2

:yn

Σyi

xi

x1

x2

:xn

Σxi

2

2

2

2

2

yi

y1

y2

:yn

Σyi

xiyi

Σxiyi

x1y1

x2y2

xnyn

Interpretation of Coefficients

^

^^1. Slope (1)

• Estimated y changes by 1 for each 1unit increase in x— If 1 = 2, then Sales (y) is expected to increase by 2

for each 1 unit increase in Advertising (x)

2. Y-Intercept (0)• Average value of y when x = 0

— If 0 = 4, then Average Sales (y) is expected to be 4 when Advertising (x) is 0

^

^

Least Squares ExampleYou’re a marketing analyst for Hasbro Toys. You gather the following data:

Ad $ Sales (Units)1 12 13 24 25 4

Find the least squares line relatingsales and advertising.

01234

0 1 2 3 4 5

Scattergram Sales vs. Advertising

Sales

Advertising

Parameter Estimation Solution Table

2 2

1 1 1 1 12 1 4 1 23 2 9 4 64 2 16 4 85 4 25 16 20

15 10 55 26 37

xi yi xi yi xiyi

Parameter Estimation Solution

1 1

11 2 2

12

1

15 1037

5ˆ .7015

555

n n

i ini i

i ii

n

ini

ii

x yx y

n

xx

n

ˆ .1 .7y x

0 1ˆ ˆ 2 .70 3 .10y x

Parameter Estimation Computer Output

Parameter Estimates

Parameter Standard T for H0:Variable DF Estimate Error Param=0 Prob>|T|INTERCEP 1 -0.1000 0.6350 -0.157 0.8849ADVERT 1 0.7000 0.1914 3.656 0.0354

0^

1^

ˆ .1 .7y x

Coefficient Interpretation Solution1. Slope (1)

• Sales Volume (y) is expected to increase by .7 units for each $1 increase in Advertising (x)

^

2. Y-Intercept (0)• Average value of Sales Volume (y) is -.10 units

when Advertising (x) is 0— Difficult to explain to marketing manager— Expect some sales without advertising

^

01234

0 1 2 3 4 5

Regression Line Fitted to the Data

Sales

Advertising

ˆ .1 .7y x

Least Squares Thinking Challenge

You’re an economist for the county cooperative. You gather the following data:Fertilizer (lb.)Yield (lb.)

4 3.0 6 5.510 6.512 9.0

Find the least squares line relatingcrop yield and fertilizer.

02468

10

0 5 10 15

Scattergram Crop Yield vs. Fertilizer*

Yield (lb.)

Fertilizer (lb.)

Parameter Estimation Solution Table*

4 3.0 16 9.00 12 6 5.5 36 30.25 3310 6.5 100 42.25 6512 9.0 144 81.00 10832 24.0 296 162.50 218

xi2yi xi yi xiyi

2

Parameter Estimation Solution*

1 1

11 2 2

12

1

32 24218

4ˆ .6532

2964

n n

i ini i

i ii

n

ini

ii

x yx y

n

xx

n

0 1ˆ ˆ 6 .65 8 .80y x

ˆ .8 .65y x

Coefficient Interpretation Solution*

2. Y-Intercept (0)• Average Crop Yield (y) is expected to be 0.8 lb.

when no Fertilizer (x) is used

^

^1. Slope (1)• Crop Yield (y) is expected to increase by .65 lb.

for each 1 lb. increase in Fertilizer (x)

Regression Line Fitted to the Data*

02468

10

0 5 10 15

Yield (lb.)

Fertilizer (lb.)

ˆ .8 .65y x

Probability Distribution of Random Error

Regression Modeling Steps

1. Hypothesize deterministic component2. Estimate unknown model parameters3. Specify probability distribution of

random error term• Estimate standard deviation of error

4. Evaluate model5. Use model for prediction and estimation

Linear Regression Assumptions

1. Mean of probability distribution of error, ε, is 0

2. Probability distribution of error has constant variance

3. Probability distribution of error, ε, is normal

4. Errors are independent

Error Probability Distribution

x1 x2 x3

yE(y) = β0 + β1x

x

Random Error Variation

• Measured by standard error of regression model– Sample standard deviation of : s^

• Affects several factors– Parameter significance– Prediction accuracy

^• Variation of actual y from predicted y, y

y

xxi

Variation Measures

0 1ˆ ˆˆi iy x

yi 2ˆ( )i iy y

Unexplained sum of squares

2( )iy yTotal sum of squares

2ˆ( )iy yExplained sum of squares

y

Estimation of σ2

22 ˆ2 i i

SSEs where SSE y yn

2

2SSEs sn

Calculating SSE, s2, s ExampleYou’re a marketing analyst for Hasbro Toys. You gather the following data:

Ad $ Sales (Units)1 12 13 24 25 4

Find SSE, s2, and s.

Calculating SSE Solution

1 1

2 1

3 2

4 2

5 4

xi yi

.6

1.3

2

2.7

3.4

ˆy y

.4

-.3

0

-.7

.6

2ˆ( )y yˆ .1 .7y x

.16

.09

0

.49

.36

SSE=1.1

Calculating s2 and s Solution

2 1.1 .366672 5 2

SSEsn

.36667 .6055s

Testing for Significance

Evaluating the Model

Regression Modeling Steps

1. Hypothesize deterministic component2. Estimate unknown model parameters3. Specify probability distribution of

random error term• Estimate standard deviation of error

4. Evaluate model5. Use model for prediction and estimation

Test of Slope Coefficient Shows if there is a linear relationship

between x and y Involves population slope 1

Hypotheses H0: 1 = 0 (No Linear Relationship) Ha: 1 0 (Linear Relationship)

Theoretical basis is sampling distribution of slope

y

Population Linex

Sample 1 Line

Sample 2 Line

Sampling Distribution of Sample Slopes

1

Sampling Distribution

11

11SS

^

^

All Possible Sample Slopes

Sample 1: 2.5Sample 2: 1.6 Sample 3: 1.8Sample 4: 2.1 : :

Very large number of sample slopes

Slope Coefficient Test Statistic

1

1 1

ˆ

2

12

1

ˆ ˆ2

wherexx

n

ini

xx ii

t df nsSSS

xSS x

n

Test of Slope Coefficient Example

You’re a marketing analyst for Hasbro Toys. You find β0 = –.1, β1 = .7 and s = .6055.

Ad $ Sales (Units)1 12 13 24 25 4

Is the relationship significant at the .05 level of significance?

^^

Test of Slope Coefficient Solution

H0:Ha: df Critical

Value(s):

Test Statistic:

Decision:

Conclusion:

t0 3.182-3.182

.025

Reject H0 Reject H0

.025

1 = 01 0.055 - 2 = 3

Solution Table

xi yi xi2 yi

2 xiyi

1 1 1 1 12 1 4 1 23 2 9 4 64 2 16 4 85 4 25 16 20

15 10 55 26 37

Test StatisticSolution

1

1

ˆ 2

1

ˆ

.6055 .191415

555

ˆ .70 3.657.1914

xx

SSSS

tS

Test of Slope Coefficient Solution

H0: 1 = 0Ha: 1 0 .05df 5 - 2 = 3Critical

Value(s):

Test Statistic:

Decision:

Conclusion:

1

1

ˆ

ˆ .70 3.657.1914

tS

Reject at = .05

There is evidence of a relationshipt0 3.182-3.182

.025

Reject H0 Reject H0

.025

Test of Slope CoefficientComputer Output

Parameter Estimates Parameter Standard T for H0:Variable DF Estimate Error Param=0 Prob>|T|INTERCEP 1 -0.1000 0.6350 -0.157 0.8849ADVERT 1 0.7000 0.1914 3.656 0.0354

t = 1 / S

P-Value

S1 1 1

^^^^

Correlation Models

Types of Probabilistic Models

ProbabilisticModels

Regression Models

Correlation Models

Correlation Models Answers ‘How strong is the linear

relationship between two variables?’ Coefficient of correlation

Sample correlation coefficient denoted r

Values range from –1 to +1 Measures degree of association Does not indicate cause–effect

relationship

Coefficient of Correlation

xy

xx yy

SSr

SS SS

2

2

2

2

xx

yy

xy

xSS x

n

ySS y

nx y

SS xyn

where

Coefficient of Correlation Values

–1.0 +1.00–.5 +.5

No Linear Correlation

Perfect Negative

Correlation

Perfect Positive

Correlation

Increasing degree of positive correlation

Increasing degree of negative correlation

Coefficient of Correlation Example

You’re a marketing analyst for Hasbro Toys. Ad $ Sales (Units)

1 12 13 24 25 4

Calculate the coefficient ofcorrelation.

Solution Table

xi yi xi2 yi

2 xiyi

1 1 1 1 12 1 4 1 23 2 9 4 64 2 16 4 85 4 25 16 20

15 10 55 26 37

Coefficient of Correlation Solution

22

2

22

2

(15)55 105

(10)26 65

(15)(10)37 75

xx

yy

xy

xSS x

n

ySS y

nx y

SS xyn

7 .90410 6

xy

xx yy

SSr

SS SS

Coefficient of Correlation Thinking Challenge

You’re an economist for the county cooperative. You gather the following data:Fertilizer (lb.)Yield (lb.)

4 3.0 6 5.510 6.512 9.0

Find the coefficient of correlation.

Solution Table*

4 3.0 16 9.00 12 6 5.5 36 30.25 3310 6.5 100 42.25 6512 9.0 144 81.00 10832 24.0 296 162.50 218

xi2yi xi yi xiyi

2

Coefficient of Correlation Solution*

22

2

22

2

(32)296 404

(24)162.5 18.54

(32)(24)218 264

xx

yy

xy

xSS x

n

ySS y

nx y

SS xyn

26 .95640 18.5

xy

xx yy

SSr

SS SS

Coefficient of DeterminationProportion of variation ‘explained’ by relationship between x and y

2 Explained VariationTotal Variation

yy

yy

SS SSEr

SS

0 r2 1r2 = (coefficient of correlation)2

Coefficient of Determination Example

You’re a marketing analyst for Hasbro Toys. You know r = .904.

Ad $ Sales (Units)1 12 13 24 25 4

Calculate and interpret thecoefficient of determination.

Coefficient of Determination Solution

r2 = (coefficient of correlation)2

r2 = (.904)2

r2 = .817

Interpretation: About 81.7% of the sample variation in Sales (y) can be explained by using Ad $ (x) to predict Sales (y) in the linear model.

r2 Computer Output

Root MSE 0.60553 R-square 0.8167 Dep Mean 2.00000 Adj R-sq 0.7556 C.V. 30.27650

r2 adjusted for number of explanatory variables & sample size

r2

Using the Model for Prediction & Estimation

Regression Modeling Steps

1. Hypothesize deterministic component2. Estimate unknown model parameters3. Specify probability distribution of random

error term• Estimate standard deviation of error

4. Evaluate model5. Use model for prediction and

estimation

Prediction With Regression Models

Types of predictions Point estimates Interval estimates

What is predicted Population mean response E(y)

for given x Point on population regression line

Individual response (yi) for given x

What Is Predicted

Mean y, E(y)

y

^yIndividual

Prediction, y

E(y) = x

^x

xP

^ ^y i =

x

Confidence Interval Estimate for Mean Value of y at x = xp

xx

p

SSxx

nSty

2

2/1ˆ

df = n – 2

Factors Affecting Interval Width

1. Level of confidence (1 – )• Width increases as confidence increases

2. Data dispersion (s)• Width increases as variation increases

3. Sample size• Width decreases as sample size

increases4. Distance of xp from meanx

• Width increases as distance increases

Why Distance from Mean?

Sample 2 Line

y

xx1 x2

ySample 1 Line

Greater dispersion than x1

x

Confidence Interval Estimate Example

You’re a marketing analyst for Hasbro Toys.You find β0 = -.1, β 1 = .7 and s = .6055.

Ad $ Sales (Units)1 12 13 24 25 4

Find a 95% confidence interval forthe mean sales when advertising is $4.

^^

Solution Table

x i y i x i2 y i

2 x iy i

1 1 1 1 12 1 4 1 23 2 9 4 64 2 16 4 85 4 25 16 2015 10 55 26 37

Confidence Interval Estimate Solution

2

/ 2

2

ˆ .1 .7 4 2.7

4 312.7 3.182 .60555 10

1.645 ( ) 3.755

p

xx

x xy t s

n SS

y

E Y

x to be predicted

Prediction Interval of Individual Value of y at x = xp

2

/ 21ˆ 1 p

xx

x xy t S

n SS

Note!

df = n – 2

Why the Extra ‘S’?

Expected(Mean) y

yy we're trying topredict

Prediction, y

xxp

E(y) = x

^^ ^

y i = x i

Prediction Interval ExampleYou’re a marketing analyst for Hasbro Toys.You find β0 = -.1, β 1 = .7 and s = .6055.

Ad $ Sales (Units)1 12 13 24 25 4

Predict the sales when advertising is $4. Use a 95% prediction interval.

^^

Solution Table

x i y i x i2 y i

2 x iy i

1 1 1 1 12 1 4 1 23 2 9 4 64 2 16 4 85 4 25 16 2015 10 55 26 37

Prediction Interval Solution

2

/ 2

2

4

1ˆ 1

ˆ .1 .7 4 2.7

4 312.7 3.182 .6055 15 10

.503 4.897

p

xx

x xy t s

n SS

y

y

x to be predicted

Interval Estimate Computer Output

Dep Var Pred Std Err Low95% Upp95% Low95% Upp95%Obs SALES Value Predict Mean Mean Predict Predict 1 1.000 0.600 0.469 -0.892 2.092 -1.837 3.037 2 1.000 1.300 0.332 0.244 2.355 -0.897 3.497 3 2.000 2.000 0.271 1.138 2.861 -0.111 4.111 4 2.000 2.700 0.332 1.644 3.755 0.502 4.897 5 4.000 3.400 0.469 1.907 4.892 0.962 5.837

Predicted y when x = 4

Confidence Interval

SYPrediction

Interval

Confidence Intervals versus Prediction Intervals

x

y

x

^ ^ ^y i =

x i

Recommended