17
Simple Regression Model INSR 260, Spring 2009 Bob Stine 1

Simple Regression Model - Statistics Departmentstine/insr260_2009/lectures/... · 2009. 3. 16. · Carat Linear Fit Price = -162.1909 + 2312.3973*Carat RSquare RSquare Adj Root Mean

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Simple Regression Model - Statistics Departmentstine/insr260_2009/lectures/... · 2009. 3. 16. · Carat Linear Fit Price = -162.1909 + 2312.3973*Carat RSquare RSquare Adj Root Mean

Simple Regression Model

INSR 260, Spring 2009Bob Stine

1

Page 2: Simple Regression Model - Statistics Departmentstine/insr260_2009/lectures/... · 2009. 3. 16. · Carat Linear Fit Price = -162.1909 + 2312.3973*Carat RSquare RSquare Adj Root Mean

Overview Simple Regression Model (SRM)

Estimators, terminology

Assumptions

Inference

Prediction

Examples! ! ! ! (both from Stat 102)Promotion responseDiamond values

2

Page 3: Simple Regression Model - Statistics Departmentstine/insr260_2009/lectures/... · 2009. 3. 16. · Carat Linear Fit Price = -162.1909 + 2312.3973*Carat RSquare RSquare Adj Root Mean

Example: Pricing DiamondsQuestions

How much should I expect to pay for a diamond?How accurate is such an estimate?How do prices depend on the weight of the diamond?

Datan = 180 diamonds, relatively small (less than 0.4 carats)

3

300

400

500

600

700

800

900

1000

1100

Price

0.3 0.4

Carat

Bivariate Fit of Price By Carat

Page 4: Simple Regression Model - Statistics Departmentstine/insr260_2009/lectures/... · 2009. 3. 16. · Carat Linear Fit Price = -162.1909 + 2312.3973*Carat RSquare RSquare Adj Root Mean

Direct ApproachAverage the price of diamonds of each weight,then connect the dots with lines.

Produces a “smooth” curve curve and an estimate of accuracy, but does not answer the question of how these are related?

4

300

400

500

600

700

800

900

1000

1100

Price

0.3 0.4

Carat

Page 5: Simple Regression Model - Statistics Departmentstine/insr260_2009/lectures/... · 2009. 3. 16. · Carat Linear Fit Price = -162.1909 + 2312.3973*Carat RSquare RSquare Adj Root Mean

Simple Regression ModelAssociation versus causation

Equation relates observed x and y (n cases)Conditional means:! E Y|X = β0 + β1 X = μy|x

Observations:! ! ! yi = β0 + β1 xi + εi

AssumptionsIndependent observationsEqual variance σ2

Normal distribution around line! yi ~ N(μy|x,σ2)! ! ! ! εi ~ N(0, σ2)

Three parameters: β0 , β1 , σ2

5

Page 6: Simple Regression Model - Statistics Departmentstine/insr260_2009/lectures/... · 2009. 3. 16. · Carat Linear Fit Price = -162.1909 + 2312.3973*Carat RSquare RSquare Adj Root Mean

Least SquaresCriterion

Find estimates that minimize sum of squared deviations! ! ! mina0,a1 Σ(yi - a0 - a1 xi)2

Solution ! ! b1 = cov(x,y)/var(x)!! b0 = y - b1 x

Fitted values, residualsFitted values (on the line)! ŷ = b0 + b1 xResidual deviations!! ! ! e = y - ŷ

Standard error of regression (estimates σ2)s2 = Σ ei2/(n-2)degrees of freedomaka: root mean squared error (RMSE), SD of residuals

cov(x, y) =P

i(xi−x)(yi−y)n−1

var(x) =P

i(xi−x)2

n−1

6

Page 7: Simple Regression Model - Statistics Departmentstine/insr260_2009/lectures/... · 2009. 3. 16. · Carat Linear Fit Price = -162.1909 + 2312.3973*Carat RSquare RSquare Adj Root Mean

Goodness of FitR-squared statistic

Square of correlation between Y and XPercentage of “explained” variation

Adjusted R-squaredTakes account of number of observations

R2 = Explained SS

Total SS=

∑i(yi − y)2∑i(yi − y)2

= 1−∑

i e2i∑

i(yi − y)2

R2 = 1 - s2

var(y)

7

Page 8: Simple Regression Model - Statistics Departmentstine/insr260_2009/lectures/... · 2009. 3. 16. · Carat Linear Fit Price = -162.1909 + 2312.3973*Carat RSquare RSquare Adj Root Mean

Checking AssumptionsScatterplots of Y on X, e on X

Normal quantile plot of residuals

300

400

500

600

700

800

900

1000

1100

Price

0.3 0.4

Carat

-300

-200

-100

0

100

200

300

400

Residual

0.3 0.4

Carat

-300

-200

-100

0

100

200

300

400

5 15 25

Count

.001 .01 .05.10 .25 .50 .75 .90.95 .99 .999

-4 -3 -2 -1 0 1 2 3 4

Normal Quantile Plot

8

Page 9: Simple Regression Model - Statistics Departmentstine/insr260_2009/lectures/... · 2009. 3. 16. · Carat Linear Fit Price = -162.1909 + 2312.3973*Carat RSquare RSquare Adj Root Mean

InferenceDerived from the standard error of the intercept and slope

Three equivalent methodsConfidence intervalt-statisticp-value

Interpretation of estimates, uncertainty?

Intercept

Carat

Term

-162.1909

2312.3973

Estimate

87.42039

284.5074

Std Error

-1.86

8.13

t Ratio

0.0652

<.0001*

Prob>|t|

Parameter Estimates

9

Var(b1) =σ2

∑i(xi − x)2

Page 10: Simple Regression Model - Statistics Departmentstine/insr260_2009/lectures/... · 2009. 3. 16. · Carat Linear Fit Price = -162.1909 + 2312.3973*Carat RSquare RSquare Adj Root Mean

PredictionInterpretation

Where is the population regression line? (μY|X)What is the average price of all diamonds that weigh 0.3 carats?

What can be anticipated about a future observation?How much for a specific 0.3 carat diamond?

0

100

200

300

400

500

600

700

800

900

1000

1100

1200

Price

0.3 0.4

Carat

Conf Int for μY|X

Pred Int for Y

E(µy|x − yx)2 =

σ2

(1n

+(x− x)2∑i(xi − x)2

)

σ2

(1 +

1n

+(x− x)2∑i(xi − x)2

)

Effect of extrapolation

10

Page 11: Simple Regression Model - Statistics Departmentstine/insr260_2009/lectures/... · 2009. 3. 16. · Carat Linear Fit Price = -162.1909 + 2312.3973*Carat RSquare RSquare Adj Root Mean

Prediction/Conf. IntervalEstimates are same for both mean and specific diamond

ŷ != -162.2 + 2312.4 Carats! = -162.2 + 2312.4 (0.3)! = $531.52

Intervals differ in anticipated uncertaintyFor the average, JMP (use Fit Model to save interval)! ! ! $509.92!! to ! $553.14For a single diamond, prediction interval is! ! ! $243.18!! to!! $819.8895% Prediction interval is approximately! ! ! ! ! ŷ ± 2 RMSE

11

0

100

200

300

400

500

600

700

800

900

1000

1100

Price

0.293 0.298 0.303

Carat

Page 12: Simple Regression Model - Statistics Departmentstine/insr260_2009/lectures/... · 2009. 3. 16. · Carat Linear Fit Price = -162.1909 + 2312.3973*Carat RSquare RSquare Adj Root Mean

Diamond Example

-300

-200

-100

0

100

200

300

400

Residual

0.3 0.4

Carat

0

100

200

300

400

500

600

700

800

900

1000

1100

1200

Pri

ce

0.3 0.4

Carat

Linear Fit

Price = -162.1909 + 2312.3973*Carat

RSquare

RSquare Adj

Root Mean Square Error

Mean of Response

Observations (or Sum Wgts)

0.270671

0.266573

145.7105

542.8333

180

Summary of Fit

Intercept

Carat

Term

-162.1909

2312.3973

Estimate

87.42039

284.5074

Std Error

-1.86

8.13

t Ratio

0.0652

<.0001*

Prob>|t|

Parameter Estimates

Linear Fit

Bivariate Fit of Price By Carat

Interpretation?Comments?Problems?

12

Page 13: Simple Regression Model - Statistics Departmentstine/insr260_2009/lectures/... · 2009. 3. 16. · Carat Linear Fit Price = -162.1909 + 2312.3973*Carat RSquare RSquare Adj Root Mean

More Diamonds

0

100000

200000

300000

400000

500000

600000

Pri

ce

0 1 2 3 4 5 6 7 8 9 10 11

Carat

Linear Fit

Price = -22655.47 + 32538.514*Carat

RSquare

RSquare Adj

Root Mean Square Error

Mean of Response

Observations (or Sum Wgts)

0.652987

0.652639

34109.01

21957.11

1000

Summary of Fit

Intercept

Carat

Term

-22655.47

32538.514

Estimate

1491.049

750.8501

Std Error

-15.19

43.34

t Ratio

<.0001*

<.0001*

Prob>|t|

Parameter Estimates

Linear Fit

Bivariate Fit of Price By Carat

-100000

0

100000

200000

300000

400000

Residual

0 1 2 3 4 5 6 7 8 9 10 11

Carat

13

Estimate of cost/carat?Negative intercept?

Possible to fix?

Page 14: Simple Regression Model - Statistics Departmentstine/insr260_2009/lectures/... · 2009. 3. 16. · Carat Linear Fit Price = -162.1909 + 2312.3973*Carat RSquare RSquare Adj Root Mean

Log-Log Scale

14

Same expanded sample, but on a log-log scale

Clearly a better fit, but what does it mean?

1000600

400

100006000

4000

2000

10000060000

40000

20000

200000

300000

500000

Price

10.70.50.3 1075432

Carat

Log(Price) = 8.5485138 + 1.9571815*Log(Carat)

RSquare

RSquare Adj

Root Mean Square Error

Mean of Response

Observations (or Sum Wgts)

0.950228

0.950178

0.377488

8.415271

1000

Summary of Fit

Intercept

Log(Carat)

Term

8.5485138

1.9571815

Estimate

0.011976

0.014179

Std Error

713.79

138.03

t Ratio

0.0000*

0.0000*

Prob>|t|

Parameter Estimates

Transformed Fit Log to Log

Page 15: Simple Regression Model - Statistics Departmentstine/insr260_2009/lectures/... · 2009. 3. 16. · Carat Linear Fit Price = -162.1909 + 2312.3973*Carat RSquare RSquare Adj Root Mean

Promotion Response

0

50

100

150

200

250

300

350

400

450

Sale

s

0 1 2 3 4 5 6 7 8

Display Feet

Linear Fit

Transformed Fit to Log

Sales = 93.032311 + 39.75648*Display Feet

RSquare

RSquare Adj

Root Mean Square Error

Mean of Response

Observations (or Sum Wgts)

0.711975

0.705574

51.59123

268.13

47

Summary of Fit

Intercept

Display Feet

Term

93.032311

39.75648

Estimate

18.22782

3.769509

Std Error

5.10

10.55

t Ratio

<.0001*

<.0001*

Prob>|t|

Parameter Estimates

Linear Fit

Sales = 83.560256 + 138.62089*Log(Display Feet)

Intercept

Log(Display Feet)

Term

83.560256

138.62089

Estimate

14.41344

9.833914

Std Error

5.80

14.10

t Ratio

<.0001*

<.0001*

Prob>|t|

Parameter Estimates

Transformed Fit to Log

Bivariate Fit of Sales By Display Feet

Same model, different scales on the axes

0

50

100

150

200

250

300

350

400

450

Sale

s

0 0.5 1 1.5 2

Log Feet

Linear Fit

Sales = 83.560256 + 138.62089*Log Feet

Intercept

Log Feet

Term

83.560256

138.62089

Estimate

14.41344

9.833914

Std Error

5.80

14.10

t Ratio

<.0001*

<.0001*

Prob>|t|

Parameter Estimates

Linear Fit

Bivariate Fit of Sales By Log Feet

15

What is the interpretation of the coefficient of the log in this model?

Page 16: Simple Regression Model - Statistics Departmentstine/insr260_2009/lectures/... · 2009. 3. 16. · Carat Linear Fit Price = -162.1909 + 2312.3973*Carat RSquare RSquare Adj Root Mean

Extrapolation

0

100

200

300

400

500

600

700

Sale

s

0 5 10 15 20

Display Feet

Sales = 376.69522 - 329.70421*Recip(Display Feet)

RSquare

RSquare Adj

Root Mean Square Error

Mean of Response

Observations (or Sum Wgts)

0.826487

0.822631

40.04298

268.13

47

S um m ary of F it

Intercept

Recip(Display Feet)

T erm

376.69522

-329.7042

E stim ate

9.439455

22.51988

S td E rror

39.91

-14.64

t R atio

<.0001*

<.0001*

Prob > |t|

Param eter E stim ates

T ran sform ed F it to R eciprocal

Sales = 83.560256 + 138.62089*Log(Display Feet)

RSquare

RSquare Adj

Root Mean Square Error

Mean of Response

Observations (or Sum Wgts)

0.815349

0.811246

41.3082

268.13

47

S um m ary of F it

Intercept

Log(Display Feet)

T erm

83.560256

138.62089

E stim ate

14.41344

9.833914

S td E rror

5.80

14.10

t R atio

<.0001*

<.0001*

Prob > |t|

Param eter E stim ates

T ran sform ed F it to L og

Comparable R2, RMSE but very different extrapolations and

implications for sales.

16

Page 17: Simple Regression Model - Statistics Departmentstine/insr260_2009/lectures/... · 2009. 3. 16. · Carat Linear Fit Price = -162.1909 + 2312.3973*Carat RSquare RSquare Adj Root Mean

Summary Simple Regression Model (SRM)

Least squares estimators

Role of assumptions

Methods of inference: tests and intervals

Importance of transformations, particularly logs

Prediction, role of extrapolation

17