14
Week 2, 2007 Lecture 2a Slide #1 Bivariate Regression Analysis • Theoretical Models • Basic Linear Models: Deterministic Version • Basic Linear Models: Stochastic Version • Statistical Assumptions • Estimating Linear Models • Residuals (and the Pursuit of Truth…) • An Example

Week 2, 2007 Lecture 2aSlide #1 Bivariate Regression Analysis Theoretical Models Basic Linear Models: Deterministic Version Basic Linear Models: Stochastic

Embed Size (px)

Citation preview

Page 1: Week 2, 2007 Lecture 2aSlide #1 Bivariate Regression Analysis Theoretical Models Basic Linear Models: Deterministic Version Basic Linear Models: Stochastic

Week 2, 2007 Lecture 2a Slide #1

Bivariate Regression Analysis

• Theoretical Models

• Basic Linear Models: Deterministic Version

• Basic Linear Models: Stochastic Version

• Statistical Assumptions

• Estimating Linear Models

• Residuals (and the Pursuit of Truth…)

• An Example

Page 2: Week 2, 2007 Lecture 2aSlide #1 Bivariate Regression Analysis Theoretical Models Basic Linear Models: Deterministic Version Basic Linear Models: Stochastic

Week 2, 2007 Lecture 2a Slide #2

Theoretical Linear Models

• The basis of “causality” in models– Time ordering– Co-variation– Non-spuriousness

• Examples– Fire Deaths f (# of fire trucks at the scene)– Job Retention f (current job satisfaction)– Income f (education)

Page 3: Week 2, 2007 Lecture 2aSlide #1 Bivariate Regression Analysis Theoretical Models Basic Linear Models: Deterministic Version Basic Linear Models: Stochastic

Week 2, 2007 Lecture 2a Slide #3

Deterministic Linear Models

• Theoretical Model:– andare constant terms

• is the intercept

• is the slope

– Xi is a predictor of Yi

Yi = +Xi

Yi = +Xi

a

b

1 =a

b

Xi

Yi

Page 4: Week 2, 2007 Lecture 2aSlide #1 Bivariate Regression Analysis Theoretical Models Basic Linear Models: Deterministic Version Basic Linear Models: Stochastic

Week 2, 2007 Lecture 2a Slide #4

Stochastic Linear Models• E[Yi] = +Xi

– Variation in Y is caused by more than X:

error (i)

• So:

0 = Y when X = 0

Each 1 unit increase in X increases Y by

i = Yi −(β0 + β1Xi ) = Yi − E[Yi ]

Yi =E[Yi ] + i= + Xi + i

Page 5: Week 2, 2007 Lecture 2aSlide #1 Bivariate Regression Analysis Theoretical Models Basic Linear Models: Deterministic Version Basic Linear Models: Stochastic

Week 2, 2007 Lecture 2a Slide #5

Assumptions Necessary for Estimating Linear Models

1.Errors have identical distributions

Zero mean, same variance, across the range of X

2.Errors are independent of X and other i

3.Errors are normally distributed

E[ i ] ≠ f(X)andE[i ] ≠ f( j , j ≠i)

i=0

X

Page 6: Week 2, 2007 Lecture 2aSlide #1 Bivariate Regression Analysis Theoretical Models Basic Linear Models: Deterministic Version Basic Linear Models: Stochastic

Week 2, 2007 Lecture 2a Slide #6

Normal, Independent & Identical i Distributions (“Normal iid”)

Y

X

Problem: We don’t know:

a) if error assumptions are true; b) values for 0 and 1

Solution: Estimate ‘em!

Page 7: Week 2, 2007 Lecture 2aSlide #1 Bivariate Regression Analysis Theoretical Models Basic Linear Models: Deterministic Version Basic Linear Models: Stochastic

Week 2, 2007 Lecture 2a Slide #7

Estimating Linear ModelsYi = +Xi is modeled as:

Y^

=b +bXi

Y^ is the predicted value forYi

So: Yi −Yi^

=eior: ei =Yi −b −bXi

This is the formula for RESIDUALS -- which you will cometo know and cherish.

Page 8: Week 2, 2007 Lecture 2aSlide #1 Bivariate Regression Analysis Theoretical Models Basic Linear Models: Deterministic Version Basic Linear Models: Stochastic

Week 2, 2007 Lecture 2a Slide #8

Residuals: Statistical Forensics

• Residuals measure prediction error:

ei > 0 if Yi > Yi

ei < 0 if Yi < Yi

Yi = +Xi

^

^

Y

X

Page 9: Week 2, 2007 Lecture 2aSlide #1 Bivariate Regression Analysis Theoretical Models Basic Linear Models: Deterministic Version Basic Linear Models: Stochastic

Week 2, 2007 Lecture 2a Slide #9

Stata and Regression: Predicting Incarceration with Average income

• Stata dataset: Guns.dta– From “Data for empirical exercises”– What are your expectations? Why?

• Stata command:– Regression: “regress incarc_rate avginc”

• Output: Source | SS df MS Number of obs = 1173 -------------+------------------------------ F( 1, 1171) = 316.82 Model | 7986385.28 1 7986385.28 Prob > F = 0.0000 Residual | 29518728.5 1171 25208.1371 R-squared = 0.2129 -------------+------------------------------ Adj R-squared = 0.2123 Total | 37505113.8 1172 32000.9503 Root MSE = 158.77 ------------------------------------------------------------------------------ incarc_rate | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- avginc | 32.31456 1.815488 17.80 0.000 28.75258 35.87653 _cons | -216.931 25.34477 -8.56 0.000 -266.6572 -167.2047 ------------------------------------------------------------------------------

Page 10: Week 2, 2007 Lecture 2aSlide #1 Bivariate Regression Analysis Theoretical Models Basic Linear Models: Deterministic Version Basic Linear Models: Stochastic

Week 2, 2007 Lecture 2a Slide #10

ei = Yi − b0 − b1X i or ei = Yi + 216.93− 32.31X i

Residual Analysis

In our data, some observed valuesare larger than

would bepredicted by

average income alone

Page 11: Week 2, 2007 Lecture 2aSlide #1 Bivariate Regression Analysis Theoretical Models Basic Linear Models: Deterministic Version Basic Linear Models: Stochastic

Week 2, 2007 Lecture 2a Slide #11

Normality of Residuals

Page 12: Week 2, 2007 Lecture 2aSlide #1 Bivariate Regression Analysis Theoretical Models Basic Linear Models: Deterministic Version Basic Linear Models: Stochastic

Week 2, 2007 Lecture 2a Slide #12

More on Normality: Q-Normal

Page 13: Week 2, 2007 Lecture 2aSlide #1 Bivariate Regression Analysis Theoretical Models Basic Linear Models: Deterministic Version Basic Linear Models: Stochastic

Week 2, 2007 Lecture 2a Slide #13

Distribution of Residuals by X

Page 14: Week 2, 2007 Lecture 2aSlide #1 Bivariate Regression Analysis Theoretical Models Basic Linear Models: Deterministic Version Basic Linear Models: Stochastic

Week 2, 2007 Lecture 2a Slide #14

BREAK TIME