131
Week 5: Simple Linear Regression Brandon Stewart 1 Princeton September 28-October 2, 2020 1 These slides are heavily influenced by Matt Blackwell, Adam Glynn, Erin Hartman and Jens Hainmueller. Illustrations by Shay O’Brien. Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 1 / 127

Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Week 5: Simple Linear Regression

Brandon Stewart1

Princeton

September 28-October 2, 2020

1These slides are heavily influenced by Matt Blackwell, Adam Glynn, Erin Hartmanand Jens Hainmueller. Illustrations by Shay O’Brien.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 1 / 127

Page 2: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Where We’ve Been and Where We’re Going...

Last WeekI hypothesis testingI what is regression

This WeekI mechanics and properties of simple linear regressionI inference and measures of model fitI confidence intervals for regressionI goodness of fit

Next WeekI mechanics with two regressorsI omitted variables, multicollinearity

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 2 / 127

Page 3: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Macrostructure—This Semester

The next few weeks,

Linear Regression with Two Regressors

Break Week and Multiple Linear Regression

Rethinking Regression

Regression in the Social Sciences

Causality with Measured Confounding

Unmeasured Confounding and Instrumental Variables

Repeated Observations and Panel Data

Review and Final Discussion

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 3 / 127

Page 4: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

1 Mechanics of OLS

2 Classical Perspective (Part 1, Unbiasedness)Sampling DistributionsClassical Assumptions 1–4

3 Classical Perspective: VarianceSampling VarianceGauss-MarkovLarge SamplesSmall SamplesAgnostic Perspective

4 InferenceHypothesis TestsConfidence IntervalsGoodness of fitInterpretation

5 Non-linearitiesLog TransformationsFun With LogsLOESS

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 4 / 127

Page 5: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Narrow Goal: Understand lm() Output

Call:

lm(formula = sr ~ pop15, data = LifeCycleSavings)

Residuals:

Min 1Q Median 3Q Max

-8.637 -2.374 0.349 2.022 11.155

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 17.49660 2.27972 7.675 6.85e-10 ***

pop15 -0.22302 0.06291 -3.545 0.000887 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.03 on 48 degrees of freedom

Multiple R-squared: 0.2075,Adjusted R-squared: 0.191

F-statistic: 12.57 on 1 and 48 DF, p-value: 0.0008866Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 5 / 127

Page 6: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

ReminderHow do we fit the regression line Y = β0 + β1X to the data?Answer: We will minimize the squared sum of residuals

iii YYu

Residual ui is “part” of Yi not predicted

n

i

iu1

2

,min

10

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 6 / 127

Page 7: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

The Population QuantityBroadly speaking we are interested in the conditional expectation function(CEF) in part because it minimizes the mean squared error.

The CEF has a potentially arbitrary shape but there is always a best linearpredictor (BLP) or linear projection which is the line given by:

g(X ) = β0 + β1X

β0 = E [Y ]− Cov[X ,Y ]

V [X ]E [X ]

β1 =Cov[X ,Y ]

V [X ]

This may not be a good approximation depending on how non-linear the trueCEF is. However, it provides us with a reasonable target that always exists.

Define deviations from the BLP as

u = Y − g(X )

then, the following properties hold:(1) E [u] = 0, (2) E [Xu] = 0, (3) Cov[X , u] = 0

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 7 / 127

Page 8: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

What is OLS?The best linear predictor is the line that minimizes

(β0, β1) = arg minb0,b1

E [(Y − b0 − b1X )2]

Ordinary Least Squares (OLS) is a method for minimizing the sampleanalog of this quantity. It solves the optimization problem:

(β0, β1) = arg minb0,b1

n∑i=1

(Yi − b0 − b1Xi )2

In words, the OLS estimates are the intercept and slope that minimizethe sum of the squared residuals.

There are many loss functions, but OLS uses the squared error losswhich is connected to the conditional expectation function. If wechose a different loss, we would target a different feature of theconditional distribution.Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 8 / 127

Page 9: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Deriving the OLS estimator

Let’s think about n pairs of sample observations:(Y1,X1), (Y2,X2), . . . , (Yn,Xn)

Let {b0, b1} be possible values for {β0, β1}Define the least squares objective function:

S(b0, b1) =n∑

i=1

(Yi − b0 − b1Xi )2.

How do we derive the LS estimators for β0 and β1? We want tominimize this function, which is actually a very well-defined calculusproblem.

1 Take partial derivatives of S with respect to b0 and b1.2 Set each of the partial derivatives to 03 Solve for {b0, b1} and replace them with the solutions

We are going to step through this process together.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 9 / 127

Page 10: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Step 1: Take Partial Derivatives

S(b0, b1) =n∑

i=1

(Yi − b0 − Xib1)2

=n∑

i=1

(Y 2i − 2Yib0 − 2Yib1Xi + b2

0 + 2b0b1Xi + b21X

2i )

∂S(b0, b1)

∂b0=

n∑i=1

(−2Yi + 2b0 + 2b1Xi )

= −2n∑

i=1

(Yi − b0 − b1Xi )

∂S(b0, b1)

∂b1=

n∑i=1

(−2YiXi + 2b0Xi + 2b1X2i )

= −2n∑

i=1

Xi (Yi − b0 − b1Xi )

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 10 / 127

Page 11: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Solving for the Intercept

∂S(b0, b1)

∂b0= −2

n∑i=1

(Yi − b0 − b1Xi )

0 = −2n∑

i=1

(Yi − b0 − b1Xi )

0 =n∑

i=1

(Yi − b0 − b1Xi )

0 =n∑

i=1

Yi −n∑

i=1

b0 −n∑

i=1

b1Xi

b0n =

(n∑

i=1

Yi

)− b1

(n∑

i=1

Xi

)b0 = Y − b1X

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 11 / 127

Page 12: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

A Helpful Lemma on Deviations from Means

Lemmas are like helper results that are often invoked repeatedly.

Lemma (Deviations from the Mean Sum to 0)

n∑i=1

(Xi − X ) =

(n∑

i=1

Xi

)− nX

=

(n∑

i=1

Xi

)− n

n∑i=1

Xi/n

=

(n∑

i=1

Xi

)−

n∑i=1

Xi

= 0

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 12 / 127

Page 13: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Solving for the Slope

0 = −2n∑

i=1

Xi (Yi − b0 − b1Xi )

0 =n∑

i=1

Xi (Yi − b0 − b1Xi )

0 =n∑

i=1

Xi (Yi − (Y − b1X )− b1Xi ) (sub in b0)

0 =n∑

i=1

Xi (Yi − Y − b1(Xi − X ))

0 =n∑

i=1

Xi (Yi − Y )− b1

n∑i=1

Xi (Xi − X )

b1

n∑i=1

Xi (Xi − X ) =n∑

i=1

Xi (Yi − Y )− X∑i=1

(Yi − Y ) (add 0)

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 13 / 127

Page 14: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Solving for the Slope

b1

n∑i=1

Xi (Xi − X ) =n∑

i=1

Xi (Yi − Y )− X∑i=1

(Yi − Y )

b1

n∑i=1

Xi (Xi − X ) =n∑

i=1

Xi (Yi − Y )−∑i=1

X (Yi − Y )

b1

(n∑

i=1

Xi (Xi − X )−∑i=1

X (Xi − X )

)=

n∑i=1

(Xi − X )(Yi − Y ) add 0

b1

n∑i=1

(Xi − X )(Xi − X ) =n∑

i=1

(Xi − X )(Yi − Y )

b1 =

∑ni=1(Xi − X )(Yi − Y )∑n

i=1(Xi − X )2

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 14 / 127

Page 15: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

The OLS estimator

Now we’re done! Here are the OLS estimators:

β0 = Y − β1X

β1 =

∑ni=1(Xi − X )(Yi − Y )∑n

i=1(Xi − X )2

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 15 / 127

Page 16: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Intuition of the OLS estimatorThe intercept equation tells us that the regression line goes throughthe point (Y ,X ):

Y = β0 + β1X

The slope for the regression line can be written as the following:

β1 =

∑ni=1(Xi − X )(Yi − Y )∑n

i=1(Xi − X )2=

Sample Covariance between X and Y

Sample Variance of X

The higher the covariance between X and Y , the higher the slope willbe.

Negative covariances → negative slopes;positive covariances → positive slopes

If Xi doesn’t vary, the denominator is undefined.

If Yi doesn’t vary, you get a flat line.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 16 / 127

Page 17: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Mechanical properties of OLSLater we’ll see that under certain assumptions, OLS will have nicestatistical properties.

But some properties are mechanical since they can be derived fromthe first order conditions of OLS.

1 The sample mean of the residuals will be zero:

1

n

n∑i=1

ui = 0

2 The residuals will be uncorrelated with the predictor(Cov is the sample covariance):

n∑i=1

Xi ui = 0 =⇒ Cov(Xi , ui ) = 0

3 The residuals will be uncorrelated with the fitted values:n∑

i=1

Yi ui = 0 =⇒ Cov(Yi , ui ) = 0

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 17 / 127

Page 18: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

OLS slope as a weighted sum of the outcomes

One useful derivation is to write the OLS estimator for the slope as aweighted sum of the outcomes.

β1 =n∑

i=1

WiYi

Where here we have the weights, Wi as:

Wi =(Xi − X )∑ni=1(Xi − X )2

This is important for two reasons. First, it’ll make derivations latermuch easier. And second, it shows that is just the sum of a randomvariable. Therefore it is also a random variable.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 18 / 127

Page 19: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Lemma 2: OLS as a Weighted Sum of Outcomes

Lemma (OLS as Weighted Sum of Outcomes)

β1 =

∑ni=1(Xi − X )(Yi − Y )∑n

i=1(Xi − X )2

=

∑ni=1(Xi − X )Yi∑ni=1(Xi − X )2

−∑n

i=1(Xi − X )Y∑ni=1(Xi − X )2

=

∑ni=1(Xi − X )Yi∑ni=1(Xi − X )2

=n∑

i=1

WiYi

Where the weights, Wi are:

Wi =(Xi − X )∑ni=1(Xi − X )2

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 19 / 127

Page 20: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

We Covered

A brief review of regression

Derivation of the OLS estimator

OLS as a weighted sum of outcomes

Next Time: The Classical Perspective

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 20 / 127

Page 21: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Where We’ve Been and Where We’re Going...

Last WeekI hypothesis testingI what is regression

This WeekI mechanics and properties of simple linear regressionI inference and measures of model fitI confidence intervals for regressionI goodness of fit

Next WeekI mechanics with two regressorsI omitted variables, multicollinearity

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 21 / 127

Page 22: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

1 Mechanics of OLS

2 Classical Perspective (Part 1, Unbiasedness)Sampling DistributionsClassical Assumptions 1–4

3 Classical Perspective: VarianceSampling VarianceGauss-MarkovLarge SamplesSmall SamplesAgnostic Perspective

4 InferenceHypothesis TestsConfidence IntervalsGoodness of fitInterpretation

5 Non-linearitiesLog TransformationsFun With LogsLOESS

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 22 / 127

Page 23: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Sampling distribution of the OLS estimatorRemember: OLS is an estimator—it’s a machine that we plugsamples into and we get out estimates.

OLS

Sample 1: {(Y1,X1), . . . , (Yn,Xn)} (β0, β1)1

Sample 2: {(Y1,X1), . . . , (Yn,Xn)} (β0, β1)2

......

Sample k − 1: {(Y1,X1), . . . , (Yn,Xn)} (β0, β1)k−1

Sample k: {(Y1,X1), . . . , (Yn,Xn)} (β0, β1)k

Just like the sample mean, sample difference in means, or the samplevariance

It has a sampling distribution, with a sampling variance/standarderror, etc.

Let’s take a simulation approach to demonstrate:I Let’s use some data from Acemoglu, Daron, Simon Johnson, and

James A. Robinson. “The colonial origins of comparative development:An empirical investigation.” 2000

I See how the line varies from sample to sample

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 23 / 127

Page 24: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Simulation procedure

1 Draw a random sample of size n = 30 with replacement usingsample()

2 Use lm() to calculate the OLS estimates of the slope and intercept3 Plot the estimated regression line

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 24 / 127

Page 25: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Population Regression

1 2 3 4 5 6 7 8

67

89

1011

12

Log Settler Mortality

Log

GDP

per c

apita

gro

wth

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 25 / 127

Page 26: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Randomly sample from AJR

1 2 3 4 5 6 7 8

67

89

1011

12

Log Settler Mortality

Log

GDP

per c

apita

gro

wth

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 26 / 127

Page 27: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Sampling distribution of OLS

You can see that the estimated slopes and intercepts vary from sampleto sample, but that the “average” of the lines looks about right.

Sampling distribution of intercepts

β0

Freq

uenc

y

6 8 10 12 14

010

030

0

Sampling distribution of slopes

β1

Freq

uenc

y-1.5 -1.0 -0.5 0.0 0.5

010

030

0

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 27 / 127

Page 28: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

The Sampling Distribution is a Joint Distribution!

While both the intercept and the slope vary, they vary together.

−1.2

−0.8

−0.4

0.0

8 10 12 14

β0

β 1

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 28 / 127

Page 29: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Sample Mean Properties Review

In the last few weeks we derived the properties of the samplingdistribution for the sample mean, Xn.

Under essentially only the iid assumption (plus finite mean andvariance) we derived the large sample distribution as

Xn ∼ N(µ,σ2

n

)I This means the estimator is unbiased for the population mean:

E [Xn] = µ.I has sampling variance: σ2/nI and standard error: σ/

√n

This in turn gave us confidence intervals and hypothesis tests.

We will use the same strategy here!

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 29 / 127

Page 30: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Our goal

What is the sampling distribution of the OLS slope?

β1 ∼?(?, ?)

We need fill in those ?s.

We’ll start with the mean of the sampling distribution. Is theestimator centered at the true value, β1?

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 30 / 127

Page 31: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Classical Model: OLS Assumptions Preview

1 Linearity in Parameters: The population model is linear in itsparameters and correctly specified

2 Random Sampling: The observed data represent a random samplefrom the population described by the model.

3 Variation in X : There is variation in the explanatory variable.

4 Zero conditional mean: Expected value of the error term is zeroconditional on all values of the explanatory variable

5 Homoskedasticity: The error term has the same variance conditionalon all values of the explanatory variable.

6 Normality: The error term is independent of the explanatory variablesand normally distributed.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 31 / 127

Page 32: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Hierarchy of OLS Assumptions

!"#$%&'(%)$*+(,(*+#-'./0%)$*

1(./(%)$*/$*2*

3$4/(-#"$#--*5)$-/-,#$'6*

1(./(%)$*/$*2*

7($")8*9(80:/$;*

</$#(./,6*/$*=(.(8#,#.-*

>#.)*5)$"/%)$(:*?#($*

@(A--B?(.C)D*EF<3GH*I-680,)%'*!$J#.#$'#*************

EK*($"*!LH"

1(./(%)$*/$*2*

7($")8*9(80:/$;*

</$#(./,6*/$*=(.(8#,#.-*

>#.)*5)$"/%)$(:*?#($*

M)8)-C#"(-%'/,6*

5:(--/'(:*<?*EF3GH*98(::B9(80:#*!$J#.#$'#***

E,*($"*NH*

1(./(%)$*/$*2*

7($")8*9(80:/$;*

</$#(./,6*/$*=(.(8#,#.-*

>#.)*5)$"/%)$(:*?#($*

M)8)-C#"(-%'/,6*

O).8(:/,6*)J*G..).-*

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 32 / 127

Page 33: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

OLS Assumption I

Assumption (I. Linearity in Parameters)

The population regression model is linear in its parameters and correctlyspecified as:

Yi = β0 + β1Xi + ui

Note that it can be nonlinear in variablesI OK: Yi = β0 + β1Xi + ui or

Yi = β0 + β1X2i + ui or

Yi = β0 + β1log(Xi ) + uI Not OK: Yi = β0 + β2

1Xi + ui orYi = β0 + exp(β1)Xi + ui

β0, β1: Population parameters — fixed and unknown

ui : Unobserved random variable with E [ui ] = 0 — captures all otherfactors influencing Yi other than Xi

We assume this to be the structural model, i.e., the model describingthe true process generating Yi

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 33 / 127

Page 34: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

OLS Assumption II

Assumption (II. Random Sampling)

The observed data:(yi , xi ) for i = 1, ..., n

represent an i.i.d. random sample of size n following the population model.

Data examples consistent with this assumption:

A cross-sectional survey where the units are sampled randomly

Potential Violations:

Time series data (regressor values may exhibit persistence)

Sample selection problems (sample not representative of thepopulation)

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 34 / 127

Page 35: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

OLS Assumption III

Assumption (III. Variation in X ; a.k.a. No Perfect Collinearity)

The observed data:xi for i = 1, ..., n

are not all the same value.

Satisfied as long as there is some variation in the regressor X in thesample.

Why do we need this?

β1 =

∑ni=1(xi − x)(yi − y)∑n

i=1(xi − x)2

This assumption is needed just to calculate β.

Only assumption needed for using OLS as a pure data summary.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 35 / 127

Page 36: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Stuck in a moment

Why does this matter? How would you draw the line of best fitthrough this scatterplot, which is a violation of this assumption?

-3 -2 -1 0 1 2 3

-2-1

01

X

Y

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 36 / 127

Page 37: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

OLS Assumption IV

Assumption (IV. Zero Conditional Mean)

The expected value of the error term is zero conditional on any value of theexplanatory variable:

E [ui |Xi = x ] = 0

E [ui |X ] = 0 implies a slightly weaker condition Cov(X , u) = 0

Given random sampling, E [u|X ] = 0 also implies E [ui |xi ] = 0 for all i

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 37 / 127

Page 38: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Violating the zero conditional mean assumption

-3 -2 -1 0 1 2 3

-2-1

01

23

45

Assumption 4 violated

X

Y

-3 -2 -1 0 1 2 3-2

-10

12

34

5

Assumption 4 not violated

X

Y

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 38 / 127

Page 39: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Violating the zero conditional mean assumption

−2 −1 0 1 2

−4

−2

02

4

Assumption 4 Violated

X

Y

−2 −1 0 1 2−

4−

20

24

Assumption 4 Not Violated

X

Y

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 39 / 127

Page 40: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Unbiasedness

With Assumptions 1-4, we can show that the OLS estimator for the slopeis unbiased, that is E [β1] = β1.

Let’s prove it!

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 40 / 127

Page 41: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Lemma 3: Weighted Combinations of Xi

Lemma (∑

i WiXi = 1)

n∑i=1

WiXi =n∑

i=1

Xi (Xi − X )∑nj=1(Xj − X )2

=1∑n

j=1(Xj − X )2

n∑i=1

Xi (Xi − X )

=1∑n

j=1(Xj − X )2

[n∑

i=1

Xi (Xi − X )−n∑

i=1

X (Xi − X )

]

=1∑n

j=1(Xj − X )2

n∑i=1

(Xi − X )(Xi − X )

= 1

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 41 / 127

Page 42: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Lemma 4: Estimation Error

Lemma

β1 =n∑

i=1

WiYi

=n∑

i=1

Wi (β0 + β1Xi + ui )

= β0

(n∑

i=1

Wi

)+ β1

(n∑

i=1

WiXi

)+

n∑i=1

Wiui

= β1 +n∑

i=1

Wiui

β1 − β1 =n∑

i=1

Wiui

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 42 / 127

Page 43: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Unbiasedness Proof

E [β1 − β1|X ] = E

[n∑

i=1

Wiui |X

]

=n∑

i=1

E [Wiui |X ]

=∑i=1

WiE [ui |X ]

=∑i=1

Wi0

= 0

Using iterated expectations we can show that it is also unconditionallybiased E [β1] = E [E [β1|X ]] = E [β1] = β1.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 43 / 127

Page 44: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Consistency

Recall the estimation error,

β1 = β1 +n∑

i=1

Wiui

Under iid sampling we have

n∑i=1

Wiui =

∑ni=1(Xi − X )ui∑ni=1(Xi − X )

p→ Cov[Xi , ui ]

V [Xi ]

Under A4 (zero conditional mean error) we have the slightly weakerproperty Cov [Xi , ui ] = 0 so as long as V [X ] > 0, then we have,

β1p→ β1

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 44 / 127

Page 45: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

We Covered

The first four assumptions of the classical model

We showed that these four were sufficient to establish unbiasednessand consistency.

We even proved it to ourselves!

Next Time: The Classical Perspective Part 2: Variance.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 45 / 127

Page 46: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Where We’ve Been and Where We’re Going...

Last WeekI hypothesis testingI what is regression

This WeekI mechanics and properties of simple linear regressionI inference and measures of model fitI confidence intervals for regressionI goodness of fit

Next WeekI mechanics with two regressorsI omitted variables, multicollinearity

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 46 / 127

Page 47: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

1 Mechanics of OLS

2 Classical Perspective (Part 1, Unbiasedness)Sampling DistributionsClassical Assumptions 1–4

3 Classical Perspective: VarianceSampling VarianceGauss-MarkovLarge SamplesSmall SamplesAgnostic Perspective

4 InferenceHypothesis TestsConfidence IntervalsGoodness of fitInterpretation

5 Non-linearitiesLog TransformationsFun With LogsLOESS

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 47 / 127

Page 48: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Where are we?

Now we know that, under Assumptions 1-4, we know that

β1 ∼?(β1, ?)

That is we know that the sampling distribution is centered on thetrue population slope, but we don’t know the population variance.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 48 / 127

Page 49: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Sampling variance of estimated slope

In order to derive the sampling variance of the OLS estimator,

1 Linearity2 Random (iid) sample3 Variation in Xi

4 Zero conditional mean of the errors5 Homoskedasticity

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 49 / 127

Page 50: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Variance of OLS EstimatorsHow can we derive Var[β0] and Var[β1]? Let’s make the following additionalassumption:

Assumption (V. Homoskedasticity)

The conditional variance of the error term is constant and does not vary as afunction of the explanatory variable:

Var[u|X ] = σ2u

This implies Var[u] = σ2u

→ all errors have an identical error variance (σ2ui = σ2

u for all i)

Taken together, Assumptions I–V imply:

E [Y |X ] = β0 + β1X

Var[Y |X ] = σ2u

Violation: Var[u|X = x1] 6= Var[u|X = x2] called heteroskedasticity.

Assumptions I–V are collectively known as the Gauss-Markov assumptions

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 50 / 127

Page 51: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Heteroskedasticity

0 20 40 60 80 100

0

20

40

60

80

100

120

x

y

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 51 / 127

Page 52: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Deriving the sampling variance

V [β1|X ] =??

V [β1|X ] = V

[n∑

i=1

Wiui

∣∣∣X]

=n∑

i=1

W 2i V [ui |X ] (A2: iid)

=n∑

i=1

W 2i σ

2u (A5: homoskedastic)

= σ2u

n∑i=1

((Xi − X )∑n

i ′=1(Xi ′ − X )2

)2

=σ2u∑n

i=1(Xi − X )2

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 52 / 127

Page 53: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Variance of OLS Estimators

Theorem (Variance of OLS Estimators)

Given OLS Assumptions I–V (Gauss-Markov Assumptions):

V [β1 | X ] =σ2u∑n

i=1(xi − x)2

V [β0 | X ] = σ2u

{1

n+

x2∑ni=1(xi − x)2

}where V [u | X ] = σ2

u (the error variance).

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 53 / 127

Page 54: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Understanding the sampling variance

V [β1|X1, . . . ,Xn] =σ2u∑n

i=1(Xi − X )2

What drives the sampling variability of the OLS estimator?I The higher the variance of Yi |Xi , the higher the sampling varianceI The lower the variance of Xi , the higher the sampling varianceI As we increase n, the denominator gets large, while the numerator is

fixed and so the sampling variance shrinks to 0.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 54 / 127

Page 55: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Variance in X Reduces Standard Errors

−1.0 −0.5 0.0 0.5 1.0

−4

−2

02

4

x

y

−1.0 −0.5 0.0 0.5 1.0−

4−

20

24

x

y

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 55 / 127

Page 56: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Estimating the Variance of OLS Estimators

How can we estimate the unobserved error variance Var [u] = σ2u?

We can derive an estimator based on the residuals:

ui = yi − yi = yi − β0 − β1xi

Recall: The errors ui are NOT the same as the residuals ui .

Intuitively, the scatter of the residuals around the fitted regression line shouldreflect the unseen scatter about the true population regression line.

We can measure scatter with the mean squared deviation:

MSD(u) ≡ 1

n

n∑i=1

(ui − ¯u)2 =1

n

n∑i=1

u2i

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 56 / 127

Page 57: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Estimating the Variance of OLS Estimators

By construction, the regression line is closer since it is drawn to fit thesample we observe

Specifically, the regression line is drawn so as to minimize the sum of thesquares of the distances between it and the observations

So the spread of the residuals MSD(u) will slightly underestimate the errorvariance Var[u] = σ2

u on average

In fact, we can show that with a single regressor X we have:

E [MSD(u)] =n − 2

nσ2u (degrees of freedom adjustment)

Thus, an unbiased estimator for the error variance is:

σ2u =

n

n − 2MSD(u) =

n

n − 2

1

n

n∑i=1

ui =1

n − 2

n∑i=1

u2i

We plug this estimate into the variance estimators for β0 and β1.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 57 / 127

Page 58: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Where are we?

Under Assumptions 1-5, we know that

β1 ∼?

(β1,

σ2u∑n

i=1(Xi − X )2

)Now we know the mean and sampling variance of the samplingdistribution.

How does this compare to other estimators for the population slope?

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 58 / 127

Page 59: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Where are we?

!"#$%&'(%)$*+(,(*+#-'./0%)$*

1(./(%)$*/$*2*

3$4/(-#"$#--*5)$-/-,#$'6*

1(./(%)$*/$*2*

7($")8*9(80:/$;*

</$#(./,6*/$*=(.(8#,#.-*

>#.)*5)$"/%)$(:*?#($*

@(A--B?(.C)D*EF<3GH*I-680,)%'*!$J#.#$'#*************

EK*($"*!LH"

1(./(%)$*/$*2*

7($")8*9(80:/$;*

</$#(./,6*/$*=(.(8#,#.-*

>#.)*5)$"/%)$(:*?#($*

M)8)-C#"(-%'/,6*

5:(--/'(:*<?*EF3GH*98(::B9(80:#*!$J#.#$'#***

E,*($"*NH*

1(./(%)$*/$*2*

7($")8*9(80:/$;*

</$#(./,6*/$*=(.(8#,#.-*

>#.)*5)$"/%)$(:*?#($*

M)8)-C#"(-%'/,6*

O).8(:/,6*)J*G..).-*

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 59 / 127

Page 60: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

OLS is BLUE :(

Theorem (Gauss-Markov)

Given OLS Assumptions I–V, the OLS estimator is BLUE, i.e. the

1 Best: Lowest variance in class

2 Linear: Among Linear estimators

3 Unbiased: Among Linear Unbiased estimators

4 Estimator.

A linear estimator is one that can be written as β = Wy

Assumptions 1-5 are called the “Gauss Markov Assumptions”

Result fails to hold when the assumptions are violated!

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 60 / 127

Page 61: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Gauss-Markov Theorem

Regression Analysis Tutorial 70

Econometrics Laboratory � University of California at Berkeley � 22-26 March 1999

All estimators

unbiased

linear

Gauss-Markov Theorem

OLS is efficient in the class of unbiased, linear estimators.

OLS is BLUE--best linear unbiased estimator.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 61 / 127

Page 62: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Where are we?

Under Assumptions 1-5, we know that

β1 ∼?

(β1,

σ2u∑n

i=1(Xi − X )2

)

And we know that σ2u∑n

i=1(Xi−X )2 is the lowest variance of any linear

estimator of β1

What about the last question mark? What’s the form of thedistribution?

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 62 / 127

Page 63: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Large-sample distribution of OLS estimatorsRemember that the OLS estimator is the sum of independent r.v.’s:

β1 =n∑

i=1

WiYi

Mantra of the Central Limit Theorem:“the sums and means of random variables tend to be Normallydistributed in large samples.”

True here as well, so we know that in large samples:

β1 − β1

SE [β1]∼ N(0, 1)

Can also replace SE with an estimate:

β1 − β1

SE [β1]∼ N(0, 1)

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 63 / 127

Page 64: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Where are we?

Under Assumptions 1-5 and in large samples, we know that

β1 ∼ N

(β1,

σ2u∑n

i=1(Xi − X )2

)

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 64 / 127

Page 65: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Sampling distribution in small samples

What if we have a small sample? What can we do then?

Can’t get something for nothing, but we can make progress if wemake another assumption:

1 Linearity

2 Random (iid) sample

3 Variation in Xi

4 Zero conditional mean of the errors

5 Homoskedasticity

6 Errors are conditionally Normal

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 65 / 127

Page 66: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

OLS Assumptions VI

Assumption (VI. Normality)

The population error term is independent of the explanatory variable, u⊥⊥X , andis normally distributed with mean zero and variance σ2

u:

u ∼ N(0, σ2u), which implies Y |X ∼ N(β0 + β1X , σ

2u)

Note: This also implies homoskedasticity and zero conditional mean.

Together Assumptions I–VI are the classical linear model (CLM)assumptions.

The CLM assumptions imply that OLS is BUE (i.e. minimum varianceamong all linear or non-linear unbiased estimators)

Non-normality of the errors is a serious concern in small samples. We canpartially check this assumption by looking at the residuals (more in comingweeks)

Variable transformations can help to come closer to normality

Reminder: we don’t need normality assumption in large samples

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 66 / 127

Page 67: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Sampling distribution of OLS slope

If we have Yi given Xi is distributed N(β0 + β1Xi , σ2u), then we have

the following at any sample size:

β1 − β1

SE [β1]∼ N(0, 1)

Furthermore, if we replace the true standard error with the estimatedstandard error, then we get the following:

β1 − β1

SE [β1]∼ tn−2

The standardized coefficient follows a t distribution n − 2 degrees offreedom. We take off an extra degree of freedom because we had toestimate one more parameter than just the sample mean.

All of this depends on Normal errors!

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 67 / 127

Page 68: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Where are we?

Under Assumptions 1-5 and in large samples, we know that

β1 ∼ N

(β1,

σ2u∑n

i=1(Xi − X )2

)

Under Assumptions 1-6 and in any sample, we know that

β1 − β1

SE [β1]∼ tn−2

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 68 / 127

Page 69: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Hierarchy of OLS Assumptions

!"#$%&'(%)$*+(,(*+#-'./0%)$*

1(./(%)$*/$*2*

3$4/(-#"$#--*5)$-/-,#$'6*

1(./(%)$*/$*2*

7($")8*9(80:/$;*

</$#(./,6*/$*=(.(8#,#.-*

>#.)*5)$"/%)$(:*?#($*

@(A--B?(.C)D*EF<3GH*I-680,)%'*!$J#.#$'#*************

EK*($"*!LH"

1(./(%)$*/$*2*

7($")8*9(80:/$;*

</$#(./,6*/$*=(.(8#,#.-*

>#.)*5)$"/%)$(:*?#($*

M)8)-C#"(-%'/,6*

5:(--/'(:*<?*EF3GH*98(::B9(80:#*!$J#.#$'#***

E,*($"*NH*

1(./(%)$*/$*2*

7($")8*9(80:/$;*

</$#(./,6*/$*=(.(8#,#.-*

>#.)*5)$"/%)$(:*?#($*

M)8)-C#"(-%'/,6*

O).8(:/,6*)J*G..).-*

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 69 / 127

Page 70: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Regression as parametric modelingLet’s summarize the parametric view we have taken thus far.

Gauss-Markov assumptions:

I (A1) linearity, (A2) i.i.d. sample, (A3) variation in X , (A4) zeroconditional mean error, (A5) homoskedasticity.

I basically, assume the model is right

OLS is BLUE, plus (A6) normality of the errors and we get small sampleSEs and BUE.

What is the basic approach here?

I A1 defines a linear model for the outcome:

Yi = β0 + β1Xi + uiI A2 and A4 let us write the CEF as function of Xi alone.

E [Yi |Xi ] = µi = β0 + β1Xi

I A5-6, define a probabilistic model for the conditional distribution:

Yi |Xi ∼ N (µi , σ2)

I A3 covers the edge-case that the βs are indistinguishable.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 70 / 127

Page 71: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Agnostic views on regression

These assumptions assume we know a lot about how Yi is ‘generated’.

Justifications for using OLS (like BLUE/BUE) often invoke theseassumptions which are unlikely to hold exactly.

Alternative: take an agnostic view on regression.I use OLS without believing these assumptions.I lean on two things: A2 i.i.d. sample, asymptotics (large-sample

properties)

Lose the distributional assumptions and focus on approximating thebest linear predictor.

If the true CEF happens to be linear, the best linear predictor is it.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 71 / 127

Page 72: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Unbiasedness Result

One of the results most people know is that OLS is unbiased, butunbiased for what?

It is unbiased for the CEF under the assumption that the model iscorrectly specified.

However, this could be a quite poor approximation to the true CEF ifthere is a great deal of non-linearity.

We will often use OLS as a means to approximate the CEF, but don’tforget that it is just an approximation!

We will return in a few weeks to how you diagnose this approximation.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 72 / 127

Page 73: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Pedagogical Note

For now we are going to move forward with the classical worldviewand we will return to some alternative approaches later in the semesteronce we are comfortable with the matrix representation of regression.

This will lead to techniques like robust standard errors which don’trely on the assumptions of homoskedasticity (but have othertradeoffs!)

For now, just remember that regression is a linear approximation tothe CEF!

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 73 / 127

Page 74: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

We Covered

Sampling Variance

Gauss Markov

Large Sample and Small Sample Properties

Next Time: Inference

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 74 / 127

Page 75: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Where We’ve Been and Where We’re Going...

Last WeekI hypothesis testingI what is regression

This WeekI mechanics and properties of simple linear regressionI inference and measures of model fitI confidence intervals for regressionI goodness of fit

Next WeekI mechanics with two regressorsI omitted variables, multicollinearity

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 75 / 127

Page 76: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

1 Mechanics of OLS

2 Classical Perspective (Part 1, Unbiasedness)Sampling DistributionsClassical Assumptions 1–4

3 Classical Perspective: VarianceSampling VarianceGauss-MarkovLarge SamplesSmall SamplesAgnostic Perspective

4 InferenceHypothesis TestsConfidence IntervalsGoodness of fitInterpretation

5 Non-linearitiesLog TransformationsFun With LogsLOESS

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 76 / 127

Page 77: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Where are we?

Under Assumptions 1-5 and in large samples, we know that

β1 ∼ N

(β1,

σ2u∑n

i=1(Xi − X )2

)Under Assumptions 1-6 and in any sample, we know that

β1 − β1

SE [β1]∼ tn−2

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 77 / 127

Page 78: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Null and alternative hypotheses review

Null: H0 : β1 = 0I The null is the straw man we want to knock down.I With regression, almost always null of no relationship

Alternative: Ha : β1 6= 0I Claim we want to testI Could do one-sided test, but you shouldn’t

Notice these are statements about the population parameters, not theOLS estimates.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 78 / 127

Page 79: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Test statistic

Under the null of H0 : β1 = c , we can use the following familiar teststatistic:

T =β1 − c

SE [β1]

Under the null hypothesis:I large samples: T ∼ N (0, 1)I any size sample with normal errors: T ∼ tn−2

I conservative to use tn−2 anyways since tn−2 is approximately normal inlarge samples.

Thus, under the null, we know the distribution of T and can use thatto formulate a rejection region and calculate p-values.

By default, R shows you the test statistic for β1 = 0 and uses the tdistribution.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 79 / 127

Page 80: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Rejection regionChoose a level of the test, α, and find rejection regions thatcorrespond to that value under the null distribution:

P(−tα/2,n−2 < T < tα/2,n−2) = 1− α

This is exactly the same as with sample means and sample differencesin means, except that the degrees of freedom on the t distributionhave changed.

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

x

dnor

m(x

)

Retain RejectReject

0.025 0.025t = 1.96-t = -1.96

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 80 / 127

Page 81: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

p-value

The interpretation of the p-value is the same: the probability of seeinga test statistic at least this extreme if the null hypothesis were true

Mathematically:

P

(∣∣∣∣∣ β1 − c

SE [β1]

∣∣∣∣∣ ≥ |Tobs |

)If the p-value is less than α we would reject the null at the α level.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 81 / 127

Page 82: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Confidence intervalsVery similar to the approach with sample means. By the samplingdistribution of the OLS estimator, we know that we can find t-valuessuch that:

P(− tα/2,n−2 ≤

β1 − β1

SE [β1]≤ tα/2,n−2

)= 1− α

If we rearrange this as before, we can get an expression for confidenceintervals:

P(β1 − tα/2,n−2SE [β1] ≤ β1 ≤ β1 + tα/2,n−2SE [β1]

)= 1− α

Thus, we can write the confidence intervals as:

β1 ± tα/2,n−2SE [β1]

We can derive these for the intercept as well:

β0 ± tα/2,n−2SE [β0]

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 82 / 127

Page 83: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

CIs Simulation Example

Returning to our simulation example we can simulate the samplingdistributions of the 95 % confidence interval estimates for β1 and β0

Statistical Inference for Simple Linear RegressionDiagnostics

Properties of the Least Squares Estimator (Point Estimation)Hypothesis TestsConfidence Intervals

Sampling distribution of interval estimates

Returning to the simulation example, we can simulate the sampling distributions of the95% interval estimates for bβ0 and bβ1.

●●

●●

●●

0 2 4 6 8 10

−6

−4

−2

02

46

xx

yy

ββ0

0 2 4 6 8 10

ββ1

−2.0 −1.5 −1.0 −0.5 0.0

Gov2000: Quantitative Methodology for Political Science I

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 83 / 127

Page 84: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

CIs Simulation Example

Statistical Inference for Simple Linear RegressionDiagnostics

Properties of the Least Squares Estimator (Point Estimation)Hypothesis TestsConfidence Intervals

Sampling distribution of interval estimates

When we repeat the process over and over, we expect 95% of the confidence intervalsto contain the true parameters.Note that, in a given sample, one CI may cover its true value and the other may not.

ββ0

0 2 4 6 8 10

ββ1

−2.0 −1.5 −1.0 −0.5 0.0

Gov2000: Quantitative Methodology for Political Science I

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 83 / 127

Page 85: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Prediction error

How do we judge how well a line fits the data?

One way is to find out how much better we do at predicting Y oncewe include X into the regression model.

Prediction errors without X : best prediction is the mean, so oursquared errors, or the total sum of squares (SStot) would be:

SStot =n∑

i=1

(Yi − Y )2

Once we have estimated our model, we have new prediction errors,which are just the sum of the squared residuals or SSres :

SSres =n∑

i=1

(Yi − Yi )2

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 84 / 127

Page 86: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Sum of Squares

1 2 3 4 5 6 7 8

67

89

1011

12

Total Prediction Errors

Log Settler Mortality

Log

GDP

per c

apita

gro

wth

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 85 / 127

Page 87: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Sum of Squares

1 2 3 4 5 6 7 8

67

89

1011

12

Residuals

Log Settler Mortality

Log

GDP

per c

apita

gro

wth

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 85 / 127

Page 88: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

R-square

By definition, the residuals have to be smaller than the deviationsfrom the mean, so we might ask the following: how much lower is theSSres compared to the SStot?

We quantify this question with the coefficient of determination or R2.This is the following:

R2 =SStot − SSres

SStot= 1− SSres

SStot

This is the fraction of the total prediction error eliminated byproviding information on X .

Alternatively, this is the fraction of the variation in Y is “explainedby” X .

R2 = 0 means no relationship

R2 = 1 implies perfect linear fit

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 86 / 127

Page 89: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Is R-squared useful?

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

0 2 4 6 8 10

05

1015

x

y

R−squared = 0.66

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 87 / 127

Page 90: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Is R-squared useful?

●●

●●

●●

●●

● ●

0 2 4 6 8 10

02

46

810

x

y

R−squared = 0.96

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 87 / 127

Page 91: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Is R-squared useful?

5 10 15

46

810

12

X

Y

5 10 15

46

810

12

X

Y

5 10 15

46

810

12

X

Y

5 10 15

46

810

12

X

Y

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 87 / 127

Page 92: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Interpreting a RegressionLet’s have a quick chat about interpretation.

0

1

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 88 / 127

Page 93: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

State Legislators and African American PopulationInterpretations of increasing quality:

> summary(lm(beo ~ bpop, data = D))

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -1.31489 0.32775 -4.012 0.000264 ***

bpop 0.35848 0.02519 14.232 < 2e-16 ***

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 1.317 on 39 degrees of freedom

Multiple R-squared: 0.8385,Adjusted R-squared: 0.8344

F-statistic: 202.6 on 1 and 39 DF, p-value: < 2.2e-16

“In states where an additional .01 proportion of the population is African American, weobserve on average .035 proportion more African American state legislators (between .03and .04 with 95% confidence).”

(still not perfect, the best will be subject matter specific. is fairly clear it is non-causal,gives uncertainty.)

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 89 / 127

Page 94: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Ground Rules: Interpretation of the Slope

I almost didn’t include the last example in the slides. It is hard to giveground rules that cover all cases. Regressions are a part of marshalingevidence in an argument which makes them naturally specific to context.

1 Give a short, but precise interpretation of the association usinginterpretable language and units

2 If the association has a causal interpretation explain why, otherwisedo not imply a causal interpretation.

3 Provide a meaningful sense of uncertainty

4 Indicate the practical significance of the finding for your argument.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 90 / 127

Page 95: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Goal Check: Understand lm() Output

Call:

lm(formula = sr ~ pop15, data = LifeCycleSavings)

Residuals:

Min 1Q Median 3Q Max

-8.637 -2.374 0.349 2.022 11.155

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 17.49660 2.27972 7.675 6.85e-10 ***

pop15 -0.22302 0.06291 -3.545 0.000887 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.03 on 48 degrees of freedom

Multiple R-squared: 0.2075,Adjusted R-squared: 0.191

F-statistic: 12.57 on 1 and 48 DF, p-value: 0.0008866Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 91 / 127

Page 96: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

We Covered

Hypothesis tests

Confidence intervals

Goodness of fit measures

Next Time: Non-linearities

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 92 / 127

Page 97: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Where We’ve Been and Where We’re Going...

Last WeekI hypothesis testingI what is regression

This WeekI mechanics and properties of simple linear regressionI inference and measures of model fitI confidence intervals for regressionI goodness of fit

Next WeekI mechanics with two regressorsI omitted variables, multicollinearity

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 93 / 127

Page 98: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

1 Mechanics of OLS

2 Classical Perspective (Part 1, Unbiasedness)Sampling DistributionsClassical Assumptions 1–4

3 Classical Perspective: VarianceSampling VarianceGauss-MarkovLarge SamplesSmall SamplesAgnostic Perspective

4 InferenceHypothesis TestsConfidence IntervalsGoodness of fitInterpretation

5 Non-linearitiesLog TransformationsFun With LogsLOESS

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 94 / 127

Page 99: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Non-linear CEFs

When we say that CEFs are linear with regression, we mean linear inparameters but by including transformations of our variables we canmake non-linear shapes of pre-specified functional forms.

Many of these non-linear transformations are made by creatingmultiple variables out of a single X and so will have to wait for futureweeks.

The function log(·) is one common transformation that has only oneparameter.

This is particularly useful for positive and right-skewed variables.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 95 / 127

Page 100: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Why does everyone keep logging stuff??

Logs linearize exponential growth.

exponentialgrows by a fixed percent.grows by a fixed amount.

linearStewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 96 / 127

Page 101: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

How? Let’s look.First, here’s a graph showing exponential growth.

x0123456789

10

y=(2x)1248

163264

128256512

1024

We’re going to use y = 2x, but

any other exponent will work

0 2 4 6 8 100

200

400

600

800

1000

1200

x

y

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 97 / 127

Page 102: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

What happens when we take the log of y?

logy = z ez = y

x0123456789

10

y1248

163264

128256512

1024

We’re going to use y = 2x, but

any other exponent will work

z

0 1 2 3 4 5 6 7 80

2

4

6

8

10

x

z

What happens when we take the log of y?

log1 = 0 e0 = 1

x0123456789

10

y1248

163264

128256512

1024

We’re going to use y = 2x, but

any other exponent will work

z

0 1 2 3 4 5 6 7 80

2

4

6

8

10

x

z

0

What happens when we take the log of y?

log2 = .69 e.69 = 2

x0123456789

10

y1248

163264

128256512

1024

We’re going to use y = 2x, but

any other exponent will work

z

0 1 2 3 4 5 6 7 80

2

4

6

8

10

x

z

0.69

What happens when we take the log of y?

log4 = 1.39 e1.39 = 4

x0123456789

10

y1248

163264

128256512

1024

We’re going to use y = 2x, but

any other exponent will work

z

0 1 2 3 4 5 6 7 80

2

4

6

8

10

x

z

0.69

1.39

What happens when we take the log of y?

log8 = 2.08 e2.08 = 8

x0123456789

10

y1248

163264

128256512

1024

We’re going to use y = 2x, but

any other exponent will work

z

0 1 2 3 4 5 6 7 80

2

4

6

8

10

x

z

0.69

1.392.08

What happens when we take the log of y?

logy = z ez = y

x0123456789

10

y1248

163264

128256512

1024

We’re going to use y = 2x, but

any other exponent will work

z

0 1 2 3 4 5 6 7 80

2

4

6

8

10

x

z

0.69

1.392.082.773.474.164.855.556.246.93

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 98 / 127

Page 103: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Interpretation

The log transformation changes the interpretation of β1:

Regress log(Y ) on X −→ β1 approximates percent increase in ourprediction of Y associated with one unit increase in X .

Regress Y on log(X ) −→ β1 approximates increase in Y associatedwith a percent increase in X .

Note that these approximations work only for small increments.

In particular, they do not work when X is a discrete random variable.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 99 / 127

Page 104: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Example from the American War Library

●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

0e+00 1e+05 2e+05 3e+05 4e+05 5e+05

0e+

001e

+05

2e+

053e

+05

4e+

055e

+05

6e+

05

X: Numbers of American Soldiers Killed in Action

Y: N

umbe

rs o

f Am

eric

an S

oldi

ers

Wou

nded

in A

ctio

n

JapanItaly TriesteNicaraguaTexas Border Cortina WarHaitiOperation Enduring Freedom, Afghanistan TheaterMexican WarOperation Enduring Freedom, AfghanistanFranco−Amer Naval WarNorth Atlantic Naval WarTerrorism Riyadh, Saudi ArabiaChina Civil WarDominican RepublicRussia Siberia ExpeditionBarbary WarsNicaraguaMexicoChina Yangtze ServiceGrenadaSouth KoreaTexas War Of IndependenceLebanonIsrael Attack/USS LibertyDominican RepublicPanamaChina Boxer RebellionMoro CampaignsRussia North ExpeditionPersian Gulf, Op Desert Shield/StormTerrorism Oklahoma CityPersian GulfTerrorism Khobar Towers, Saudi ArabiaYemen, USS ColeTerrorism, World Trade CenterSpanish American WarIndian WarsPhilippines WarD−DayAleutian CampaignWar of 1812Revolutionary WarIwo JimaOperation Iraqi Freedom, Iraq

Okinawa

Korean War

Civil War, SouthVietnam War

World War I

Civil War, North

World War II

β1 = 1.23 −→ One additional soldier killed predicts 1.23 additional soldierswounded

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 100 / 127

Page 105: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Wounded (Scale in Levels)

JapanItaly TriesteNicaraguaTexas Border Cortina WarHaitiOperation Enduring Freedom, Afghanistan TheaterMexican WarOperation Enduring Freedom, AfghanistanFranco−Amer Naval WarNorth Atlantic Naval WarTerrorism Riyadh, Saudi ArabiaChina Civil WarDominican RepublicRussia Siberia ExpeditionBarbary WarsNicaraguaMexicoChina Yangtze ServiceGrenadaSouth KoreaTexas War Of IndependenceLebanonIsrael Attack/USS LibertyDominican RepublicPanamaChina Boxer RebellionMoro CampaignsRussia North ExpeditionPersian Gulf, Op Desert Shield/StormTerrorism Oklahoma CityPersian GulfTerrorism Khobar Towers, Saudi ArabiaYemen, USS ColeTerrorism, World Trade CenterSpanish American WarIndian WarsPhilippines WarD−DayAleutian CampaignWar of 1812Revolutionary WarIwo JimaOperation Iraqi Freedom, IraqOkinawaKorean WarCivil War, SouthVietnam WarWorld War ICivil War, NorthWorld War II

0e+00 1e+05 2e+05 3e+05 4e+05 5e+05 6e+05

Number of Wounded

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 101 / 127

Page 106: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Wounded (Logarithmic Scale)

JapanItaly TriesteNicaraguaTexas Border Cortina WarHaitiOperation Enduring Freedom, Afghanistan TheaterMexican WarOperation Enduring Freedom, AfghanistanFranco−Amer Naval WarNorth Atlantic Naval WarTerrorism Riyadh, Saudi ArabiaChina Civil WarDominican RepublicRussia Siberia ExpeditionBarbary WarsNicaraguaMexicoChina Yangtze ServiceGrenadaSouth KoreaTexas War Of IndependenceLebanonIsrael Attack/USS LibertyDominican RepublicPanamaChina Boxer RebellionMoro CampaignsRussia North ExpeditionPersian Gulf, Op Desert Shield/StormTerrorism Oklahoma CityPersian GulfTerrorism Khobar Towers, Saudi ArabiaYemen, USS ColeTerrorism, World Trade CenterSpanish American WarIndian WarsPhilippines WarD−DayAleutian CampaignWar of 1812Revolutionary WarIwo JimaOperation Iraqi Freedom, IraqOkinawaKorean WarCivil War, SouthVietnam WarWorld War ICivil War, NorthWorld War II

2 4 6 8 10 12

Log(Number of Wounded)

10 100 1,000 10,000 100,000 1,000,000

Number of Wounded

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 102 / 127

Page 107: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Regression: Log-Level

●●●

●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●

●●●

0e+00 1e+05 2e+05 3e+05 4e+05

24

68

1012

X: Numbers of American Soldiers Killed in Action

Y: l

og(N

umbe

rs o

f Am

eric

an S

oldi

ers

Wou

nded

in A

ctio

n)

Japan

Italy TriesteNicaraguaTexas Border Cortina WarHaitiOperation Enduring Freedom, Afghanistan TheaterMexican WarOperation Enduring Freedom, Afghanistan

Franco−Amer Naval WarNorth Atlantic Naval WarTerrorism Riyadh, Saudi ArabiaChina Civil WarDominican RepublicRussia Siberia ExpeditionBarbary WarsNicaraguaMexicoChina Yangtze Service

GrenadaSouth KoreaTexas War Of Independence

LebanonIsrael Attack/USS LibertyDominican RepublicPanamaChina Boxer Rebellion

Moro CampaignsRussia North ExpeditionPersian Gulf, Op Desert Shield/StormTerrorism Oklahoma CityPersian GulfTerrorism Khobar Towers, Saudi ArabiaYemen, USS Cole

Terrorism, World Trade Center

Spanish American WarIndian Wars

Philippines WarD−DayAleutian CampaignWar of 1812

Revolutionary War

Iwo JimaOperation Iraqi Freedom, Iraq

Okinawa

Korean WarCivil War, SouthVietnam War

World War ICivil War, North

World War II

β1 = 0.0000237 −→ One additional soldier killed predicts 0.0023 percent increasein the number of soldiers wounded

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 103 / 127

Page 108: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Regression: Log-Log

●●●

●●●●

● ●● ● ●●● ●●

●●

● ●●● ●

● ●● ●

●●●●

●●

●●●

●●●

2 4 6 8 10 12

24

68

1012

X: Log(Numbers of American Soldiers Killed in Action)

Y: L

og(N

umbe

rs o

f Am

eric

an S

oldi

ers

Wou

nded

in A

ctio

n)

Japan

Italy TriesteNicaraguaTexas Border Cortina WarHaitiOperation Enduring Freedom, Afghanistan Theater Mexican WarOperation Enduring Freedom, Afghanistan

Franco−Amer Naval WarNorth Atlantic Naval WarTerrorism Riyadh, Saudi ArabiaChina Civil War Dominican RepublicRussia Siberia ExpeditionBarbary Wars NicaraguaMexicoChina Yangtze Service

GrenadaSouth KoreaTexas War Of Independence

LebanonIsrael Attack/USS LibertyDominican RepublicPanamaChina Boxer Rebellion

Moro CampaignsRussia North ExpeditionPersian Gulf, Op Desert Shield/StormTerrorism Oklahoma CityPersian GulfTerrorism Khobar Towers, Saudi ArabiaYemen, USS Cole

Terrorism, World Trade Center

Spanish American WarIndian Wars

Philippines WarD−DayAleutian CampaignWar of 1812

Revolutionary War

Iwo JimaOperation Iraqi Freedom, Iraq

Okinawa

Korean WarCivil War, SouthVietnam War

World War ICivil War, North

World War II

β1 = 0.797 −→ A percent increase in deaths predicts 0.797 percent increase inthe wounded

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 104 / 127

Page 109: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Four Most Commonly Used Models

Model Equation β1 Interpretation

Level-Level Y = β0 + β1X ∆Y = β1∆XLog-Level log(Y ) = β0 + β1X %∆Y = 100β1∆XLevel-Log Y = β0 + β1log(X ) ∆Y = (β1/100)%∆XLog-Log log(Y ) = β0 + β1log(X ) %∆Y = β1%∆X

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 105 / 127

Page 110: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Why Does This Approximation Work?

A useful thing to know is that for small x ,

log(1 + x) ≈ x

exp(x) ≈ 1 + x

This can be derived from a series expansion of the log function.Numerically, when |x | ≤ .1, the approximation is within 0.001.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 106 / 127

Page 111: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Why Does This Approximation Work?Take two numbers a > b > 0. The percentage difference between a and bis

p = 100

(a− b

b

)We can rewrite this as

a

b= 1 +

p

100

Taking natural logs

log(a)− log(b) = log(

1 +p

100

)Applying our approximation and multiplying by 100 we find,

p ≈ 100 (log(a)− log(b))

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 107 / 127

Page 112: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Be Careful: Log-Level with binary X

Assume we have: log(Y ) = β0 + β1 X where X is binary with values 1 or 0.Assume β1 > .2. What is the problem with saying that a one unit increase in X isassociated with a β1 · 100 percent change in Y ?

Log approximation is inaccurate for large changes like going from X = 0 toX = 1. Instead the percent change in Y when X goes from 0 to 1 needs to becomputed using:

100(YX=1 − YX=0)/YX=0 = 100((YX=1/YX=0)− 1)

= 100((YX=1/YX=0)− 1)

= 100(exp(β1)− 1)

Recall: log(YX=1)− log(YX=0) = log(YX=1/YX=0) = β1.

A one unit change in X (ie. going from 0 to 1) is associated with a100(exp(β1)− 1) percent increase in Y .

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 108 / 127

Page 113: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Interpreting a Logged Outcome

On the last few slides, there was a bit that was a little dodgy.

When we log the outcome, we are no longer approximating E [Y |X ]we are approximating E [log(Y )|X ].

Jensen’s inequality gives us information on this relation:f (E [X ]) ≤ E [f (X )] for any convex function f ().

In practice, this means we are no longer characterizing theexpectation of Y and it is technically innaccurate to talk about Y ‘onaverage’ changing in a certain way.

What are we characterizing? The geometric mean.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 109 / 127

Page 114: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Geometric Mean

exp(E (log(Y ))) = exp

(1

N

N∑i=1

log(Yi )

)

= exp

(1

Nlog(

N∏i=1

Yi )

)

= exp

log

( N∏i=1

Yi

) 1N

=

(N∏i=1

Yi

) 1N

= Geometric Mean(Y )

The geometric mean is a robust measure of central tendency.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 110 / 127

Page 115: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Application

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 111 / 127

Page 116: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Core Idea

Classic approach : E (log (Y ) | X )︸ ︷︷ ︸Mean of log

offspring income Ygiven parent income X

= β0 + β1︸︷︷︸Intergenerational

elasticity(IGE)

log (X )︸ ︷︷ ︸Log parent

income

MG proposal : log (E (Y | X ))︸ ︷︷ ︸Log of mean

offspring income Ygiven parent income X

= α0 + α1︸︷︷︸Intergenerationalelasticity of the

expectation (IGEE)

log (X )︸ ︷︷ ︸Log parent

income

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 112 / 127

Page 117: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Geometric Mean is Closer to the Median Than the Mean

Exp

onen

tiate

dm

ean

of lo

g

Med

ian

Mea

n

Subgroup: 1st quartile of childhood income

+ 1 familybeyond axis

Exp

onen

tiate

dm

ean

of lo

g

Med

ian

Mea

n

Subgroup: 2nd quartile of childhood income

+ 2 familiesbeyond axis

Exp

onen

tiate

dm

ean

of lo

g

Med

ian

Mea

n

Subgroup: 3rd quartile of childhood income

+ 8 familiesbeyond axis

Med

ian

Exp

onen

tiate

dm

ean

of lo

g

Mea

n

Subgroup: 4th quartile of childhood income

+ 11 familiesbeyond axis

0 50 100 150 200 250Offspring family income

(thousands of 2016 dollars)

Den

sity

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 113 / 127

Page 118: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Our Response

Images from this section are from this paper or earlier drafts of it.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 114 / 127

Page 119: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Two Implicit Choices

(1) Summary Statistics for the Conditional Distribution(gets you down to one number per value of x)

and

(2) Assume or Learn a Functional Form(potentially simplifies the set of summary statistics to a single number)

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 115 / 127

Page 120: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Visualizing the MG Proposal

●●

●● ●

GeometricMean

ArithmeticMean

Mitnik andGruskyProposal

ClassicIntergenerationalElasticity

Parentincome densityM

edia

n

$0k

$100k

$200k

$300k

$0k $100k $200kPredictor: Parent Income

Out

com

e: O

ffspr

ing

Inco

me

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 116 / 127

Page 121: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Single Summary Statistics Necessarily Mask Information

1. Near equality 2. Distributed 3. A few high earners

0 50 100 150 0 50 100 150 0 50 100 1500

1

2

3

4

0.0

0.5

1.0

1.5

0

1

2

3

4

5

Income (thousands of dollars)

Den

sity

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 117 / 127

Page 122: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

The Mean is a Normative Choice

Very sensitive to

low incomes

Goes to negative

infinity when income = 0

More income

equally valuable

at top and bottom

Like log, but

well−behaved

when income = 0

Concavity between

linear and log

A. Log B. Linear C. Inverse hyperbolic sine D. Log(Income + $5,000)

0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100Income (thousands of dollars)

Util

ity

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 118 / 127

Page 123: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

A New Proposal

●●

●●

● ● ●

●●

●●

● ● ●●

●●

●● ● ●

●● ●

10th percentile25th percentile50th percentile

75th percentile

90th percentile

Parentincome densityM

edia

n

$0k

$100k

$200k

$300k

$0k $100k $200kPredictor: Parent Income

Out

com

e: O

ffspr

ing

Inco

me

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 119 / 127

Page 124: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Single Number Summaries

A key selling point of the conventional IGE, the MG proposal andregression more broadly is the single-number summary.

Any such summary necessitates a loss of information.

Even with more complex functional forms, we can always calculatesuch a summary. For instance here, median is (on average) $4k higherwhen parent income is $10k higher.

We obtain this by simply plugging in the 50th percentile at eachoffspring income, adding $10k to each parent income and taking theaverage.

If you are willing to commit to a quantity of interest, you can usuallyestimate it directly.

At their best, single-number summaries are a way that the reader cancalculate any approximation to a variety of quantities they areinterested in. At their worst, they are a way for authors to abdicateresponsibility for choosing a clear quantity of interest.

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 120 / 127

Page 125: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

Broader Implications (Lee, Lundberg and Stewart)

0

100

200

300

400

0.00 0.25 0.50 0.75Proportion of Non−Hispanic Blacks in the County

Cov

id D

eath

s P

er 1

00,0

00 R

esid

ents

Covid data from NYTimes github as of 2020/09/07Demographic data from American Community Survey 2014−2018 5−year estimate

Traditional Approach to Visualize Covid−19 Death Rates in US Counties

0

100

200

300

0.0 0.2 0.4 0.6 0.8Proportion of Non−Hispanic Blacks in the County

Cov

id D

eath

s P

er 1

00,0

00 R

esid

ents

Percentile

95th

90th

75th

50th

25th

10th

5th

Covid data from NYTimes github as of 2020/09/07Demographic data from American Community Survey 2014−2018 5−year estimate

Covid−19 Death Rates in US Counties

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 121 / 127

Page 126: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

1 Mechanics of OLS

2 Classical Perspective (Part 1, Unbiasedness)Sampling DistributionsClassical Assumptions 1–4

3 Classical Perspective: VarianceSampling VarianceGauss-MarkovLarge SamplesSmall SamplesAgnostic Perspective

4 InferenceHypothesis TestsConfidence IntervalsGoodness of fitInterpretation

5 Non-linearitiesLog TransformationsFun With LogsLOESS

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 122 / 127

Page 127: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

So what is ggplot2 doing?

6

8

10

2 4 6 8Log Settler Mortality

Log

GD

P p

er c

apita

gro

wth

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 123 / 127

Page 128: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

LOESS

We can combine the nonparametric kernel method idea of using onlylocal data with a parametric model

Idea: fit a linear regression within each band

Locally weighted scatterplot smoothing (LOWESS or LOESS):1 Pick a subset of the data that falls in the interval [x − h , x + h]

2 Fit a line to this subset of the data (= local linear regression),weighting the points by their distance to x using a kernel function

3 Use the fitted regression line to predict the expected value ofE [Y |X = x0]

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 124 / 127

Page 129: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

LOESS Example

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 125 / 127

Page 130: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

We Covered

Interpretation with logged independent and dependent variables

The geometric mean!

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 126 / 127

Page 131: Week 5: Simple Linear Regression · Week 5: Simple Linear Regression Brandon Stewart1 Princeton September 28-October 2, 2020 1These slides are heavily in uenced by Matt Blackwell,

This Week in Review

OLS!

Classical regression assumptions!

Inference!

Logs!

Going Deeper:

Aronow and Miller (2019) Foundations of Agnostic Statistics.Cambridge University Press. Chapter 4.

Next week: Linear Regression with Two Variables!

Stewart (Princeton) Week 5: Simple Linear Regression September 28-October 2, 2020 127 / 127