1_SLR (1)

7/27/2019 1_SLR (1)

http://slidepdf.com/reader/full/1slr-1 1/35

SLR Model Gauss-Markov OLS Estimation More OLS Terminology Unbiasedness Variance Example

EconometricsThe Simple Regression Model

Joao Valle e Azevedo

Nova School of Business and Economics

Spring Semester

Joao Valle e Azevedo (NOVA SBE) Econometrics Lisbon 1 / 35

http://find/

http://goback/

7/27/2019 1_SLR (1)



Objectives

Objectives

Given the model

y = β 0 + β 1x + u

Where y is earnings and x is years of education

Or y is sales and x is spending in advertising

Or y is the market value of an apartment and x is its area

Our objectives for the moment will be:

Estimate the unknown parameters β 0 and β 1

Assess how much x explains y


http://find/

http://goback/

7/27/2019 1_SLR (1)



Assumptions

The Gauss-Markov Assumptions

Assumption SLR.1 (Linearity in Parameters)

y = β 0 + β 1x + u

y is the dependent variable (or explained variable, or response variable,or the regressand)

x is the independent variable (or explanatory variable, or controlvariable, or the regressor)

u is the error term or disturbance (y is allowed to vary for the same

level of x )

This is a population equation. It is linear in the (unknown)parameters.


http://find/

7/27/2019 1_SLR (1)



Assumptions


Assumption SLR.2 (Random Sampling) Random sample of size n,{(x i ,y i ): i=1,2,...,n} such that:

y i = β 0 + β 1x i + u i , i = 1, 2, ..., n

Assumption SLR.3 (Sample Variation in the ExplanatoryVariable)The sample outcomes on x , namely {x i : i=1,2,...,n}, are not all thesame value


http://find/

7/27/2019 1_SLR (1)



Assumptions


Assumption SLR.4 (Zero Conditional Mean) The error u has anexpected value of zero given any value of the explanatory variable

E (u |x ) = 0

Crucial Assumption!Remember the previous examples:

Earnings = β 0 + β 1Education + u

CrimeRate = β 0 + β 1Policeman + u

In these cases SLR.4 most likely fail


S G O S O S

http://find/

http://goback/

7/27/2019 1_SLR (1)



Assumptions


Assumption SLR.4’ (Zero Mean) The error u has an expected valueof zero given any value of the explanatory variable

E (u ) = 0

Not really necessary given that SLR.4 implies SLR.4’

This is not a restrictive assumption since we can always use β 0 so thatSLR.4’ holds


SLR M d l G M k OLS E i i M OLS T i l U bi d V i E l

http://find/

http://goback/

7/27/2019 1_SLR (1)



Assumptions

Zero Conditional Mean

E (u |x ) = 0 ⇒ E (y |x ) = β 0 + β 1x

Figure: For any x , the distribution of y is centered about E(y |x )Joao Valle e Azevedo (NOVA SBE) Econometrics Lisbon 7 / 35

SLR M d l G M k OLS E ti ti M OLS T i l U bi d V i E l

http://find/

7/27/2019 1_SLR (1)



Assumptions

Graph of y i =β 0+β 1x i +u i

Figure: Population regression line, data points and the associated error terms


SLR Model Gauss Markov OLS Estimation More OLS Terminology Unbiasedness Variance Example

http://find/

7/27/2019 1_SLR (1)



Estimators

OLS Estimators

Want to estimate the population parameters from a sampleLet {(x i ,y i ): i=1,2,...,n} denote a random sample of size n from thepopulation

For each observation in this sample:

y i = β 0 + β 1x i + u i , i = 1, 2, ..., n

To derive the OLS estimates we need to realize that our mainassumption of E(u|x)=E(u)=0 also implies:

Cov (x , u ) = E (xu ) = 0

Remember from basic probability that Cov(x,y) = E(xy)-E(x)E(y)



http://find/

7/27/2019 1_SLR (1)



Estimators

OLS Estimators

Two Restrictions: E (u |x ) = E (u ) = 0

Cov (x,u ) = E (xu ) = 0

Since u = y

−β 0

−β 1x :

Two Moment Conditions:

E (y − β 0 − β 1x ) = 0

E [x (y − β 0 − β 1x )] = 0 Sample versions:

n−1n

i =1

(y i − β 0 − β 1x i ) = 0

n−1n

i =1

x i (y i

−β 0

−β 1x i ) = 0



http://find/

7/27/2019 1_SLR (1)



Estimators

OLS Estimators

We have two equations and two unknowns β 0, β 1

Pick β 0 and β 1 so that the sample moments match the populationmoments

From the first sample moment condition and given the definition

of a sample mean and properties of summation:

n−1n

i =1

(y i − β 0 − β 1x i ) = 0

⇔ y = β 0 + β 1x

⇔ β 0 = y − β 1x



http://find/

7/27/2019 1_SLR (1)



Estimators

OLS EstimatorsFrom the second sample moment condition:

n−1n

i =1

x i (y i − β 0 − β 1x i ) = 0

Plug-in β 0 ⇒n

i =1x i (y i − (y − β 1x ) − β 1x i ) = 0

This simplifies to ⇒n

i =1

x i (y i − y ) = β 1

ni =1

x i (x i − x )

Rearranging ⇒n

i =1

(x i − x )(y i − y ) = β 1n

i =1

(x i − x )2

Thus, β 1 =

n

i =1(x i − x )(y i − y )

n

i

=1

(x i −

x )2



http://find/

http://goback/

7/27/2019 1_SLR (1)


gy p

Estimators

OLS Estimators

β 1 =n

i =1(x i − x )(y i − y )n

i =1(x i − x )2

Need x to vary in our sample so that the slope parameter iswell-defined!

ni =1

(x i − x )2 > 0

Sample covariance between x and y divided by the sample variance of x

If x and y are positively correlated, the slope will be positive

If x and y are negatively correlated, the slope will be negative

It is an estimator if we treat (x i , y i ) as random variables It is an estimate once we substitute (x i , y i ) by the actual observations



http://find/

7/27/2019 1_SLR (1)


Estimators

Example

ˆwagei = 245.073 + 52.513educi

n = 11064, Data from the Employment Survey (INE), 2003

wage is net monthly wage and educ measures the number of years of schooling completed

So, we estimate a positive relation between these two variables:

An additional year of schooling leads to an estimated expected increaseof 52.513 euros in the wage on average, ceteris paribus

Caution about the interpretation of these estimates!



http://find/

http://goback/

7/27/2019 1_SLR (1)


Definitions

Some Definitions

Intuitively, OLS is fitting a line through the sample points such that theSSR is as small as possible, hence the term least squares

The residual, u , is an estimate of the error term, u , and is the differencebetween the fitted line (sample regression function) and the sample point

Fitted Valuey i = β 0 + β 1x i

Residualu i = y i

−y i = y i

−β 0

−β 1x i

Sum of Squared Residuals (SSR)

ni =1

u i 2 =

ni =1

(y i − β 0 − β 1x i )2



http://find/

http://goback/

7/27/2019 1_SLR (1)


Definitions

Graph of ˆ y i =β 0+β 1x i

Figure: Sample regression line, sample data points and the associated estimatederror terms - the residuals



http://find/

http://goback/

7/27/2019 1_SLR (1)


Definitions

Alternative approach to derive OLS

We want to choose our parameters such that we minimize the SSR:

SSR =n

i =1

u i 2 =

ni =1

(y i − β 0 − β 1x i )2

δ RSS δ β 0

= −2n

i =1

(y i − β 0 − β 1x i ) = 0

δ RSS

δ β 1

= −2n

i =1

x i (y i − β 0 − β 1x i ) = 0

These equations are equivalent to the same equations we had before:thus, same solution!



http://find/

http://goback/

7/27/2019 1_SLR (1)


More Terminology

More TerminologyThink of each observation as being made up of an explained part and anunexplained part:

y i = y i + u i Total Sum of Squares (SST)

ni =1

(y i − y )2

Explained Sum of Squares (SSE)

ni =1

(y i − y )2

Residual Sum of Squares (SSR)

n

i =1

u i 2



http://find/

7/27/2019 1_SLR (1)


More Terminology

SST = SSE + SSR

SST =n

i =1

(y i − y )2 =n

i =1

[(y i − y i ) + (y i − y )]2

=

ni =1

[u i + (y i − y )]2

=n

i =1

u i 2 + 2

ni =1

u i (y i − y ) +n

i =1

(y i − y )2

=n

i =1

u i 2 +

ni =1

(y i − y )2

= SSR + SSE



http://find/

7/27/2019 1_SLR (1)


More Terminology

Goodness of Fit

How well does our sample regression line fit our sample data? If SSR is very small relative to SST (or SSE very large relative to SST),

then a lot of the variation in the dependent variable y is explained bythe model

Can compute the fraction of the total sum of squares (SST) that isexplained by the model: the R-squared of the regression

R 2 =SSE

SST = 1 − SSR

SST

0 ≤ R 2 ≤ 1

The lower the SSR, the higher will be the R 2

OLS maximises the R 2 (minimises SSR)



http://find/

7/27/2019 1_SLR (1)


More Terminology

Goodness of Fit - Example (Cont.)

ˆwagei = 245.073 + 52.513educi


R 2 = 0.262

A lot of variation in the dependent variable (wage ) is left unexplainedby the model

This is not related to the fact that educ is most probably correlatedwith u

Can have low R 2 even if all the assumptions hold true



http://find/

7/27/2019 1_SLR (1)


Properties of OLS

Properties of OLS EstimatorsThought experiment:

Get a sample of size n from the population many times (infinite times) Compute the OLS estimates Is the average of these estimates equal to the true parameter values? Under SLR.1 to SLR.4 it turns out that the answer is positive

β 1 =n

i =1(x i − x )(y i − y )n

i =1(x i − x )2

β 0 = y − β 1x

These are estimators if we treat (x i , y i ) as random variables

Theorem

Under these assumptions:

E( β 0)= β 0 and E( β 1)= β 1



http://find/

7/27/2019 1_SLR (1)


Properties of OLS

Proof of Unbiasedness of OLS

β 1 =

n

i =1(x i − x )(y i − y )n

i =1(x i − x )2

= n

i =1(x i − x )(y i − y )

SST x

where:

SST =n

i =1

(x i − x )2

Note that:n

i =1

(x i − x ) = 0 andn

i =1

(x i − x )x i =n

i =1

(x i − x )2



http://find/

http://goback/

7/27/2019 1_SLR (1)


Properties of OLS

Proof of Unbiasedness of OLS (Cont.)Focusing on the numerator:

ni =1

(x i − x )(y i − y ) =n

i =1

(x i − x )y i

=n

i =1

(x i

−x )(β 0 + β 1x i + u i )

= β 1SST x +n

i =1

(x i − x )u i

Putting together:

β 1 =β 1SST x +

ni =1(x i − x )u i

SST x

= β 1 +

n

i =1(x i − x )u i SST x



http://find/

7/27/2019 1_SLR (1)


Properties of OLS

Proof of Unbiasedness of OLS (Cont.)

β 1 = β 1 +

ni =1(x i − x )u i

SST x

Now take expectations conditional on the values of the x’s, that is,treat the x ’s as constants. To avoid heavy notation that isn’t done

explicitly

E (β 1) = β 1 +

n

i =1(x i − x )E (u i )

SST x

= β 1 +n

i =1(x i − x )0SST x

= β 1

And if the conditional expectation is a constant, the unconditinalexpectation is also that constant...



http://find/

7/27/2019 1_SLR (1)


Properties of OLS

Proof of Unbiasedness of OLS (Cont.)

β 0 = y − β 1x

Now, it’s really easy to prove unbiasedness of β 0. Try it yourself!

Unbiasedness Summary

The OLS estimators of β 0 and β 1 are unbiased

However, in a given sample we may be close or far from the trueparameter, unbiasedness is a minimal requirement

Unbiasedness depends on the 4 assumptions: if any assumption fails,then OLS is not necessarily unbiased



http://find/

http://goback/

7/27/2019 1_SLR (1)


Properties of OLS

Variance of the OLS Estimators

We know already that the sampling distribution of our estimators iscentered around the true parameter

But, is the distribution concentrated around the true parameter?

To answer this question, we add an additional assumption:

Assumption SLR.5 (Homoskedasticity) The error u has the samevariance given any value of the explanatory variable.

Var (u |x ) = σ2



P i f OLS

http://find/

http://goback/

7/27/2019 1_SLR (1)


Properties of OLS

Variance of the OLS Estimators

Remember that:

σ2 = Var (u |x ) = E (u 2|x ) − [E (u |x )]2

= E (u 2|x ),given that E (u |x )=0

= E (u 2) = Var (u )

So, σ2 is also the unconditional variance, called the variance of the error σ, the square root of the error variance, is called the standard deviation

of the error

Can write:

E (y |x ) = β 0 + β 1x and Var (y |x ) = σ2



P ti f OLS

http://find/

http://goback/

7/27/2019 1_SLR (1)


Properties of OLS

Homoskedastic Case

Figure: How spread out is the distribution of the estimator



Properties of OLS

http://find/

http://goback/

7/27/2019 1_SLR (1)


Properties of OLS

Heteroskedastic Case

Figure: How spread out is the distribution of the estimator



Properties of OLS

http://find/

7/27/2019 1_SLR (1)


Properties of OLS

Variance of the OLS Estimators (Cont.)

β 1 = β 1 +

n

i =1(x i − x )u i SST x

,where SST x =n

i =1

(x i − x )2

Var (β 1

) = Var β 1

+1

SST x

n

i =1

(x i −x )u i

=

1

SST x

2

Var

ni =1

(x i − x )u i

=

1SST x

2 ni =1

(x i − x )2Var (u i )

=

1

SST x

2 ni =1

(x i − x )2σ2 = σ2

1

SST x

2

SST x



Properties of OLS

http://find/

7/27/2019 1_SLR (1)


Properties of OLS

Variance of the OLS Estimators (Cont.)

Var (β 1) =σ2

SST x ,where SST x =

ni =1

(x i − x )2

The larger the error variance, σ2, the larger the variance of the slope

estimatorThe larger the variability in the x i , the smaller the variance of theslope estimate

SST x tends to increase with the sample size n, so variance tends to

decrease with the sample sizeCan also show:

Var (β 0) =n−1σ2

n

i =1 x 2i

n

i =1(x i

−x )2



Properties of OLS

http://find/

http://goback/

7/27/2019 1_SLR (1)


Properties of OLS

Variance of the OLS

TheoremUnder Assumptions SLR.1 through SLR.5

Var (β 1) =σ2

n

i =1(x i −

x )2

Var (β 0) =n−1σ2

n

i =1 x 2i

n

i =1(x i − x )2

conditional on the sample values {x 1, x 2, ...}



Properties of OLS

http://find/

http://goback/

7/27/2019 1_SLR (1)


p

Estimating the Error Variance

In practice, the error variance σ

2

is unknown since we don’t observethe errors, we must estimate it...

σ2 =

n

i =1 u i 2

n − 2=

SSR

n − 2

Can be show that this is an unbiased estimator of the error varianceThe so-called standard error of the regression (SER) is given by:

σ =√ σ2

Recall that:

se (β 1) =σ

[

n

i =1(x i − x )2]1/2,and similarly for β 0



Example

http://find/

7/27/2019 1_SLR (1)


p

Example (Cont.)

wagei = 245.073 + 52.513educi


R 2 = 0.262

σ2 = 155366 and σ = 394.164

se (β 0) = 7.575 and se (β 1) = 0.837


http://find/

Documents

1_SLR (1)