31
Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques [email protected]

Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Embed Size (px)

DESCRIPTION

Introduction to Regression  Typically, the social scientist is dealing with multiple and complex webs of interactions between variables. An immediate and appealing extension to simple linear regression is to extend the set of explanatory variable to other variables.  Multiple regressions include several explanatory variables in the empirical model

Citation preview

Page 1: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Class 5Multiple Regression

CERAM February-March-April 2008

Lionel NestaObservatoire Français des Conjonctures Economiques

[email protected]

Page 2: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Introduction to Regression Typically, the social scientist is dealing with multiple

and complex webs of interactions between variables. An immediate and appealing extension to simple linear regression is to extend the set of explanatory variable to other variables.

Multiple regressions include several explanatory variables in the empirical model

1 21 2

pi i i p i iy x x x u

Page 3: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Introduction to Regression Typically, the social scientist is dealing with multiple

and complex webs of interactions between variables. An immediate and appealing extension to simple linear regression is to extend the set of explanatory variable to other variables.

Multiple regressions include several explanatory variables in the empirical model

1

k Kk

i k i ik

y x u

Page 4: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

22

1 1

21

1

2

1

220 , ,

ˆˆmin min

0

, ,

ˆ

,

n n k K

n

j k

ki i i iki i k

i

Kik

n

y y y x

To minimize the sum of squared errors

Page 5: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

1

12

ˆ

ˆcov( )

i i iy x u

β XX

y = Xβ +

y

u

X

β XX

Multivariate Least Square Estimator

Usually, the multivariate is described by matrix notation:

With the following least square solution:

Page 6: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Assumption OLS 1

20 1 1y x u

It is possible to operate non linear transformation of the variables (e.g. log of x) but not of the parameters like the following :

0 1 1 2 2 k ky x x x u

LinearityThe model is linear in its parameters

OLS can not estimate this

Page 7: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Assumption OLS 2

There is no selection bias in the sample. The results pertain to the whole population

All observations are independent from one another (no serial nor cross-sectional correlation)

Random SamplingThe n observations are a random sample of

the whole population

Page 8: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Assumption OLS 3

No independent variable is constant. Each variable has variance which can be used with the variance of the dependent variable to compute the parameters.

No exact linear relationships amongst independent variables

No perfect Collinearity There is no collinearity between independent

variables

Page 9: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Assumption OLS 4

Given any values of the independent variables (IV), the error term must have an expected value of zero.

In this case, all independent variables are exogenous. Otherwise, at least one IV suffers from an endogeneity problem.

Zero Conditional Mean The error term u has an expected value of zero

1 2 kE u x ,x , ,x 0

Page 10: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Sources of endogeneity

Wrong specification of the model

Omitted variable correlated with one RHS.

Measurement errors of RHS

Mutual causation between LHS and RHS

Simultaneity

Page 11: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Assumption OLS 5

21 2 k uVar u x ,x , ,x

Homoskedasticity The variance of the error term, u, conditional on RHS, is the same for all values of RHS.

Otherwise we speak of heteroskedasticity.

Page 12: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Assumption OLS 6

Normality of error termThe error term is independent of all RHS and follows a normal distribution with zero mean

and variance

2u Normal(0, )

2

Page 13: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Assumptions OLS

OLS1 Linearity

OLS2 Random Sampling

OLS3 No perfect Collinearity

OLS4 Zero Conditional Mean

OLS5 Homoskedasticity

OLS6 Normality of error term

Page 14: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Theorem 1

j jˆE , j 0,1,2, ,k

OLS1 - OLS4 : Unbiasedness of OLS. The set of estimated parameters is equal to the true unknown values of j

j

Page 15: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Theorem 2OLS1 – OLS5 : Variance of OLS estimate. The variance of the OLS estimator is

2u

j n 2 2ij j j

i 1

ˆVarx x 1 R

… where R²j is the R-squared from regressing xj on all other independent variables. But how can we measure ?

2u

Page 16: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Theorem 3OLS1 – OLS5 : The standard error of the regression is defined as

22

i ii2 2 i iu u

ˆy y uˆE

n k 1n k 1

This is also called the standard error of the estimate or the root mean squared errors (RMSE)

Page 17: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Standard Error of Each Parameter Combining theorems 2 and 3 yields:

uj n 2 2

ij j ji 1

ˆˆsex x 1 R

Page 18: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Theorem 4Under assumptions OLS1 – OLS5, estimators are the best linear unbiased estimators (BLUE) of

0 1 kˆ ˆ ˆ, , ,

0 1 k, , ,

Assumptions OLS1 – OLS5 are known as the Gauss-Markov Theorem, which stipulates that under OLS1-5, the OLS are the best estimation methodThe estimates are unbiased (OLS1-4)The estimates have the smallest variance (OLS5)

Page 19: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Theorem 5Under assumptions OLS1 – OLS6, the OLS estimates follows a t distribution:

j jn k 1

j

ˆtˆse( )

Page 20: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Extension of theorem 5: Inference We can define de confidence interval of β, at 95% :

.025

2 2

1

ˆt

1

ujj n

ij j ji

x x R

If the 95% CI does not include 0, then β is significantly different than 0.

Page 21: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Student t Test for H0: βj=0 We are also in the position to infer on βj

H0: βj = 0

H1: βj ≠ 0

Rule of decision

Accept H0 is | t | < tα/2

Reject H0 is | t | ≥ tα/2

ˆ ˆ

tse se

Page 22: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Summary

OLS1 Linearity

OLS2 Random Sampling

OLS3 No perfect Collinearity

OLS4 Zero Conditional Mean

OLS5 Homoskedasticity

OLS6 Normality of error term

T1UnbiasednessT2-T4

BLUET5β ~ t

Page 23: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

The knowledge production function

Application 1: Seminal model

1 2

1 2

PAT f (RD,SIZE)

PAT A RD SIZE exp u

pat rd size u

Page 24: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

The knowledge production function

Application 2: Changing specification

1

2

1 2

PAT f (RD,SIZE)

RDPAP A SIZE exp uSIZE

RDy log size uSIZE

Page 25: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

The knowledge production function

Application 3: Adding variables

1

23

1 2 3

PAT f (RD,SIZE,SPE)

RDPAT A SIZE exp SPE uSIZErdpat size SPE usize

Page 26: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

The knowledge production function

Application 4: Dummy variables

1

23 4

1 2 3 4

PAT f (RD,SIZE,SPE,BIO)

RDPAT A SIZE exp SPE BIO uSIZErdpat size SPE BIO usize

Page 27: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Application 4: Dummy variables

Coefficientsa

-5.465 .616 -8.864 .000 -6.676 -4.253.556 .042 .909 13.326 .000 .474 .638.492 .080 .313 6.123 .000 .334 .650.421 .145 .118 2.912 .004 .137 .706

1.657 .168 .665 9.835 .000 1.326 1.988

(constante)lnassetslnrd_assetsspebio

Modèle1

BErreur

standard

Coefficients nonstandardisés

Bêta

Coefficientsstandardisés

t SignificationBorne

inférieureBorne

supérieure

Intervalle de confiance à95% de B

Variable dépendante : lnpatenta.

Page 28: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Application 4: Dummy variables

Patent(lnpatent)

Size(lnasset)

4

42DBF: size

2LDF: size

2Slope

2Slope

Page 29: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

The knowledge production function

Application 5: Interacting Variables

1

2

4 5

4

3

1 2 3

5

PAT f (RD,SIZE,SPE,BIO)

RDPAT A SIZESIZE

exp SPE BIO BIO size u

rdpat size SPEsize

BIO BIO size u

Page 30: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Application 5: Interacting Variables

Coefficientsa

-6.483 .843 -7.693 .000 -8.139 -4.827.620 .055 1.013 11.241 .000 .511 .728.474 .081 .301 5.863 .000 .315 .633.413 .144 .115 2.862 .004 .129 .697

3.592 1.108 1.441 3.242 .001 1.415 5.770-.144 .081 -.693 -1.767 .078 -.303 .016

(constante)lnassetslnrd_assetsspebiosizebio

Modèle1

BErreur

standard

Coefficients nonstandardisés

Bêta

Coefficientsstandardisés

t SignificationBorne

inférieureBorne

supérieure

Intervalle de confiance à95% de B

Variable dépendante : lnpatenta.

Page 31: Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques

Application 5: Interacting variables

Patent(lnpatent)

Size(lnasset)

4

2 4 5DBF: size size bio

2LDF: size

2 5Slope size bio

2Slope