Econometrics II - uni-muenster.de · Module Statistics and Module Empirical Methods Descriptive...

Preview:

Citation preview

Econometrics II

Andrea Beccarini

Winter2011/2012

1

Introduction

� Econometrics: application of statistical methods to empirical research ineconomics

� Compare theory with facts (data)

� Statistics: foundation of econometrics

2

Module Statistics and Module Empirical Methods

� Descriptive statistics (Statistik I)How to process data? How to display data?

� Probability theory and statistical inference (Statistik II)Estimation of unknown parameters from random samples; hypothesis tests

� Empirical research in economics (Empirische Wirtschaftsforschung)Applications of the linear model; statistical software

3

Module Statistics/Econometrics/Empirical Economics I

� Advanced StatisticsProbability theory; multidimensional random variables;estimation and hypothesis testing

� Econometrics ISimple and multiple linear regression model

� Econometrics IIExtensions of the multivariate linear regression model;simultaneous equation systems; dynamic models

4

Module Statistics/Econometrics/Empirical Economics II

� Time series analysis:Stochastic processes; stationarity; ergodicity; linear processes;unit root processes; cointegration; vector-autoregressive models

� One further special course or seminar, e.g.

�Financial econometrics�Panel data econometrics� Introduction to R�Poverty and inequality�Statistical inference, bootstrap�Wage and earnings dynamics

5

Literature: Statistical basics

� Karl Mosler and Friedrich Schmid, Wahrscheinlichkeitsrechnungand schließende Statistik, 2. Au�., Springer, 2006.

� Aris Spanos, Statistical Foundations of Econometric Modelling,Cambridge University Press, 1986.

� Mood, A.M., Graybill, F.A. and D.C. Boes (1974). Introduction tothe theory of statistics, 3rd ed., McGraw-Hill, Tokyo.

6

Literature: Econometrics

� Main book for this course: Ludwig von Auer, Ökonometrie: Eine Einführung, 4.Au�., Springer, 2005.

� Alternatively: William E. Gri¢ ths, R. Carter Hill and George G. Judge, Learningand Practicing Econometrics, John Wiley & Sons, 1993.

� James Stock and Mark Watson, Introduction to Econometrics,Addison Wesley, 2003.

� Russell Davidson and James MacKinnon, Econometric Theoryand Methods, Oxford University Press, 2004.

7

Class

� Class teacher: Rainer Schüssler

� Time and location: Tue, 14.00-16.00, CAWM1

� A detailed schedule is available on the home page of this coursehttp://www.wiwi.uni-muenster.de/statistik�! Studium �! Aktuelle Veranstaltungen �! Econometrics II

8

Outline

� Very brief revision of Econometrics I (chap. 8 to 14)

� Violations of model assumptions (chap. 15 to 19, 21)

� Stochastic exogenous variables (chap. 20)

� Dynamic models (chap. 22)

� Interdependent equation systems (chap. 23)9

Multiple linear regression model (revision)

Assumption A1: No relevant exogenous variable is omitted from the econometricmodel, and all exogenous variables in the model are relevant

Assumption A2: The true functional dependence between X and y is linear

Assumption A3: The parameters � are constant for all T observations (xt; yt)

Assumptions B1 to B4:

u � N�0; �2IT

�10

Assumption C1: The exogenous variables x1t; : : : ; xKt are not stochastic, but canbe controlled as in an experimental situation

Assumption C2: No perfect multicollinearity:

rank(X) = K + 1

� Econometric model:

y = X� + u

� Point estimator (OLS):

� =�X0X

��1X0y

11

� Estimated model:

y = X�

� Residuals:

u = y � y

� Coe¢ cient of determination:

R2 =Syy � SuuSyy

=Syy

Syy=

PKk=1

b�kSkySyy

12

� Unbiasedness:

E(�) = �

� Covariance matrix of �

V(�) = �2�X0X

��1

� Gauss-Markov theorem: � is BLUE

13

� Distribution of y:y � N(X�; �2IT )

� Distribution of �:

� � N��; �2

�X0X

��1�

� Estimator of error term variance:

�2 =Suu

T �K � 1

� Unbiasedness:E(�2) = �2

14

� Interval estimator of the component �k of �h�k � ta=2 � cse(�k) ; �k + ta=2 � cse(�k)i

� t-test:

H0 : r0� = qH1 : r0� 6= q

where

r = [r0; r1; : : : ; rK]0

� Test statistic:

t =r0� � qcse(r0�)

15

� F -test:

H0 : R� = q

H0 : R� 6= q

� Test statistic:

F =

�S0bubu � Sbubu�.L

Sbubu/ (T �K � 1);

or

F =

�R� � q

�0 hR�X0X

��1R0i�1 �R� � q� =Lu0u= (T �K � 1)

where L is the number of restrictions in H0

16

� Forecasting: Let x0 = [1; x10; x20; : : : ; xK0]0 be the vector of exogenousvariables

� Point forecast: y0 = x00�

� Variance of the forecast error:

V ar (y0 � y0) = �2�1 + x00

�X0X

��1x0

� Violation of A1: Omitted or redundant variables

� Violation of A2: Nonlinear functional forms17

Qualitative exogenous variables

� A3: The parameters � are constant for all T observations (xt; yt)

� Example: Wage yt depends on both education x1t and age x2tyt = �+ �1x1t + �2x2t + ut

� Suppose the parameters di¤er between men and women

yt = �M + �M1x1t + �M2x2t + utyt = �F + �F1x1t + �F2x2t + ut

� What happens if the gender di¤erence is ignored? [dummy.R]18

� Introduce a dummy variable

Dt =

(0 if male1 if female

� Extended model

yt = �+Dt + �1x1t + �1Dtx1t + �2x2t + �2Dtx2t + ut

� Submodels for men (Dt = 0) and women (Dt = 1)

yt = � + �1 x1t + �2 x2t + utyt = (�+ ) + (�1 + �1) x1t + (�2 + �2) x2t + ut

� Interpretation of the coe¢ cients ; �1; �219

� Estimation of the model by OLS?

� How does the matrix of exogenous variables X look like?

� Apply t- or F -tests to check parameter constancy, e.g.

H0 : = �1 = �2 = 0

� Often, the models just include a level e¤ect, i.e.

yt = �+ Dt + �1x1t + �2x2t + ut

(use a t-test for )

20

� If the qualitative exogenous variable has more than two values,we need more than one dummy variable

� Example: Religion (protestant, catholic, other)

Dprott =

8><>:0 if other1 if protestant0 if catholic

Dcatht =

8><>:0 if other0 if protestant1 if catholic

� Interpretation of the coe¤cients?

21

� If there are two or more qualitative exogenous variables,interaction terms can be added

� Example: Gender and citizenship

D1t =

(0 if male1 if female

D2t =

(0 if German citizenship1 else

� Interpretation of the coe¢ cients 1; 2; � in the two models

yt = �+ 1D1t + 2D2t + �xt + ut

yt = �+ 1D1t + 2D2t + �D1tD2t + �xt + ut

22

� What happens if there are two dummy variables

Dfemalet =

(0 if male1 if female

Dmalet =

(1 if male0 if female

� What happens if the dummy variable is coded as

Dt =

(1 if male2 if female

23

� Compare the joint dummy variable model

yt = �+Dt + �1x1t + �1Dtx1t + �2x2t + �2Dtx2t + ut

with the two separated models

yt = �M + �M1x1t + �M2x2t + ut for menyt = �F + �F1x1t + �F2x2t + ut for women

[dummycomparison.R]

� Questions:1. Why are the point estimates identical? [1]

2. Why is the sum of squared residuals identical? [2]

3. Why are the standard errors di¤erent? [3]

24

Heteroskedasticity

� Assumption B2: V ar(ut) = �2 for t = 1; : : : ; T

� Rent example: The rent yt depends on the distance xt from the city center

t xt yt t xt yt1 0,50 16,80 7 3,10 12,802 1,40 16,20 8 4,40 12,203 1,10 15,90 9 3,70 15,004 2,20 15,40 10 3,00 13,605 1,30 16,40 11 3,50 14,106 3,20 13,20 12 4,10 13,30

25

� The scatterplot suggests that there might be heteroskedasticity:

� What are the properties of � if there is heteroskedasticity? [4]

26

Transformation of the model

� (Restrictive and arbitrary) assumption:

�2t = �2xt

� Transformation of the model:ytpxt

= �1pxt+ �

xtpxt+

utpxt| {z }

error termy�t = �z�t + �x

�t + u

�t

� Properties of the new error term u�t [5]

27

� The transformed model satis�es all A-, B- and C-assumptions!

� OLS estimation of the transformed model:

b�� =Sz�y�

Sz�z�b�� =Sx�y�

Sx�x�

=

P(x�t � x�) (y�t � y)P�

x�t � x��2

=

P 1xt(xt � x) (yt � y)P 1xt(xt � x)2

28

� The usual estimators

� =

P(xt � x) (yt � y)P

(xt � x)2

� = �y � ��x

are ine¢ cient

� An unbiased estimator of

V ar(u�t ) = �2

is

�2 =Su�u�

T � 2

29

� From �2t = �2xt we conclude that

�2t = �2 � xt

is an unbiased estimator of V ar(ut)

� It can be shown that [6]

V ar(�) =

P(xt � x)2 �2tS2xx

� The usual equations

V ar(�) =�2

Sxxand �2 =

SuuT � 2

are wrong under heteroskedasticity

30

Goldfeld-Quandt test

� Step 1: Re-order the observations according to their xt-values(or some other �source of heteroskedasticity�)

� Step 2: De�ne two groups:

� T1 observations with low xt-values;

� T2 observations with high xt-values

Often, T1 + T2 = T

31

� Step 3: We assume �22 > �21; hence

H0 : �22 = �21

H1 : �22 > �21

� Step 4: Separate OLS estimation for both groups; compute S1uuand S2

uu

� Step 5: Goldfeld and Quandt (1972) show that ander H0

F =S2uu= (T2 �K � 1)

S1uu= (T1 �K � 1)

follows an F(T2�K�1;T1�K�1)-distribution

� Step 6: Compare F to the critical level Fa. If F > Fa; reject H032

Numeric illustration: rentexample.R

1. Order the observations according to their xt-values

2. Group Z: City center (TZ = 5)Group P: Periphery (TP = 7)

3. Null hypothesis: H0 : �2P � �2Z

4. Sums of squared residuals

SZuu = 0:246 and SPuu = 4:666

33

5. Hence,

F =4:666=5

0:246=3= 11:4

6. At level a = 5% the critical value is 9.01. Reject the null hypothesis. The dataindicate heteroscedasticity.

The null hypothesis that the error term variance is the same in the center and theperiphery, is rejected at the 5% level.

Heteroscedasticity should be taken into account.

34

White test

� Consider the linear regression model with two exogenous variables

yt = �+ �1x1t + �2x2t + ut

� Step 1: H0: the error terms are homoskedastic

� Step 2: Calculate the OLS residuals ut

� Step 3: Estimate the auxilliary regression

u2t = 0 + 1x1t + 2x2t +

3x21t + 4x

22t + 5x1tx2t + vt

35

� Step 4: It can be shown that under H0

R2 � T � �2rwhere r is the number of slope parameters in the auxilliary regression

� If T �R2 is larger than the critical value of the �2r-distribution, reject H0

� The squared residuals can be explained (at least partially) by the exogenousvariables

� Illustration [rentexample.R]

36

� Question: Given that heteroskedasticity has been detected, how shall weproceed?

� Answer 1: Adjust the estimation procedure

�! GLS or feasible GLS

� Answer 2: Still use OLS but compute the correct standard errors

�! White�s heteroskedasticiy-consistent covariance maxtrix estimator

37

Generalized least squares method (GLS)

� Verallgemeinerte Kleinste-Quadrate-Methode (VKQ)

� Regression model y = X� + u

� Covariance matrix of the error terms V(u) 6= �2I, but V(u) = �2

� Example: �2t = �2xkt; then

=

264 xk1 : : : 0... . . . ...0 : : : xkT

37538

� Transformation of the model: Since is positive de�nit, there is a(T � T )-matrix P with

P0P = �1

� Example: If

=

264 xk1 : : : 0... . . . ...0 : : : xkT

375 ;then

P =

264 1=pxk1 : : : 0

... . . . ...0 : : : 1=

pxkT

375

39

� From P0P = �1 it follows that

PP0 = IT

� Pre-multiplication of y = X� + u by P yields

Py = PX� +Pu

y� = X�� + u�

� Properties of u� [7]

� The transformed model satis�es all A-, B- and C-assumptions

40

� Derivation of the GLS estimator �V KQ [8]

� Covariance matrices of �V KQ and � [9]

� Estimation of �2 by

�2 =u�0u�

T �K � 1=

u0�1uT �K � 1

� Ignoring heteroskedasticity one would use

V(�) = �2(X0X)�1

�2 =u0u

T �K � 1

41

� Interval estimators and hypothesis tests would not work correctly

� What happens if is unknown?

� Example:

W = �2 =

2666666664

�2I 0 : : : : : : : : : 00 . . . ...... �2I

...... �2II

...... . . . 00 : : : : : : : : : 0 �2II

3777777775

42

� Feasible Generalized Least Squares (FGLS),Geschätzte verallgemeinerte Kleinste-Quadrate (GVKQ)

� First, estimate the unknown quantities inW = �2

� The FGLS estimator is �FGLS =�X0W�1X

��1X0W�1y

� Estimated covariance matrix V(�FGLS) =�X0W�1X

��1

� What to do if there is no information at all about the form ofheteroskedasticity?

43

White�s heteroskedasticiy-consistent covariance maxtrix

estimator

� Davidson and MacKinnon, chap. 5.5

� Econometric model y = X� + u

� Covariance matrix V (u) =W withW =diag��21; : : : ; �

2T

�� OLS estimator

� = (X0X)�1X0y

� Covariance matrix V(�) =�X0X

��1X0WX0�X0X

��144

� Consistent estimation ofW is impossible

� White (1980): Consistent estimation of

� =1

TX0WX

=1

T

TXt=1

�2ixix0i

is possible!

� Consistent estimator of �

� =1

T

TXt=1

u2ixix0i

45

� Estimated covariance matrix

V(�) = (X0X)�1X0WX(X0X)�1

with

W =

264 u21 . . .u2T

375

� Sandwich estimator

� Illustration [rentexample.R]

46

Autocorrelation

� Assumption B3: The error terms are uncorrelated,

Cov(ut; us) = 0

for all t 6= s

� Example [water�lter.R]: Demand function

yt = �+ �xt + ut

for water �lters; quantity sold yt and prices xt for the months January 2001 toDecember 2002

47

� Assumption about the form of autocorrelation:

ut = �ut�1 + et

with �1 < � < 1

� Assumption about etet � NID(0; �2e)

� Properties of ut [10]

48

� Moment functions of ut

E(ut) = 0

V ar(ut) =�2e

1� �2

Cov(ut; ut�1) = �

�2e

1� �2

!

Cov(ut; ut�j) = �j

�2e1� �2

!

� B1, B2 and B4 are still satis�ed

� But B3 is violated

49

� Transformation of the model [11]

yt � �yt�1 = (1� �)�+ � (xt � �xt�1) + et

� De�ne

y�t = yt � �yt�1�� = (1� �)�x�t = xt � �xt�1

� Then

y�t = �� + �x�t + et

satis�es all A-, B- and C-assumptions (if � would be known)

50

� Hence, OLS estimation is ine¢ cient

� Consequences for interval estimation and hypothesis tests?

� The usual OLS formulas

V ar(�) =�2

Sxx

and

�2 =SuuT � 2

are invalid

� Consequences are the same as in the case of heteroskedasticity51

Diagnosis

� Plot the residuals ut over time, or plot the pairs (ut�1; ut)

� Example (demand function)

� Estimator for �: Because of ut = �ut�1 + et we can estimate � by theregression

but = �but�1 + e�t

52

� Least squares estimator

� =

PTt=2 butbut�1PTt=2 bu2t�1

� Numeric illustration: From the residuals we calculate

� =1481594

2557515= 0:58

� Due to the two-step approach the ordinary t-test is no longer exact

53

Durbin-Watson test

� Step 1: Set up the hypotheses

H0 : � � 0H1 : � > 0

� Step 2: Compute the Durbin-Watson test statistic

d =

PTt=2 (but � but�1)2PT

t=1 bu2t

54

� Numeric illustration:

d =2101281

2761231= 0:76

� Relation between d and � [12]

d � 2(1� �)

� Step 3: Find the critical value da (using econometric software). If d < da,reject H0

55

� Problem: The critical value da depends on X

� If the software cannot compute da there are tables providing an upperboandary dHa and a lower boandary dLa for da

� Step 4: Compare the test statistic d to dLa and dHa

� Decision rule:

� if d < dL0;05, reject H0 : � � 0;� if d > dH0;05, do not reject H0 : � � 0;� if dL0;05 � d � dH0;05, leave the decision open

56

� Numeric illustration: For K = 1 and T = 24 Table T5 gives

dL0:05 = 1:27 and dL0:05 = 1:45

Since d = 0:76 < dL0:05, reject the null hypothesis that the residuals arepositively correlated

� Disadvantages of the Durbin-Watson test:

�no decision in some cases� lagged endogenous variables are not allowed�only applicable for AR(1)-processes

� Alternative tests for autocorrelation are available in many software packages57

GLS and autocorrelation

� Regression model

y = X� + u

� Covariance matrix V(u) = �2 with

=

2666641 � : : : �T�1

� 1 : : : �T�2... ... . . . ...

�T�1 �T�2 : : : 1

377775

58

� Transformation of the model using the matrix P satisfying P0P = �1

� One can verify that

P =

26666664

q1� �2 0 0 : : : 0

�� 1 0 : : : 00 �� 1 : : : 0... . . . . . . . . . ...0 : : : 0 �� 1

37777775

� The GLS estimator is the same as in the case of heteroskedasticity

59

� GLS estimator

�GLS

=�X0�1X

��1X0�1y

with covariance matrix

V(�GLS

) = �2�X0�1X

��1

� Estimator of the error term variance

�2 =u0�1uT �K � 1

� GLS is not possible as � (and hence P) is unknown

60

� Hildreth-Lu approach:

Lege für � ein feines Gitter über [�1; 1]; wähle das � mit dem kleinsten Wertvon �2

� Cochrane-Orcutt-Verfahren:

Schätze � aus den KQ-Residuen, dann GVKQ mit �; anschließend Iterationen

61

Heteroskedasticiy and autocorrelation consistent covariance

maxtrix estimation

� Newey W.K. and West, K.D. (1987), A Simple Positive De�nite,Heteroskedasticity and Autocorrelation Consistent Covariance Matrix,Econometrica, 55: 703-708.

� Econometric model y = X� + u

� Covariance matrix

V (u) =W

with arbitrary covariance matrixW

62

� OLS estimator

� = (X0X)�1X0y

� Covariance matrix of � (as before)

V(�) = (X0X)�1X0WX(X0X)�1

� The matrixW cannot be estimated consistently

� But 1TX0WX can estimated consistently

63

� Consistent estimation of V(�) byTXt=1

u2txtx0t +

qXb=1

1� b

q + 1

!Ab

where q is the number of autocorrelations to be taken into account

� The matrices

Ab =TX

t=b+1

�xtutut�bx

0t�b + xt�but�butx

0t

�are estimators of autocorrelation matrices

� The White estimator is a special case of V(�)(if Ab = 0 for all b)

64

Nonnormal error terms

� Assumption B4: The error terms are normally distributed

� This assumption is necessary

� to derive the normality of ��to derive the t-distribution of the t-statistic� to derive the F -distribution of the F -statistic

� Remember that � is a linear estimator

� =�X0X

��1X0y

= Cy

65

� For a single component of � we �nd

�k =TXt=1

cktyt

� The random variables y1; : : : ; yT are stochastically independent

� Hence, �k is the sum of independent (but not identically distributed) randomvariables

� Question: How is the sum of random variables distributed?

66

� Central limit theorem: The sum of many i.i.d. random variables isapproximately normally distributed

� The central limit theorem does also hold for nonidentical distribution:�k is approximately normally distributed, even if the error terms are nonnormal

� Further: � is approximately multivariate normal,

�appr� N

��; �2

�X0X

��1�

� Careful: There are some (weak) regularity conditions that must be satis�ed;normality can break down (but usually does not)

67

Simulation [b4.R]:

� Gratuituy example (from last semester):

yt = 0:5 + 0:1 � xt + utsatisfying all A-, B-, C-assumptions apart from B4

� Distribution of error terms: fut (u) = exp (� (u+ 1))

68

� Since

�appr� N

��; �2

�X0X

��1�we �nd for single components �k of �

�k � �kSE(�k)

d�! U � N(0; 1)

for k = 1; : : : ;K

� Con�dence intervals and t-tests are asymptotically valid(use quantiles of N(0; 1) instead of t-distribution)

� F -tests are asymptotically valid (convergence to �2-distribution)69

Stochastic convergence and limit theorems

� Comvergence of real sequences: Let a1; a2; : : : be a sequence of real numbers

� De�nition: The sequence fangn2N converges to its limit a, if for any(arbitrarily small) " > 0 there is a number N(") such that jan � aj < " for alln � N(")

� Notation: limn!1 an = a or an ! a

� Examples:

limn!1 1=n = 0limn!1

h�n2 + n+ 6

�=�3n2 � 2n+ 2

�i= 1=3

70

� Graph of the convergent sequence (n2 + n+ 6)=(3n2 � 2n+ 2):

71

Questions:

� How can the idea of convergence be transferred to sequences of randomvariables?

� What is a sequence of random variables?

� What does convergence of sequences of random variables mean?

� Which sequences of random variables do we typically encounter ineconometrics?

72

� De�nition: Let X1; X2; : : : be random variables

Xi : ! R

We call X1; X2; : : : a sequences of random variables

� X1; X2; : : : are (countably in�nitely many) multivariate random variables

� Formally, this is a sequences of functions(not of real variables)

73

� De�nition: The sequence X1; X2; : : : converges almost surely (fast sicher) to arandom variable X, if

P�n! : lim

n!1Xn(!) = X(!)o�= 1

� Notation

Xnf:s:! X

Xna:s:! X

� This kind of convergence is only of minor importance in econometrics

74

� De�nition: The sequences X1; X2; : : : converges in probability (nachWahrscheinlichkeit) to a random variables X, if

limn!1P (jXn �Xj < ") = 1

� Notation

Xnp! X

plimXn = X

� This kind of convergence is very important in econometrics

75

� Special case: Convergence in probability to a constant

� The sequence X1; X2; : : : converges in probability (nach Wahrscheinlichkeit) toa constant a, if

limn!1P (jXn � aj < ") = 1

� Notation

Xnp! a

plimXn = a

� In econometrics we usually need this kind of convergence in probability76

� De�nition: The sequence X1; X2; : : : (with distribution functions F1; F2; : : :)converges in distribution, in law (nach Verteilung) to a random variable X(with distribution function F ), if

limn!1Fn(x) = F (x)

for all x 2 R where F (x) is continuous

� Notation

Xnd! X

� Relation between types of convergence

Xnf:s:! X ) Xn

p! X ) Xnd! X

77

� Limit theorems: laws of large numbers (LLN, Gesetze der großen Zahl); centrallimit theorems (CLT, zentrale Grenzwertsätze)

� Let X1; X2; : : : be a sequence of random variables

� De�ne a new sequence �X1; �X2; : : : where

�Xn =1

n

nXi=1

Xi

� De�ne another new sequence Z1; Z2; : : : where

Zn =Sn � E(Sn)qV ar(Sn)

with Sn =nXi=1

Xi

78

Strong law of large numbers (SLLN)

� Let X1; X2; : : : be a sequence of independent random variables with�i = E(Xi) <1 and V ar(Xi) <1 for i = 1; 2; : : :

� If P1k=1 V ar(Xk)=k2 <1; thenP

0@ limn!1

0@ �Xn � 1

n

nXi=1

�i

1A = 01A = 1

� Special case: iid sequences, �Xnf:s:! �

79

Weak law of large numbers (Chebyshev, WLLN)

� Let X1; X2; : : : be a sequence of independent random variables with�i = E(Xi) <1 and V ar(Xi) < c <1

� Then

limn!1P

0@������ �Xn � 1

n

nXi=1

�i

������ < "1A = 1

� Special case: iid sequences, plim �Xn = �

80

Weak law of large numbers (Khinchin)

� Let X1; X2; : : : be a sequence of iid random variables with E(Xi) = �

� Then

limn!1P

���� �Xn � ���� < "� = 1� There are also laws of large numbers for stochastic processes, e.g. formartingale di¤erence sequences

81

� The weak laws of large numbers can easily be generalized to the multivariatecase, e.g. Khinchin:

� Let X1;X2; : : : be a sequences of iid random vectors with E(Xi) = �

� For each component k = 1; : : : ;K

limn!1P

���� �Xnk � �k��� < "� = 1� Notation

plim �Xn = �

82

Central limit theorem

� Let X1; X2; : : : be a sequence of random variables

� Consider the sequence of standardized cumulative sums

Zn =Sn � E(Sn)qV ar(Sn)

with Sn =nXi=1

Xi

� How is Zn distributed for n!1 ?

� Impose only a few assumptions about the distribution of the Xis83

Central limit theorem (Lindeberg-Levy)

� Let X1; X2; : : : be a sequence of iid random variables with E(Xi) = � andV ar(Xi) = �

2 <1

� Let Fn(z) = P (Zn � z) denote the distribution function of Zn

� Then

limn!1Fn(z) =

Z z�1

1p2�exp

��12u2�du

� Convergence in distribution: Zn d! Z � N(0; 1)84

Central limit theorem (Liapunov)

� Let X1; X2; : : : be a sequence of independent random variables withE(Xi) = �i, V ar(Xi) = �

2i <1, and E(jXij2+�) <1 for (arbitrarily

small) � > 0

� De�ne cn =qPn

i=1 �2i

� If

limn!1

0@ 1

c2+�n

nXi=1

E (jXi � �ij)2+�1A = 0;

then Znd! Z � N(0; 1)

85

� The heart of the central limit theorem: no single random variable mustdominate the sum

� Each (Xi � �i)=�i is only a negligibly small contribution to the sum(Sn � E(Sn))=cn

� Frequent notation (in the iid case)

Snappr� N(n�; n�2)

�Xnappr� N(�; �2=n)

� We can deal with the sum as if it is normally distributed(if n is large enough)

86

� The central limit theorem also applies to empirical moments!

� Let �k = E(Xk) denote the k-th (theoretical) moment of X

� The k-th empirical moment

mk =1

n

nXi=1

Xki

is an estimator for �k

� According to the CLT, mk is asymptotically normal if the variance of Xk exists(i.e., the 2k-th moment �2k)

87

� The central limit theorem can easily be generalized to the multivariate case,e.g. Lindeberg-Levy:

� Let X1;X2; : : : be a sequence of iid random vectors with E(Xi) = � andCov(Xi) = �

� Thenpn��Xn � �

�d! Z � N(0;�)

� Remark: In the univariate case we can also writepn( �Xn � �) d! Z � N(0; �2)

88

Further central limit theorems

� The assumptions about the sequence X1; X2; : : : can be weakened

� Central limit theorems for stochastic processes

� Central limit theorems for products of random variables

� Central limit theorems for maxima (extreme value theory)

89

Useful rules of calculus

� If plimXn = a and plimYn = b, then

plim(Xn � Yn) = a� bplim(XnYn) = ab

plim�Xn

Yn

�=

a

b; if b 6= 0

� If a function g is continuous at a, then

plim g (Xn) = g (a)

90

� If Yn d! Z and h is a continuous function, then

h (Yn)d! h (Z)

� Cramér�s theorem: If Xnp! a and Yn

d! Z, then

Xn + Ynd! a+ Z

XnYnd! aZ

� Cramér�s theorem is very useful if there are unknown parameters in theasymptotic distribution that can be estimated consistently(more on consistency later)

91

Example for Cramér�s theorem:

� Let X1; : : : ; Xn be a random sample from X; we know that

S�2n =1

n� 1

nXi=1

�Xi � �X

�2 p! �2

S2n =1

n

nXi=1

�Xi � �X

�2 p! �2

� Hence�

S�n

p! 1 and�

Sn

p! 1

92

� According to the central limit theorem

pn�Xn � ��

d! Z � N (0; 1)

� Due topn�Xn � �Sn

=pn�Xn � ��

� �Sn

and �=Snp! 1 we have

pn�Xn � �Sn

d! Z � 1 = Z � N (0; 1)

� Similarly forpn( �Xn � �)=S�n

93

� Multivariate version: According to the central limit theorempn��Xn � �

�d! Z � N (0;�)

� Due to

�n =1

n

X�Xi � �Xn

� �Xi � �Xn

�0 p! �

we can use the following approximation for large n;

�Xnappr� N

��; �n

�(Careful: the notation is bad, but it helps the intuition)

94

Stochastic exogenous variables

� Assumption C1: The matrix X ist non-stochastic

� What happens if X is (at least partially) stochastic?

� We distinguish three cases:

1. X and u are stochastically independent

2. Contemporaneous uncorrelatedness: Cov(xkt; ut) = 0 for all t; k

3. X and u are contemporaneously correlated

95

Conditional expectation

� Let (X;Y ) be jointly continuous with density function fX;Y (x; y)

� Marginal distributions (marginal densities)

fX(x) =Z 1�1

fX;Y (x; y)dy

fY (y) =Z 1�1

fX;Y (x; y)dx

� Conditional density of X given Y = y

fXjY=y (x) =fX;Y (x; y)

fY (y)

96

� Conditional expectation (bedingter Erwartungswert) of X given Y = y

E (XjY = y) =Z 1�1

xfXjY=y (x) dx

� Conditional expectation (bedingte Erwartung) of X given Y :

E (XjY )

is a random variable realizing as E (XjY = y) if Y = y

� The conditional expectation E(XjY = y) is a real number (for given y)

� The conditional expectation E(XjY ) is a random variable

97

Useful rules for conditional expectations

1. Law of iterated expectations: E (E (XjY )) = E (X)

2. Independence: If X and Y are independent, then E (XjY ) = E (X)

3. Linearity: For a1; a2 2 R,

E (a1X1 + a2X2jY ) = a1E (X1jY ) + a2E (X2jY )

4. The conditioned random variables can be treated like constants,

E (f (X) g (Y ) jY ) = g (Y )E (f (X) jY )

98

Stochastic exogenous variables, case 1

� Model y = X� + u with X and u stochastically independent

� The estimators � and �2 are unbiased and consistent

� (Estimated) covariance matrix of �

� Asymptotic normality:pT (� � �) � N(0; �2uQ�1XX)

� Conclusion: If X and u are independent there are no problems

99

Stochastic exogenous variables, case 2

� The error term and the exogenous variables are contemporaneouslyuncorrelated (but may be correlated over time)

� Typical case: lagged endogenous variables on the right hand side

� Unbiasedness is lost

� Consistency and asymptotic normality still hold

� Conclusion: If there is contemporaneous uncorrelatedness, there are hardly anyproblems if the sample is large enough

100

Stochastic exogenous variables, case 3

� Contemporaneous correlation between error terms and exogenous variables

� Example:

101

Why might there be contemporaneous correlation?

� Errors-in-variables:Model: yt = �+ �x�t + etMeasurement: xt = x�t + vt

� Simultaneous equation systems:

ct = �+ �yt + ut

yt = ct + it

102

Instrumental variables (IV estimation)

� Model

y = X� + u

with contemporaneous correlation between X and u

� Instrumental variables: contemporaneously uncorrelated with u, but correlatedwith X

� Let Z denote the (T � (L+ 1))-matrix of instruments, and

P = Z�Z0Z

��1Z0

103

� The matrix P is symmetric and idempotent, P0P = P

� Number of columns L � K (often L = K)

� Transformed model

Py = PX� +Pu

� The least squares estimators of the transformed model are called IV estimators

�IV

=�X0P0PX

��1X0P0Py

=�X0PX

��1X0Py

104

� If L = K then

�IV

= (X0Z(Z0Z)�1Z0X)�1X0Z(Z0Z)�1Z0y

= (Z0X)�1(Z0Z)(X0Z)�1X0Z(Z0Z)�1Z0y

= (Z0X)�1Z0y

� Simple linear regression (L = K = 1)

�IV=

P(zt � �z) (yt � �y)P(zt � �z) (xt � �x)

105

Assumptions about Z

� Existing limit,

plimZ0ZT

= limT!1

E

Z0ZT

!= QZZ

with QZZ positive de�nite

� Asymptotic correlation with exogenous variables

plimZ0XT

= QZX ; rang(QZX) = K + 1

106

� Asymptotic uncorrelatedness with error terms

plimZ0uT

= limT!1

E

Z0uT

!= 0

107

� IV estimators are consistent but not unbiased

� Hausman test (Hausman-Wu test): Hypotheses

H0 : plimX0uT

= 0

H1 : plimX0uT

6= 0

� Test idea: Under H0 both OLS and IV are consistent, under H1 only IV isconsistent

� If �IV deviates �too much� from �, reject H0

108

� Test statistic: ��IV � �

�0 �V(�

IV)� V(�)

��1 ��IV � �

� Asymptotic distribution under H0 is �2K�, where K� is the number of columns

in Z that are not included in X

109

Multicollinearity

� Perfect vs imperfect multicollinearity

� Graphical illustration

110

Dynamic models

� Stochastic process: x1; : : : ; xT

� Moment functions: E(xt), V ar(xt), Cov(xt; xt+�)

� (Weak) stationarity

E(xt) = �

V ar(xt) = �2xCov(xt; xt+�) = �

� Order of integration of a process, I(d)111

� Simplest dynamic model: lagged exogenous variables

yt = �+ �0xt + �1xt�1 + : : :+ �Kxt�K + vt

� Interpretation of the parameters (short-term and long-term multiplier)

� Problems:

� many parameters

� Multicollinearity

� no precise estimation of individual components �k

112

� Note: The variance of the long-term multiplier may be small even if allcomponents �k have a large variance

� Functional form for �0; �1; : : : ; �K

� Polynomial lags (Almon lags)

� geometric lags (Koyck lags)

113

Polynomial lags

� The �k are a polynomial function of k

� Example: Quadratic function:

�k = �0 + �1k + �2k2

for k = 0; : : : ;K

114

� There are less than K parameters, since

yt = �+KXk=0

�kxt�k + vt

= �+KXk=0

��0 + �1k + �2k

2�xt�k + vt

= �+ �0

KXk=0

xt�k + �1KXk=0

kxt�k

+�2

KXk=0

k2xt�k + vt

= �+ �0x�1t + �1x

�2t + �2x

�3t + vt

� The validity of the linear restrictions can be tested115

Geometric lags

� The �k depend on k as follows,

�k = �0�k

where 0 < � < 1

� It is possible to set K =1

yt = �+ �0xt + �1xt�1 + �2xt�2 + : : :+ vt= �+ �0xt + �0�xt�1 + �0�

2xt�2 + : : :+ vt

116

� Short-term multiplier: �0

� Long-term multiplier:

1Xk=0

�k = �0

1Xk=0

�k

= �01

1� �

117

� Koyck transformation:

yt = �+ �0xt + �0�xt�1 + �0�2xt�2 + : : :+ vt

minus

�yt�1 = ��+ �0�xt�1 + �0�2xt�2 + : : :+ �vt�1

yields

yt � �yt�1 = (�� ��) + �0xt + (vt � �vt�1)yt = �0 + �0xt + �yt�1 + ut

� Estimation problematic since B3 and C1 are violated

118

� Models with rational lag distribution

yt = �0 + �0xt + �1xt�1 + : : :+ �Kxt�K+�1yt�1 + : : :+ �Myt�M + ut

� Special case K =M = 1

yt = �0 + �0xt + �xt�1 + �yt�1 + ut

� From

yt = �0 + �0xt + �xt�1 + �yt�1 + ut

we �nd

yt � �yt�1 = �0 + �0xt + �xt�1 + ut

119

� Long-term (undisturbed) equilibrium

y� =�01� �

+�0 + �

1� �x�

� Error correction formulation

�yt = �0�xt � (1� �) et�1 + utwith error (disequilibrium) term

et�1 = yt�1 ��01� �

+�0 + �

1� �xt�1

� If xt and yt are both I (1), and if et�1 is I(0), then xt and yt are calledcointegrated

120

Estimation of error correction models (ECM)

1. Determine the order of integration of xt and yt

2. Estimate by OLS

yt�1 =�01� �

� �0 + �1� �

xt�1 + et�1

and calculate the residuals et�1

3. Determine the order of integration of et�1

4. If there is cointegration, estimate

�yt = �0�xt � (1� �) et�1 + ut

121

Interdependent equation systems

� Illustration by a simple example

� Pharmacy company: Advertisement expenditures wt, quantity sold at, price pt,advertising price (per page) qt

� Model equations

at = �+ �1wt + �2pt + utwt = + �1at + �2qt + vt

� Error terms satisfy all B-assumptions; further we assumeCov (ut; vt) = �uv and Cov (us; vt) = 0 for s 6= t

122

� In the �rst equation, ut and wt are correlated!

� Hence the OLS estimators are inconsistent

� Structural form vs reduced form

� From the structural form

at = �+ �1wt + �2pt + utwt = + �1at + �2qt + vt

we derive the reduced form �at = �1 + �2pt + �3qt + u

�t

wt = �4 + �5pt + �6qt + v�t

123

� Reduced form: all endogenous variables are on the left hand side, all exogenousare on the right hand side

� The equations of the reduced form can be estimated by the OLS method.

� From the estimated values �1; : : : ; �6 one obtains the following values�; �1; �2; ; �1; �2 �

� The estimators �; �1; �2; ; �1; �2 are consistent

� It is not always possible to derive the structural parameters from the reducedform parameters (identi�cation problem)

124

� From the structural form

at = �+ �1wt + �2pt + ut

wt = + �1at + vt

one obtains the reduced form

at = �1 + �2pt + u�t

wt = �3 + �4pt + v�t

� Five structural parameters but only for reduced parameters

� Sometimes there are more reduced parameters than structural parameters

125

� Condition of indeterminacy

_K = Number of the exogenous variables in the general model

K� = Number of the exogenous variables in the considered equation

M� = Number of the endogenous variables in the considered equation

� An equation is

underidenti�ed, if M� � 1 > _K �K�

exactly identi�ed, if M� � 1 = _K �K�

overidenti�ed, if M� � 1 < _K �K�

� M� � 1 is the number of the explanatory endogneous variables (on the rightside); _K �K� is the number of the exogenous variables in the other equations

126

� Estimation of an exactly or overidenti�ed equation

� Two-stages LS method (2SLS)

� Idea: obtain instrumental variables from the reduced form

� Example for the 2SLS method

� In the system

at = �+ �1wt + �2pt + utwt = + �1at + �2qt + vt

The second equation has to be estimated

127

� First step: estimate by the LS

at = �1 + �2pt + �3qt + u�t

and obtains at = �1 + �2pt + �3qt

� Second step: estimate by the LS

wt = + �1at + �2qt + vt

� The 2SLS estimators are consistent (IV-estimator)

� The standard errors have to be adjusted

� The properties of the estimators in �nite samples are complicated128

Interdipendent euquationsystems in matrix notation

� General representation

� Let M the number of equations in the system

� The endogenous variables are set in a (T �M)-Matrix

Y = [y1 y2 : : : yM ]

� The exogenous variables (and the intercept) are set in a (T � _K)-Matrix

X = [x0 x1 : : : xK]

129

� The m-th equation is

ym = �mx0 + �1mx1 + �2mx2 + : : :+ �KmxK+ 1my1 + : : :+ m�1mym�1 + m+1mym+1 + : : :+ MmyM+um

� Setting mm = �1, yields

1my1 + : : :+ MmyM + �mx0 + �1mx1 + : : :+ �KmxK + um = 0

� Pile the coe¢ cients in vectors

m = ( 1m; 2m; : : : ; Mm)0

�m = (�m; �1m; �2m; : : : ; �Km)0

130

� Compact notation of the complete system

Y 1 +X�1 + u1 = 0

Y 2 +X�2 + u2 = 0...

Y M +X�M + uM = 0

and accordingly

Y�+XB+U = 0

with dimensions!

� = [ 1 : : : M ]

B = [�1 : : : �K]

U = [u1 : : : uM ]

131

� The noise terms um, m = 1; : : : ;M , satisfy all B-assumptions

� Dependencies between noise terms of di¤erent equations are permitted

� Assumption

E(umu0m) = �2mIT für m = 1; : : : ;M

E(umu0n) = �mnIT für m 6= n

� How could one write these asumptions in a compact notation for the matrix U?

132

� Reduced form (all endogenous variables on the left side and all exogenous oneson the right side)

� From

Y�+XB+U = 0

follows

Y���1 +XB��1 +U��1 = 0

and accordingly

Y = X�+V

with � = �B��1 and V = �U��1

133

� The structural coe¢ cients in � and B are identi�able only when their valuescan be distinctly deduced from �

� Number of the coe¢ cients:

� : _KM

� : M2 �MB : _KM

� So one needs (at least) M2 �M appropriate restrictions in � and/or B

� In what follows zero-restrictions are assumed �134

Estimations of interdependent equationsystems

� Reduced form

y1 = X�1 + v1...

yM = X�M + vM

� LS-estimation of one equation

�m = (X0X)�1X0ym

� LS-estimation of all equations

� = (X0X)�1X0Y

135

� ILS-method: if equation m is exactly identi�ed, one can derive the estimatorsof the structural coe¢ cients from the matrix �

� If equation m is exactly or overidenti�ed, one uses the 2SLS-method

� Resort and matrix partition

hym �Ym �Ym

i 264 �1� m0

375+X�m + um = 0

� �Ym: in equation m included endogenous variables;�Ym: excluded variables

136

� Equation m can be rewritten in this way

ym = �Ym� m +X�m + um

=h�Ym X

i " � m�m

#+ um

� First step: estimate

� = (X0X)�1X0Y

and accordingly partition��m

c��m c��m� = (X0X)�1X0 hym �Ym �Ymi

137

� The system of the endogenous variables are estimated by

c�Ym = Xc��m� Second step: substitute �Ym through c�Ym in

ym =h�Ym X

i " � m�m

#+ um

� 2SLS estimator24 b� ZSKQm

�ZSKQm

35 = "�c�Ym X�0 �c�Ym X

�#�1 �c�Ym X�0ym

138

� The covariance matrix of the estimated vector24 b� ZSKQm

�ZSKQm

35is

�2"�c�Ym X

�0 �c�Ym X�#�1

with

�2 =1

T

TXt=1

0@ym � h�Ym X

i 24 b� ZSKQm

�ZSKQm

351A2

and NOT

�2 =1

T

TXt=1

0@ym � �c�Ym X� 24 b� ZSKQm

�ZSKQm

351A2

139