1 Outline. - University of Washingtonfaculty.washington.edu/bajari/metricsf06/slides16.pdf · 2009. 7. 17. · large ﬁrms (such as Bechtel) who in engage in extremely large complicated

1 Outline.

1. Motivation.

2. Variance of ols

3. Estimation and testing.

4. Asymptotic theory of ols with conditional het-

eroskedasticity.

2 Motivation.

• In this section, we want to drop the assumptionthat the error term is iid

• This assumpton may not make sense in many

context.

• For example, suppose that we were estimating aproduction function by pooling data on outputs

and inputs for i = 1, ..., N firms in an industry

qi = β0 + β1ki + β2li + ωi

• qi−output (often measured in value added), ki−capital,li−labor, ωi−productivity shock

• In many industries, the size distribution of firmsis highly skewed.

• Ex. construction, there a large number of smallfirms with no payroll and a handful of extremely

large firms (such as Bechtel) who in engage in

extremely large complicated projects.

• It would be silly to assume that the variance ofBechtel’s productivity shock is the same as the

contractor who does small remodeling jobs.

• Remark- you might ask what framework for mea-surment justifies pooling these very distinct firms

into the same regression!

• Thus, it is important to consider the case thatV ar(y|X) = Ω0 6= σoI.

• The following properties of our ols estimator re-main unchanged: unbiasedness, consistency, nor-

mality, asymptotic normality.

• None of these properties relied on the assumptionof a scalar variance covariance matrix.

• The following properties are changed-the variancematrix, estimation of the variance matrix, distrib-

ution of pivotal statistic, relative efficiency of ols

• In this chapter, we shall assume that

V ar(y|X) =

⎛⎜⎜⎜⎜⎜⎜⎝σ21 0 0 0 0

σ22 0 0 00 0 σ23 0 00 0 0 . . . 00 0 0 0 σ2N

⎞⎟⎟⎟⎟⎟⎟⎠

• This is called conditional heteroskedasticity.

• In Chapter 19, consider serial correlation of ele-ments.

• As a practical matter, it is common practice toadjust standard errors in regression results (e.g.

calculate robust standard errors).

3 Variance of Least Squares.

• Consider the variance of our ols estimator condi-tonal on X.

V AR(bβ|X) = V ar³(X 0X)−1X 0y|X

´= (X 0X)−1X 0V ar(y|X)X(X0X)−1

= (X 0X)−1X 0Ω0X(X0X)−1

• Obviously, we should expect s2(X 0X) to be a bi-ased estimator of the variance.

• Next note that:

V ar[y − bμ|X] = V ar[(I − PX)y|X]= (I − PX)V ar[y|X](I − PX)

= (I − PX)Ω0(I − PX)

• The above formula suggests that our distributiontheory for s2 falls apart.

• It is no longer going to be possible to show that(y−bμ)0(y−bμ)

σ20converges to a χ2 distribution.

• The test statistics previously derived will no longerhold because our variance matrix is no longer a

scalar.

• It is easy to show that ols is no longer going to

be an efficient estimator.

• For simplicity, consider the case that the only re-gressor is a constant, xn = 1 for all n

• Also, let the errors be normally distributed andheteroskedastic as follows:

var(yn) =n σ201 if 1 ≤ n ≤ N1

σ202 if N1 < n ≤ N

σ201 < σ202

• Recall that the ols estimator is just the samplemean, so that:

V ar(bβ|X) = σ201N1 + σ202(N −N1)

N2

• Consider an alternative estimator where we justuse the first N1 observations, so that

eβ =1

N1

N1Xn=1

yn

var(eβ|X) =σ201N1

• Obviously, if σ201 is small enough, then var(eβ|X) <V ar(bβ|X).

• This example demonstrates that with heteroskedas-ticty, ols is no longer efficient.

• Moreover, this example suggests that we may wishto overweight observations with a lower variance

in order to construct more efficient estimators.

3.1 Testing for Conditional Heteroskedas-

ticity.

• Following the previous example, you could parti-tion the sample into two groups 1 ≤ n ≤ N1 and

N1 < n ≤ N.

• We want to test the null hypothesis that σ201 =σ202.

• Define s1 and s2 as:

s1 =(y1 − bμ1)0 (y1 − bμ1)

σ201

s2 =(y2 − bμ2)0 (y2 − bμ2)

σ202

• Under the null, s1s2∼ FN1−1,N−N1−1 under the

hypothesis of heteroskedasticty and normality.

• Note many of the tests described in the chapterrequire the assumption of normality, which is not

particularly attractive for applied work.

• This idea can be generalized into the Goldfeld-Quandt F-Test.

• Suppose we are running a multiple regression, in-stead of just using an intercept as above.

• Suppose that one wishes to test the null hypoth-esis that σ2n increases with zn.

• Reorder the observations from the highest zn to

the lowest zn.

• Choose a point N1 and compute s1 and s2 as

above.

• We need to adjust the degrees of freedom, but itcan be shown that s1

s2∼ FN1−K,N−N1−K

• The test also discussed the Breusch-Pagan ScoreTest.

• Assume

yn|(xn, zn) ∼ N(x0nβ0, γ01 + z02nγ02)

• Test null hypothesis that γ02 = 0.

• The formal test statistic is stated in the text, butlet’s talk about an ”intuitive” derivation.

• Let wn(bβ) = (yn−bμOLS,n)2.• Loosely speaking, the fitted residuals are estima-tors of σ20n, n = 1, ..., N

• Suppose that we were at the ”limit” and that β0were effectively observable.

• Then (yn−xnβ0)2 would be distributed as a σ20nχ21which has a mean of σ20n

• Then, under our modeling assumptions:

E(w(β0)|X,Z) = [σ20n]0 = zγ0

V ar(w(β0)|X,Z) = 2diag∙³σ20n

´2¸= 2diag

h(zγ0)

2i

• Recall that an ols regression gives us the condi-tional mean.

• Thus, we could do an ols regression on the aboveequation and test the null hypothesis that γ02 =

0.

• Let our estimates of the regression parameters be:

bγ = (Z0Z)−1Z0w(β0)bw2 = ((I − PZ)w(β0))0 ((I − PZ)w(β0))

= w(β0)0(I − PZ)w(β0)

• Remark, recall that projection matrices are idem-potent and symmetric.

• Under the null hypothesis, the Wald Statistic wouldbe:

W =bγ02Z02⊥1Z2⊥1bγ2bw2

• The above uses our partitioned regression formu-las:

bγ2 = ³Z02⊥1Z2⊥1

´−1Z2⊥1w(β0)

• This generates the variance formula:

V ar(bγ2) = V ar[won|Z]³Z02⊥1Z2⊥1

´−1

• This gives us the weighting matrix in the numer-ator of W.

• Replacing β0 withbβ this is almost the Breusch-

Pagan Score Test.

• Remarket: the usefulness is also limited by nor-mality assumption.

4 Adjustments to ols

• The assumption of homoskedasticity is strong andit is common to adjust standard errors to allow for

conditional heteroskedasticity in applied studies.

• The above example encouraged us to think ofwn(β0) as an ”estimate” of σ

2n.

• Of course, since we do not have repeated samplesper n , we will not be able to learn about σ2n.

• However, many of the objects that we wish tolearn about, such as V ar(bβ|X) will be a fixed k

dimensional object and will not change with the

sample size.

• Can sometimes show that the errors in thinking

about wn(β0) as an ”estimate” of σ2n will average

out.

• White demonstrated that:

1

NX 0diag

h(yn − bμn)2iX →p 1

NX0ΩX

• Hence it is possible to estimate V ar(bβ|X) as:

V ar(bβ|X) = (X 0X)−1X 0Ω0X(X0X)−1d

V ar(bβ|X) = (X 0X)−1X 0diagh(yn − bμn)2iX(X 0X)−1

4.1 WLS/GLS

• Next, we describe how to generate more efficientestimates of our linear model.

• Previous example demonstrated that ols can beinefficient.

• The reason that our proof for the efficiency of olsbreaks down is that the error distribution is no

longer spherical.

• Previous example demonstrated that you may wantto overweight low variance observations and un-

derweight high variance observations.

• For example, suppose that the variance matrix isdiagonal.

• Let’s reweight our observations using the inverseof the variance matrix, i.e.

y∗n =yn

σ0n

x∗n =xn

σ0n

• Note that var[y∗n|x∗n] = 1 so that we are back toa spherical distribution.

• Consider the following regression:

y∗n = x∗0nβ + ε∗n

• Since we are back to a spherical distribution, wenow satisfy the assumptions of the Gauss-Markov

theorem.

• Suppose that Ω0 was known (not required to bespherical for this next theorem!)

Theorem Let X be full column rank, y be a random

variable such thatE[y|X] = Xβ0 and Var[y|X] =Ω0, a pd matrix. The GLS estimator bβGLS =³X0Ω−10 X

´−1X0Ω−10 y is efficient.

• The idea behind the proof is to overweight theobservations with low variance like our simple ex-

ample.

• This will return us to a spherical distribution andenhance efficiency.

• Let Ω0 = C0C−10 be the Cholesky factorization

of Ω0

• Next note that:

EhC−10 y|X

i= C−10 Xβ0

V arhC−10 y|X

i= C−10 Ω0

³C−10

´0= In

• Next we apply the Gauss Markov Theorem to es-

timate β0

β0 =∙³C−10 X

´0 ³C−10 X

´¸−1 ³C−10 X

´0C−10 y

=³X 0Ω−10 X

´−1X 0Ω−10 y

= bβGLS• This proves the theorem.

• In general the GLS estimator is not feasible sinceΩ0 is not known.

• However, we can do a Feasible GLS by replacingΩ0 with an estimate in some circustances.

• In the case of models with ”linear homoskedastic-ity” (i.e. the standard deviation is a linear func-

tion of the x’s), we perform a two step estimator:

1. Fit a linear regression of wn(bβOLS) on zn and

denote the fitted coefficients as bγ2. Plug in bγ for γ0 and compute the FGLS estimatorbβFGLS = ³

X 0 bΩ−1X´−1X 0 bΩ−1y• In practice, most papers do not use the feasibleGLS estimator discussed in the text (although it is

sometimes seen if heteroskedasticity is particularly

bad and efficiency is a concern).

• Most commonly, use ols with ”robust” standarderrors to allow for heteroskedasticity of an un-

known form.

• Making parametric assumptions about the het-eroskedasticity is not particularly attractive as in

the estimators in the text.

5 Asymptotic Theory for Heteroskedas-

ticity.

• Next, we would like to work out the distributiontheory for ols under heteroskedasticity.

• We begin by stating LLN and CLT’s for inid (inde-pendently, not identically) distributed sequences

of random number.

Chebychev’s LLN Let Un be a sequence of inde-pendent rv’s such thatE [Un] = μn and V ar [Un] =

σ2n exit for all n. Denote

EN [μ] =1

N

Xnμn

EN

hσ2i=

1

N

Xnσ2n

If limN→∞1NEN

hσ2i= 0 then EN [U ]−EN [μ]→p

0.

Liapounov CLT Let Un be a sequence of inid rv’ssuch that E [Un] = μn and V ar [Un] = σ2n >

ε > 0 and E [|Un − μn|] = γn exits for all n. If

limN→∞

(Pn γn)

1/3³Pn σ

2n

´1/2 = 0

then

N1/2EN [U − μ]

EN

hσ2i1/2 →d N(0, 1)

• The assumption on the CLT essentially rules outdistributions that has tails that are ”too fat”.

• Next, we sketch the proof for the asymptotic dis-tribution theory for ols under heteroskedasticity.

• The first thing we wish to do is demonstrate thatbβOLS is consistent.• Recall that:

bβOLS − β0 =µ1

NX 0X

¶−1 1NX0(y −Xβ0)

• Let xn(y−x0nβ0) play the role of Un in the LLN.

• Assume that xn are not random variables, but

that limN→∞1N

Pn h(znγ

0)xnx0n = D(γ) and

³1NX 0X

´−1→ D1 finite and pd.

• We have satisfied the LLN so thatEN£xn(y − x0nβ0)

¤=

1NX0(y − Xβ0) →p 0. Then by continuity of

plims:

µ1

NX 0X

¶−1 1NX 0(y −Xβ0)→p 0

• Assuming that the error term is normal, it is slightlytedious, but we can verify the sufficient condition

for the CLT applying the Cramer-Wald device.

• Hence, it follows analogously to previous proofsthat:

³X0Ω0X

´−1/2X 0(y −Xβ0)→d N(0, I)

• Suppose that we have an estimator of the variancematrix such that:

X0 bΩX −p D2(γ0)

where D2(γ0) is nonsingular.

• Then by limit continuity it will follow that:

³X 0 bΩX´−1/2X 0X(bβOLS − β0)

−³X0Ω0X

´−1/2X 0(y −Xβ0)→p 0

• Thus we will treat bβOLS as approximatelyN(β0,

¡X 0X

¢−1 ³X0 bΩX´ ¡X 0X¢−1).

Documents

1 Outline. - University of Washingtonfaculty.washington.edu/bajari/metricsf06/slides16.pdf · 2009. 7. 17. · large ﬁrms (such as Bechtel) who in engage in extremely large complicated