Bias correction for the use of educational and ...home.gwu.edu/~bdwilliams/Site/research_files/tsregressions_v1_051415.pdfassessments as covariates in linear regression Benjamin Williams

Bias correction for the use of educational and psychological

assessments as covariates in linear regression

Benjamin Williams∗

June 2, 2015

Abstract

The use of aggregate scores on item response assessments as a proxy for an underlying trait

in an econometric model generally produces biased estimators. I show that if the number of

items on the assessment is proportional to the square root of the sample size then standard

inferential procedures for the linear regression model are incorrect. I propose a bias-corrected

estimator based on nonparametric estimation of the bias term which produces valid inference

under relatively weak conditions. I also demonstrate the finite sample performance in a Monte

Carlo study and implement the procedure for a wage regression using data from the NLSY 1979.

JEL codes: C14, C38, C39, C55

Keywords: bias correction, test scores, cognitive and noncognitive ability, measurement error,

item response

∗Department of Economics, George Washington University, 2115 G St. NW, Washington DC 20052, phone: (302)994-6685, email: [email protected].

1 Introduction

Item response data consist of the discrete responses of a sample of individuals to several items

on a test or questionnaire. Cognitive ability tests, such as an IQ test, standardized academic

achievement tests, personality assessments, and political opinion polls can be of this form. The

standard psychological testing theory underlying the construction of these tests treats each item as a

noisy signal of a latent variable, or variables (Lord and Novick, 1968). Typically the latent variable

is of primary interest, not the individual items or the test score. It is a common practice among

economists and other social scientists to use the aggregate score in place of the latent variable in

estimating an econometric model (see Junker, Schofield, and Taylor (2012) for a discussion). In this

paper I study the properties of this procedure and propose a strategy for correcting the bias in the

asymptotic distribution. The results use a theoretical framework where the number of items grows

asymptotically. To make the discussion more concrete, I focus on a particular linear regression

model where the scores are used as a covariate in place of the latent variable. See Williams (2013,

2014) for results regarding a more general class of models. See also Williams (2015) for related

results.

In a linear regression with classical measurement error the resulting bias is proportional to

the variance of the measurement error. There are two common methodological responses to this.

The first aims to place bounds on the variance of the measurement error (Frisch, 1934; Klepper

and Leamer, 1984). The second uses more information in the form of additional measurements,

instrumental variables, or higher order moments to eliminate the bias using two stage least squares,

method of moments, or structural equation methods (see Aigner, Hsiao, Kapteyn, and Wansbeek

(1984) for a summary). Alternatively, Chesher (1991) provides a “small variance” approximation

to the bias that can be used to perform a sensitivity analysis.

The assumptions of the standard item response model considered in this paper lead to a model of

classical measurement error in the linear regression outcome model. Both of the above approaches

to classical measurement error has been employed in empirical studies using item response data.

Bollinger (2003) uses the Klepper and Leamer (1984) bounds to study the black-white wage gap,

1

controlling for ability. Goldberger (1972),Chamberlain and Griliches (1975), and others have em-

ployed structural equation modeling in this setting and, more recently, more sophisticated methods

for using additional measurements in a factor analytic approach in linear and nonlinear models have

been developed (Cunha, Heckman, and Schennach, 2010; Heckman, Pinto, and Savelyev, 2013). In

addition there are many empirical studies that employ an aggregate measurement of item response

data without addressing the measurement error in the statistical analysis at all, often because data

on each individual item is not available, but also appealing to the fact that the variance of the

measurement error should be small if the number of items is large.

In the spirit of Chesher (1991), I study the bias under an asymptotic framework where J ,

the number of items, grows along with the sample size, n. Within this framework, each of the

common approaches described can be justified. The estimator will be consistent regardless of how

quickly J grows with n. If J grows sufficiently quickly relative to n then the bias can be ignored

asymptotically. However, if√n/J → γ > 0 then the OLS estimator is biased asymptotically

and standard inference is invalid.1 If J grows too slowly with n then either a bounds analysis or

instrumental variable approach is necessary.2 The bias-correction procedure I propose eliminates

the bias under intermediate rates where J neither grows sufficiently quickly nor too slowly. I

consider such asymptotic sequences in order to provide a better approximation to the finite sample

behavior of the estimator. Alternative asymptotic sequences have been used similarly in other

contexts (Bekker, 1994; Hahn and Kuersteiner, 2002). Monte Carlo results demonstrate that the

asymptotic analysis presented in this paper provides a better approximation to the finite sample

behavior than the standard fixed J asymptotics.

Another approach, advocated by Schofield, Junker, Taylor, and Black (2014) is to model the

individual item responses simultaneously. Their approach does not require asymptotics where J

grows because the latent factor is treated as a random effect with a known distribution. If the

distribution of the latent factor is misspecified then their results will still exhibit bias. In the

context of a related panel data model, Arellano and Bonhomme (2009) show that random effects

1The asymptotic distribution is normal but is not centered at the true parameter value.2These approaches are also able to address problems such as failure of the conditional independence assumption.

I am not arguing that they are only necessary if J is too small.

2

models can be interpreted as a bias reduction procedure relative to a fixed effect model but only if

the specified distribution happens to be from a class of bias-reducing priors. Moreover, Schofield,

Junker, Taylor, and Black (2014) require a fully parametric model which I am able to avoid. This

paper offers an alternative empirical approach which relies on a theoretical framework similar to the

nonlinear panel data framework used in Arellano and Bonhomme (2009) and elsewhere. The bias

correction derived in this paper reduces the bias from O(J−1) to O(J−p) where p ∈ (1, 2] depends

on the quality of the bias-correction, meaning that valid inference is possible if√n/Jp → 0. On

the other hand, the present approach requires nonparametric estimation and focuses on a reduced

form parameter that may be hard to motivate.3

The rest of the paper is structured as follows. Section 2 outlines the main results of the paper.

In Section 3, I derive the bias of OLS estimator, propose a bias-corrected estimator, and prove its

root-n consistency and asymptotic normality. In Section 4, an extension to multiple latent factors

is considered. In section 5, I study the improved performance of the bias-corrected estimator via

Monte Carlo simulations. In section 6, the proposed methods are used to estimate the return to

education controlling for various dimensions of ability. Section 7 concludes.

2 The biased OLS estimator and a bias-corrected estimator

In this section I outline the main results of the paper. In the next section I provide the details

of these results. Consider a vector of J binary response variables, or items, Mi = (Mi1, . . . ,MiJ)′

which are mutually independent conditional on θi, an unobserved variable, or factor. While the

main results of the paper assume that θi is a scalar, Section 4 briefly discusses the case where θi is

a vector.

Let MiJ := J−1∑J

j=1Mij denote the average response of individual i across the J items. This

average response is often available when the full vector of responses, Mi, is not. Since Mi =

(Mi1, . . . ,MiJ)′ are mutually independent conditional on θi, the average response, MiJ , should

not be considered a proxy variable (in the sense of Wooldridge (2009), for example). Instead, if

3See the discussion in the next section for the latter issue. Also see the companion paper, Williams (2015), whereI address this issue.

3

pJ(θi) := J−1∑J

j=1 pj(θi), where pj(θi) := E(Mij | θi), then MiJ = pJ(θi)+ηiJ where E(ηiJ | θi) =

0.

Consider the regression equation

Yi = βJ,1 + β′J,2Xi + βJ,3pJ(θi) + εiJ (1)

for a K × 1 vector of observed explanatory variables, Xi, and an outcome of interest, Yi where

E(XiεiJ) = E(pJ(θi)εiJ) = 0. This paper shows that the least squares estimator that uses MiJ as

a covariate is consistent for βJ = (βJ,1, β′J,2, βJ,3)′ as n, J →∞. Thus this paper provides a rigorous

analysis of what this common approach actually estimates.

Generally the latent variable will not enter the model as specified in equation (1). If the

econometric model is

Yi = β1 + β′2Xi + β3g(θi) + εi, E(εi | Xi, θi) = 0 (2)

then in general βJ −β 6= 0 represents a misspecification bias. In this paper, by studying estimation

of βJ , I separate the issue of measurement error bias from the separate problem of the correct

specification of the regression model.

There are several reasons for addressing the measurement error bias but not the misspecfication

bias. When β2 is the primary object of interest, if pJ(θi) = aJ + bJg(θi) then βJ,2 = β2. In Monte

Carlo simulations I show that when g(θi) = θi and the items are generated by a probit model it is

often the case that the misspecification bias is dominated by the measurement error bias because

pJ(·) is approximately linear in θi. Moreover, even when the misspecification error is substantial

it is in the same direction as the measurement error bias so that the bias correction proposed

in this paper provides an improvement in the mean squared error when β2 is the parameter of

interest. Addressing the misspecification bias either requires a parametric model, as in Schofield,

Junker, Taylor, and Black (2014), or calls for a more involved theoretical argument and a more

complex estimation procedure (Williams, 2015). Williams (2015) also finds that correcting the

misspecification bias introduces an additional term in the asymptotic variance.

4

Let βJ denote the OLS estimator obtained by regressing Yi on Xi and Mi,

βJ =

(n∑i=1

WiJW′iJ

)−1 n∑i=1

WiJY′i

where WiJ = (1, X ′i, MiJ)′. Taking the limit where n, J → ∞, under mild regularity conditions,

βJ is consistent for βJ . That is, βJ − βJ →p 0. However, if J increases too slowly relative to n

in the asymptotics then a bias shows up in the limiting distribution. I show in Section 3 that if

the asymptotic sequence satisfies 0 < γ := lim√n/J < ∞ then

√n(βJ − βJ) →d N(γB,V). If

the asymptotic bias, γB, is ignored, the usual t-statistics and confidence intervals will be invalid,

despite the fact that βJ is consistent.

The bias term can be written as B = − limJ→∞ βJ,3ωJM−1J eK+2 where eK+2 denotes the unit

vector, (0, . . . , 0, 1)′, MJ = E(WiJW′iJ), and ωJ = J−1

∑Jj=1 ωj where ωj = E(pj(θi)(1 − pj(θi))).

This last object, ωJ , is the average conditional variance of MiJ , or the variance of the measurement

error, V ar(ηiJ). The asymptotic bias can be eliminated if consistent estimation of ωJ is possible.

If the individual items, Mi1, . . . ,MiJ are observed then pj(·) can be estimated using the non-

parametric method studied in Williams (2013), which was previously proposed by Ramsay (1991)

and Douglas (1997). An estimate of the function pj(·) can in turn be used to construct an estimate

of ωj , for each j.4 Given estimates ωj of ωj for each j, I can construct ˆωJ = J−1∑J

j=1 ωj . If

| ˆωJ − ωJ | →p 0 then the bias-corrected estimator,

βBCJ := βJ + βJ,3 ˆωJM−1J eK+2,

satisfies√n(βBCJ − βJ) →d N(0,V). Hence standard statistical inference based on βBCJ will be

valid. In addition, this bias correction does not lead to a loss of efficiency asymptotically.

4In fact, because ωj is an average, it is sufficient to estimate a rescaled version of pj(·) which simplifies this firststep estimator.

5

3 Asymptotic results

In this section I discuss the assumptions of the model and state formally the results laid out

in the previous section. I start by discussing the asymptotic distribution of the feasible OLS

estimator. Then I derive conditions under which a bias-corrected estimator is root-n consistent

and asymptotically centered at 0. Finally, I propose an estimator for ωJ and show that this can be

used to construct a bias-corrected estimator satisfying these conditions.

3.1 Feasible OLS estimator

Let βJ be the vector b that minimizes MSE(b) = E (Yi − b′W ∗iJ)2 where W ∗iJ = (1, X ′i, pJ(θi))′, and

let εiJ = Yi−W ∗′iJβJ so that equation (1) holds with E(W ∗iJεiJ) = 0. Then Yi = W ′iJβJ +uiJ where

uiJ = εiJ − (WiJ −W ∗iJ)′βJ and the vector (WiJ −W ∗iJ) is only nonzero in its last entry where it

is equal to ηiJ . I make the following assumptions on the model.

Assumption 3.1. There exists θi ∈ R such that

(i) For each J , Yi ⊥⊥ Mi | Xi, θi.

(ii) For each J , Mi1, . . . ,MiJ are mutually independent conditional on Xi, θi.

(iii) For any j, E(Mij | Xi, θi) = E(Mij | θi).

Under these conditions, E(WiJuiJ) = O(J−1), and hence βJ can be consistently estimated as

n, J →∞ since E(WiJW′iJ)−1E(WiJYi) = βJ+O(J−1). Condition (i) requires that the individual’s

mean performance on the test or questionnaire, Mi, does not influence the outcome except through

the latent factor θi, or, while individual items may be related to the outcome, any effect of the

average response on Yi is explained by variation in Xi. This could be weakened to a conditional

mean independence; I use the full conditional independence to more easily analyze second (and

higher) order terms in the asymptotic expansion. Condition (ii) is standard in the item response

literature. Limited conditional dependence among the responses can be allowed but I do not pursue

this here to simplify the exposition. For example, Stout (1990) replaces this condition with a higher

level “essential conditional independence”. Condition (iii) assumes that the control variables Xi

6

do not influence the response variables. If condition (iii) does not hold then all of the same results

carry through with W ∗iJ replaced by (1, X ′i, pJ(Xi, θi))′.

To derive the main result, first note that the estimation error can be written as

βJ − βJ =

(n−1

n∑i=1

WiJW′iJ

)−1

n−1n∑i=1

(WiJuiJ − E(WiJuiJ))

+

(n−1

n∑i=1

WiJW′iJ

)−1

n−1n∑i=1

E(WiJuiJ)

Suppose (Yi, X′i,Mi1, . . . ,MiJ , θi) is i.i.d. so that I can define MJ = E(WiJW

′iJ), RJ = E(WiJuiJ),

and ΨJ = E(WiJuiJ − E(WiJuiJ))(WiJuiJ − E(WiJuiJ))′. Then

βJ − βJ = n−1n∑i=1

ψiJ +BJ + op(n−1/2)

where E(ψiJ) = 0, E(ψiJψ′iJ) = n−1M−1

J ΨJM−1J , and BJ := M−1

J RJ .

By Assumption 3.1, E(WiJεiJ) = E(E(WiJεiJ | Xi, θi)) = E(W ∗iJεiJ) = 0. Therefore,

RJ = E(WiJεiJ)− E(WiJ(WiJ −W ∗iJ)′βJ)

= −E((WiJ −W ∗iJ)(WiJ −W ∗iJ)′)βJ

Only the last component of WiJ−W ∗iJ is nonzero so RJ = −βJ,3E(η2iJ)eK+2. Finally by Assumption

3.1(ii),

E(η2iJ) = E

J−1J∑j=1

(Mij − pj(θi))

2

= J−2J∑j=1

E(Mij − pj(θi))2 = J−1ωJ

Therefore, if M−1J , βJ , and ωJ are bounded as J → ∞ then BJ = O(J−1), and if

√n/J → γ

then√nBJ → γB.

Next I will introduce notation that is used to state the regularity assumptions of the model.

7

Define M∗J = E(W ∗iJW∗′iJ) and Ψ∗J := E(W ∗iJW

∗′iJν

2i ) where νi := Yi −E(Yi | Xi, θi). For a matrix A

let λAmin denote the minimum eigenvalue of A. Also let ||A|| denote the operator norm induced by

the Euclidean norm. I assume the following regularity conditions.

Assumption 3.2. For all J ,

(i) λM∗Jmin ≥ λ1 > 0.

(ii) λΨ∗Jmin ≥ λ2 > 0.

(iii) E(||W ∗iJW ∗′iJ ||2+δ) ≤ D1 <∞.

(iv) E(|W ∗iJYi|2+δ) ≤ D2 <∞.

The conditions are stated in terms of the components of the infeasible regression rather than

the feasible regression, that is, in terms of the distribution of (Yi,W∗′iJ) rather than the distribu-

tion of (W ′iJ , εi)′. These are standard regularity conditions under which asymptotic normality of

the infeasible least squares estimator follows. The first theorem states formally the asymptotic

distribution of the feasible least squares estimator under these conditions.

Theorem 3.1. Under Assumptions 3.1 and 3.2, if (Yi, X′i,Mi1, . . . ,MiJ , θi) is i.i.d. and n, J →∞

such that√n/J → γ, 0 < γ <∞ then

√n(βJ − βJ) = Op(1). Moreover,

Ω−1/2nJ (βJ − βJ −BJ)→d N(0, I)

so that√n(βJ − βJ)→d N(γB,V)

where BJ = M−1J RJ , B = limJ→∞ JBJ and V = limn,J→∞ nΩnJ .

Proof. See Appendix A.

Assumption 3.2 is a somewhat high level set of conditions. Since pJ(θi) is constrained to

the interval [0, 1] conditions (iii) and (iv) can be replaced by an assumption that the moments

E(||XiX′i||2+δ) and E(|XiYi|2+δ), which do not vary with the sample size, are finite, where Xi =

8

(1, X ′i)′. Conditions (i) and (ii) cannot so easily be replaced by lower level conditions. Before

discussing a bias-corrected estimator, I provide one sufficient set of conditions for (i) and (ii) of

Assumption 3.2.

Assumption 3.3. For all J ,

(i) E(XiX′i) is positive definite

(ii) V ar(Yi | Xi, θi) = V ar(Yi)

(iii) V ar(pJ(θi) | Xi) is bounded away from 0 as J →∞

(iv) E(||XiX′i||2+δ) <∞ and E(|XiYi|2+δ) <∞

Condition (iii) can be satisfied in several ways. If the support of θi is finite and the functions

pj(·) are each differentiable and monotonically increasing with derivatives uniformly boundedly

away from 0 then the condition holds as long as V ar(θi | Xi) is nonzero.

Alternatively, if there is a subset of the support, Θ0 ⊂ Supp(θi) such that the functions pj(·)

are each differentiable and monotonically increasing with derivatives uniformly boundedly away

from 0 on the space Θ0, then condition (iii) holds as long as V ar(θi | Xi, θi ∈ Θ0) is nonzero.

Finally, condition (iii) could instead be stated in terms of the parameters of familiar parametric

measurement models.

3.2 Bias correction

In this section I show how the OLS estimator can be asymptotically recentered. First I state an

immediate corollary to Theorem 1 which provides conditions under which a consistent estimator

for BJ can be used to correct the OLS estimator.

Theorem 3.2. Under the conditions of Theorem 3.1, if BJ is an estimator of the bias BJ such

that |JBJ − JBJ | = op(1) then Ω−1/2nJ (βJ − BJ − βJ)→d N(0, I). In addition, under Assumptions

3.1 and 3.2, if (Yi, X′i,Mi1, . . . ,MiJ , θi) is i.i.d., n, J → ∞ such that

√n/J → ∞, and if BJ is

an estimator of the bias BJ such that |JBJ − JBJ | = Op(n−δ) + Op(J

−α) for 0 < α < 1 and

α/(1 + α) ≤ 2δ then√n(βJ − BJ − βJ)→d N(0,V) if

√n/J1+α → 0.

9

Proof.

Ω−1/2nJ (βJ − BJ − βJ) = Ω

−1/2nJ (βJ −BJ − βJ) + Ω

−1/2nJ (BJ −BJ)

By Theorem 3.1, the first term converges in distribution to a standard normal. It is also shown

in the proof of Theorem 3.1 that Ω−1/2nJ = O(

√n). Therefore,

Ω−1/2nJ (BJ −BJ) = O(1)

√n

J(JBJ − JBJ)

If√n/J → γ < ∞ then the right hand side converges in probability to 0 provided that (JBJ −

JBJ) = op(1). This proves the first part of the theorem.

Under the second set of conditions in the theorem,

Ω−1/2nJ (BJ −BJ) = O(1)

√n

J(JBJ − JBJ)

= Op

(n1/2−δ

J

)+Op

(n1/2

J1+α

)= op(1)

which proves the second result.

Recall now that BJ = J−1βJ,3ωJM−1J eK+2. Therefore the estimator of the bias takes the form

J−1βJ,3 ˆωJM−1J eK+2 and hence

|JBJ − JBJ | = Op(n−1/2) +Op(| ˆωJ − ωJ |)

Theorem 3.2 demonstrates that the more quickly J grows with n, the less precise estimation of

ωJ needs to be. For example, if√n/J → c then JBJ merely needs to be consistent. On the

other hand, if J is small such an asymptotic sequence may understate the bias and an asymptotic

sequence such that√n/J1+α0 → c for some 0 < α0 < 1 may be more appropriate. In this case

ωJ must be estimated more precisely. If ωJ can be estimated very precisely in the sense that the

condition is satisfied with α = 1 and δ = 1/2 then correct asymptotic inference is possible even

under an asymptotic sequence where√n/J2 → c < ∞. The bias correction can be interpreted as

10

reducing the bias from the order of 1/J to the order of 1/J1+α.

I will now provide an estimator of ωJ , prove that it is consistent, and provide a bound on the

convergence rate as required for the second part of Theorem 3.2.

3.2.1 Consistent estimation of ωJ

For any j let Mi,−j denote the average response excluding item j and let pJ,−j(·) denote the

associated conditional expectation function, E(Mi,−j | θi = ·). Then for a given j consider the

conditional expectation function E(Mi,j | Mi,−j = m). If pJ,−j(·) is one-to-one and therefore

has a unique inverse, denoted p−1J,−j , it can be shown that, under sufficient regularity conditions,

E(Mi,j | Mi,−j = m) ≈ pj(p−1J,−j(m)) where the approximation holds as J →∞. Define p∗j,J(m) :=

pj(p−1J,−j(m)) and ω∗j,J(·) = p∗j,J(·)(1 − p∗j,J(·)). Since Mi,−j ≈ pJ,−j(θi) it can also be shown that,

under sufficient regularity conditions, ω∗j,J(Mi,−j) ≈ ωj(θi) := pj(θi)(1− pj(θi)).

This argument suggests the following estimation strategy. First for each j estimate the condi-

tional expectation functions, p∗j,J(·), denoting the estimators by p∗j,J(·), then estimate ωJ by

ˆωJ = J−1n−1n∑i=1

J∑j=1

ω∗j,J(Mi,−j)

where ω∗j,J(·) = p∗j,J(·)(1− p∗j,J(·)).

The only missing step is how to estimate p∗j,J(·). This can be estimated nonparametrically by a

kernel regression of Mj on M−j . Kernel regression estimators for nonparametric IRT models have

been proposed by Stout (1990) and Douglas (1997). The proposed estimator can be defined by

p∗j,J(m; h) where h is a (possibly data-dependent) bandwidth parameter and

p∗j,J(m;h) =

∑ni=1Kh(Mi,−j −m)Mij∑ni=1Kh(Mi,−j −m)

, Kh(u) := h−1K(u/h)

where K(·) is a compactly supported kernel function.

Assumption 3.4. For ε, ε′ > 0 the following conditions hold.

(i) maxj supm∈[0,1]

∣∣∣ ∂∂mω∗j,J(m)∣∣∣ is bounded

11

(ii) maxi,j |Mi,−j − pJ,−j(θi)| = Op(J−1/2+ε)

(iii) maxj supm∈[0,1] |p∗j,J(m)− p∗j,J(m)| = Op(J−1/2+ε) +Op(n

−1/3+ε′)

Condition (i) can alternatively be stated in terms of the individual functions pj . It will generally

follow under the types of conditions considered after Theorem 3.1. Conditions (ii) and (iii) can be

prove under quite general conditions by slight modifications of arguments in Williams (2013). The

desired result follows immediately from these high level assumptions.

Theorem 3.3. Under Assumptions 3.1 and 3.4, | ˆωJ − ωJ | = Op(J−1/2+ε) +Op(n

−1/3+ε′).

Proof. The result follows from the decomposition,

|ωJ − ωJ | ≤1

nJ

∑i,j

ω∗j,J(Mi,−j)− ω∗j (Mi,−j)

+

1

nJ

∑i,j

ω∗j,J(Mi,−j)− ω∗j,J(p−j(θi))

+

1

n

∑i

ωJ(θi)− E(ωJ(θi))

where I have used the fact that ω∗j,J(p−j(θi)) = ωj(θi). Then the third term is O(n−1/2) since the

θi is i.i.d. across i. Further, |ω∗j,J(Mi,−j) − ω∗j (Mi,−j)| |p∗j (Mi,−j) − p∗j (Mi,−j)| since p∗j and p∗j,J

are bounded. So the first term is Op(J−1/2+ε) +Op(n

−1/3+ε′) by Assumption 3.4(iii).

Lastly, |ω∗j,J(m′)− ω∗j,J(m)| ≤ supm

∣∣∣ ∂∂mω∗j,J(m)∣∣∣ |m′ −m|. So the second term in the decompo-

sition is Op(J−1/2+ε) by Assumption 3.4(ii).

Therefore, the bias-corrected estimator described in this section will have an asymptotic dis-

tribution centered at 0, as stated in Theorem 3.2, if for some√n/J → γ < ∞ but also if J is

O(n1/3+ε).

4 Multidimensional latent trait

In this section I extend the results above to allow for multiple latent factors. I first provide a very

general result about what a feasible regression estimates as J →∞. Then, under more restrictive

12

conditions, I show how a bias correction can be implemented. Lastly I discuss the case of “dedicated

measurements” where each item is a signal of a single dimension of the latent variable.

4.1 Feasible Least Squares Regression

Suppose J binary test items are used to construct LM averages, Mi1, . . . , MiLM . The lth average

is constructed as Mil = J−1∑J

j=1wljMij where wljJj=1 is a sequence of nonrandom weights for

each l. First I will consider the properties of the least squares estimator, βJ , from the regression

of Yi on Wi = (1, X ′i, Mi1, . . . , MiLM )′ when Assumption 3.1 is replaced by the following.

Assumption 4.1. There exists θi ∈ RLθ such that

(i) Yi ⊥⊥ Mil | Xi, θi for each l = 1, . . . , LM .

(ii) Mi1, . . . ,MiJ are mutually independent conditional on Xi, θi.

(iii) For any j, E(Mij | Xi, θi) = E(Mij | θi).

Define the conditional mean functions pJ,l(h) = E(Mil | θi = h). Under regularity con-

ditions stated below βJ will converge to βJ , the estimand of the regression of Yi on W ∗iJ =

(1, X ′i, pJ,1(θi), . . . , pJ,LM (θi))′. If√n/J → γ for some 0 < γ <∞, there will be an asymptotic bias

that can be written as

BJ =E(WiJW

′iJ)−1

V ηJ βJ (3)

V ηJ is a matrix with entries V η

J [s, t] such that (i) V ηJ [s, t] = 0 if s or t is less than K + 2 and (ii) for

l, l′ ∈ 1, . . . , LM, V ηJ [K + 2 + l,K + 2 + l′] = J−1ωJ,l,l′ where

ωJ,l,l′ = J−1J∑j=1

wljwl′jωj

To state the regularity conditions and describe the asymptotic variance I first define the residuals

εiJ and uiJ and the moment matrices MJ ,ΨJ ,ΩnJ , M∗J , and Ψ∗J as before except now in terms of

WiJ = (1, X ′i, Mi1, . . . , MiLM )′ and W ∗iJ = (1, X ′i, pJ,1(θi), . . . , pJ,LM (θi))′.

13

Assumption 4.2. For all J

(i) λM∗Jmin ≥ λ1 > 0.

(ii) λΨ∗Jmin ≥ λ2 > 0.

(iii) E(||W ∗iJW ∗′iJ ||2+δ) ≤ D1 <∞.

(iv) E(|W ∗iJYi|2+δ) ≤ D2 <∞.

This assumption is identical to Assumption 3.2 except that WiJ and W ∗iJ now are defined

to include LM different average response variables and LM different transformations of Lθ latent

factors, respectively . This distinction is important for example in interpreting condition (i) – in the

event that Lθ < LM condition (i) can easily fail depending on the relationship among the functions

pl(·). The asymptotic distribution of βJ can then be derived under Assumptions 4.1 and 4.2 as an

extension of the arguments in the proof of Theorem 3.1.

Theorem 4.1. Under assumptions 4.1 and 4.2, if (Yi,Mi1, . . . ,MiJ , X′i, θ′i) is i.i.d. and n, J →∞

such that√n/J → γ where 0 < γ <∞ then

√n(βJ − βJ) = Op(1). Moreover,

√n(βJ − βJ)→d N(B,V)

where

B =

V =

4.2 Bias correction

If Lθ = LM then the bias correction method proposed in Section 4.2 can be generalized to the

multidimensional case.5 I assume that the model satisfies the following injectivity assumption.

5If Lθ < LM then the bias correction procedure proposed may be valid after regrouping the J items into only Lθgroups.

14

Assumption 4.3. Define the function pJ : RLθ → RLM by pJ(t) = (pJ,1(t), . . . , pJ,LM (t)). Suppose

Lθ = LM and that p is one-to-one.

The proposed estimation procedure can be separated into three stages. First, estimate the

kernel regression of Mi,j on Qi,j := (Mi1,−j , . . . , MiJM ,−j) and denote this p∗j,J(·). Second, form

ωjJ = n−1∑n

i=1 p∗j,J(Qi,j)(1 − p∗j,J(Qi,j)). Third, after completing the previous steps for each

j = 1, . . . , J , form ˆωJ = J−1∑J

j=1wljwl′jωj . To state the conditions needed for this to lead to

asymptotic bias correction I define p∗j,l(q) = pj(p−1J,l,−j(q)) and ω∗j,l(q) = p∗j,l(q)(1− p∗j,l(q)).6

Assumption 4.4. For ε, ε′ > 0 the following conditions hold.

(i) maxj,l supq∈[0,1]LM

∣∣∣ ∂∂mω∗j,l(q)∣∣∣ is bounded

(ii) supi,j,l |Mi,l,−j − pJ,l,−j(θi)| = Op(J−1/2+ε)

(iii) supi,l |Mi,l − pJ,l(θi)| = Op(J−1/2+ε)

(iv) for some p > 0, maxj,l supq∈[0,1]LM |p∗j,l(q)− p∗j,l(q)| = Op(J−1/2+ε) +Op(n

−p+ε′)

As stated in the following theorem, under these regularity conditions the bias corrected estimator

will exhibit negligible bias asymptotically.

Theorem 4.2. Under Assumptions 4.1 - 4.4, | ˆωJ − ωJ | = Op(J−1/2+ε) + Op(n

−p+ε′) and, conse-

quently,√n(βJ − BJ − βJ)→d N(0,V)

4.3 Interpretation and Dedicated Items

Additional structure can be used to simplify the bias correction procedure. The approach described

here is used in the application in the following section.

Suppose that, for each j, there is some dimension of the latent vector, denoted lj , such that

pj(·) varies only with θi,lj so that pj(θi) = pj(θi,jl). Then Lθ response averages can be computed

where wlj = 0 if l 6= lj and wljj = 1. With this structure, V ηJ [K + 2 + l,K + 2 + l′] = 0 if l 6= l′.

6The function p−1J,l,−j is the inverse of the function pJ,l,−j which is the vector of functions (pJ,1, . . . , pJ,LM ) with

pJ,l replaced by pJ,l,−j .

15

So the bias correction only requires estimating the Lθ terms, ωJ,l,l. Moreover, the first stage now

will only require a nonparametric regression in one dimension and as a result will avoid the curse

of dimensionality.

In addition, with this additional structure the infeasible regression model,

Yi = βJ,1 + β′J,2Xi +

Lθ∑l=1

βJ,3,lpJ,l(θi,l) + εiJ ,

differs from the causal model,

Yi = βJ,1 + β′J,2Xi +

Lθ∑l=1

β3,lθi,l + εiJ ,

only to the extent that the scalar functions pJ,l are not well approximated by a linear function.

5 Monte Carlo Simulations

To study the finite sample properties of the OLS estimator and the bias-corrected estimator a

series of Monte Carlo exercises are performed. I report the bias, standard deviation, and root

mean squared error of the coefficient on a scalar covariate Xi. I also assess the relative size of

the measurement error bias and the misspecification bias. I report results from a wide range of

examples of the item response model.

The results demonstrate the following main results. First, the feasible OLS estimator can exhibit

a substantial bias, even when J is as large as 50 or 100. Second, the bias corrected estimator

dominates the uncorrected estimator, though it does not eliminate the bias entirely unless J is

large. The misspecification bias is dominated by the measurement error bias in a wide range of

models, though I also find that the misspecification bias can be substantial depending on the shape

of pJ , as described below.

16

In the simulations the data are generated by the model

Yi = 0.25Xi + 0.5θi + εi

Mij = 1(γ0j + γ1jθi ≥ σηij)

where Wi ∼ Bernoulli(0.5), θi | Wi = w ∼ N(µw, σ2θ), εi ∼ N(0, 1), and ηij ∼ N(0, 1). The

parameters µw and σ2θ are chosen so that E(θi) = 0, V ar(θi) = 1, and E(θi | Xi = 1) − E(θi |

Xi = 0) = 1. I simulate various models that allow γ0j , γ1j and σ to vary. In all models I set

n equal to 50, 100, 500, 1000, 5000 and J equal to 5, 10, 25, 50, 100. For the bias correction I use a

uniform kernel and a bandwidth of h = 1/(ln(n)n1/4). The results of the Monte Carlo exercises

are presented in Tables 1-4.

The results in Table 1 were generated from a model with γ0j = 0, γ1j = 1 and σ = 2. These

results show that the finite sample properties of both the uncorrected estimator and the bias-

corrected estimator are well approximated by the asymptotic theory. The feasible OLS estimator

exhibits a finite sample bias that represents a substantial contribution to the RMSE. Generally the

bias is of the same order of magnitude as the standard deviation, except when J is large relative to

n. The bias correction decreases the bias with only a slight increase in the standard deviation, thus

reducing the RMSE. In some cases where J is small relative to n, the bias is on the same order of

magnitude as the standard deviation after the bias correction.

Table 2 presents results from two additional models. Model 2 is the same as Model 1 except

that γ0j = 1. In Model 3, γ1j = 1 and γ0j = 0 but σ = 1. I find that the bias-corrected estimator

dominates the uncorrected estimator in these models, as in Model 1. In Model 3, ωJ is smaller so

the bias of the uncorrected estimator is smaller, as expected. I find that the bias of the corrected

estimator is substantially smaller in Model 3 as well, except when J is 50 or 100.

In Models 1-3, the item response functions are identical across j. Table 3 presents results from

models where γ0j or γ1j varies across j. Model 4 is the same as Model 1 except that the γ0j are

spaced equidistantly between −3 and 3 for each J . Model 5 is the same as Model 1 except that

the γ1j = 2δj where δj are spaced equidistantly between −2 and 2. Models 6 and 7 are the same as

17

Bias SD RMSE Bias SD RMSE

50 1.17 1.29 1.74 0.79 1.41 1.61 0.13100 1.15 0.92 1.47 0.75 0.99 1.24 0.23500 1.15 0.40 1.22 0.76 0.43 0.87 0.351000 1.16 0.29 1.19 0.76 0.30 0.82 0.375000 1.15 0.13 1.16 0.76 0.14 0.77 0.39

50 0.78 1.30 1.51 0.38 1.39 1.44 0.07100 0.82 0.92 1.23 0.41 0.98 1.06 0.17500 0.82 0.40 0.91 0.38 0.42 0.57 0.341000 0.81 0.28 0.86 0.37 0.30 0.48 0.385000 0.81 0.13 0.82 0.37 0.14 0.40 0.42

50 0.43 1.30 1.37 0.10 1.37 1.38 0.00100 0.42 0.90 1.00 0.10 0.95 0.95 0.04500 0.43 0.40 0.59 0.11 0.42 0.43 0.151000 0.43 0.28 0.51 0.10 0.30 0.31 0.205000 0.42 0.13 0.44 0.10 0.13 0.17 0.28

50 0.22 1.30 1.31 0.01 1.33 1.33 -0.02100 0.22 0.94 0.97 0.01 0.96 0.97 0.00500 0.24 0.41 0.47 0.03 0.42 0.42 0.051000 0.23 0.29 0.37 0.02 0.29 0.30 0.075000 0.23 0.13 0.27 0.03 0.13 0.13 0.13

50 0.09 1.31 1.31 -0.03 1.33 1.33 -0.02100 0.11 0.93 0.93 -0.01 0.94 0.94 -0.01500 0.11 0.41 0.42 0.00 0.41 0.41 0.011000 0.11 0.28 0.30 0.00 0.28 0.28 0.025000 0.12 0.12 0.17 0.00 0.13 0.13 0.05

Notes: All entries are as a percent of the true parameter value. This table reports the results of the simulation exercise described in Section 5. Each simulation used 5,000 monte carlo draws. This table reports results for the coefficient on the observed regressor.

5

10

25

50

100

Table 1. Monte Carlo simulation results for Model 1

J n feasible OLS feasible OLS with bias-correction RMSE improvement

18

J= n=10

050

010

0010

050

010

0010

050

010

0010

050

010

0010

050

010

00

bias

of

unco

rrec

ted

1.15

1.15

1.16

0.82

0.82

0.81

0.42

0.43

0.43

0.22

0.24

0.23

0.11

0.11

0.11

bias

of

corr

ecte

d0.

750.

760.

760.

410.

380.

370.

100.

110.

100.

010.

030.

02-0

.01

0.00

0.00

RM

SE im

prov

emen

t0.

230.

350.

370.

170.

340.

380.

040.

150.

200.

000.

050.

07-0

.01

0.01

0.02

bias

of

unco

rrec

ted

1.20

1.20

1.20

0.88

0.87

0.87

0.48

0.49

0.49

0.28

0.30

0.29

0.17

0.17

0.17

bias

of

corr

ecte

d0.

820.

830.

830.

480.

450.

440.

150.

170.

170.

070.

090.

080.

050.

050.

05R

MSE

impr

ovem

ent

0.23

0.34

0.36

0.18

0.34

0.38

0.06

0.18

0.23

0.01

0.08

0.10

0.00

0.03

0.04

bias

of

unco

rrec

ted

0.63

0.62

0.63

0.37

0.38

0.39

0.21

0.20

0.20

0.13

0.13

0.13

0.11

0.09

0.09

bias

of

corr

ecte

d0.

320.

300.

310.

150.

140.

150.

080.

070.

070.

060.

060.

060.

080.

050.

05R

MSE

impr

ovem

ent

0.10

0.22

0.26

0.03

0.12

0.16

0.00

0.03

0.05

0.00

0.01

0.02

0.00

0.00

0.01

5010

0Ta

ble

2. M

onte

Car

lo s

imul

atio

n re

sults

for β

2, co

mpa

rison

of

Mod

els

1-3

Not

es: A

ll en

trie

s ar

e as

a p

erce

nt o

f th

e tr

ue p

aram

eter

val

ue. T

his

tabl

e re

port

s th

e re

sults

of

the

sim

ulat

ion

exer

cise

des

crib

ed in

Sec

tion

5. E

ach

sim

ulat

ion

used

5,0

00 m

onte

car

lo d

raw

s. T

his

tabl

e re

port

s re

sults

for t

he c

oeff

icie

nt o

n th

e ob

serv

ed re

gres

sor.

mod

el 1

mod

el 2

mod

el 3

510

25

19

Models 4 and 5, respectively, except that σ = 1. I find the same overall patterns across all models.

The bias is practically corrected by the bias correction if either σ is small or J is large. A

substantial bias remains in a few cases where J is 5 or 10 and σ is greater than 1. Even in these

cases the bias is much smaller than the bias of the OLS estimator. On the other hand, the variance

of the bias-corrected estimator is higher across the board, in some cases close to twice the variance

of the OLS estimator. Overall the mean squared error is lower for the bias-corrected estimator

in every simulation but one. The one simulation where the MSE is lower exhibited a negligible

difference; this was the simulation with the smallest OLS bias. The bias correction reduces the

MSE by as much as 80%.

These results are what one would expect given the asymptotic results in the paper. While the

variance should not be increased asymptotically by the bias correction a moderately higher finite

sample variance is to be expected. Meanwhile, the asymptotics suggest that the bias should be

reduced but not eliminated by the bias correction and that the remaining bias should be decreasing

in the number of items.

As I have noted, in some cases a non-negligible bias remains even when n = 5000 and J = 100

because the model is misspecified. The first column of Table 4 shows results from three of the models

already discussed plus Model 4’ that differs from Model 4 in that the γ0j are spaced equidistantly on

a narrower range, from −1 and 1. The second and third columns show results from these same four

models except that σ, the variance of error in the latent index for each binary item, is decreased.

A smaller σ represents what psychologist call a more discriminating item. The smaller σ is, the

more the response is driven by the trait, θi, and the smaller ωj is. Table 4 only presents results for

n = 1000 and J = 10.

20

J= n=10

050

010

0010

050

010

0010

050

010

0010

050

010

0010

050

010

00

bias

of

unco

rrec

ted

1.15

1.15

1.16

0.82

0.82

0.81

0.42

0.43

0.43

0.22

0.24

0.23

0.11

0.11

0.11

bias

of

corr

ecte

d0.

750.

760.

760.

410.

380.

370.

100.

110.

100.

010.

030.

02-0

.01

0.00

0.00

RM

SE im

prov

emen

t0.

230.

350.

370.

170.

340.

380.

040.

150.

200.

000.

050.

07-0

.01

0.01

0.02

bias

of

unco

rrec

ted

1.32

1.31

1.31

0.95

0.95

0.94

0.50

0.52

0.52

0.28

0.29

0.29

0.14

0.15

0.15

bias

of

corr

ecte

d1.

011.

001.

000.

540.

520.

510.

140.

160.

150.

030.

040.

040.

000.

000.

00R

MSE

impr

ovem

ent

0.21

0.28

0.29

0.20

0.36

0.39

0.07

0.20

0.25

0.01

0.08

0.11

-0.0

10.

020.

03

bias

of

unco

rrec

ted

1.13

1.11

1.12

0.76

0.78

0.78

0.41

0.40

0.40

0.22

0.22

0.22

0.13

0.11

0.11

bias

of

corr

ecte

d0.

730.

710.

720.

350.

350.

350.

100.

100.

090.

020.

030.

020.

020.

000.

00R

MSE

impr

ovem

ent

0.23

0.35

0.38

0.15

0.33

0.37

0.04

0.14

0.18

0.00

0.05

0.07

0.00

0.01

0.02

bias

of

unco

rrec

ted

0.92

0.92

0.92

0.54

0.57

0.56

0.26

0.25

0.25

0.11

0.12

0.13

0.05

0.06

0.06

bias

of

corr

ecte

d0.

750.

750.

750.

250.

270.

260.

050.

050.

05-0

.01

0.00

0.00

-0.0

20.

00-0

.01

RM

SE im

prov

emen

t0.

100.

150.

160.

080.

200.

240.

010.

060.

09-0

.01

0.01

0.02

-0.0

10.

000.

00

bias

of

unco

rrec

ted

0.65

0.66

0.65

0.41

0.41

0.41

0.25

0.23

0.23

0.15

0.16

0.15

0.11

0.12

0.11

bias

of

corr

ecte

d0.

330.

340.

330.

180.

160.

160.

120.

100.

100.

080.

090.

080.

070.

080.

08R

MSE

impr

ovem

ent

0.11

0.23

0.27

0.04

0.13

0.16

0.01

0.04

0.06

0.00

0.02

0.02

0.00

0.01

0.01

Tabl

e 3.

Mon

te C

arlo

sim

ulat

ion

resu

lts fo

r β2,

com

paris

on o

f M

odel

s 1,

4-7

510

2550

100

mod

el 1

mod

el 4

mod

el 5

mod

el 6

Not

es: A

ll en

trie

s ar

e as

a p

erce

nt o

f th

e tr

ue p

aram

eter

val

ue. T

his

tabl

e re

port

s th

e re

sults

of

the

sim

ulat

ion

exer

cise

des

crib

ed in

Sec

tion

5. E

ach

sim

ulat

ion

used

5,0

00 m

onte

car

lo d

raw

s. T

his

tabl

e re

port

s re

sults

for t

he c

oeff

icie

nt o

n th

e ob

serv

ed re

gres

sor.

mod

el 7

21

σ 2 1 0.5

feasible 0.12 0.09 0.27bias corrected 0.00 0.05 0.26

infeasible -0.01 0.05 0.26

feasible 0.15 0.06 0.03bias corrected 0.00 -0.01 -0.01

infeasible -0.01 -0.01 -0.01


infeasible -0.01 0.03 0.11


infeasible 0.00 0.07 0.25

Table 4. Monte Carlo simulation results for β2 -- comparing biases

Notes: All entries are as a percent of the true parameter value. This table reports the results of the simulation exercise described in Section 5. Each simulation used 5,000 monte carlo draws. This table reports results for the coefficient on the observed regressor.

model 1

model 4

model 4'

model 5

The results from Table 4 first demonstrate that the bias correction works in that the bias of

the bias corrected estimator is equal to the bias of the infeasible estimator that uses pJ(θi) as a

covariate. The results also show that the misspecification bias, that is, the bias of the infeasible

estimator, can be larger in some cases. Specifically we see that typically as σ decreases a substantial

misspecification bias arises. However, Model 4 shows that, even with discriminating items, if there

is a wide enough range in the difficulty of the items then the misspecification bias is still minimal.

6 Application: The effect of education and ability on labor market

outcomes

In this section I conduct an empirical exercise to demonstrate the use of the methods proposed in

this paper. I make use of data from the National Longitudinal Study of Youth 1979 to investigate

the effect of education on earnings.

22

It is well known in the extensive literature on the returns to education that failing to account for

an individual’s ability in a wage regression will lead to a positive ability bias in the estimate of the

return to education. Becker (1967), for example, showed how this bias would arise using a model

of investment in human capital where the marginal benefit of education is increasing in ability.7

Rosen (1977) and, more recently, Carneiro, Heckman, and Vytlacil (2011) have demonstrated

more generally that a selection bias can arise in a hedonic model of the return to human capital

investment.

Several different approaches to overcoming the endogeneity of schooling have been considered.

One approach is to use earnings data on identical twins to control for genetic and environmen-

tal components of ability that are common between twins who make different education choices

(Behrman, Taubman, and Wales, 1977; Ashenfelter and Krueger, 1994). Typical estimates are

30% smaller than OLS estimates (Ashenfelter and Rouse, 1998). Measurement error in education

combined with remaining within-family variation in ability for twins makes it hard to see to what

extent these estimates reduce the ability bias in estimates of the effect of education on earnings

(Ashenfelter and Krueger, 1994; Card, 1999; Neumark, 1999).8 Another approach uses instrumen-

tal variables. Card (1999) provides an extensive review of IV estimates of the effect of education

on earnings. These estimates are typically higher, sometimes much higher, than OLS estimates.

Moreover, the choice of the instrument affects the interpretation of the estimate of the effect of

education (Heckman, Lochner, and Todd, 2006; Heckman, Urzua, and Vytlacil, 2006).

A third approach aims to control directly the unobserved ability that plagues OLS estimates.

While some economists have used IQ scores and performance on achievement tests as proxies

for unobserved ability it has long been recognized that doing so produces methods that suffer

from measurement error bias that tends to bias the effect of education upwards. Thus structural

equations models – a combination of errors-in-variables models, factor models, and simultaneous

equations – have been employed to account for the fact that these tests are noisy measures of ability.

This approach is typified by studies by Griliches and Mason (1972), Chamberlain (1977), and

7Heckman, Lochner, and Todd (2006) point out that the problem of ability bias in this context had been recognizedby economists as far back as Noyes (1945).

8The Minnesota Twin Family Study provides evidence that twins who made different education choices differsignificantly in verbal IQ, personality, and GPA (Stanek, Iacono, and McGue, 2011).

23

Blackburn and Neumark (1993). Some more recent work (Carneiro, Hansen, and Heckman, 2003;

Heckman, Stixrud, and Urzua, 2006) incorporates measures of ability while also using exclusion

restrictions to aid in identification. One common criticism of this approach is that these models

require normalizations for identification which are sometimes perceived as arbitrary, in the sense

that they are not motivated by an economic model. My analysis in this section avoids these

normalizations using the methods developed in this paper.

6.1 Data

The data used in this section is the National Longitudinal Survey of Youth (NLSY79). The NLSY79

has a broad set of information on demographics, educational outcomes, labor market outcomes,

health, and criminal behavior for a panel of individuals over thirty years. The respondents were

first interviewed in 1979 when they were between 14 and 22 years old. The respondents were

reinterviewed each subsequent year until 1994 after which point they were interviewed on a biennial

basis. The analysis is restricted 30-year old white males who reported working at least 35 hours

per week on average. Table 5 reports summary statistics for this subsample.

As part of the survey 11, 914 respondents (94% of the respondents) were administered the

Armed Services Vocational Aptitude Battery (ASVAB) which comprised ten subtests. Both raw

scores and scale scores are reported for each subtest. The Armed Forces Qualifying Test (AFQT),

a composite of four of these subtests, is also reported. Recently the individual item responses for

these four subtests – mathematical knowledge, arithmetic reasoning, paragraph comprehension,

and word knowledge – have been offered in a new data release.9 In addition, individuals were

administered two separate personality surveys, the Rosenberg Self Esteem and the Rotter Locus

of Control which consist of 10 and 4 binary10 items, respectively. These data are summarized in

Table 5 and Table 7 below.

9See Schofield (2014) for an initial analysis of this newly released data.10Respondents are asked whether they weakly agree, strongly agree, weakly disagree, or strongly disagree with

different statements. I convert responses to a binary agree or disagree classification.

24

mean std. dev. min. maxHighest grade completed 13.83 2.45 7 20Average weekly wage 967.14 523.66 2.98 3764.71Average hours worked per week 47.42 9.64 35 97.85Urban residence 0.75 0.44 0 1Married 0.63 0.48 0 1Number of children in household 0.87 1.07 0 5Urban residence, at age 14 0.75 0.44 0 1Father's highest grade completed 12.61 3.26 0 20Mother's highest grade completed 12.20 2.26 3 20Asvab - math. knowledge 0.64 0.25 0.04 1Asvab - paragraph comp. 0.78 0.18 0.07 1Asvab - word knowledge 0.82 0.16 0.11 1Asvab - arith. Reasoning 0.71 0.23 0.10 1Rosenberg Self Esteem 0.95 0.09 0.30 1Rotter Locus of Control 0.69 0.26 0 1

Table 5. Descriptive Statistics

Notes: The sample consists of n=1,157 white males from the NLSY79. Highest grade completed corresponds to the highest level reported by the individual. The average weekly wage is in 2010 dollars. The parent's education is reported at the initial interview in 1979. The scores reported for the ASVAB subtests and the Rosenberg and Rotter scales are the average responses as described in the text.

6.2 Results

In this section I estimate an earnings regression that controls for various dimensions of cognitive

and noncognitive ability. I start by estimating the following feasible regression model

Yi = β0 + β1Si +

6∑k=1

β3,kMki + γ′Xi + ui

where Yi is the log of individual i’s average weekly wages, Si represents education level, and Xi

is a vector of additional controls, including year dummies. For k = 1, . . . , 4, Mki is the score

(% answered correctly) on subtest k. For k = 5, 6, Mki represents the average response on the

Rosenberg and Rotter surveys after items are recoded so that a 1 is associated with higher self

esteem and a more internal locus of control, respectively. Theorem 4.1 implies that the OLS

25

estimator from this regression is consistent for the estimand of the infeasible regression

Yi = β0 + β1Si +6∑

k=1

β3,kpk(θi) + γ′Xi + ui

The results from two different specifications of the feasible regression are reported in columns

(2) and (5) of Table 6. Column (2) estimates the regression with Si equal to the highest grade

completed. The estimated effect of an additional year of education is 4.2%. This is substantially

smaller than the 6.6% obtained in column (1) from a regression that excludes all six scores, Mki.

This provides evidence of a substantial ability bias. While both personality measures are statis-

tically significant, among the cognitive measures, only mathematical knowledge has a statistically

significant effect.

(1) (2) (3) (4) (5) (6)0.066 0.042 0.036

(0.009) (0.011) (0.011)0.310 0.217 0.200

(0.042) (0.047) (0.047)0.353 0.501 0.345 0.470

(0.137) (0.137) (0.131) (0.131)0.175 0.321 0.170 0.312

(0.145) (0.145) (0.147) (0.147)-0.233 -0.406 -0.264 -0.449(0.183) (0.183) (0.183) (0.183)-0.006 -0.139 0.044 -0.073(0.157) (0.157) (0.157) (0.157)0.613 0.878 0.648 0.924

(0.229) (0.229) (0.225) (0.225)0.167 0.295 0.173 0.304

(0.079) (0.079) (0.077) (0.077)

Table 6. The effect of education on earnings, white males.

Notes: The regressions are based on a sample of n=1,157 white males from the NLS79. The regressions controlled for urban residence, regional dummies, mother's and father's educational level, urban residence at age 14, and year dummies. Standard errors are included in parantheses. Columns (3) and (6) implement the bias correction as discussed in the text.

Highest Grade Completed

College

ASVAB - Math. KnowledgeASVAB - Paragraph ComprehensionASVAB - Word KnowledgeASVAB - Arithmetic ReasoningRosenberg Self Esteem

Rotter Locus of Control

The specification reported in column (5) has Si equal to a dummy for whether the individual

26

has attended college at some level. The estimated effect of college attendance is 21.7%. Again,

this is substantially smaller than the 31% obtained in column (4) from a regression that excludes

all six scores, Mki due to ability bias. The coefficients on the six ability measures are similar to

the coefficients in regression specification (2), with no change in the precision with which they are

estimated.

The number of items on each test ranges from 4 for the Rotter locus of control assessment to

35 for the word knowledge component of the ASVAB. I implement the procedure from Section 4 to

compute a bias correction. Table 7 reports estimates of ˆωk for each k. These first stage estimates are

then used to construct a bias correction factor. The bias-corrected estimates, reported in columns

(3) and (6) of Table 6, suggest a non-negligible bias in the estimates of the coefficients on the ability

measures. In fact, two of the three test scores that were insignificant in the feasible regression are

statistically significant after the bias correction. More importantly, the estimates of the education

effect are smaller (by ∼ 10%) after the bias correction.

items mean std. dev. ωASVAB - Math. knowledge 25 0.64 0.25 0.15ASVAB - Paragraph comp. 15 0.78 0.18 0.14ASVAB - Word knowledge 35 0.82 0.16 0.10ASVAB - Arith. reasoning 30 0.71 0.23 0.14Rosenberg Self Esteem 10 0.95 0.09 0.04Rotter Locus of Control 4 0.69 0.26 0.19

Table 7. Descriptive Statistics - Assessments

Notes: The sample consists of n=1,157 white males from the NLSY79. The last column reports estimates ofω, calculated as discussed in the text.

7 Conclusion

This paper shows how the measurement error bias resulting from the use of a test score as a

regressor can be eliminated, allowing for valid statistical inference. The asymptotic results for both

the uncorrected least squares estimator and the bias-corrected estimator are new to the literature,

though similar bias correction procedures have been suggested for nonlinear panel data models

27

with fixed effects. The importance of these results is demonstrated by the Monte Carlo studies in

Section 5 and in the empirical exercise in Section 6. These methods are easy to implement and

provide a useful tool for taking advantage of the full item level data that is increasingly available to

social science researchers. Extending these bias correction procedures to other econometric models

is an important avenue for future research.

28

Acknowledgements: This paper has benefited from discussion with Robert Phillips and from

correspondence with Dan Black and Lynne Schofield.

References

Aigner, D. J., C. Hsiao, A. Kapteyn, and T. Wansbeek (1984): “Latent variable models in

econometrics,” in Handbook of Econometrics, ed. by Z. Griliches, and M. D. Intriligator, vol. 2

of Handbook of Econometrics, chap. 23, pp. 1321–1393. Elsevier.

Arellano, M., and S. Bonhomme (2009): “Robust priors in nonlinear panel data models,”

Econometrica, 77(2), 489–536.

Ashenfelter, O., and A. Krueger (1994): “Estimates of the economic return to schooling from

a new sample of twins,” American Economic Review, 84, 1157–1183.

Ashenfelter, O., and C. Rouse (1998): “Income, schooling, and ability: Evidence from a new

sample of identical twins,” Quarterly Journal of Economics, 113, 253–284.

Becker, G. (1967): Human capital and the personal distribution of income: An analytical ap-

proach. University of Michigan Press, Ann Arbor, MI.

Behrman, J., P. Taubman, and T. J. Wales (1977): “Controlling for and measuring the effects

of genetics and family environment in equations for schooling and labor market success,” in

Kinometrics: The determinants of socioeconomic success within and between families, ed. by

P. Taubman, pp. 35–96. North-Holland, Amsterdam.

Bekker, P. A. (1994): “Alternative approximations to the distributions of instrumental variable

estimators,” Econometrica: Journal of the Econometric Society, pp. 657–681.

Blackburn, M. L., and D. Neumark (1993): “Omitted-ability bias and the increase in the

return to schooling,” Journal of Labor Economics, 11(3), 521–544.

Bollinger, C. R. (2003): “Measurement error in human capital and the black-white wage gap,”

Review of Economics and Statistics, 85(3), 578–585.

29

Card, D. (1999): “The causal effect of education on earnings,” Handbook of labor economics, 3,

1801–1863.

Carneiro, P., K. T. Hansen, and J. J. Heckman (2003): “2001 Lawrence R. Klein Lecture:

Estimating Distributions of Treatment Effects with an Application to the Returns to Schooling

and Measurement of the Effects of Uncertainty on College Choice,” International Economic

Review, 44(2), 361–422.

Carneiro, P., J. J. Heckman, and E. J. Vytlacil (2011): “Estimating Marginal Returns to

Education,” The American Economic Review, 101(6), 2754–2781.

Chamberlain, G. (1977): “Education, income, and ability revisited,” Journal of Econometrics,

5(2), 241 – 257.

Chamberlain, G., and Z. Griliches (1975): “Unobservables with a Variance-Components Struc-

ture: Ability, Schooling, and the Economic Success of Brothers,” International Economic Review,

16(2), 422–449.

Chesher, A. (1991): “The effect of measurement error,” Biometrika, 78(3), 451–462.

Cunha, F., J. J. Heckman, and S. M. Schennach (2010): “Estimating the technology of

cognitive and noncognitive skill formation,” Econometrica, 78(3), 883–931.

Douglas, J. (1997): “Joint consistency of nonparametric item characteristic curve and ability

estimation,” Psychometrika, 62, 7–28.

Frisch, R. (1934): Statistical confluence analysis by means of complete regression systems. Oslo:

Universitetets konomiske institutt.

Goldberger, A. S. (1972): “Structural Equation Methods in the Social Sciences,” Econometrica,

40(6), 979–1001.

Griliches, Z., and W. M. Mason (1972): “Education, income, and ability,” Journal of Political

Economy, 80, S74–S103.

30

Hahn, J., and G. Kuersteiner (2002): “Asymptotically unbiased inference for a dynamic panel

model with fixed effects when both n and T are large,” Econometrica, 70(4), 1639–1657.

Heckman, J., R. Pinto, and P. Savelyev (2013): “Understanding the Mechanisms Through

Which an Influential Early Childhood Program Boosted Adult Outcomes,” The American eco-

nomic review, 103(6), 2052–2086.

Heckman, J. J., L. J. Lochner, and P. E. Todd (2006): “Earnings functions, rates of re-

turn and treatment effects: The Mincer equation and beyond,” Handbook of the Economics of

Education, 1, 307–458.

Heckman, J. J., J. Stixrud, and S. Urzua (2006): “The Effects of Cognitive and Noncognitive

Abilities on Labor Market Outcomes and Social Behavior,” Journal of Labor Economics, 24(3),

411–482.

Heckman, J. J., S. Urzua, and E. Vytlacil (2006): “Understanding instrumental variables in

models with essential heterogeneity,” The Review of Economics and Statistics, 88(3), 389–432.

Junker, B., L. S. Schofield, and L. J. Taylor (2012): “The use of cognitive ability measures

as explanatory variables in regression analysis,” IZA Journal of Labor Economics, 1(1), 1–19.

Klepper, S., and E. E. Leamer (1984): “Consistent sets of estimates for regressions with errors

in all variables,” Econometrica: Journal of the Econometric Society, pp. 163–183.

Lord, F., and M. Novick (1968): Statistical theories of mental test scores. Addison-Wesley.

Neumark, D. (1999): “Biases in twin estimates of the return to schooling,” Economics of Educa-

tion Review, 18(2), 143–148.

Noyes, C. (1945): “Director’s Comment,” in Income from Independent Professional Practice, ed.

by M. Friedman, and S. S. Kuznets, pp. 405–410. National Bureau of Economic Research, New

York.

Ramsay, J. (1991): “Kernel smoothing approaches to nonparametric item characteristic curve

estimation,” Psychometrika, 56, 611–630.

31

Rosen, S. (1977): “Human capital: A survey of empirical research,” in Research in Labor Eco-

nomics, ed. by R. Ehrenberg, vol. 1, pp. 3–40. JAI Press, Greenwich, CT.

Schofield, L. S. (2014): “Measurement error in the AFQT in the NLSY79,” Economics Letters,

123(3), 262 – 265.

Schofield, L. S., B. Junker, L. J. Taylor, and D. A. Black (2014): “Predictive inference

using latent variables with covariates,” Psychometrika, pp. 1–21.

Stanek, K. C., W. G. Iacono, and M. McGue (2011): “Returns to education: what do twin

studies control?,” Twin Research and Human Genetics, 14(06), 509–515.

Stout, W. F. (1990): “A new item response theory modeling approach with applications to

unidimensionality assessment and ability estimation,” Psychometrika, 55(2), 293–325.

Williams, B. (2013): “A Measurement Model with Discrete Measurements and Continuous Latent

Variables,” unpublished manuscript, George Washington University.

(2014): “Identification of a Nonseparable Model with Item Response Data,” Working

paper, George Washington University.

(2015): “A Model with Regressors Generated from Many Measurements with an Appli-

cation to Item Response Data,” Working paper, George Washington University.

Wooldridge, J. M. (2009): “On estimating firm-level production functions using proxy variables

to control for unobservables,” Economics Letters, 104(3), 112–114.

A Proofs

Throughout this appendix I will use the notation an,J bn,J for sequences of random variables

an,Jn,J→∞ and bn,Jn,J→∞ to mean that an,J ≤ κbn,J for all n, J where κ is nonrandom and

does not vary with n or J . I use the matrix norm ||A|| := ||A||2 = maxv∈Rl |Av|2/|v|2 for a matrix

with l columns where | · |2 is the usual euclidean distance (l2 norm) for a finite vector space. This

32

norm is also equal to the largest eigenvalue of the matrix A. Because the relevant matrices operate

on finite vector spaces all matrix norms are equivalent. I will use this fact along with the Frobenius

norm, ||A||Frob. :=(∑

i,j A2i,j

)1/2.

First I need to show that Assumption 3.2 implies the following set of higher level regularity

conditions.

Assumption A.1. There exists J0 such that or all J ≥ J0,

(i) |βJ | ≤ b <∞.

(ii) ||MJ || ≤ λ <∞ and λMJmin ≥ λ > 0.

(iii) λΨJmin ≥ λ > 0.

(iv) E||WiJW′iJ − E(WiJW

′iJ)||2 ≤ C <∞.

(v) E(|WiJuiJ |2+δ) ≤ D <∞.

Lemma A.1. Under Assumption 3.1, Assumption 3.2 implies Assumption A.1.

Proof. First, |βJ | ≤ ||M∗−1J |||E(W ∗i Yi)| ≤ 1/λ

M∗Jmin

√E(|WiYi|2) ≤

√D2/λ by Assumptions 3.2(i)

and (iv).

Next, WiJuiJ = (W ∗iJ + (WiJ −W ∗iJ))(εiJ − (WiJ −W ∗iJ)′βJ) so that

|WiJuiJ |2+δ 1 + |W ∗iJεiJ |2+δ + |W ∗iJ |2+δ

since |WiJ−W ∗iJ | = |MiJ− pJ(θi)| ≤ 1. Then since |W ∗iJεiJ |2+δ |W ∗iJYi|2+δ+ |W ∗iJW ∗′iJ |2+δ|βJ |2+δ,

condition (v) in Assumption A.1 is implied by Assumptions 3.2(iii) and (iv) and the previous result

that |βJ | ≤ b <∞.

Next, under Assumption 3.1, MJ = M∗J+EJ where EJ is a matrix with zeros everywhere except

in the K + 2,K + 2 position where it is equal to E(M2iJ)− E(pJ(θi)

2). By the triangle inequality,

||MJ || ≤ ||M∗J || + ||EJ || and by Weyl’s theorem λMJmin ≥ λ

M∗Jmin + λEJmin. Then note that ||M∗J ||

E(||W ∗iJW ∗′iJ ||2+δ). Also, E(M2iJ) − E(pJ(θi)

2) = J−1ωJ > 0 implies that ||EJ || = J−1ωJ < J−1

and λEJmin = 0. Therefore, Assumption A.1(ii) follows from 3.2(i) and (iii).

33

Next, by matrix norm equivalence

||WiJW′iJ − E(WiJW

′iJ)||2 ||W ∗iJW ∗′iJ − E(W ∗iJW

∗′iJ)||2Frob. + (MiJ − E(MiJ))2

+∑k

(MiJXik − E(MiJXik))2 + (M2

iJ − E(M2iJ))2

Since 0 < MiJ < 1 the second and fourth terms are bounded and the third term is bounded

by the first. Since ||W ∗iJW ∗′iJ − E(W ∗iJW∗′iJ)||Frob. ||W ∗iJW ∗′iJ − E(W ∗iJW

∗′iJ)|| and ||W ∗iJW ∗′iJ −

E(W ∗iJW∗′iJ)||2 ||W ∗iJW ∗′iJ ||2 + ||M∗J ||2 it follows that condition (iv) in Assumption A.1 is implied

by Assumption 3.2(iii).

Finally, it can be shown that Assumptions 3.1 and 3.2 imply that ΨJ = ΨJ + F1J where

ΨJ = E(W ∗iJW∗′iJε

2iJ) and F1J is a matrix with entries f l,kJ such that maxl,k |f l,kJ | = O(J−1). As

a result, due to e.g. Gershgorin’s theorem, the minimum eigenvalue of F1J is also O(J−1) which

implies that for any ε > 0 there is a Jε such that for J ≥ Jε, λF1Jmin > −ε. Therefore, by Weyl’s

theorem λΨJmin is bounded away from 0 since λΨJ

min is. Furthermore, ΨJ = Ψ∗J + F2J where F2J =

E(W ∗iJW∗′iJ(E(Yi |W ∗iJ)−β′JW ∗iJ)) is positive definite and hence has all positive eigenvalues. Weyl’s

theorem can be applied again to conclude that the minimum eigenvalue of ΨJ is bounded from

below by the minimum eigenvalue of Ψ∗J which is bounded away from 0 by Assumption 3.2(ii).

Lemma A.2. Under Assumption A.1 ||M−1J −M

−1J || →p 0.

Proof. First note that ||M−1J −M−1

J || ≤ ||M−1J ||||M

−1J ||||MJ −MJ ||. Next, I can use the identity,

||(I−P )−1|| ≤ 11−||P || for a matrix P such that ||P || < 1 to show that ||M−1

J || ≤||M−1

J ||1−||M−1

J ||||MJ−MJ ||.

Therefore, for ε > 0,

Pr(||M−1J −M

−1J || > ε) ≤ Pr(||MJ −MJ || > 1/||M−1

J ||)

+ Pr(||MJ −MJ || > ε/(||M−1J ||ε+ ||M−1

J ||2))

By Assumption A.1(ii), ||M−1J || ≤ 1/λMJ

min < 1/λ < ∞. The result then follows from Assump-

tion A.1(iv) and Chebyschev’s inequality.

34

Finally, I show that the results of Theorem 3.1 hold under Assumptions 3.1 and A.1. Combined

with Lemma A.1 this proves Theorem 3.1.

Proof. Let MJ = n−1∑n

i=1 WiJW′iJ and MJ = E(WiJW

′iJ) and define ξiJ := WiJuiJ . Recall from

the text that βJ − βJ =)V ∗nJ +B∗nJ where

V ∗nJ := M−1J n−1

n∑i=1

(ξiJ − E(ξiJ))

B∗nJ := M−1J n−1

n∑i=1

E(ξiJ) = M−1J E(ξiJ)

where the equality holds because of the i.i.d. assumption. Also define VnJ := M−1J n−1

∑ni=1(ξiJ −

E(ξiJ))) and BJ := M−1J E(ξiJ).

Now to show the first result,

|√nV ∗nJ | ≤ ||M−1

J |||n−1/2

n∑i=1

(ξiJ − E(ξiJ))|

≤ (||M−1J ||+ ||M

−1J −M

−1J ||)|n

−1/2n∑i=1

(ξiJ − E(ξiJ))|

Then√nV ∗nJ = Op(1) as desired since ||M−1

J || = O(1) (by Assumption A.1(ii)), ||M−1J −M−1

J || =

op(1) (by Lemma A.2) and |n−1/2∑n

i=1(ξiJ − E(ξiJ))| = Op(1). The latter holds by Chebyschev’s

inequality since ξiJ − E(ξiJ) is i.i.d. by assumption and E|ξiJ − E(ξiJ)|2 E|ξiJ |2+δ 1 (by

Assumption A.1(v)).

Similarly,

|√nB∗nJ | ≤ ||M−1

J |||√nE(ξiJ)|

≤ (||M−1J ||+ ||M

−1J −M

−1J ||)|

√nE(ξiJ)|

= (||M−1J ||+ ||M

−1J −M

−1J ||)

√n

JβJ,K+2ωJ

where the equality follows from the derivation in the text. Since ωJ is a simple average of quantities,

E(pj(θi)(1 − pj(θi))), which are bounded by 1, ωJ = O(1) and√nJ and βJ,K+2 are each O(1) by

35

assumption I can conclude that√nB∗nJ = O(1) as desired.

I have shown that√n(βJ − βJ) = Op(1). To prove that Ω

−1/2nJ (βJ − βJ −BJ)→d N (0, I) I will

first show that (i) Ω−1/2nJ (V ∗nJ − VnJ) = op(1) and (ii) Ω

−1/2nJ (B∗nJ −BJ) = op(1). The desired result

follows from (iii) Ω−1/2nJ VnJ →d N(0, I).

I first prove (i). Since

|Ω−1/2nJ (V ∗nJ − VnJ)| = |Ω−1/2

nJ (M−1J −M

−1J )n−1

n∑i=1

(ξiJ − E(ξiJ))|

= |Ψ−1/2J MJ(M−1

J −M−1J )n−1/2

n∑i=1

(ξiJ − E(ξiJ))|

≤ ||Ψ−1/2J ||||MJ ||||M−1

J −M−1J |||n

−1/2n∑i=1

(ξiJ − E(ξiJ))|

≤ 1/

√λΨJmin||MJ ||||M−1

J −M−1J |||n

−1/2n∑i=1

(ξiJ − E(ξiJ))|

the result follows from Assumption A.1(ii) and A.1(iii) and Lemma A.2 as I have already shown

that |n−1/2∑n

i=1(ξiJ − E(ξiJ))| = Op(1).

Next I prove (ii). As in the previous argument note that

|Ω−1/2nJ (B∗nJ −BJ)| ≤ 1/

√λΨJmin||MJ ||||M−1

J −M−1J |||√nE(ξiJ)|

Again the result follows since ||M−1J − M−1

J || = op(1) by Lemma A.2 and the other terms are

bounded in probability.

Now to prove (iii) I will first apply the Lindeberg CLT (Billingsley, 1995, p. 369) to∑n

i=1 νiJ

where νiJ = t′Ω−1/2nJ M−1

J (ξiJ − E(ξiJ))/n for any fixed vector t ∈ RK+2. Then I will appeal to the

Cramer-Wold device.

First note that E(νiJ) = 0 by construction. Second, by definition ΨJ = E((ξiJ −E(ξiJ))(ξiJ −

E(ξiJ))′) so E(ν2iJ) = n−2t′Ω

−1/2nJ M−1

J ΨJM−1J Ω

−1/2nJ t = n−1t′t <∞.

Now define s2n =

∑ni=1E(ν2

iJ) = t′t. The next step is to verify the Lyapunov condition:

36

1s2+δn

∑ni=1E(|νiJ |2+δ) = nE(|νiJ |2+δ)/(t′t)2+δ → 0. To verify this, note that

nE(|νiJ |2+δ) = n−δ/2E(|t′(Ψ−1/2J MJ)M−1

J (ξiJ − E(ξiJ))|2+δ)

= n−δ/2E(|t′Ψ−1/2J (ξiJ − E(ξiJ))|2+δ)

So the goal is to show that E(|t′Ψ−1/2J (ξiJ − E(ξiJ))|2+δ) < M < ∞. Noting that |t′Ψ−1/2

J (ξiJ −

E(ξiJ))| ≤ t′t√λ

ΨJmin

|ξiJ −E(ξiJ)|, the result follows by Assumptions A.1(iii) and A.1(v) since E|ξiJ −

E(ξiJ)|2+δ E|ξiJ |2+δ.

37

Documents

Bias correction for the use of educational and ...home.gwu.edu/~bdwilliams/Site/research_files/tsregressions_v1_051415.pdfassessments as covariates in linear regression Benjamin Williams