Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Bias correction for the use of educational and psychological
assessments as covariates in linear regression
Benjamin Williams∗
June 2, 2015
Abstract
The use of aggregate scores on item response assessments as a proxy for an underlying trait
in an econometric model generally produces biased estimators. I show that if the number of
items on the assessment is proportional to the square root of the sample size then standard
inferential procedures for the linear regression model are incorrect. I propose a bias-corrected
estimator based on nonparametric estimation of the bias term which produces valid inference
under relatively weak conditions. I also demonstrate the finite sample performance in a Monte
Carlo study and implement the procedure for a wage regression using data from the NLSY 1979.
JEL codes: C14, C38, C39, C55
Keywords: bias correction, test scores, cognitive and noncognitive ability, measurement error,
item response
∗Department of Economics, George Washington University, 2115 G St. NW, Washington DC 20052, phone: (302)994-6685, email: [email protected].
1 Introduction
Item response data consist of the discrete responses of a sample of individuals to several items
on a test or questionnaire. Cognitive ability tests, such as an IQ test, standardized academic
achievement tests, personality assessments, and political opinion polls can be of this form. The
standard psychological testing theory underlying the construction of these tests treats each item as a
noisy signal of a latent variable, or variables (Lord and Novick, 1968). Typically the latent variable
is of primary interest, not the individual items or the test score. It is a common practice among
economists and other social scientists to use the aggregate score in place of the latent variable in
estimating an econometric model (see Junker, Schofield, and Taylor (2012) for a discussion). In this
paper I study the properties of this procedure and propose a strategy for correcting the bias in the
asymptotic distribution. The results use a theoretical framework where the number of items grows
asymptotically. To make the discussion more concrete, I focus on a particular linear regression
model where the scores are used as a covariate in place of the latent variable. See Williams (2013,
2014) for results regarding a more general class of models. See also Williams (2015) for related
results.
In a linear regression with classical measurement error the resulting bias is proportional to
the variance of the measurement error. There are two common methodological responses to this.
The first aims to place bounds on the variance of the measurement error (Frisch, 1934; Klepper
and Leamer, 1984). The second uses more information in the form of additional measurements,
instrumental variables, or higher order moments to eliminate the bias using two stage least squares,
method of moments, or structural equation methods (see Aigner, Hsiao, Kapteyn, and Wansbeek
(1984) for a summary). Alternatively, Chesher (1991) provides a “small variance” approximation
to the bias that can be used to perform a sensitivity analysis.
The assumptions of the standard item response model considered in this paper lead to a model of
classical measurement error in the linear regression outcome model. Both of the above approaches
to classical measurement error has been employed in empirical studies using item response data.
Bollinger (2003) uses the Klepper and Leamer (1984) bounds to study the black-white wage gap,
1
controlling for ability. Goldberger (1972),Chamberlain and Griliches (1975), and others have em-
ployed structural equation modeling in this setting and, more recently, more sophisticated methods
for using additional measurements in a factor analytic approach in linear and nonlinear models have
been developed (Cunha, Heckman, and Schennach, 2010; Heckman, Pinto, and Savelyev, 2013). In
addition there are many empirical studies that employ an aggregate measurement of item response
data without addressing the measurement error in the statistical analysis at all, often because data
on each individual item is not available, but also appealing to the fact that the variance of the
measurement error should be small if the number of items is large.
In the spirit of Chesher (1991), I study the bias under an asymptotic framework where J ,
the number of items, grows along with the sample size, n. Within this framework, each of the
common approaches described can be justified. The estimator will be consistent regardless of how
quickly J grows with n. If J grows sufficiently quickly relative to n then the bias can be ignored
asymptotically. However, if√n/J → γ > 0 then the OLS estimator is biased asymptotically
and standard inference is invalid.1 If J grows too slowly with n then either a bounds analysis or
instrumental variable approach is necessary.2 The bias-correction procedure I propose eliminates
the bias under intermediate rates where J neither grows sufficiently quickly nor too slowly. I
consider such asymptotic sequences in order to provide a better approximation to the finite sample
behavior of the estimator. Alternative asymptotic sequences have been used similarly in other
contexts (Bekker, 1994; Hahn and Kuersteiner, 2002). Monte Carlo results demonstrate that the
asymptotic analysis presented in this paper provides a better approximation to the finite sample
behavior than the standard fixed J asymptotics.
Another approach, advocated by Schofield, Junker, Taylor, and Black (2014) is to model the
individual item responses simultaneously. Their approach does not require asymptotics where J
grows because the latent factor is treated as a random effect with a known distribution. If the
distribution of the latent factor is misspecified then their results will still exhibit bias. In the
context of a related panel data model, Arellano and Bonhomme (2009) show that random effects
1The asymptotic distribution is normal but is not centered at the true parameter value.2These approaches are also able to address problems such as failure of the conditional independence assumption.
I am not arguing that they are only necessary if J is too small.
2
models can be interpreted as a bias reduction procedure relative to a fixed effect model but only if
the specified distribution happens to be from a class of bias-reducing priors. Moreover, Schofield,
Junker, Taylor, and Black (2014) require a fully parametric model which I am able to avoid. This
paper offers an alternative empirical approach which relies on a theoretical framework similar to the
nonlinear panel data framework used in Arellano and Bonhomme (2009) and elsewhere. The bias
correction derived in this paper reduces the bias from O(J−1) to O(J−p) where p ∈ (1, 2] depends
on the quality of the bias-correction, meaning that valid inference is possible if√n/Jp → 0. On
the other hand, the present approach requires nonparametric estimation and focuses on a reduced
form parameter that may be hard to motivate.3
The rest of the paper is structured as follows. Section 2 outlines the main results of the paper.
In Section 3, I derive the bias of OLS estimator, propose a bias-corrected estimator, and prove its
root-n consistency and asymptotic normality. In Section 4, an extension to multiple latent factors
is considered. In section 5, I study the improved performance of the bias-corrected estimator via
Monte Carlo simulations. In section 6, the proposed methods are used to estimate the return to
education controlling for various dimensions of ability. Section 7 concludes.
2 The biased OLS estimator and a bias-corrected estimator
In this section I outline the main results of the paper. In the next section I provide the details
of these results. Consider a vector of J binary response variables, or items, Mi = (Mi1, . . . ,MiJ)′
which are mutually independent conditional on θi, an unobserved variable, or factor. While the
main results of the paper assume that θi is a scalar, Section 4 briefly discusses the case where θi is
a vector.
Let MiJ := J−1∑J
j=1Mij denote the average response of individual i across the J items. This
average response is often available when the full vector of responses, Mi, is not. Since Mi =
(Mi1, . . . ,MiJ)′ are mutually independent conditional on θi, the average response, MiJ , should
not be considered a proxy variable (in the sense of Wooldridge (2009), for example). Instead, if
3See the discussion in the next section for the latter issue. Also see the companion paper, Williams (2015), whereI address this issue.
3
pJ(θi) := J−1∑J
j=1 pj(θi), where pj(θi) := E(Mij | θi), then MiJ = pJ(θi)+ηiJ where E(ηiJ | θi) =
0.
Consider the regression equation
Yi = βJ,1 + β′J,2Xi + βJ,3pJ(θi) + εiJ (1)
for a K × 1 vector of observed explanatory variables, Xi, and an outcome of interest, Yi where
E(XiεiJ) = E(pJ(θi)εiJ) = 0. This paper shows that the least squares estimator that uses MiJ as
a covariate is consistent for βJ = (βJ,1, β′J,2, βJ,3)′ as n, J →∞. Thus this paper provides a rigorous
analysis of what this common approach actually estimates.
Generally the latent variable will not enter the model as specified in equation (1). If the
econometric model is
Yi = β1 + β′2Xi + β3g(θi) + εi, E(εi | Xi, θi) = 0 (2)
then in general βJ −β 6= 0 represents a misspecification bias. In this paper, by studying estimation
of βJ , I separate the issue of measurement error bias from the separate problem of the correct
specification of the regression model.
There are several reasons for addressing the measurement error bias but not the misspecfication
bias. When β2 is the primary object of interest, if pJ(θi) = aJ + bJg(θi) then βJ,2 = β2. In Monte
Carlo simulations I show that when g(θi) = θi and the items are generated by a probit model it is
often the case that the misspecification bias is dominated by the measurement error bias because
pJ(·) is approximately linear in θi. Moreover, even when the misspecification error is substantial
it is in the same direction as the measurement error bias so that the bias correction proposed
in this paper provides an improvement in the mean squared error when β2 is the parameter of
interest. Addressing the misspecification bias either requires a parametric model, as in Schofield,
Junker, Taylor, and Black (2014), or calls for a more involved theoretical argument and a more
complex estimation procedure (Williams, 2015). Williams (2015) also finds that correcting the
misspecification bias introduces an additional term in the asymptotic variance.
4
Let βJ denote the OLS estimator obtained by regressing Yi on Xi and Mi,
βJ =
(n∑i=1
WiJW′iJ
)−1 n∑i=1
WiJY′i
where WiJ = (1, X ′i, MiJ)′. Taking the limit where n, J → ∞, under mild regularity conditions,
βJ is consistent for βJ . That is, βJ − βJ →p 0. However, if J increases too slowly relative to n
in the asymptotics then a bias shows up in the limiting distribution. I show in Section 3 that if
the asymptotic sequence satisfies 0 < γ := lim√n/J < ∞ then
√n(βJ − βJ) →d N(γB,V). If
the asymptotic bias, γB, is ignored, the usual t-statistics and confidence intervals will be invalid,
despite the fact that βJ is consistent.
The bias term can be written as B = − limJ→∞ βJ,3ωJM−1J eK+2 where eK+2 denotes the unit
vector, (0, . . . , 0, 1)′, MJ = E(WiJW′iJ), and ωJ = J−1
∑Jj=1 ωj where ωj = E(pj(θi)(1 − pj(θi))).
This last object, ωJ , is the average conditional variance of MiJ , or the variance of the measurement
error, V ar(ηiJ). The asymptotic bias can be eliminated if consistent estimation of ωJ is possible.
If the individual items, Mi1, . . . ,MiJ are observed then pj(·) can be estimated using the non-
parametric method studied in Williams (2013), which was previously proposed by Ramsay (1991)
and Douglas (1997). An estimate of the function pj(·) can in turn be used to construct an estimate
of ωj , for each j.4 Given estimates ωj of ωj for each j, I can construct ˆωJ = J−1∑J
j=1 ωj . If
| ˆωJ − ωJ | →p 0 then the bias-corrected estimator,
βBCJ := βJ + βJ,3 ˆωJM−1J eK+2,
satisfies√n(βBCJ − βJ) →d N(0,V). Hence standard statistical inference based on βBCJ will be
valid. In addition, this bias correction does not lead to a loss of efficiency asymptotically.
4In fact, because ωj is an average, it is sufficient to estimate a rescaled version of pj(·) which simplifies this firststep estimator.
5
3 Asymptotic results
In this section I discuss the assumptions of the model and state formally the results laid out
in the previous section. I start by discussing the asymptotic distribution of the feasible OLS
estimator. Then I derive conditions under which a bias-corrected estimator is root-n consistent
and asymptotically centered at 0. Finally, I propose an estimator for ωJ and show that this can be
used to construct a bias-corrected estimator satisfying these conditions.
3.1 Feasible OLS estimator
Let βJ be the vector b that minimizes MSE(b) = E (Yi − b′W ∗iJ)2 where W ∗iJ = (1, X ′i, pJ(θi))′, and
let εiJ = Yi−W ∗′iJβJ so that equation (1) holds with E(W ∗iJεiJ) = 0. Then Yi = W ′iJβJ +uiJ where
uiJ = εiJ − (WiJ −W ∗iJ)′βJ and the vector (WiJ −W ∗iJ) is only nonzero in its last entry where it
is equal to ηiJ . I make the following assumptions on the model.
Assumption 3.1. There exists θi ∈ R such that
(i) For each J , Yi ⊥⊥ Mi | Xi, θi.
(ii) For each J , Mi1, . . . ,MiJ are mutually independent conditional on Xi, θi.
(iii) For any j, E(Mij | Xi, θi) = E(Mij | θi).
Under these conditions, E(WiJuiJ) = O(J−1), and hence βJ can be consistently estimated as
n, J →∞ since E(WiJW′iJ)−1E(WiJYi) = βJ+O(J−1). Condition (i) requires that the individual’s
mean performance on the test or questionnaire, Mi, does not influence the outcome except through
the latent factor θi, or, while individual items may be related to the outcome, any effect of the
average response on Yi is explained by variation in Xi. This could be weakened to a conditional
mean independence; I use the full conditional independence to more easily analyze second (and
higher) order terms in the asymptotic expansion. Condition (ii) is standard in the item response
literature. Limited conditional dependence among the responses can be allowed but I do not pursue
this here to simplify the exposition. For example, Stout (1990) replaces this condition with a higher
level “essential conditional independence”. Condition (iii) assumes that the control variables Xi
6
do not influence the response variables. If condition (iii) does not hold then all of the same results
carry through with W ∗iJ replaced by (1, X ′i, pJ(Xi, θi))′.
To derive the main result, first note that the estimation error can be written as
βJ − βJ =
(n−1
n∑i=1
WiJW′iJ
)−1
n−1n∑i=1
(WiJuiJ − E(WiJuiJ))
+
(n−1
n∑i=1
WiJW′iJ
)−1
n−1n∑i=1
E(WiJuiJ)
Suppose (Yi, X′i,Mi1, . . . ,MiJ , θi) is i.i.d. so that I can define MJ = E(WiJW
′iJ), RJ = E(WiJuiJ),
and ΨJ = E(WiJuiJ − E(WiJuiJ))(WiJuiJ − E(WiJuiJ))′. Then
βJ − βJ = n−1n∑i=1
ψiJ +BJ + op(n−1/2)
where E(ψiJ) = 0, E(ψiJψ′iJ) = n−1M−1
J ΨJM−1J , and BJ := M−1
J RJ .
By Assumption 3.1, E(WiJεiJ) = E(E(WiJεiJ | Xi, θi)) = E(W ∗iJεiJ) = 0. Therefore,
RJ = E(WiJεiJ)− E(WiJ(WiJ −W ∗iJ)′βJ)
= −E((WiJ −W ∗iJ)(WiJ −W ∗iJ)′)βJ
Only the last component of WiJ−W ∗iJ is nonzero so RJ = −βJ,3E(η2iJ)eK+2. Finally by Assumption
3.1(ii),
E(η2iJ) = E
J−1J∑j=1
(Mij − pj(θi))
2
= J−2J∑j=1
E(Mij − pj(θi))2 = J−1ωJ
Therefore, if M−1J , βJ , and ωJ are bounded as J → ∞ then BJ = O(J−1), and if
√n/J → γ
then√nBJ → γB.
Next I will introduce notation that is used to state the regularity assumptions of the model.
7
Define M∗J = E(W ∗iJW∗′iJ) and Ψ∗J := E(W ∗iJW
∗′iJν
2i ) where νi := Yi −E(Yi | Xi, θi). For a matrix A
let λAmin denote the minimum eigenvalue of A. Also let ||A|| denote the operator norm induced by
the Euclidean norm. I assume the following regularity conditions.
Assumption 3.2. For all J ,
(i) λM∗Jmin ≥ λ1 > 0.
(ii) λΨ∗Jmin ≥ λ2 > 0.
(iii) E(||W ∗iJW ∗′iJ ||2+δ) ≤ D1 <∞.
(iv) E(|W ∗iJYi|2+δ) ≤ D2 <∞.
The conditions are stated in terms of the components of the infeasible regression rather than
the feasible regression, that is, in terms of the distribution of (Yi,W∗′iJ) rather than the distribu-
tion of (W ′iJ , εi)′. These are standard regularity conditions under which asymptotic normality of
the infeasible least squares estimator follows. The first theorem states formally the asymptotic
distribution of the feasible least squares estimator under these conditions.
Theorem 3.1. Under Assumptions 3.1 and 3.2, if (Yi, X′i,Mi1, . . . ,MiJ , θi) is i.i.d. and n, J →∞
such that√n/J → γ, 0 < γ <∞ then
√n(βJ − βJ) = Op(1). Moreover,
Ω−1/2nJ (βJ − βJ −BJ)→d N(0, I)
so that√n(βJ − βJ)→d N(γB,V)
where BJ = M−1J RJ , B = limJ→∞ JBJ and V = limn,J→∞ nΩnJ .
Proof. See Appendix A.
Assumption 3.2 is a somewhat high level set of conditions. Since pJ(θi) is constrained to
the interval [0, 1] conditions (iii) and (iv) can be replaced by an assumption that the moments
E(||XiX′i||2+δ) and E(|XiYi|2+δ), which do not vary with the sample size, are finite, where Xi =
8
(1, X ′i)′. Conditions (i) and (ii) cannot so easily be replaced by lower level conditions. Before
discussing a bias-corrected estimator, I provide one sufficient set of conditions for (i) and (ii) of
Assumption 3.2.
Assumption 3.3. For all J ,
(i) E(XiX′i) is positive definite
(ii) V ar(Yi | Xi, θi) = V ar(Yi)
(iii) V ar(pJ(θi) | Xi) is bounded away from 0 as J →∞
(iv) E(||XiX′i||2+δ) <∞ and E(|XiYi|2+δ) <∞
Condition (iii) can be satisfied in several ways. If the support of θi is finite and the functions
pj(·) are each differentiable and monotonically increasing with derivatives uniformly boundedly
away from 0 then the condition holds as long as V ar(θi | Xi) is nonzero.
Alternatively, if there is a subset of the support, Θ0 ⊂ Supp(θi) such that the functions pj(·)
are each differentiable and monotonically increasing with derivatives uniformly boundedly away
from 0 on the space Θ0, then condition (iii) holds as long as V ar(θi | Xi, θi ∈ Θ0) is nonzero.
Finally, condition (iii) could instead be stated in terms of the parameters of familiar parametric
measurement models.
3.2 Bias correction
In this section I show how the OLS estimator can be asymptotically recentered. First I state an
immediate corollary to Theorem 1 which provides conditions under which a consistent estimator
for BJ can be used to correct the OLS estimator.
Theorem 3.2. Under the conditions of Theorem 3.1, if BJ is an estimator of the bias BJ such
that |JBJ − JBJ | = op(1) then Ω−1/2nJ (βJ − BJ − βJ)→d N(0, I). In addition, under Assumptions
3.1 and 3.2, if (Yi, X′i,Mi1, . . . ,MiJ , θi) is i.i.d., n, J → ∞ such that
√n/J → ∞, and if BJ is
an estimator of the bias BJ such that |JBJ − JBJ | = Op(n−δ) + Op(J
−α) for 0 < α < 1 and
α/(1 + α) ≤ 2δ then√n(βJ − BJ − βJ)→d N(0,V) if
√n/J1+α → 0.
9
Proof.
Ω−1/2nJ (βJ − BJ − βJ) = Ω
−1/2nJ (βJ −BJ − βJ) + Ω
−1/2nJ (BJ −BJ)
By Theorem 3.1, the first term converges in distribution to a standard normal. It is also shown
in the proof of Theorem 3.1 that Ω−1/2nJ = O(
√n). Therefore,
Ω−1/2nJ (BJ −BJ) = O(1)
√n
J(JBJ − JBJ)
If√n/J → γ < ∞ then the right hand side converges in probability to 0 provided that (JBJ −
JBJ) = op(1). This proves the first part of the theorem.
Under the second set of conditions in the theorem,
Ω−1/2nJ (BJ −BJ) = O(1)
√n
J(JBJ − JBJ)
= Op
(n1/2−δ
J
)+Op
(n1/2
J1+α
)= op(1)
which proves the second result.
Recall now that BJ = J−1βJ,3ωJM−1J eK+2. Therefore the estimator of the bias takes the form
J−1βJ,3 ˆωJM−1J eK+2 and hence
|JBJ − JBJ | = Op(n−1/2) +Op(| ˆωJ − ωJ |)
Theorem 3.2 demonstrates that the more quickly J grows with n, the less precise estimation of
ωJ needs to be. For example, if√n/J → c then JBJ merely needs to be consistent. On the
other hand, if J is small such an asymptotic sequence may understate the bias and an asymptotic
sequence such that√n/J1+α0 → c for some 0 < α0 < 1 may be more appropriate. In this case
ωJ must be estimated more precisely. If ωJ can be estimated very precisely in the sense that the
condition is satisfied with α = 1 and δ = 1/2 then correct asymptotic inference is possible even
under an asymptotic sequence where√n/J2 → c < ∞. The bias correction can be interpreted as
10
reducing the bias from the order of 1/J to the order of 1/J1+α.
I will now provide an estimator of ωJ , prove that it is consistent, and provide a bound on the
convergence rate as required for the second part of Theorem 3.2.
3.2.1 Consistent estimation of ωJ
For any j let Mi,−j denote the average response excluding item j and let pJ,−j(·) denote the
associated conditional expectation function, E(Mi,−j | θi = ·). Then for a given j consider the
conditional expectation function E(Mi,j | Mi,−j = m). If pJ,−j(·) is one-to-one and therefore
has a unique inverse, denoted p−1J,−j , it can be shown that, under sufficient regularity conditions,
E(Mi,j | Mi,−j = m) ≈ pj(p−1J,−j(m)) where the approximation holds as J →∞. Define p∗j,J(m) :=
pj(p−1J,−j(m)) and ω∗j,J(·) = p∗j,J(·)(1 − p∗j,J(·)). Since Mi,−j ≈ pJ,−j(θi) it can also be shown that,
under sufficient regularity conditions, ω∗j,J(Mi,−j) ≈ ωj(θi) := pj(θi)(1− pj(θi)).
This argument suggests the following estimation strategy. First for each j estimate the condi-
tional expectation functions, p∗j,J(·), denoting the estimators by p∗j,J(·), then estimate ωJ by
ˆωJ = J−1n−1n∑i=1
J∑j=1
ω∗j,J(Mi,−j)
where ω∗j,J(·) = p∗j,J(·)(1− p∗j,J(·)).
The only missing step is how to estimate p∗j,J(·). This can be estimated nonparametrically by a
kernel regression of Mj on M−j . Kernel regression estimators for nonparametric IRT models have
been proposed by Stout (1990) and Douglas (1997). The proposed estimator can be defined by
p∗j,J(m; h) where h is a (possibly data-dependent) bandwidth parameter and
p∗j,J(m;h) =
∑ni=1Kh(Mi,−j −m)Mij∑ni=1Kh(Mi,−j −m)
, Kh(u) := h−1K(u/h)
where K(·) is a compactly supported kernel function.
Assumption 3.4. For ε, ε′ > 0 the following conditions hold.
(i) maxj supm∈[0,1]
∣∣∣ ∂∂mω∗j,J(m)∣∣∣ is bounded
11
(ii) maxi,j |Mi,−j − pJ,−j(θi)| = Op(J−1/2+ε)
(iii) maxj supm∈[0,1] |p∗j,J(m)− p∗j,J(m)| = Op(J−1/2+ε) +Op(n
−1/3+ε′)
Condition (i) can alternatively be stated in terms of the individual functions pj . It will generally
follow under the types of conditions considered after Theorem 3.1. Conditions (ii) and (iii) can be
prove under quite general conditions by slight modifications of arguments in Williams (2013). The
desired result follows immediately from these high level assumptions.
Theorem 3.3. Under Assumptions 3.1 and 3.4, | ˆωJ − ωJ | = Op(J−1/2+ε) +Op(n
−1/3+ε′).
Proof. The result follows from the decomposition,
|ωJ − ωJ | ≤1
nJ
∑i,j
ω∗j,J(Mi,−j)− ω∗j (Mi,−j)
+
1
nJ
∑i,j
ω∗j,J(Mi,−j)− ω∗j,J(p−j(θi))
+
1
n
∑i
ωJ(θi)− E(ωJ(θi))
where I have used the fact that ω∗j,J(p−j(θi)) = ωj(θi). Then the third term is O(n−1/2) since the
θi is i.i.d. across i. Further, |ω∗j,J(Mi,−j) − ω∗j (Mi,−j)| |p∗j (Mi,−j) − p∗j (Mi,−j)| since p∗j and p∗j,J
are bounded. So the first term is Op(J−1/2+ε) +Op(n
−1/3+ε′) by Assumption 3.4(iii).
Lastly, |ω∗j,J(m′)− ω∗j,J(m)| ≤ supm
∣∣∣ ∂∂mω∗j,J(m)∣∣∣ |m′ −m|. So the second term in the decompo-
sition is Op(J−1/2+ε) by Assumption 3.4(ii).
Therefore, the bias-corrected estimator described in this section will have an asymptotic dis-
tribution centered at 0, as stated in Theorem 3.2, if for some√n/J → γ < ∞ but also if J is
O(n1/3+ε).
4 Multidimensional latent trait
In this section I extend the results above to allow for multiple latent factors. I first provide a very
general result about what a feasible regression estimates as J →∞. Then, under more restrictive
12
conditions, I show how a bias correction can be implemented. Lastly I discuss the case of “dedicated
measurements” where each item is a signal of a single dimension of the latent variable.
4.1 Feasible Least Squares Regression
Suppose J binary test items are used to construct LM averages, Mi1, . . . , MiLM . The lth average
is constructed as Mil = J−1∑J
j=1wljMij where wljJj=1 is a sequence of nonrandom weights for
each l. First I will consider the properties of the least squares estimator, βJ , from the regression
of Yi on Wi = (1, X ′i, Mi1, . . . , MiLM )′ when Assumption 3.1 is replaced by the following.
Assumption 4.1. There exists θi ∈ RLθ such that
(i) Yi ⊥⊥ Mil | Xi, θi for each l = 1, . . . , LM .
(ii) Mi1, . . . ,MiJ are mutually independent conditional on Xi, θi.
(iii) For any j, E(Mij | Xi, θi) = E(Mij | θi).
Define the conditional mean functions pJ,l(h) = E(Mil | θi = h). Under regularity con-
ditions stated below βJ will converge to βJ , the estimand of the regression of Yi on W ∗iJ =
(1, X ′i, pJ,1(θi), . . . , pJ,LM (θi))′. If√n/J → γ for some 0 < γ <∞, there will be an asymptotic bias
that can be written as
BJ =E(WiJW
′iJ)−1
V ηJ βJ (3)
V ηJ is a matrix with entries V η
J [s, t] such that (i) V ηJ [s, t] = 0 if s or t is less than K + 2 and (ii) for
l, l′ ∈ 1, . . . , LM, V ηJ [K + 2 + l,K + 2 + l′] = J−1ωJ,l,l′ where
ωJ,l,l′ = J−1J∑j=1
wljwl′jωj
To state the regularity conditions and describe the asymptotic variance I first define the residuals
εiJ and uiJ and the moment matrices MJ ,ΨJ ,ΩnJ , M∗J , and Ψ∗J as before except now in terms of
WiJ = (1, X ′i, Mi1, . . . , MiLM )′ and W ∗iJ = (1, X ′i, pJ,1(θi), . . . , pJ,LM (θi))′.
13
Assumption 4.2. For all J
(i) λM∗Jmin ≥ λ1 > 0.
(ii) λΨ∗Jmin ≥ λ2 > 0.
(iii) E(||W ∗iJW ∗′iJ ||2+δ) ≤ D1 <∞.
(iv) E(|W ∗iJYi|2+δ) ≤ D2 <∞.
This assumption is identical to Assumption 3.2 except that WiJ and W ∗iJ now are defined
to include LM different average response variables and LM different transformations of Lθ latent
factors, respectively . This distinction is important for example in interpreting condition (i) – in the
event that Lθ < LM condition (i) can easily fail depending on the relationship among the functions
pl(·). The asymptotic distribution of βJ can then be derived under Assumptions 4.1 and 4.2 as an
extension of the arguments in the proof of Theorem 3.1.
Theorem 4.1. Under assumptions 4.1 and 4.2, if (Yi,Mi1, . . . ,MiJ , X′i, θ′i) is i.i.d. and n, J →∞
such that√n/J → γ where 0 < γ <∞ then
√n(βJ − βJ) = Op(1). Moreover,
√n(βJ − βJ)→d N(B,V)
where
B =
V =
4.2 Bias correction
If Lθ = LM then the bias correction method proposed in Section 4.2 can be generalized to the
multidimensional case.5 I assume that the model satisfies the following injectivity assumption.
5If Lθ < LM then the bias correction procedure proposed may be valid after regrouping the J items into only Lθgroups.
14
Assumption 4.3. Define the function pJ : RLθ → RLM by pJ(t) = (pJ,1(t), . . . , pJ,LM (t)). Suppose
Lθ = LM and that p is one-to-one.
The proposed estimation procedure can be separated into three stages. First, estimate the
kernel regression of Mi,j on Qi,j := (Mi1,−j , . . . , MiJM ,−j) and denote this p∗j,J(·). Second, form
ωjJ = n−1∑n
i=1 p∗j,J(Qi,j)(1 − p∗j,J(Qi,j)). Third, after completing the previous steps for each
j = 1, . . . , J , form ˆωJ = J−1∑J
j=1wljwl′jωj . To state the conditions needed for this to lead to
asymptotic bias correction I define p∗j,l(q) = pj(p−1J,l,−j(q)) and ω∗j,l(q) = p∗j,l(q)(1− p∗j,l(q)).6
Assumption 4.4. For ε, ε′ > 0 the following conditions hold.
(i) maxj,l supq∈[0,1]LM
∣∣∣ ∂∂mω∗j,l(q)∣∣∣ is bounded
(ii) supi,j,l |Mi,l,−j − pJ,l,−j(θi)| = Op(J−1/2+ε)
(iii) supi,l |Mi,l − pJ,l(θi)| = Op(J−1/2+ε)
(iv) for some p > 0, maxj,l supq∈[0,1]LM |p∗j,l(q)− p∗j,l(q)| = Op(J−1/2+ε) +Op(n
−p+ε′)
As stated in the following theorem, under these regularity conditions the bias corrected estimator
will exhibit negligible bias asymptotically.
Theorem 4.2. Under Assumptions 4.1 - 4.4, | ˆωJ − ωJ | = Op(J−1/2+ε) + Op(n
−p+ε′) and, conse-
quently,√n(βJ − BJ − βJ)→d N(0,V)
4.3 Interpretation and Dedicated Items
Additional structure can be used to simplify the bias correction procedure. The approach described
here is used in the application in the following section.
Suppose that, for each j, there is some dimension of the latent vector, denoted lj , such that
pj(·) varies only with θi,lj so that pj(θi) = pj(θi,jl). Then Lθ response averages can be computed
where wlj = 0 if l 6= lj and wljj = 1. With this structure, V ηJ [K + 2 + l,K + 2 + l′] = 0 if l 6= l′.
6The function p−1J,l,−j is the inverse of the function pJ,l,−j which is the vector of functions (pJ,1, . . . , pJ,LM ) with
pJ,l replaced by pJ,l,−j .
15
So the bias correction only requires estimating the Lθ terms, ωJ,l,l. Moreover, the first stage now
will only require a nonparametric regression in one dimension and as a result will avoid the curse
of dimensionality.
In addition, with this additional structure the infeasible regression model,
Yi = βJ,1 + β′J,2Xi +
Lθ∑l=1
βJ,3,lpJ,l(θi,l) + εiJ ,
differs from the causal model,
Yi = βJ,1 + β′J,2Xi +
Lθ∑l=1
β3,lθi,l + εiJ ,
only to the extent that the scalar functions pJ,l are not well approximated by a linear function.
5 Monte Carlo Simulations
To study the finite sample properties of the OLS estimator and the bias-corrected estimator a
series of Monte Carlo exercises are performed. I report the bias, standard deviation, and root
mean squared error of the coefficient on a scalar covariate Xi. I also assess the relative size of
the measurement error bias and the misspecification bias. I report results from a wide range of
examples of the item response model.
The results demonstrate the following main results. First, the feasible OLS estimator can exhibit
a substantial bias, even when J is as large as 50 or 100. Second, the bias corrected estimator
dominates the uncorrected estimator, though it does not eliminate the bias entirely unless J is
large. The misspecification bias is dominated by the measurement error bias in a wide range of
models, though I also find that the misspecification bias can be substantial depending on the shape
of pJ , as described below.
16
In the simulations the data are generated by the model
Yi = 0.25Xi + 0.5θi + εi
Mij = 1(γ0j + γ1jθi ≥ σηij)
where Wi ∼ Bernoulli(0.5), θi | Wi = w ∼ N(µw, σ2θ), εi ∼ N(0, 1), and ηij ∼ N(0, 1). The
parameters µw and σ2θ are chosen so that E(θi) = 0, V ar(θi) = 1, and E(θi | Xi = 1) − E(θi |
Xi = 0) = 1. I simulate various models that allow γ0j , γ1j and σ to vary. In all models I set
n equal to 50, 100, 500, 1000, 5000 and J equal to 5, 10, 25, 50, 100. For the bias correction I use a
uniform kernel and a bandwidth of h = 1/(ln(n)n1/4). The results of the Monte Carlo exercises
are presented in Tables 1-4.
The results in Table 1 were generated from a model with γ0j = 0, γ1j = 1 and σ = 2. These
results show that the finite sample properties of both the uncorrected estimator and the bias-
corrected estimator are well approximated by the asymptotic theory. The feasible OLS estimator
exhibits a finite sample bias that represents a substantial contribution to the RMSE. Generally the
bias is of the same order of magnitude as the standard deviation, except when J is large relative to
n. The bias correction decreases the bias with only a slight increase in the standard deviation, thus
reducing the RMSE. In some cases where J is small relative to n, the bias is on the same order of
magnitude as the standard deviation after the bias correction.
Table 2 presents results from two additional models. Model 2 is the same as Model 1 except
that γ0j = 1. In Model 3, γ1j = 1 and γ0j = 0 but σ = 1. I find that the bias-corrected estimator
dominates the uncorrected estimator in these models, as in Model 1. In Model 3, ωJ is smaller so
the bias of the uncorrected estimator is smaller, as expected. I find that the bias of the corrected
estimator is substantially smaller in Model 3 as well, except when J is 50 or 100.
In Models 1-3, the item response functions are identical across j. Table 3 presents results from
models where γ0j or γ1j varies across j. Model 4 is the same as Model 1 except that the γ0j are
spaced equidistantly between −3 and 3 for each J . Model 5 is the same as Model 1 except that
the γ1j = 2δj where δj are spaced equidistantly between −2 and 2. Models 6 and 7 are the same as
17
Bias SD RMSE Bias SD RMSE
50 1.17 1.29 1.74 0.79 1.41 1.61 0.13100 1.15 0.92 1.47 0.75 0.99 1.24 0.23500 1.15 0.40 1.22 0.76 0.43 0.87 0.351000 1.16 0.29 1.19 0.76 0.30 0.82 0.375000 1.15 0.13 1.16 0.76 0.14 0.77 0.39
50 0.78 1.30 1.51 0.38 1.39 1.44 0.07100 0.82 0.92 1.23 0.41 0.98 1.06 0.17500 0.82 0.40 0.91 0.38 0.42 0.57 0.341000 0.81 0.28 0.86 0.37 0.30 0.48 0.385000 0.81 0.13 0.82 0.37 0.14 0.40 0.42
50 0.43 1.30 1.37 0.10 1.37 1.38 0.00100 0.42 0.90 1.00 0.10 0.95 0.95 0.04500 0.43 0.40 0.59 0.11 0.42 0.43 0.151000 0.43 0.28 0.51 0.10 0.30 0.31 0.205000 0.42 0.13 0.44 0.10 0.13 0.17 0.28
50 0.22 1.30 1.31 0.01 1.33 1.33 -0.02100 0.22 0.94 0.97 0.01 0.96 0.97 0.00500 0.24 0.41 0.47 0.03 0.42 0.42 0.051000 0.23 0.29 0.37 0.02 0.29 0.30 0.075000 0.23 0.13 0.27 0.03 0.13 0.13 0.13
50 0.09 1.31 1.31 -0.03 1.33 1.33 -0.02100 0.11 0.93 0.93 -0.01 0.94 0.94 -0.01500 0.11 0.41 0.42 0.00 0.41 0.41 0.011000 0.11 0.28 0.30 0.00 0.28 0.28 0.025000 0.12 0.12 0.17 0.00 0.13 0.13 0.05
Notes: All entries are as a percent of the true parameter value. This table reports the results of the simulation exercise described in Section 5. Each simulation used 5,000 monte carlo draws. This table reports results for the coefficient on the observed regressor.
5
10
25
50
100
Table 1. Monte Carlo simulation results for Model 1
J n feasible OLS feasible OLS with bias-correction RMSE improvement
18
J= n=10
050
010
0010
050
010
0010
050
010
0010
050
010
0010
050
010
00
bias
of
unco
rrec
ted
1.15
1.15
1.16
0.82
0.82
0.81
0.42
0.43
0.43
0.22
0.24
0.23
0.11
0.11
0.11
bias
of
corr
ecte
d0.
750.
760.
760.
410.
380.
370.
100.
110.
100.
010.
030.
02-0
.01
0.00
0.00
RM
SE im
prov
emen
t0.
230.
350.
370.
170.
340.
380.
040.
150.
200.
000.
050.
07-0
.01
0.01
0.02
bias
of
unco
rrec
ted
1.20
1.20
1.20
0.88
0.87
0.87
0.48
0.49
0.49
0.28
0.30
0.29
0.17
0.17
0.17
bias
of
corr
ecte
d0.
820.
830.
830.
480.
450.
440.
150.
170.
170.
070.
090.
080.
050.
050.
05R
MSE
impr
ovem
ent
0.23
0.34
0.36
0.18
0.34
0.38
0.06
0.18
0.23
0.01
0.08
0.10
0.00
0.03
0.04
bias
of
unco
rrec
ted
0.63
0.62
0.63
0.37
0.38
0.39
0.21
0.20
0.20
0.13
0.13
0.13
0.11
0.09
0.09
bias
of
corr
ecte
d0.
320.
300.
310.
150.
140.
150.
080.
070.
070.
060.
060.
060.
080.
050.
05R
MSE
impr
ovem
ent
0.10
0.22
0.26
0.03
0.12
0.16
0.00
0.03
0.05
0.00
0.01
0.02
0.00
0.00
0.01
5010
0Ta
ble
2. M
onte
Car
lo s
imul
atio
n re
sults
for β
2, co
mpa
rison
of
Mod
els
1-3
Not
es: A
ll en
trie
s ar
e as
a p
erce
nt o
f th
e tr
ue p
aram
eter
val
ue. T
his
tabl
e re
port
s th
e re
sults
of
the
sim
ulat
ion
exer
cise
des
crib
ed in
Sec
tion
5. E
ach
sim
ulat
ion
used
5,0
00 m
onte
car
lo d
raw
s. T
his
tabl
e re
port
s re
sults
for t
he c
oeff
icie
nt o
n th
e ob
serv
ed re
gres
sor.
mod
el 1
mod
el 2
mod
el 3
510
25
19
Models 4 and 5, respectively, except that σ = 1. I find the same overall patterns across all models.
The bias is practically corrected by the bias correction if either σ is small or J is large. A
substantial bias remains in a few cases where J is 5 or 10 and σ is greater than 1. Even in these
cases the bias is much smaller than the bias of the OLS estimator. On the other hand, the variance
of the bias-corrected estimator is higher across the board, in some cases close to twice the variance
of the OLS estimator. Overall the mean squared error is lower for the bias-corrected estimator
in every simulation but one. The one simulation where the MSE is lower exhibited a negligible
difference; this was the simulation with the smallest OLS bias. The bias correction reduces the
MSE by as much as 80%.
These results are what one would expect given the asymptotic results in the paper. While the
variance should not be increased asymptotically by the bias correction a moderately higher finite
sample variance is to be expected. Meanwhile, the asymptotics suggest that the bias should be
reduced but not eliminated by the bias correction and that the remaining bias should be decreasing
in the number of items.
As I have noted, in some cases a non-negligible bias remains even when n = 5000 and J = 100
because the model is misspecified. The first column of Table 4 shows results from three of the models
already discussed plus Model 4’ that differs from Model 4 in that the γ0j are spaced equidistantly on
a narrower range, from −1 and 1. The second and third columns show results from these same four
models except that σ, the variance of error in the latent index for each binary item, is decreased.
A smaller σ represents what psychologist call a more discriminating item. The smaller σ is, the
more the response is driven by the trait, θi, and the smaller ωj is. Table 4 only presents results for
n = 1000 and J = 10.
20
J= n=10
050
010
0010
050
010
0010
050
010
0010
050
010
0010
050
010
00
bias
of
unco
rrec
ted
1.15
1.15
1.16
0.82
0.82
0.81
0.42
0.43
0.43
0.22
0.24
0.23
0.11
0.11
0.11
bias
of
corr
ecte
d0.
750.
760.
760.
410.
380.
370.
100.
110.
100.
010.
030.
02-0
.01
0.00
0.00
RM
SE im
prov
emen
t0.
230.
350.
370.
170.
340.
380.
040.
150.
200.
000.
050.
07-0
.01
0.01
0.02
bias
of
unco
rrec
ted
1.32
1.31
1.31
0.95
0.95
0.94
0.50
0.52
0.52
0.28
0.29
0.29
0.14
0.15
0.15
bias
of
corr
ecte
d1.
011.
001.
000.
540.
520.
510.
140.
160.
150.
030.
040.
040.
000.
000.
00R
MSE
impr
ovem
ent
0.21
0.28
0.29
0.20
0.36
0.39
0.07
0.20
0.25
0.01
0.08
0.11
-0.0
10.
020.
03
bias
of
unco
rrec
ted
1.13
1.11
1.12
0.76
0.78
0.78
0.41
0.40
0.40
0.22
0.22
0.22
0.13
0.11
0.11
bias
of
corr
ecte
d0.
730.
710.
720.
350.
350.
350.
100.
100.
090.
020.
030.
020.
020.
000.
00R
MSE
impr
ovem
ent
0.23
0.35
0.38
0.15
0.33
0.37
0.04
0.14
0.18
0.00
0.05
0.07
0.00
0.01
0.02
bias
of
unco
rrec
ted
0.92
0.92
0.92
0.54
0.57
0.56
0.26
0.25
0.25
0.11
0.12
0.13
0.05
0.06
0.06
bias
of
corr
ecte
d0.
750.
750.
750.
250.
270.
260.
050.
050.
05-0
.01
0.00
0.00
-0.0
20.
00-0
.01
RM
SE im
prov
emen
t0.
100.
150.
160.
080.
200.
240.
010.
060.
09-0
.01
0.01
0.02
-0.0
10.
000.
00
bias
of
unco
rrec
ted
0.65
0.66
0.65
0.41
0.41
0.41
0.25
0.23
0.23
0.15
0.16
0.15
0.11
0.12
0.11
bias
of
corr
ecte
d0.
330.
340.
330.
180.
160.
160.
120.
100.
100.
080.
090.
080.
070.
080.
08R
MSE
impr
ovem
ent
0.11
0.23
0.27
0.04
0.13
0.16
0.01
0.04
0.06
0.00
0.02
0.02
0.00
0.01
0.01
Tabl
e 3.
Mon
te C
arlo
sim
ulat
ion
resu
lts fo
r β2,
com
paris
on o
f M
odel
s 1,
4-7
510
2550
100
mod
el 1
mod
el 4
mod
el 5
mod
el 6
Not
es: A
ll en
trie
s ar
e as
a p
erce
nt o
f th
e tr
ue p
aram
eter
val
ue. T
his
tabl
e re
port
s th
e re
sults
of
the
sim
ulat
ion
exer
cise
des
crib
ed in
Sec
tion
5. E
ach
sim
ulat
ion
used
5,0
00 m
onte
car
lo d
raw
s. T
his
tabl
e re
port
s re
sults
for t
he c
oeff
icie
nt o
n th
e ob
serv
ed re
gres
sor.
mod
el 7
21
σ 2 1 0.5
feasible 0.12 0.09 0.27bias corrected 0.00 0.05 0.26
infeasible -0.01 0.05 0.26
feasible 0.15 0.06 0.03bias corrected 0.00 -0.01 -0.01
infeasible -0.01 -0.01 -0.01
feasible 0.12 0.07 0.13bias corrected 0.00 0.03 0.11
infeasible -0.01 0.03 0.11
feasible 0.12 0.11 0.27bias corrected 0.01 0.07 0.25
infeasible 0.00 0.07 0.25
Table 4. Monte Carlo simulation results for β2 -- comparing biases
Notes: All entries are as a percent of the true parameter value. This table reports the results of the simulation exercise described in Section 5. Each simulation used 5,000 monte carlo draws. This table reports results for the coefficient on the observed regressor.
model 1
model 4
model 4'
model 5
The results from Table 4 first demonstrate that the bias correction works in that the bias of
the bias corrected estimator is equal to the bias of the infeasible estimator that uses pJ(θi) as a
covariate. The results also show that the misspecification bias, that is, the bias of the infeasible
estimator, can be larger in some cases. Specifically we see that typically as σ decreases a substantial
misspecification bias arises. However, Model 4 shows that, even with discriminating items, if there
is a wide enough range in the difficulty of the items then the misspecification bias is still minimal.
6 Application: The effect of education and ability on labor market
outcomes
In this section I conduct an empirical exercise to demonstrate the use of the methods proposed in
this paper. I make use of data from the National Longitudinal Study of Youth 1979 to investigate
the effect of education on earnings.
22
It is well known in the extensive literature on the returns to education that failing to account for
an individual’s ability in a wage regression will lead to a positive ability bias in the estimate of the
return to education. Becker (1967), for example, showed how this bias would arise using a model
of investment in human capital where the marginal benefit of education is increasing in ability.7
Rosen (1977) and, more recently, Carneiro, Heckman, and Vytlacil (2011) have demonstrated
more generally that a selection bias can arise in a hedonic model of the return to human capital
investment.
Several different approaches to overcoming the endogeneity of schooling have been considered.
One approach is to use earnings data on identical twins to control for genetic and environmen-
tal components of ability that are common between twins who make different education choices
(Behrman, Taubman, and Wales, 1977; Ashenfelter and Krueger, 1994). Typical estimates are
30% smaller than OLS estimates (Ashenfelter and Rouse, 1998). Measurement error in education
combined with remaining within-family variation in ability for twins makes it hard to see to what
extent these estimates reduce the ability bias in estimates of the effect of education on earnings
(Ashenfelter and Krueger, 1994; Card, 1999; Neumark, 1999).8 Another approach uses instrumen-
tal variables. Card (1999) provides an extensive review of IV estimates of the effect of education
on earnings. These estimates are typically higher, sometimes much higher, than OLS estimates.
Moreover, the choice of the instrument affects the interpretation of the estimate of the effect of
education (Heckman, Lochner, and Todd, 2006; Heckman, Urzua, and Vytlacil, 2006).
A third approach aims to control directly the unobserved ability that plagues OLS estimates.
While some economists have used IQ scores and performance on achievement tests as proxies
for unobserved ability it has long been recognized that doing so produces methods that suffer
from measurement error bias that tends to bias the effect of education upwards. Thus structural
equations models – a combination of errors-in-variables models, factor models, and simultaneous
equations – have been employed to account for the fact that these tests are noisy measures of ability.
This approach is typified by studies by Griliches and Mason (1972), Chamberlain (1977), and
7Heckman, Lochner, and Todd (2006) point out that the problem of ability bias in this context had been recognizedby economists as far back as Noyes (1945).
8The Minnesota Twin Family Study provides evidence that twins who made different education choices differsignificantly in verbal IQ, personality, and GPA (Stanek, Iacono, and McGue, 2011).
23
Blackburn and Neumark (1993). Some more recent work (Carneiro, Hansen, and Heckman, 2003;
Heckman, Stixrud, and Urzua, 2006) incorporates measures of ability while also using exclusion
restrictions to aid in identification. One common criticism of this approach is that these models
require normalizations for identification which are sometimes perceived as arbitrary, in the sense
that they are not motivated by an economic model. My analysis in this section avoids these
normalizations using the methods developed in this paper.
6.1 Data
The data used in this section is the National Longitudinal Survey of Youth (NLSY79). The NLSY79
has a broad set of information on demographics, educational outcomes, labor market outcomes,
health, and criminal behavior for a panel of individuals over thirty years. The respondents were
first interviewed in 1979 when they were between 14 and 22 years old. The respondents were
reinterviewed each subsequent year until 1994 after which point they were interviewed on a biennial
basis. The analysis is restricted 30-year old white males who reported working at least 35 hours
per week on average. Table 5 reports summary statistics for this subsample.
As part of the survey 11, 914 respondents (94% of the respondents) were administered the
Armed Services Vocational Aptitude Battery (ASVAB) which comprised ten subtests. Both raw
scores and scale scores are reported for each subtest. The Armed Forces Qualifying Test (AFQT),
a composite of four of these subtests, is also reported. Recently the individual item responses for
these four subtests – mathematical knowledge, arithmetic reasoning, paragraph comprehension,
and word knowledge – have been offered in a new data release.9 In addition, individuals were
administered two separate personality surveys, the Rosenberg Self Esteem and the Rotter Locus
of Control which consist of 10 and 4 binary10 items, respectively. These data are summarized in
Table 5 and Table 7 below.
9See Schofield (2014) for an initial analysis of this newly released data.10Respondents are asked whether they weakly agree, strongly agree, weakly disagree, or strongly disagree with
different statements. I convert responses to a binary agree or disagree classification.
24
mean std. dev. min. maxHighest grade completed 13.83 2.45 7 20Average weekly wage 967.14 523.66 2.98 3764.71Average hours worked per week 47.42 9.64 35 97.85Urban residence 0.75 0.44 0 1Married 0.63 0.48 0 1Number of children in household 0.87 1.07 0 5Urban residence, at age 14 0.75 0.44 0 1Father's highest grade completed 12.61 3.26 0 20Mother's highest grade completed 12.20 2.26 3 20Asvab - math. knowledge 0.64 0.25 0.04 1Asvab - paragraph comp. 0.78 0.18 0.07 1Asvab - word knowledge 0.82 0.16 0.11 1Asvab - arith. Reasoning 0.71 0.23 0.10 1Rosenberg Self Esteem 0.95 0.09 0.30 1Rotter Locus of Control 0.69 0.26 0 1
Table 5. Descriptive Statistics
Notes: The sample consists of n=1,157 white males from the NLSY79. Highest grade completed corresponds to the highest level reported by the individual. The average weekly wage is in 2010 dollars. The parent's education is reported at the initial interview in 1979. The scores reported for the ASVAB subtests and the Rosenberg and Rotter scales are the average responses as described in the text.
6.2 Results
In this section I estimate an earnings regression that controls for various dimensions of cognitive
and noncognitive ability. I start by estimating the following feasible regression model
Yi = β0 + β1Si +
6∑k=1
β3,kMki + γ′Xi + ui
where Yi is the log of individual i’s average weekly wages, Si represents education level, and Xi
is a vector of additional controls, including year dummies. For k = 1, . . . , 4, Mki is the score
(% answered correctly) on subtest k. For k = 5, 6, Mki represents the average response on the
Rosenberg and Rotter surveys after items are recoded so that a 1 is associated with higher self
esteem and a more internal locus of control, respectively. Theorem 4.1 implies that the OLS
25
estimator from this regression is consistent for the estimand of the infeasible regression
Yi = β0 + β1Si +6∑
k=1
β3,kpk(θi) + γ′Xi + ui
The results from two different specifications of the feasible regression are reported in columns
(2) and (5) of Table 6. Column (2) estimates the regression with Si equal to the highest grade
completed. The estimated effect of an additional year of education is 4.2%. This is substantially
smaller than the 6.6% obtained in column (1) from a regression that excludes all six scores, Mki.
This provides evidence of a substantial ability bias. While both personality measures are statis-
tically significant, among the cognitive measures, only mathematical knowledge has a statistically
significant effect.
(1) (2) (3) (4) (5) (6)0.066 0.042 0.036
(0.009) (0.011) (0.011)0.310 0.217 0.200
(0.042) (0.047) (0.047)0.353 0.501 0.345 0.470
(0.137) (0.137) (0.131) (0.131)0.175 0.321 0.170 0.312
(0.145) (0.145) (0.147) (0.147)-0.233 -0.406 -0.264 -0.449(0.183) (0.183) (0.183) (0.183)-0.006 -0.139 0.044 -0.073(0.157) (0.157) (0.157) (0.157)0.613 0.878 0.648 0.924
(0.229) (0.229) (0.225) (0.225)0.167 0.295 0.173 0.304
(0.079) (0.079) (0.077) (0.077)
Table 6. The effect of education on earnings, white males.
Notes: The regressions are based on a sample of n=1,157 white males from the NLS79. The regressions controlled for urban residence, regional dummies, mother's and father's educational level, urban residence at age 14, and year dummies. Standard errors are included in parantheses. Columns (3) and (6) implement the bias correction as discussed in the text.
Highest Grade Completed
College
ASVAB - Math. KnowledgeASVAB - Paragraph ComprehensionASVAB - Word KnowledgeASVAB - Arithmetic ReasoningRosenberg Self Esteem
Rotter Locus of Control
The specification reported in column (5) has Si equal to a dummy for whether the individual
26
has attended college at some level. The estimated effect of college attendance is 21.7%. Again,
this is substantially smaller than the 31% obtained in column (4) from a regression that excludes
all six scores, Mki due to ability bias. The coefficients on the six ability measures are similar to
the coefficients in regression specification (2), with no change in the precision with which they are
estimated.
The number of items on each test ranges from 4 for the Rotter locus of control assessment to
35 for the word knowledge component of the ASVAB. I implement the procedure from Section 4 to
compute a bias correction. Table 7 reports estimates of ˆωk for each k. These first stage estimates are
then used to construct a bias correction factor. The bias-corrected estimates, reported in columns
(3) and (6) of Table 6, suggest a non-negligible bias in the estimates of the coefficients on the ability
measures. In fact, two of the three test scores that were insignificant in the feasible regression are
statistically significant after the bias correction. More importantly, the estimates of the education
effect are smaller (by ∼ 10%) after the bias correction.
items mean std. dev. ωASVAB - Math. knowledge 25 0.64 0.25 0.15ASVAB - Paragraph comp. 15 0.78 0.18 0.14ASVAB - Word knowledge 35 0.82 0.16 0.10ASVAB - Arith. reasoning 30 0.71 0.23 0.14Rosenberg Self Esteem 10 0.95 0.09 0.04Rotter Locus of Control 4 0.69 0.26 0.19
Table 7. Descriptive Statistics - Assessments
Notes: The sample consists of n=1,157 white males from the NLSY79. The last column reports estimates ofω, calculated as discussed in the text.
7 Conclusion
This paper shows how the measurement error bias resulting from the use of a test score as a
regressor can be eliminated, allowing for valid statistical inference. The asymptotic results for both
the uncorrected least squares estimator and the bias-corrected estimator are new to the literature,
though similar bias correction procedures have been suggested for nonlinear panel data models
27
with fixed effects. The importance of these results is demonstrated by the Monte Carlo studies in
Section 5 and in the empirical exercise in Section 6. These methods are easy to implement and
provide a useful tool for taking advantage of the full item level data that is increasingly available to
social science researchers. Extending these bias correction procedures to other econometric models
is an important avenue for future research.
28
Acknowledgements: This paper has benefited from discussion with Robert Phillips and from
correspondence with Dan Black and Lynne Schofield.
References
Aigner, D. J., C. Hsiao, A. Kapteyn, and T. Wansbeek (1984): “Latent variable models in
econometrics,” in Handbook of Econometrics, ed. by Z. Griliches, and M. D. Intriligator, vol. 2
of Handbook of Econometrics, chap. 23, pp. 1321–1393. Elsevier.
Arellano, M., and S. Bonhomme (2009): “Robust priors in nonlinear panel data models,”
Econometrica, 77(2), 489–536.
Ashenfelter, O., and A. Krueger (1994): “Estimates of the economic return to schooling from
a new sample of twins,” American Economic Review, 84, 1157–1183.
Ashenfelter, O., and C. Rouse (1998): “Income, schooling, and ability: Evidence from a new
sample of identical twins,” Quarterly Journal of Economics, 113, 253–284.
Becker, G. (1967): Human capital and the personal distribution of income: An analytical ap-
proach. University of Michigan Press, Ann Arbor, MI.
Behrman, J., P. Taubman, and T. J. Wales (1977): “Controlling for and measuring the effects
of genetics and family environment in equations for schooling and labor market success,” in
Kinometrics: The determinants of socioeconomic success within and between families, ed. by
P. Taubman, pp. 35–96. North-Holland, Amsterdam.
Bekker, P. A. (1994): “Alternative approximations to the distributions of instrumental variable
estimators,” Econometrica: Journal of the Econometric Society, pp. 657–681.
Blackburn, M. L., and D. Neumark (1993): “Omitted-ability bias and the increase in the
return to schooling,” Journal of Labor Economics, 11(3), 521–544.
Bollinger, C. R. (2003): “Measurement error in human capital and the black-white wage gap,”
Review of Economics and Statistics, 85(3), 578–585.
29
Card, D. (1999): “The causal effect of education on earnings,” Handbook of labor economics, 3,
1801–1863.
Carneiro, P., K. T. Hansen, and J. J. Heckman (2003): “2001 Lawrence R. Klein Lecture:
Estimating Distributions of Treatment Effects with an Application to the Returns to Schooling
and Measurement of the Effects of Uncertainty on College Choice,” International Economic
Review, 44(2), 361–422.
Carneiro, P., J. J. Heckman, and E. J. Vytlacil (2011): “Estimating Marginal Returns to
Education,” The American Economic Review, 101(6), 2754–2781.
Chamberlain, G. (1977): “Education, income, and ability revisited,” Journal of Econometrics,
5(2), 241 – 257.
Chamberlain, G., and Z. Griliches (1975): “Unobservables with a Variance-Components Struc-
ture: Ability, Schooling, and the Economic Success of Brothers,” International Economic Review,
16(2), 422–449.
Chesher, A. (1991): “The effect of measurement error,” Biometrika, 78(3), 451–462.
Cunha, F., J. J. Heckman, and S. M. Schennach (2010): “Estimating the technology of
cognitive and noncognitive skill formation,” Econometrica, 78(3), 883–931.
Douglas, J. (1997): “Joint consistency of nonparametric item characteristic curve and ability
estimation,” Psychometrika, 62, 7–28.
Frisch, R. (1934): Statistical confluence analysis by means of complete regression systems. Oslo:
Universitetets konomiske institutt.
Goldberger, A. S. (1972): “Structural Equation Methods in the Social Sciences,” Econometrica,
40(6), 979–1001.
Griliches, Z., and W. M. Mason (1972): “Education, income, and ability,” Journal of Political
Economy, 80, S74–S103.
30
Hahn, J., and G. Kuersteiner (2002): “Asymptotically unbiased inference for a dynamic panel
model with fixed effects when both n and T are large,” Econometrica, 70(4), 1639–1657.
Heckman, J., R. Pinto, and P. Savelyev (2013): “Understanding the Mechanisms Through
Which an Influential Early Childhood Program Boosted Adult Outcomes,” The American eco-
nomic review, 103(6), 2052–2086.
Heckman, J. J., L. J. Lochner, and P. E. Todd (2006): “Earnings functions, rates of re-
turn and treatment effects: The Mincer equation and beyond,” Handbook of the Economics of
Education, 1, 307–458.
Heckman, J. J., J. Stixrud, and S. Urzua (2006): “The Effects of Cognitive and Noncognitive
Abilities on Labor Market Outcomes and Social Behavior,” Journal of Labor Economics, 24(3),
411–482.
Heckman, J. J., S. Urzua, and E. Vytlacil (2006): “Understanding instrumental variables in
models with essential heterogeneity,” The Review of Economics and Statistics, 88(3), 389–432.
Junker, B., L. S. Schofield, and L. J. Taylor (2012): “The use of cognitive ability measures
as explanatory variables in regression analysis,” IZA Journal of Labor Economics, 1(1), 1–19.
Klepper, S., and E. E. Leamer (1984): “Consistent sets of estimates for regressions with errors
in all variables,” Econometrica: Journal of the Econometric Society, pp. 163–183.
Lord, F., and M. Novick (1968): Statistical theories of mental test scores. Addison-Wesley.
Neumark, D. (1999): “Biases in twin estimates of the return to schooling,” Economics of Educa-
tion Review, 18(2), 143–148.
Noyes, C. (1945): “Director’s Comment,” in Income from Independent Professional Practice, ed.
by M. Friedman, and S. S. Kuznets, pp. 405–410. National Bureau of Economic Research, New
York.
Ramsay, J. (1991): “Kernel smoothing approaches to nonparametric item characteristic curve
estimation,” Psychometrika, 56, 611–630.
31
Rosen, S. (1977): “Human capital: A survey of empirical research,” in Research in Labor Eco-
nomics, ed. by R. Ehrenberg, vol. 1, pp. 3–40. JAI Press, Greenwich, CT.
Schofield, L. S. (2014): “Measurement error in the AFQT in the NLSY79,” Economics Letters,
123(3), 262 – 265.
Schofield, L. S., B. Junker, L. J. Taylor, and D. A. Black (2014): “Predictive inference
using latent variables with covariates,” Psychometrika, pp. 1–21.
Stanek, K. C., W. G. Iacono, and M. McGue (2011): “Returns to education: what do twin
studies control?,” Twin Research and Human Genetics, 14(06), 509–515.
Stout, W. F. (1990): “A new item response theory modeling approach with applications to
unidimensionality assessment and ability estimation,” Psychometrika, 55(2), 293–325.
Williams, B. (2013): “A Measurement Model with Discrete Measurements and Continuous Latent
Variables,” unpublished manuscript, George Washington University.
(2014): “Identification of a Nonseparable Model with Item Response Data,” Working
paper, George Washington University.
(2015): “A Model with Regressors Generated from Many Measurements with an Appli-
cation to Item Response Data,” Working paper, George Washington University.
Wooldridge, J. M. (2009): “On estimating firm-level production functions using proxy variables
to control for unobservables,” Economics Letters, 104(3), 112–114.
A Proofs
Throughout this appendix I will use the notation an,J bn,J for sequences of random variables
an,Jn,J→∞ and bn,Jn,J→∞ to mean that an,J ≤ κbn,J for all n, J where κ is nonrandom and
does not vary with n or J . I use the matrix norm ||A|| := ||A||2 = maxv∈Rl |Av|2/|v|2 for a matrix
with l columns where | · |2 is the usual euclidean distance (l2 norm) for a finite vector space. This
32
norm is also equal to the largest eigenvalue of the matrix A. Because the relevant matrices operate
on finite vector spaces all matrix norms are equivalent. I will use this fact along with the Frobenius
norm, ||A||Frob. :=(∑
i,j A2i,j
)1/2.
First I need to show that Assumption 3.2 implies the following set of higher level regularity
conditions.
Assumption A.1. There exists J0 such that or all J ≥ J0,
(i) |βJ | ≤ b <∞.
(ii) ||MJ || ≤ λ <∞ and λMJmin ≥ λ > 0.
(iii) λΨJmin ≥ λ > 0.
(iv) E||WiJW′iJ − E(WiJW
′iJ)||2 ≤ C <∞.
(v) E(|WiJuiJ |2+δ) ≤ D <∞.
Lemma A.1. Under Assumption 3.1, Assumption 3.2 implies Assumption A.1.
Proof. First, |βJ | ≤ ||M∗−1J |||E(W ∗i Yi)| ≤ 1/λ
M∗Jmin
√E(|WiYi|2) ≤
√D2/λ by Assumptions 3.2(i)
and (iv).
Next, WiJuiJ = (W ∗iJ + (WiJ −W ∗iJ))(εiJ − (WiJ −W ∗iJ)′βJ) so that
|WiJuiJ |2+δ 1 + |W ∗iJεiJ |2+δ + |W ∗iJ |2+δ
since |WiJ−W ∗iJ | = |MiJ− pJ(θi)| ≤ 1. Then since |W ∗iJεiJ |2+δ |W ∗iJYi|2+δ+ |W ∗iJW ∗′iJ |2+δ|βJ |2+δ,
condition (v) in Assumption A.1 is implied by Assumptions 3.2(iii) and (iv) and the previous result
that |βJ | ≤ b <∞.
Next, under Assumption 3.1, MJ = M∗J+EJ where EJ is a matrix with zeros everywhere except
in the K + 2,K + 2 position where it is equal to E(M2iJ)− E(pJ(θi)
2). By the triangle inequality,
||MJ || ≤ ||M∗J || + ||EJ || and by Weyl’s theorem λMJmin ≥ λ
M∗Jmin + λEJmin. Then note that ||M∗J ||
E(||W ∗iJW ∗′iJ ||2+δ). Also, E(M2iJ) − E(pJ(θi)
2) = J−1ωJ > 0 implies that ||EJ || = J−1ωJ < J−1
and λEJmin = 0. Therefore, Assumption A.1(ii) follows from 3.2(i) and (iii).
33
Next, by matrix norm equivalence
||WiJW′iJ − E(WiJW
′iJ)||2 ||W ∗iJW ∗′iJ − E(W ∗iJW
∗′iJ)||2Frob. + (MiJ − E(MiJ))2
+∑k
(MiJXik − E(MiJXik))2 + (M2
iJ − E(M2iJ))2
Since 0 < MiJ < 1 the second and fourth terms are bounded and the third term is bounded
by the first. Since ||W ∗iJW ∗′iJ − E(W ∗iJW∗′iJ)||Frob. ||W ∗iJW ∗′iJ − E(W ∗iJW
∗′iJ)|| and ||W ∗iJW ∗′iJ −
E(W ∗iJW∗′iJ)||2 ||W ∗iJW ∗′iJ ||2 + ||M∗J ||2 it follows that condition (iv) in Assumption A.1 is implied
by Assumption 3.2(iii).
Finally, it can be shown that Assumptions 3.1 and 3.2 imply that ΨJ = ΨJ + F1J where
ΨJ = E(W ∗iJW∗′iJε
2iJ) and F1J is a matrix with entries f l,kJ such that maxl,k |f l,kJ | = O(J−1). As
a result, due to e.g. Gershgorin’s theorem, the minimum eigenvalue of F1J is also O(J−1) which
implies that for any ε > 0 there is a Jε such that for J ≥ Jε, λF1Jmin > −ε. Therefore, by Weyl’s
theorem λΨJmin is bounded away from 0 since λΨJ
min is. Furthermore, ΨJ = Ψ∗J + F2J where F2J =
E(W ∗iJW∗′iJ(E(Yi |W ∗iJ)−β′JW ∗iJ)) is positive definite and hence has all positive eigenvalues. Weyl’s
theorem can be applied again to conclude that the minimum eigenvalue of ΨJ is bounded from
below by the minimum eigenvalue of Ψ∗J which is bounded away from 0 by Assumption 3.2(ii).
Lemma A.2. Under Assumption A.1 ||M−1J −M
−1J || →p 0.
Proof. First note that ||M−1J −M−1
J || ≤ ||M−1J ||||M
−1J ||||MJ −MJ ||. Next, I can use the identity,
||(I−P )−1|| ≤ 11−||P || for a matrix P such that ||P || < 1 to show that ||M−1
J || ≤||M−1
J ||1−||M−1
J ||||MJ−MJ ||.
Therefore, for ε > 0,
Pr(||M−1J −M
−1J || > ε) ≤ Pr(||MJ −MJ || > 1/||M−1
J ||)
+ Pr(||MJ −MJ || > ε/(||M−1J ||ε+ ||M−1
J ||2))
By Assumption A.1(ii), ||M−1J || ≤ 1/λMJ
min < 1/λ < ∞. The result then follows from Assump-
tion A.1(iv) and Chebyschev’s inequality.
34
Finally, I show that the results of Theorem 3.1 hold under Assumptions 3.1 and A.1. Combined
with Lemma A.1 this proves Theorem 3.1.
Proof. Let MJ = n−1∑n
i=1 WiJW′iJ and MJ = E(WiJW
′iJ) and define ξiJ := WiJuiJ . Recall from
the text that βJ − βJ =)V ∗nJ +B∗nJ where
V ∗nJ := M−1J n−1
n∑i=1
(ξiJ − E(ξiJ))
B∗nJ := M−1J n−1
n∑i=1
E(ξiJ) = M−1J E(ξiJ)
where the equality holds because of the i.i.d. assumption. Also define VnJ := M−1J n−1
∑ni=1(ξiJ −
E(ξiJ))) and BJ := M−1J E(ξiJ).
Now to show the first result,
|√nV ∗nJ | ≤ ||M−1
J |||n−1/2
n∑i=1
(ξiJ − E(ξiJ))|
≤ (||M−1J ||+ ||M
−1J −M
−1J ||)|n
−1/2n∑i=1
(ξiJ − E(ξiJ))|
Then√nV ∗nJ = Op(1) as desired since ||M−1
J || = O(1) (by Assumption A.1(ii)), ||M−1J −M−1
J || =
op(1) (by Lemma A.2) and |n−1/2∑n
i=1(ξiJ − E(ξiJ))| = Op(1). The latter holds by Chebyschev’s
inequality since ξiJ − E(ξiJ) is i.i.d. by assumption and E|ξiJ − E(ξiJ)|2 E|ξiJ |2+δ 1 (by
Assumption A.1(v)).
Similarly,
|√nB∗nJ | ≤ ||M−1
J |||√nE(ξiJ)|
≤ (||M−1J ||+ ||M
−1J −M
−1J ||)|
√nE(ξiJ)|
= (||M−1J ||+ ||M
−1J −M
−1J ||)
√n
JβJ,K+2ωJ
where the equality follows from the derivation in the text. Since ωJ is a simple average of quantities,
E(pj(θi)(1 − pj(θi))), which are bounded by 1, ωJ = O(1) and√nJ and βJ,K+2 are each O(1) by
35
assumption I can conclude that√nB∗nJ = O(1) as desired.
I have shown that√n(βJ − βJ) = Op(1). To prove that Ω
−1/2nJ (βJ − βJ −BJ)→d N (0, I) I will
first show that (i) Ω−1/2nJ (V ∗nJ − VnJ) = op(1) and (ii) Ω
−1/2nJ (B∗nJ −BJ) = op(1). The desired result
follows from (iii) Ω−1/2nJ VnJ →d N(0, I).
I first prove (i). Since
|Ω−1/2nJ (V ∗nJ − VnJ)| = |Ω−1/2
nJ (M−1J −M
−1J )n−1
n∑i=1
(ξiJ − E(ξiJ))|
= |Ψ−1/2J MJ(M−1
J −M−1J )n−1/2
n∑i=1
(ξiJ − E(ξiJ))|
≤ ||Ψ−1/2J ||||MJ ||||M−1
J −M−1J |||n
−1/2n∑i=1
(ξiJ − E(ξiJ))|
≤ 1/
√λΨJmin||MJ ||||M−1
J −M−1J |||n
−1/2n∑i=1
(ξiJ − E(ξiJ))|
the result follows from Assumption A.1(ii) and A.1(iii) and Lemma A.2 as I have already shown
that |n−1/2∑n
i=1(ξiJ − E(ξiJ))| = Op(1).
Next I prove (ii). As in the previous argument note that
|Ω−1/2nJ (B∗nJ −BJ)| ≤ 1/
√λΨJmin||MJ ||||M−1
J −M−1J |||√nE(ξiJ)|
Again the result follows since ||M−1J − M−1
J || = op(1) by Lemma A.2 and the other terms are
bounded in probability.
Now to prove (iii) I will first apply the Lindeberg CLT (Billingsley, 1995, p. 369) to∑n
i=1 νiJ
where νiJ = t′Ω−1/2nJ M−1
J (ξiJ − E(ξiJ))/n for any fixed vector t ∈ RK+2. Then I will appeal to the
Cramer-Wold device.
First note that E(νiJ) = 0 by construction. Second, by definition ΨJ = E((ξiJ −E(ξiJ))(ξiJ −
E(ξiJ))′) so E(ν2iJ) = n−2t′Ω
−1/2nJ M−1
J ΨJM−1J Ω
−1/2nJ t = n−1t′t <∞.
Now define s2n =
∑ni=1E(ν2
iJ) = t′t. The next step is to verify the Lyapunov condition:
36
1s2+δn
∑ni=1E(|νiJ |2+δ) = nE(|νiJ |2+δ)/(t′t)2+δ → 0. To verify this, note that
nE(|νiJ |2+δ) = n−δ/2E(|t′(Ψ−1/2J MJ)M−1
J (ξiJ − E(ξiJ))|2+δ)
= n−δ/2E(|t′Ψ−1/2J (ξiJ − E(ξiJ))|2+δ)
So the goal is to show that E(|t′Ψ−1/2J (ξiJ − E(ξiJ))|2+δ) < M < ∞. Noting that |t′Ψ−1/2
J (ξiJ −
E(ξiJ))| ≤ t′t√λ
ΨJmin
|ξiJ −E(ξiJ)|, the result follows by Assumptions A.1(iii) and A.1(v) since E|ξiJ −
E(ξiJ)|2+δ E|ξiJ |2+δ.
37