Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
CORRECTION OF SELECTIVITY BIAS IN MODELS OFDOUBLE SELECTION
Berk Yavuzoglua, Insan Tunalib, Elvan Ceyhanc
aDepartment of Economics, University of Wisconsin-Madison, Madison, Wisconsin
53706-1393, USA
bDepartment of Economics, Koc University, Sariyer, ·Istanbul 34450 Turkey
cDepartment of Mathematics, Koc University, Sariyer, ·Istanbul 34450, Turkey
August15, 2008
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
Abstract
Edgeworth expansions are known to be useful for approximating probability distributions
and moments. In our case, we exploit the expansion to specify models of double selection
by relaxing the trivariate normality assumption. We assume bivariate normality among the
error terms in the two selection equations and allow the distribution of the error term in the
regression equation to be free. This sets the stage for a simple test for the more common
trivariate normality speci�cation. Other recently proposed methods for handling multiple
categories are Multinomial Logit based selection correction models. An empirical example
about the wage determinants of regular and casual female workers in Turkey is presented to
document the di¤erences among the results obtained from our selectivity correction approach,
trivariate normality speci�cation and Multinomial Logit based selection correction models.
Keywords: double selection models; Edgeworth expansion; female labor supply; Multino-
mial Logit based selection correction models; selectivity bias
JEL codes: C34, C35, C63.
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
1 Motivation
Availability of a random sample provides the conventional setting for estimation of population
parameters. In this process, if the outcome variable is observed just for a subsample of the
original sample and this subsample itself fails to be a random sample, we encounter what
is called selectivity. In case of selectivity, reduced form estimates obtained by conventional
linear regression will be inconsistent for the population parameters. Selectivity problem,
which refers to the non-randomness of a sample caused by selection according to speci�ed
rules, has been a topic of interest for many researchers in econometrics and statistics in the
last 30 years.
To solve the selectivity problem, most of the literature deals with one selection equation
according to which the non-random sample is formed. Heckman (1976, 1979) and Lee (1976,
1979) solved this problem by assuming bivariate normality. Lee (1982) extended this literature
by dropping the bivariate normality assumption and relying on an Edgeworth expansion.
Tunali (1986) built on Heckman�s and Lee�s original work by bringing a second selection
equation into the scene and was able to handle complete and incomplete classi�cations using
a double selection model. He assumed trivariate normality among the random disturbances
of these three equations. Our contribution on the subject is relaxing the trivariate normality
assumption by extending Lee (1982). In obtaining our correction, we do not impose any
condition on the form of the distribution of the random disturbance in the regression equation,
but conveniently assume bivariate normality between the random disturbances of the two
selection equations. This apparently strong distributional assumption can be defended in
light of Ruud (1983) and Stoker (1986), or quasi-maximum likelihood methodology.
As documented in Miller (2005), Edgeworth expansion was introduced by F. Y. Edgeworth
in 1905 and dubbed "Edgeworth�s series" by E. P. Elderton in 1906. H. Cramer established the
theory of "Edgeworth�s series" in the 1920s. The presently popular term Edgeworth expansion
was used for the �rst time by David et al. (1951). The topic has received considerable
attention in statistics and econometrics. In order to sample the contributions, one may look
at Bhattacharya and Rao (1976), Bhattacharya and Ghosh (1978), Rothenberg (1983) and
Kolassa and McCullagh (1990).
Although double selection problem can be solved by employing other methodologies, we
2
opt for Edgeworth expansion because it is easier to implement and yields a richer functional
form for selectivity correction. This form nests the version obtained under trivariate normality
and provides a test of that restriction. We illustrate our robust selectivity bias correction in
the empirical context of Tunali and Baslevent (2006), where the goal is to estimate a wage
equation subject to participation and mode of employment choices.
Multinomial Logit based selection correction models have also been developed for the
purpose of handling two or more rounds of selection. These models have a very di¤erent
structure. They de�ne a selection equation for each possible state, and the random distur-
bances of these selection equations are assumed to be independent and identically Gumbel
distributed (Bourguignon et al., 2007). Our model has the advantage of allowing correlation
between the random disturbances of the two selection equations. Another advantage of our
model is the accommodation of states which are not disjoint with another state (as in Tunali
and Baslevent, 2006). In such cases, using a Multinomial Logit based selection correction
model is equivalent to ignoring the structure of the problem. Nonetheless, we present the re-
sults obtained from some well-known Multinomial Logit based selection models: Lee (1983),
Dubin and McFadden (1984) and its two variants proposed by Bourguignon et al. (2007),
and Dahl (2002).
In Section 2, we introduce the theory related to our problem. We start with the de�n-
ition of the double selection problem; then, discuss various solution strategies and reasons
for employing Edgeworth expansion. After explaining Edgeworth expansion in detail and
developing a theoretical solution for the double selection problem, we conclude with an ex-
planation of the estimation procedure. In Section 3, we provide an empirical example which
illustrates our selectivity correction. In the following section, we revisit the example using
Multinomial Logit based selection correction models. We compare these results with those
based on our selectivity correction approach. In the �nal section of the paper, we present the
conclusion.
3
2 Theory
2.1 Double Selection Problem
We take as our point of departure the following generic representation of the double selection
problem:
y�1i = �01X1i + �1u1i (�rst selection rule), (1)
y�2i = �02X2i + �2u2i (second selection rule), (2)
y3i = �03X3i + �3u3i (regression equation). (3)
For k = 1; 2; 3 and individual i, Xki�s are vectors of explanatory variables, �k�s are the
corresponding vectors of unknown coe¢ cients, uki�s are the random disturbances, and �k�s
are unknown scale parameters. The variables y�1i and y�2i are unobserved continuous random
variables, but without loss of generality, functions of them classify individuals to di¤erent
categories according to a sample selection regime denoted by �. From now on, we drop
subscript i to avoid notational clutter. Let X = X1 [X2 [X3. The feature of interest is
E(y3 j X;�) = �03X3 + �3E(u3 j X;�): (4)
Selectivity is manifested via E(u3 jX;�) 6= 0, which renders conventional linear regression
inconsistent. Parametric approaches exploit additional assumptions which yield a functional
form for E(u3 j X;�), so that adjustment for selectivity can be implemented. It is customary
to assume that (u1; u2; u3) is distributed independently across individuals and of Xk for
k = 1; 2; 3. Once the selectivity adjustment component is consistently estimated, linear
regression becomes a viable choice. The goal, then, is to consistently estimate �3 conditional
on the sample selection regime determined by �.
In the cases covered in Tunali (1986), the selection rule can be expressed as � = fD1; D2g
where D1 and D2 are two dichotomous variables indicating the outcomes of the two selection
rules as follows:
D1 =
8><>: 1 if y�1 > 0
0 if y�1 6 0
9>=>; and D2 =
8><>: 1 if y�2 > 0
0 if y�2 6 0
9>=>; : (5)
4
Then, we can characterize the outcomes resulting from the two selection equations in the
following 2� 2 table, where Sj is the set of individuals in the jth subsample for j = 1; 2; 3; 4.
D2
D1
0 1
0 S1 S2
1 S3 S4
Figure 1. Possible outcomes resulting from selection equations.
The case in which we observe all the cells of 2 � 2 table yields a complete classi�cation
of the original sample. However, incomplete classi�cation cases, in which only three or two
distinct cells of 2� 2 table are observed, are also possible. Various sample selection regimes
which �t this set-up are discussed in Tunali (1986). Without loss of generality, he assumes
that y3 is observed for the subsample S4, so that the conditional expectation function of
interest given in Equation (4) specializes to:
E(y3 j X;D1 = 1; D2 = 1) = �03X3 + �3E(u3j X;D1 = 1; D2 = 1)
= �3X3 + �3E(u3 j �1u1 > ��01X1; �2u2 > ��
02X2): (6)
The conditioning on X is implicit throughout, but we drop it for notational convenience.
Tunali (1986) assumes that u1, u2 and u3 have a trivariate normal distribution and sets
�1 = �2 = 1. The latter is an innocuous assumption for his set-up. However, for other
choices of �, this is no longer the case. In obtaining our solution, we relax both restrictions.
In particular, we let the form of the distribution of the random disturbance in the regression
equation be free. To prevent any confusion, under our distributional assumption, we denote
this term by ~u3 instead of u3, and assume that it has expectation E(~u3) = 0 and variance
V ar(~u3) = 1. However, we maintain that u1 and u2 have a standard bivariate normal
distribution.
In Section 2:3, after setting �1 = �2 = 1 for convenience and giving the functional form
of E(~u3 j u1 > ��01X1; u2 > ��2X2) obtained via an Edgeworth expansion, we provide a
strategy for consistent estimation of �3. The methodology can easily be extended to other
cases covered in Tunali (1986). Presently, we situate our approach within a broader distri-
5
butional context and justify our choice. We relax the variance restriction when we confront
the empirical problem in Tunali and Baslevent (2006).
2.2 Solution Strategies
Given our distributional assumptions, numerous approaches can be pursued for the purposes
of estimation. Kolassa (1994) o¤ers a broad list of series approximations and discusses the
computational details. One commonly used approximation is the Normal Inverse Gaussian,
examined in detail in Eriksson et al. (2004). Unlike Maximum Likelihood Estimation, this
method involves an incomplete speci�cation: the conditional expectation is written without
specifying the joint distribution. Even though this method works well for the univariate case,
its behavior is not known in the multivariate case. Edgeworth expansion also involves an
incomplete speci�cation, but is easier to apply compared to the alternatives.
For our purposes, the most appealing characteristic of Edgeworth expansion is the fact
that it nests the conventional trivariate normality assumption. Under the trivariate normality
assumption, conditional expectation function E(u3 j u1; u2) becomes a linear function of u1
and u2. In other words, trivariate normality allows us to express E(u3 j �) using only the
�rst and second central moments. Edgeworth expansion provides an improvement because
higher order moments are introduced. These allow us to relax the conventional restrictions
imposed on the skewness and kurtosis of the distribution.
Skewness is a measure of the degree of asymmetry of a distribution. If the left tail is
more pronounced than the right tail, the function is said to have negative skewness or said
to be left skewed. If the reverse is true, it has positive skewness or is right skewed. If two tails
are symmetric as in a normal distribution, then the density function has zero skewness. The
skewness of a multivariate distribution is de�ned by a function of 3rd order central moments
(Mardia, 1970b). For the univariate case, it is given by = �3=�3=22 where �i = E(x�E(x))i.
Kurtosis is the degree of peakedness of a distribution compared to the normal distribution,
and is de�ned as a function of 4th order central moments (Mardia, 1970b). For the univariate
case, the kurtosis is de�ned as & = �4=�22 .
Since only the �rst and second central moments are used under the trivariate normality
assumption, the conventional method will not work well if our data are skewed and/or kurtic.
In the derivation of our Edgeworth expansion, we also use the third and fourth central
6
moments. This allows us to test the restrictions on skewness and kurtosis. Most real life data
reveal some amount of skewness and/or kurtosis. Data are said to be skewed or kurtic if a
non-normal distribution is involved in their generation. Lahiri and Song (1999) provide an
example in the context of smoking related diseases.
2.3 Trivariate Edgeworth expansion
Let f(u1; u2; ~u3) be the joint density of u1, u2 and ~u3. The triplet (u1i; u2i; ~u3i) is assumed
to be independently and identically distributed across individulas with zero mean vector and
covariance matrix
� =
2666641 �12 �13
�12 1 �23
�13 �23 1
377775 ;and is independent of Xki for k = 1; 2; 3. It is helpful to keep in mind that u1, u2 and ~u3 are
the random disturbances of Equations (1) - (3). For brevity, we suppress the conditioning on
covariate matrix X and the individual subscript i throughout the section.
We denote the standard trivariate normal density STVN (�12; �13; �23) by g(u1; u2; u3)
and the standard bivariate normal density SBVN (�12) by g(u1; u2). Under some general
conditions given by Chambers (1967) and conditional on existence of all the moments of
u1, u2, and ~u3, the trivariate density f(u1; u2; ~u3) can be expanded in terms of a series of
derivatives of g(u1; u2; u3):
f(u1; u2; ~u3) = g(u1; u2; u3) +X
r+s+p�3
(�1)r+s+pr!s!p!
ArspDru1D
su2D
pu3g(u1; u2; u3); (7)
where Arsp�s are functions of the moments of f(u1; u2; ~u3), and
Dru1D
su2D
pu3g(u1; u2; u3) =
@r+s+pg(u1; u2; u3)
@ur1@us2@u
p3
: (8)
The iterative relation between the derivatives in Equation (8) can be captured with the
help of Hermite polynomials. In our case, a bivariate version is su¢ cient. The bivariate
7
Hermite polynomial of (r; s)th order, denoted by Hrs(u1; u2), satis�es
Dru1D
su2g(u1; u2) = (�1)
r+sHrs(u1; u2)g(u1; u2): (9)
In other words, Hrs(u1; u2) is a function of the ratio of (r; s)th to (0; 0)th order derivatives
of g(u1; u2). With the help of Equation (9), we can deduce from Equation (7) that the joint
density of u1 and u2 can be written as:
h(u1; u2) =
241 + Xr+s�3
1
r!s!Ars0Hrs(u1; u2)
35 g(u1; u2): (10)
Lee (1984) exploits this expression for testing bivariate normality in single selection models.
Employing the usual practice in the literature (see Lee, 1982; Lee, 1984; Lahiri and Song,
1999), we consider terms up to 4th order (r + s + p = 4 in Equation (7)). This amounts
to truncating the series approximation. While this practice is known as a Gram Charlier
Expansion for the univariate case, for the bivariate case, it is called a Type AaAa surface in
Pretorious (1930) and a Type AA surface in Mardia (1970a). For the trivariate case, we may
adopt Mardia�s (1970a) designation and identify the expansion as a Type AAA surface. We
further know that for r + s + p � 4, Arsp = Krsp where Krsp�s denote trivariate cumulants
(Lahiri and Song, 1999).
Returning to the generic representation of the double selection problem, we assume
(u1; u2) � SBVN (�12) in Equations (1) and (2), but allow the distribution of ~u3 to be free
with E(~u3) = 0 and V ar(~u3) = 1 in Equation (3) and set �1 = �2 = 1. Extending the steps
given in Mardia (1970a), we get the following operationally useful formula for the Type AAA
surface:
E(~u3 j u1; u2) = K101H10(u1; u2) +K011H01(u1; u2) +1
2K201H20(u1; u2)
+1
2K021H02(u1; u2) +K111H11(u1; u2) +
1
6K301H30(u1; u2)
+1
6K031H03(u1; u2) +
1
2K211H21(u1; u2) +
1
2K121H12(u1; u2): (11)
8
Equivalently,
E(~u3 j u1; u2) =1
1� �212[(�13 � �12�23)u1 + (�23 � �12�13)u2] +
1
2K201H20(u1; u2)
+1
2K021H02(u1; u2) +K111H11(u1; u2) +
1
6K301H30(u1; u2)
+1
6K031H03(u1; u2) +
1
2K211H21(u1; u2) +
1
2K121H12(u1; u2): (12)
The derivation is provided in Appendix 6:1. Appendix 6:2 gives explicit expressions for the
bivariate Hermite polynomials in Equation (11), and Appendix 6:3 presents derivations of
moments of the truncated SBVN distribution up to the third order.1
Incorporating the explicit formulas for the Hermite polynomials into Equation (12) and
using the moment formulas of the truncated SBVN distribution, we obtain:
E(~u3 j u1 > ��01X1; u2 > ��2X2) = �13�1 + �23�2 +
1
2K201�3 +K111�4 +
1
2K021�5
+1
6K301�6 +
1
6K031�7 +
1
2K211�8 +
1
2K121�9: (13)
Explicit formulas of ��s are given in Appendix 6:4. Note that �1 and �2 are the terms given
in Tunali (1986) and omission of the remaining terms in Equation (12) corresponds to the
standard trivariate normality speci�cation.
In theory, we can exploit Equation (10) and use approximation to the same order (r+s =
4) for the bivariate component, h(u1; u2), instead of assuming standard bivariate normality.
However, this will add 9 terms to the denominator. Since each of these terms includes
a di¤erent cumulant, the problem becomes intractable. We therefore rely on the SBVN
distribution. Two strands in the econometrics literature help us justify our choice. First, as
Ruud (1983) and Stoker (1986) have shown, slope coe¢ cients of index-function models can be
consistently estimated up to a factor of proportionality using any commonly used technique
like probit or logit. Second is the work on quasi maximum likelihood methodology: If the
correct speci�cation of the distribution is in the linear exponential family and we incorrectly
specify the model by using another distribution from that family, we still get consistent
1Our moment derivations of the truncated SBVN distribution up to the second order corroboratewith Henning and Henningsen�s (2007) formulas, but di¤er from Rosenbaum (1961). In our view,Rosenbaum (1961) made a sign error while calculating integrals. Besides, Henning and Henningsen(2007) cross-check their formulas using numerical integration and Monte Carlo simulation.
9
estimates.
As seen from the formulas of ��s in Appendix 6:4, the model is highly non-linear. There-
fore, adopting the practice in the literature, we rely on Heckman�s (1979) two-step estimation
procedure to estimate Equation (12). In the �rst step, we maximize the relevant likelihood
function to get consistent estimates of ��s and �12.2 Then, we �t the following linear regression
for the individuals in S4:
y3 = �03X3+ 1�1+ 2�2+ 3�3+ 4�4+ 5�5+ 6�6+ 7�7+ 8�8+ 9�9+ �3�; (14)
where E(� j u1 > ��01X1; u2 > ��2X2) = 0. We may refer to ��s as selectivity correction
terms. Then, we can test for the presence of selectivity bias by testing joint signi�cance of all
selectivity correction terms �1, �2, :::, �9. Furthermore, we can set 3 = 4 = ::: = 9 = 0 to
test for the conventional trivariate normality speci�cation under the maintained assumption
that the random disturbances in the two selection equations are bivariate normally distrib-
uted.3 The latter test can also be thought as a test for linearity of the conditional expectation
of ~u3 given u1 and u2.
3 Application
3.1 Problem
We illustrate our methodology in the context of the empirical work undertaken in Tunali
and Baslevent (2006). The data come from October 1988 Household Labor Force Survey
conducted by the State Institute of Statistics. The working sample consists of 8962 married
women between 20-54 years old who reside in urban areas (population > 20000), are not
in school and identi�ed as the only wife of the household head. There are four labor force
participation categories in the data set: 7770 non-participants, 745 wage workers (or regu-
lar/casual wage and salary earners), 181 self-employed which include employers, own-account
work and unpaid family workers, and 266 unemployed. The goal of the paper is to estimate
labor supply elasticities for wage workers. This calls for estimation of a wage equation for
2The formation of the likelihood function according to di¤erent sample selection regimes is explainedin detail by Tunali (1986).
3Lahiri and Song (1999) provide a Lagrange Multiplier procedure for testing trivariate normalityassumption in a sequential selection context.
10
the subsample of wage workers, taking selection into consideration.
Tunali and Baslevent (2006) assume that home-work (or non-participation), self-employment
and wage work utilities can be expressed as follows.
Home-work utility : U�0 = �00z + �0; (15)
Self-employment utility : U�1 = �01z + �1; (16)
Wage work utility : U�2 = �02z + �2; (17)
where z is a vector of observed variables, �j�s are the corresponding vectors of unknown
coe¢ cients and �j�s are the random disturbances. Assuming that individuals choose the
state with highest utility, their decision can be captured using the utility di¤erences:
y�1 = U�1 � U�0 = (�01 � �
00)z + (�1 � �0) = �
01z + �1u1; (18)
y�2 = U�2 � U�1 = (�02 � �
01)z + (�2 � �1) = �
02z + �2u2: (19)
This pair of selection equations has the same form as the ones in Equations (1) and (2)
with X1 = X2 = z. Note that y�1 can be expressed as the propensity to be self-employed or
the excess of the utility gained from self-employment compared to home-work and y�2 as the
incremental propensity to engage in wage work rather than self-employment. Then, y�1 + y�2
gives the excess utility under wage work over home-work. Even though we do not know
the preferences of unemployed women, Tunali and Baslevent (2006) follow Magnac (1991)
and de�ne them as people obtaining higher utility either from self-employment or wage work
relative to home-work. The four way classi�cation observed in the sample can be obtained
via the categorical variable LFP (Labor Force Participation)
LFP =
8>>>>>>><>>>>>>>:
0 = home-work, if y�1 < 0 and y�1 + y
�2 < 0;
1 = self-employment, if y�1 > 0 and y�2 < 0;
2 = wage labor, if y�2 > 0 and y�1 + y
�2 > 0;
3 = unemployed, if y�1 > 0 or y�1 + y
�2 > 0:
9>>>>>>>=>>>>>>>;(20)
Hence, the sample selection regime is given by � = fLFPg for this problem. In Appendix
6:5, the methodology given in Section 2:3 is adapted to handle the current case.
11
Since probability of being unemployed is de�ned as a union of self-employed and wage
work probabilities, we conclude that our support is broken down into three mutually exclusive
regions, which correspond to LFP = 0; 1; 2. Considering Equation (20), we see that the
classi�cation in our sample is obtained via y�1, y�2 and y
�1 + y
�2. Suppose we were to normalize
the variances of y�1 and y�1 + y�2 to 1, but this has an implication for the variance of y�2
(�22 = �2�12). Hence, we can apply the normalization to one of �11 = �21 and �22 = �22,
and must let the other variance be free. In our analysis, we apply the normalization �11 = 1
and let �22 be free. In the �rst step, we rely on maximum likelihood estimation and obtain
consistent estimates of �1, �2, �12 and �2 subject to �1 = 1. The relevant likelihood function
is given by
L =Y
LFP=0
P0Y
LFP=1
P1Y
LFP=2
P2Y
LFP=3
P3; (21)
where Pj = Pr(LFP = j) for j = 0; 1; 2; 3 and their explicit de�nitions are provided in
Appendix 6:5.
The regression equation for this problem is a Mincer-type wage equation obtained by
setting y3 = log(wage) in Equation (3) where X3 includes human capital variables and labor
market characteristics:
log(wage) = �03X3 + �3u3: (22)
The aim is to estimate �3 for wage workers, who have LFP = 2. As in Equation (14),
after forming the estimates of selectivity correction terms via �rst step estimates, we run a
linear regression equation with 9 selectivity correction terms in the second step.
3.2 Results
Table 1 provides the summary statistics for the variables used in the �rst stage of the analysis.
We �rst run a normalized (�11 = 1) maximum likelihood estimation as Tunali and Baslevent
(2006) did. This step is estimated using derivative free Maximum Likelihood option (lf) of
Stata by restricting the correlations and variances according to their theoretical ranges using
non-linear transformations. Besides, the probability weights coming from the data are used
12
Table 1. Sample Means (Standard Deviations) of Variables by Labor Force ParticipationStatus
Variable Fullsample
Nonparticipant
Selfemployed
Wageworker
Unemployed
Own wage 1.39(3.17)
Age 33.29 33.41 34.28 32.82 30.65Age squared/100 11.75 11.86 12.37 11.18 9.84Experience 20.29 20.82 21.31 15.76 16.93Experience squared/100 4.87 5.08 5.26 3.11 3.39Illiterate (reference) 0.25 0.27 0.21 0.055 0.13Literate without a diploma 0.090 0.095 0.12 0.034 0.071Elementary school 0.49 0.52 0.47 0.22 0.48Middle school 0.058 0.055 0.10 0.060 0.094High school 0.087 0.059 0.072 0.35 0.20University 0.030 0.0069 0.022 0.28 0.023Husband selfemployed 0.33 0.35 0.39 0.16 0.18Children aged 02 0.26 0.26 0.18 0.22 0.22Children aged 35 0.34 0.35 0.28 0.27 0.34Female children aged 614 0. 41 0.42 0.49 0.32 0.36Male children aged 614 0.44 0.45 0.49 0.34 0.43Extended household 0.12 0.13 0.14 0.11 0.079Ext. hh × Children aged 02 0.029 0.029 0.028 0.031 0.019Ext. hh × Children aged 35 0.036 0.036 0.033 0.039 0.019Ext. hh × Female ch. aged 614 0.045 0.046 0.044 0.032 0.030Ext. hh × Male ch. aged 614 0.047 0.049 0.050 0.039 0.030Share of textiles 0.31 0.31 0.39 0.30 0.31Share of agriculture 0.17 0.18 0.20 0.15 0.19Share of finance 0.060 0.059 0.053 0.064 0.056Migration rate 0.024 0.023 0.021 0.035 0.018Marmara (reference) 0.35 0.35 0.32 0.37 0.28Aegean 0.11 0.11 0.10 0.14 0.21South 0.12 0.12 0.26 0.099 0.13Central 0.19 0.18 0.10 0.23 0.17North West 0.040 0.037 0.028 0.055 0.075East 0.078 0.081 0.088 0.044 0.086South East 0.081 0.087 0.088 0.039 0.015North East 0.033 0.034 0.0055 0.027 0.041Population 200,000 or more 0.67 0.66 0.69 0.74 0.60Population 1 million or more 0.51 0.50 0.43 0.61 0.46Share of Welfare Party 0.073 0.074 0.059 0.062 0.058Share of Left of Center 0.35 0.35 0.33 0.36 0.36Sample size 8962 7770 181 745 266
13
since sampling frame relies on strati�cation. Robust standard errors are reported throughout
the section. Table 2 provides the results of the �rst step.
Based on the likelihood ratio statistics, 4-way model �ts the data well (p�value < 0:0001).
Since �12 is very close to �1, computational fragility is an issue. A cure for computational
fragility in multinomial probit models may be adding alternative speci�c variables for each
election equation as shown by Keane (1993). Since alternative speci�c variables are not
available, this venue is not a viable option. To explore the issue, Tunali and Baslevent (2006)
estimate restricted versions of the model in which they set �22 = B �11 for B = 0:35, 0:5; 1,
and 2 to see the sensitivity of the results. They state that all these restricted versions lead
to very similar inferences. However, they only report the estimates of the restricted version
with B = 0:35 since �22 is equal to 0:358 in the normalized version.4 This version improves
the precision of several slope estimates.5
Detailed discussion of the substantive results can be found in Tunali and Baslevent (2006).
We review some of these results in Section 4:3 in order to compare them with the �rst stage
results of Multinomial Logit based selection correction models. The negative sign of �12 can
be interpreted as follows: if the unobserved characteristics of a woman make her more likely
to choose self-employment rather than home-work, these unobserved characteristics also make
her less likely to choose wage work over self-employment. (Tunali and Baslevent, 2006)
We provide three least squares estimates of the wage equation in Table 3: without se-
lectivity correction terms, with conventional correction (corresponds to trivariate normality
speci�cation), and with robust correction (corresponds to the speci�cation of this paper).
We exclude 10 observations in these regressions due to missing wage information. Note that
experience corresponds to potential labor market experience, which equals age � 7 � years
of schooling for individuals having elementary school education or above and age � 12 for
individuals who have not �nished elementary school.
The regression equation is estimated after excluding 15 variables that appear in the se-
lection equations and putting experience and experience squared/100 instead of age and
4We do not provide the �rst step estimates of the restricted version with B = 0:35 here since thecoe¢ cient estimates di¤er at most by 4% from the normalized version. Furthermore, we only needpoint estimates of the normalized version in order to correct selectivity bias in the regression equation.
5Note that age, age squared/100, high school, university, Aegean, North West, North East andconstant in the �rst selection equation, and literate without a diploma, middle school, share of �nanceand population 1 million or more in the second selection equation become signi�cant in the restrictedversion with B = 0:35 in addition to the signi�cant variables in Table 2.
14
Table 2. Maximum Likelihood Estimates of Reduced Form Participation Equations(Normalized Version)
First Selection Second SelectionVariableCoefficient Standard
ErrorCoefficient Standard
ErrorAge 0.077 0.066 0.010 0.029Age squared/100 0.124 0.096 0.024 0.041Literate without a diploma 0.256*** 0.099 0.185 0.122Elementary school 0.124 0.085 0.048 0.075Middle school 0.471*** 0.138 0.194 0.187High school 0.534 0.580 0.090 0.116University 0.931 1.050 0.149 0.154Husband selfemployed 0.081 0.242 0.152* 0.092Children aged 02 0.178** 0.075 0.077 0.089Children aged 35 0.078 0.077 0.012 0.053Female children aged 614 0.055 0.103 0.075 0.094Male children aged 614 0.031 0.089 0.072 0.065Extended household 0.189 0.133 0.167 0.130Ext. hh. × Children aged 02 0.096 0.181 0.047 0.182Ext. hh. × Children aged 35 0.058 0.178 0.091 0.149Ext. hh. × Female ch. aged 614 0.252 0.165 0.221 0.161Ext. hh. × Male ch. aged 614 0.083 0.161 0.103 0.137Share of textiles 0.058 0.242 0.172 0.153Share of agriculture 0.759* 0.432 0.952*** 0.356Share of finance 3.686* 2.121 2.918 2.637Migration rate 3.642** 1.613 4.214*** 1.515Aegean 0.229 0.228 0.326* 0.180South 0.038 0.101 0.015 0.093Central 0.508* 0.263 0.547* 0.292North West 0.392 0.306 0.526** 0.262East 0.381 0.233 0.424* 0.233South East 0.291* 0.151 0.149 0.201North East 0.850 0.546 0.970* 0.528Share of Welfare Party 5.969*** 2.063 5.628** 2.857Share of Left of Center 0.697 0.695 1.172** 0.508Population 200,000 or more 0.114 0.088 0.029 0.123Population 1 million or more 0.160* 0.092 0.139 0.099Constant 1.688 1.340 0.242 0.643σ11 1 [normalized]σ22 0.358 (0.514)ρ12 0.971*** (0.046)No. of observations 8962Loglikelihood without covariates 3247.25Loglikelihood with covariates 2443.05Robust standard errors are reported in parentheses.* significant at 10%; ** significant at 5%; *** significant at 1%.
15
Table 3. Least Squares Estimates of the Wage EquationWith robustcorrection
With conventionalcorrection
Without selectivitycorrection terms
Variable
Coef. Std.Error
Coef. Std.Error
Coef. Std.Error
Experience 0.032*** 0.012 0.034*** 0.013 0.031** 0.013Experience sq./100 0.064** 0.033 0.072** 0.037 0.064* 0.037Literate w/o a diploma 0.119 0.174 0.080 0.181 0.112 0.186Elementary school 0.055 0.116 0.028 0.120 0.011 0.121Middle school 0.215 0.149 0.256* 0.144 0.231* 0.125High school 0.594*** 0.191 0.437** 0.184 0.465*** 0.119University 1.254*** 0.272 1.083*** 0.250 1.038*** 0.128Share of Textiles 0.308** 0.150 0.282** 0.143 0.407*** 0.122Share of Agriculture 0.098 0.237 0.085 0.238 0.150 0.238Share of Finance 3.437*** 1.319 2.557** 1.225 4.241*** 1.158Aegean 0.135* 0.072 0.169** 0.073 0.131* 0.069South 0.162** 0.081 0.172** 0.080 0.159* 0.082Central 0.035 0.072 0.007 0.073 0.029 0.073North West 0.151 0.100 0.170* 0.099 0.166* 0.096East 0.109 0.114 0.096 0.115 0.142 0.113South East 0.130 0.118 0.086 0.114 0.191* 0.100North East 0.137 0.117 0.210* 0.113 0.129 0.115
1λ 0.044 0.687 0.171 0.146 ―
2λ 2.441 3.132 0.458** 0.182 ―
3λ 0.135 0.412 ― ―
4λ 2.033 2.784 ― ―
5λ 2.076 2.897 ― ―
6λ 0.060 0.215 ― ―
7λ 0.625 1.006 ― ―
8λ 0.595 1.291 ― ―
9λ 1.149 2.080 ― ―
Constant 1.087** 0.519 0.927*** 0.345 0.923*** 0.194No. of obs. 735 735 735R2 0.387 0.378 0.367Robust standard errors are in parentheses.* significant at 10%; ** significant at 5%; *** significant at 1%.
age squared/100. In order to be con�dent that exclusion of these variables do not cast
any doubt in our �ndings, we conduct Sargan�s (1984) test for overidentifying restrictions
on our robust correction.6 The regression of the residuals from the wage equation on the
6We do not include age and age squared/100 in this test since experience is a highly collinearfunction of age.
16
15 excluded variables generates an R2 = 0:0035. Let n denote the sample size. Then,
nR2 = 2:5725 � �2(15), which is well within the acceptance region. Hence, the exclusions are
warranted.
Using robust correction results, a test for the joint signi�cance of all the selectivity cor-
rection terms yields strong evidence in favor of non-random selection (p � value = 0:0006).
Moreover, the test 3 = 4 = ::: = 9 = 0 provides evidence in favor of our robust correction
against conventional correction (p� value = 0:0424).
Only a fraction of women in urban areas participate and participation probability in-
creases dramatically with education. In light of the �rst stage results, one would think that
participants do not constitute a random subsample and this has implications for the estimate
of the wage equation. Indeed, comparison of the substantive results from the three models un-
derscores the importance of proper adjustment for selection. According to our speci�cation,
while the coe¢ cient of middle school becomes insigni�cant (implying people having middle
school education or less earn the same wage), the coe¢ cients of high school and university
dramatically increase in magnitude. Evidently, the minority which opts for wage work is
highly selective of unobservables and failure to account for this yields underestimation of the
returns to higher levels of education. With illiterates as the reference category,high school
results in 81 percent higher returns on average according to our robust correction whereas it
is 55 percent on average under conventional correction. Compared to illiterates, while univer-
sity results in 250 percent higher returns on average under our robust correction, this statistic
is 195 percent on average in the trivariate normality speci�cation. According to our robust
correction, average incremental returns are 21.9 percent per additional year of high school
and 28 percent per additional year of university education. The other signi�cant variables
under our robust correction have coe¢ cient estimates between that of random selection and
conventional correction.
Our robust correction implies that wage is maximized at 25 years of potential labor market
experience. This statistics is given by 24 in the conventional correction. Furthermore, regions
are jointly statistically signi�cant with a p� value of 0:0158. Besides, a one percent increase
in share of textiles causes a 0:31 percent decrease in wages while a one percent increase in
share of �nance increases wages by 3:4 percent on average. These e¤ects are underestimated
in the trivariate normality speci�cation.
17
3.3 Multicollinearity
One issue that arises in the context of our robust correction approach is the large standard
errors of individual selectivity correction terms (see Table 3). Letting R2j for j = 1; 2; ::9
denote the coe¢ cient of determination obtained by regressing �j on the remaining selectivity
correction terms, we obtain values in the range 0:995 < R2j < 1. This implies extremely high
multicollinearity. At a �rst glance, one may think that this is caused by the fragility of our
selection model since �12 is very close to �1. To get rid of this fragility, we reestimated the
model by restricting �22 = 1. This version yields �12 = �0:746 and 0:964 < R2j < 0:9997 for
j = 1; 2; ::9.
To assure that multicollinearity issue is not speci�c to this problem, we also generated
��s for Tunali�s (1986) migration-remigration problem, which is another example of a double
selection problem. The generic statement of this problem was given in Section 2:1. The
empirical context of the problem is as follows. There are three earnings pro�les: y�s under
the stay option, y�o under the one-time move option, y�f under the frequent move option. Let
��1 = y�o� y�s denote anticipated earnings gain from the one-time move to the best potential
destination, and ��1 be the cost of that move. Let ��2 = y�f� y�o denote anticipated incremental
earnings gain from the frequent move involving the best potential combination of intermediate
and �nal destinations, and ��2 be the cost of that move. Tunali (1986) assumes that individuals
are rational, in that sense anticipations are unbiased. Then, the two selection equations in
Equations (1) and (2) are obtained as y�1 = ��1 � ��1 and y�2 = ��2 � ��2. Upon introducing the
variable r = supf0; y�1; y�1 + y�2g, the decision rule becomes
Stay if r = 0; (23)
Move once if r = y�1; (24)
Move more than once if r = y�1 + y�2: (25)
However, Tunali�s (1986) modi�ed decision rule treats r = y�1 + y�2 > 0 > y�1 (move more
than once will be optimal in this case) as a stay decision on the grounds that the �rst move
will not be feasible when 0 > y�1.
Using the two dichotomous variables given in Equation (5), we can de�ne the decision
18
rule as:
Stay if D1 = 0; (26)
Move once if D1 = 1 and D2 = 0; (27)
Move more than once if D1 = 1 and D2 = 1: (28)
Thus, D2 is observed () D1 = 1. The sample selection regime for this problem is a 3-way
classi�cation with � = fD1; D2g, where cells in the �rst row of Figure 1 are combined. A more
detailed description can be found in Tunali (1986). For this problem, we �nd �12 = 0:389.
Nevertheless, high multicollinearity is present: 0:910 < R2j < 0:9999 for j = 1; 2; ::9.
Note that STATA had no trouble in computing standard errors and test statistics despite
extremely high multicollinearity. We are able to explain this issue by exploiting linear alge-
bra. Partition the full covariate matrix ~X3 as (X31 j X32), where X31 includes all explanatory
variables of the model without selectivity correction terms (X31 = X3 in terms of the nota-
tion given in the generic statement of the double selection model), and X32 includes all the
selectivity correction terms. We can write:
( ~X03~X3)
�1 =
0B@ X031X31 X
031X32
X032X31 X
032X32
1CA�1
: (29)
We use the lower right block of the matrix ( ~X03~X3)
�1 in calculating standard errors of
��s. This block may be computed as [(X032X32)�X
032X31(X
031X31)
�1X031X32]
�1 via submatrix
inverse theorem (see Goldberger, 1991). Extremely large R2j�s imply that the matrix (X032X32)
is nearly singular. However, this matrix is not inverted. Instead, the di¤erence between
(X032X32) and X
032X31(X
031X31)
�1X031X32 is inverted. Therefore, it is possible to compute
standard errors for ��s.
19
4 Multinomial Logit Based Selection Correction Models
4.1 Selection Step
Multinomial Logit based selection correction models are used as alternatives to Multivariate
Normal based models in the literature. These models de�ne a selection equation for each
possible state.
In the context of Tunali and Baslevent (2006), let7
y�j = �0jz + �j for j = 0; 1; 2; 3; (30)
where y�j�s denote the utility obtained from the choice of labor force participation status j and
are unobserved, z is the vector of explanatory variables8, �j�s are the corresponding vectors
of unknown coe¢ cients and �j�s are the random disturbances. Let r = max (y�0; y�1; y
�2; y
�3).
The four way classi�cation can obtained via the following categorical variable:
LFP 0 =
8>>>>>>><>>>>>>>:
0 = home-work, if r = y�0;
1 = self-employment, if r = y�1;
2 = wage labor, if r = y�2;
3 = unemployed, if r = y�3:
9>>>>>>>=>>>>>>>;(31)
Hence, the sample selection regime is given by � = fLFP 0g and Equation (30) corresponds
to the selection equations.
We assume that �j�s satisfy Independence of Irrelevant Alternatives (IIA) hypothesis, so
that they are independent and identically Gumbel distributed. McFadden (1973) proves that
this speci�cation corresponds to the Multinomial Logit model (Bourguignon et al., 2007).
The choice probabilities are
�j = Pr(LFP0 = j j z) =
exp(�0jz)
3X�=0
exp(�0�z)
; j = 0; 1; 2; 3: (32)
7 Individual subscripts are dropped from Equation (30) in order to avoid notational clutter.8We deliberately use the same set of explanatory variables speci�ed in the Equations (18) and (19)
for logit based selection equations. The aim is to compare the empirical results of these models andour selectivity correction approach in Section 4:3.
20
Since3Xk=0
�k = 1, we choose non-participants as the reference group and set �0 = 0. The
estimation of these models relies on two step estimation as well. In the �rst step, we obtain
consistent estimates of �j�s by maximizing the likelihood function in Equation (33):
L =Y
LFP=0
�0Y
LFP=1
�1Y
LFP=2
�2Y
LFP=3
�3: (33)
Note that our selectivity correction approach allows correlation between the random dis-
turbances of the two selection equations, but computational di¢ culties limit its generaliza-
tion9 (unless simulation based estimation is used at the �rst stage). On the other hand,
Multinomial Logit based selection correction models do not allow correlation between the
random disturbances of the selection equations, but can handle a large number of selection
equations. Because of the IIA assumption, these models treat unemployment as a distinct
state. However, unemployment probability is de�ned as a union of self-employment and wage
labor options in our problem [see Equation (20)]. Therefore, another potential limitation of
these models is the failure to capture the choice structure appropriately. Despite this failure,
we still discuss second (regression) step estimation for some well-known Multinomial Logit
based selection correction models (Lee (1983), Dubin and McFadden (1984) and its two vari-
ants proposed by Bourguignon et al. (2007), and Dahl (2002)) in the following subsection.
4.2 Regression Step
Regression equation given in Equation (22) remains intact. Our aim is to estimate the
parameter vector of the regression equation under the speci�ed Multinomial Logit based
selection correction models for the individuals with LFP0= 2. Returning to the generic
statement of the selectivity problem in Equation (4), once again we are confronted with
E(u3 j X;�) 6= 0. Multinomial Logit based models di¤er in the assumptions needed for
obtaining a parametric form for E(u3 j X;�). The regression step for each model is discussed
in detail below. Denote the correlations between �j and u3 by �j for j = 0; 1; 2; 3 and let
9This is the case for any multinomial probit model. For example, the aML software package (Lillardand Panis, 2003) may allow at most 4 alternatives.
21
�l = �l � �2 for l = 0; 1; 3. In addition, considering Equation (30), we de�ne
�2 = max (y�l � y�2) for l 2 f0; 1; 3g: (34)
Then,
LFP 0 = 2() �2 < 0: (35)
4.2.1 Lee�s (1983) Model
Lee (1983) generalizes the two-step selection correction methodology provided by Heckman
(1976, 1979) and Lee (1976, 1979) by extending it to polychotomous choice. Thus, this
method can be applied to Multinomial Logit based selectivity problems.
Assume that the cumulative distribution function for �2 is given by F�2(�2 j �0z) where
� is a matrix including the vectors of unknown coe¢ cients in each selection equation, i.e.
� = (�0 j �1 j �2 j �3). Using the translation method, we can de�ne
J�2(�2 j �0z) = ��1(F (�2 j �
0z)); (36)
where � is the standard normal distribution function. Lee (1983) makes the following as-
sumption:
The joint distribution of (u3; J�2(�2 j �0z)) does not depend on the index �
0z: (37)
This assumption is contested in the literature since it has strong implications. First, the
transformation J�2(�2 j �0z) implies that the marginal distribution is independent of �
0z, but
it does not tell us anything about the joint distribution. Besides, Schmertmann (1994) shows
that under Lee�s (1983) index independence assumption, all �l�s must have the same sign;
moreover, if we assume �l� �2�s are identically distributed, �l�s will be identical for l = 0; 1;
3 (Bourguignon et al., 2007).
Lee (1983) also assumes that
E(u3 j �2; �0z) = c2J�2(�2 j �
0z); (38)
22
where c2 is the correlation between u3 and J�2(�2 j �0z). In the literature, some ways of avoid-
ing the additional assumption in Equation (38) are available. If u3 is normally distributed,
this equation follows directly. When u3 is not normally distributed, this equation may be
viewed as a linear approximation to the conditional expectation function. Another way to
justify Equation (38) is to translate the distribution of u3 to standard normal. Following
Equation (36), we denote this translation as Ju3(u3). Then, assuming that the joint distrib-
ution of (Ju3(u3), J�2(�2 j �0z)) does not depend on �
0z (i.e. the correlation between Ju3(u3)
and J�2(�2 j �0z) is zero), we obtain Equation (38).
To recapitulate, under Lee (1983)�s assumptions, we get
E(u3 j LFP0= 2; �
0z) = �2�
j2; (39)
where �2 = �c2 and �j2 =�(J�2 (0 j �
0z))
F�2 (0 j �0z). Note that �j2 is the familiar term in conventional
selection correction. As before, after obtaining the consistent estimate of � from the �rst step,
so for �j2, the second step estimates of Lee�s (1983) model (LEE) are acquired by estimating
a linear regression:
log(wage) = �03X3 + �3�2�
j2: (40)
4.2.2 Dubin and McFadden�s (1984) Model and Its Two Variants
The conditional expectation function of the regression equation error is assumed to be a linear
function of �j � E(nj)�s, yielding Equation (41):10
E(u3 j �1; �2; �3; �4) =p6
�
3Xj=0
�j(�j � E(nj)): (41)
In this expression, consistent with the set-up of the selection step, �j�s are assumed to have
independent extreme value distributions. It can be shown that
E(�2 � E(n2) j LFP0= 2; �
0z) = � log(�2); (42)
10Linear conditional expectation function assumption is a convenient starting point going back toOlsen (1980).
23
E(�l � E(nl) j LFP0= 2; �
0z) =
�l log(�l)
1� �lfor l = 0; 1; 3: (43)
where log denotes the natural logarithm. To assure that E(u3 j X3) = 0, they also assume
3Xj=0
�j = 0: (44)
Under the assumptions given by Equations (41) and (44), the regression equation becomes
log(wage) = �03X3 +
�3p6
�
Xl=0;1;3
�lsl; (45)
where sl =�l log(�l)1��l + log(�2) for l = 0, 1, 3. After forming the three selectivity correction
terms via selection step estimates, we run a least squares on Equation (45) to obtain the
second step estimates of DMF model. Even though the estimation will not provide us a
consistent estimate for �2, we may calculate it from Equation (44) as �2 = �(�0 + �1 + �3).
Since only restriction on the correlation structure is via Equation (44), DMF can be regarded
as superior to LEE.
Bourguignon et al. (2007) show via Monte Carlo experiments that the restriction in
Equation (44) results in biased results when incorrectly imposed; in contrast, e¢ ciency loss
is small by not imposing it when the restriction holds. Hence, they drop the restriction in
Equation (44) and refer to it as the �rst variant of Dubin and McFadden�s (1984) model
(DMF 1). For this model, the regression equation is given by
log(wage) = �03X3 +
�3p6
�
24 Xl=0;1;3
�lsl + �2s2
35 ; (46)
where sl =�l log(�l)1��l for l = 0, 1, 3 and s2 = � log(�2). Observe that if we set �2 =
�(�0 + �1 + �3) in this equation, we get Equation (45). Hence, DMF 1 is more general
compared to DMF. After forming the four selectivity correction terms via selection step
estimates, we run a least squares on Equation (46) to obtain the second step estimates of the
DMF 1 model.
Since the assumption in Equation (41) restricts the distribution of u3, Bourguignon et al.
(2007) proposes a second variant of Dubin and McFadden�s (1984) model (DMF 2). Assuming
24
that the cumulative distribution function for �j is given by G(�j) and using the trick given
in Lee (1983), they make the transformation
��j = J(�j) = ��1(G(�j)): (47)
Then, they assume that conditional expectation function of the regression equation error
is a linear function ��j�s:
E(u3 j �1; �2; �3; �4) =4Xj=1
��j��j ; (48)
where ��j is the correlation between u3 and ��j . Observe that Equation (41) can be obtained
by setting ��j = �j�E(�j) in this equation; therefore, DMF 2 is a generalized version of DMF
1. Note that the assumption in Equation (48) holds if u3 and ��j�s are multivariate normally
distributed. Assuming that �(:) denotes the density function of �j , let
M(�j) =
ZJ(v � log(�j))�(v)dv for j = 0; 1; 2; 3: (49)
Then, Bourguignon et al. (2007) derive the following:
E(��2 j LFP0= 2; �
0z) =M(�2): (50)
E(��l j LFP0= 2; �
0z) =M(�l)
�l�l � 1
for l = 0; 1; 3: (51)
As a result, wage equation can be written as
log(wage) = �03X3 + �3
24 Xl=0;1;3
��l sl + ��2s2
35 : (52)
where sl =M(�l)�l�l�1 for l = 0, 1, 3 and s2 =M(�2). Even though M�s do not have closed
form solutions, they are computed using Gauss-Laguerre quadrature. In this calculation,
methods given by Davis and Polonsky (1964) are used. Second step estimates of DMF 2 are
obtained by running a linear regression on Equation (52) after obtaining consistent �rst step
estimates and then forming selectivity correction terms.
25
4.2.3 Dahl�s (2002) Model
Dahl (2002) proposes a semi-parametric model. Estimation of the model becomes more di¢ -
cult as the number of categories increases. To simplify matters, he makes an index su¢ ciency
assumption by restricting the set of known probabilities to a subset Q0of the full choice set
Q = f�0; �1; �2g in order to deal with large number of categories. We do not include �3 in
Q since �3 = 1� (�0+ �1+ �2) This assumption can be written as follows:
'(u3; �2 j �0z) = '(u3; �2 j Q) = '(u3; �2 j Q
0) (53)
where ' is the conditional density function of u3 and �2.
In other words, Dahl (2002) makes the following restriction in order to reduce the dimen-
sionality of the problem:
E(u3 j LFP0= 2; �
0z) = E(u3 j LFP
0= 2; Q
0) = (Q
0): (54)
is a unique function, which Dahl (2002) obtains by assuming that the probabilities are
invertible. Then, the regression function becomes
log(wage) = �03X3 + (�2): (55)
This approach has two shortcomings. First, as stated by Dahl (2002), we generally can
not test the index su¢ ciency assumption. Second, reducing dimensionality is equivalent to
making a correlation assumption, i.e. �j , the correlation between �j and u3, is a function of
the elements of Q0for j = 0; 1; 2; 3 (Bourguignon et al., 2007). However, compared to other
Multinomial Logit based selection correction models, Dahl (2002) does not assume a linear
conditional expectation function.
A special case of this model proposed by Dahl (2002) is to assume that the set Q0is a
singleton containing only the chosen alternative, in our case 2. Then, the regression equation
becomes
log(wage) = �03X + (�2) (56)
26
We assume that is a fourth order polynomial as in Bourguignon et al. (2007), and call
this model DAHL 1. Another approach, following Bourguignon et al. (2007), is to express
as a function of all probabilities up to fourth order plus the interactions between them.
This latter approach is identi�ed as DAHL 2 in our analysis. In both approaches, �s are
approximated by a series expansion after obtaining �rst step estimates. Unlike the other logit
based methods, DAHL�s approach does not yield consistent estimates of �33 and ��s.
4.3 Application Revisited
We revisit the application given in Section 3 and estimate it using the Multinomial Logit
based selection correction models. Conveniently, the results for both of the steps can be
obtained by using the Stata module selmlog developed by Bourguignon et al. (2007).11 The
results obtained from the selection step are presented in Table 4. Observe that while Tunali
and Baslevent (2006) and we set P3 = P1 + P2, Multinomial Logit based selection correction
models have �3 = 1� (�0+�1+�2). The restriction for unemployment probability in Tunali
and Baslevent (2006) and our model is not satis�ed in Table 4 since coe¢ cients in columns 1
and 2 do not sum to those in column 3. Since our model captures the logic of participation
better, it may be safe to assume that it gives more accurate results. Then, do �rst step
results of Multinomial Logit based models yield us astray? Even though most qualitative
results are similar except those regarding unemployment (for obvious reasons) with those of
the multinomial probit based models12, there are some di¤erences. These di¤erences can be
seen as costs of simplicity.
As seen from Table 4, with illiterates as the reference category, being a middle school or
university graduate increase the log odds for self-employment relative to non-participation.
High school can also be regarded as a signi�cant variable for the selection equation regarding
self-employment since it has a p� value of 0:101. As people get more educated, they become
more able, so they may set up their own business or work for their own account. In that
sense, these �ndings are consistent with expectations. Besides, education levels starting from
elementary school are signi�cant for the selection equation regarding wage workers, and the
coe¢ cient estimates increase with education. This implies the probability of being a wage
11The STATA ado �le is available at http://www.pse.ens.fr/gurgand/selmlog13.html12Throughout the section, in order to make a comparison, we use the results of the restricted version
with �22= 0:35 since it gives more precise estimates.
27
Table 4. Selection Step for Multinomial Logit Based Selection Correction ModelsSelfemployed Wage Worker UnemployedVariableCoef. Std.
ErrorCoef. Std.
ErrorCoef. Std.
ErrorAge 0.085 0.090 0.575*** 0.067 0.128 0.087Age squared/100 0.113 0.126 0.839*** 0.097 0.264** 0.129Literate without a diploma 0.402 0.282 0.371 0.262 0.310 0.295Elementary school 0.194 0.221 0.500*** 0.185 0.287 0.207Middle school 1.083*** 0.311 1.493*** 0.233 0.863*** 0.282High school 0.572 0.348 3.226*** 0.189 1.642*** 0.244University 1.630*** 0.559 5.089*** 0.235 1.735*** 0.476Husband selfemployed 0.082 0.160 1.318*** 0.125 0.863*** 0.166Children aged 02 0.362 0.232 0.390*** 0.134 0.505*** 0.174Children aged 35 0.163 0.193 0.429*** 0.118 0.176 0.147Female children aged 614 0.381** 0.178 0.260** 0.114 0.086 0.151Male children aged 614 0.160 0.180 0.405*** 0.113 0.020 0.152Extended household 0.634* 0.343 0.094 0.256 0.156 0.400Ext. hh. × Children aged 02 0.220 0.562 0.260 0.338 0.099 0.566Ext. hh. × Children aged 35 0.067 0.523 0.446 0.308 0.450 0.543Ext. hh. × Female ch. aged 614 0.702 0.487 0.007 0.312 0.183 0.505Ext. hh. × Male ch. aged 614 0.301 0.476 0.293 0.306 0.090 0.511Share of textiles 0.163 0.539 1.212*** 0.433 0.823* 0.498Share of agriculture 1.115 1.248 1.939** 0.841 1.397 1.039Share of finance 6.208 4.435 2.910 3.615 4.184 4.793Migration rate 10.18*** 3.356 7.469** 3.070 3.402 3.406Aegean 0.729** 0.310 0.551** 0.227 0.507* 0.260South 0.105 0.291 0.283 0.240 0.033 0.315Central 1.739*** 0.361 0.670** 0.292 0.090 0.343North West 1.559*** 0.552 1.012** 0.398 0.350 0.412East 1.386** 0.543 0.495 0.415 0.118 0.481South East 0.451 0.468 0.223 0.361 1.750*** 0.609North East 2.790*** 1.062 1.037** 0.463 0.347 0.485Share of Welfare Party 16.509*** 2.978 2.749 2.309 4.006 2.941Share of Left of Center 4.545*** 1.240 2.400* 1.346 3.757*** 1.465Population 200,000 or more 0.766*** 0.263 0.303* 0.174 0.088 0.230Population 1 million or more 0.372 0.259 0.126 0.185 0.523** 0.249Constant 1.993 1.693 14.708*** 1.343 5.258*** 1.605No. of observations 8962Loglikelihood w/o covariates 4603.94Loglikelihood with covariates 3528.64Robust standard errors are in parentheses.* significant at 10%; ** significant at 5%; *** significant at 1%.The reference group is nonparticipation.
worker compared to non participation increases with education level where illiterates consti-
tute the reference group. These results are consistent with those of the multinomial probit
based models.
Self employment is more likely for women who live in extended households or have female
children aged 6-14 compared to non-participation. However, probability of wage labor over
28
non-participation is not a¤ected by extended households and decreases with the presence
of children. Typically, women are responsible for care of dependents (child and/or elderly
people) due to their comparative advantage and/or the social role assigned to them in the
Turkish culture. As the number of dependents in the household increases, not only time
required for care responsibilities but also living expenses increase. That probably induces
women to work, but they do not have enough time to work in a regular job due to the time
constraint arising from increased care responsibilities. As a result, they may end up being
a self-employed rather than a wage worker. The results of the multinomial probit based
models imply that extended household and presence of children except those aged 0-2 do
not a¤ect participation probabilities. Children aged 0-2 decreases the probability of being a
self-employed over non-participation. Hence, the results di¤er.
While probability of being a wage worker compared to non-participation has a concave
pro�le with respect to age, log odds for self-employment relative to non-participation is not
a¤ected by age. The part regarding wage workers are consistent with the theory of labor
supply over the life cycle. However, self-employment part does not comply with this theory,
and it is contrary to the results of the multinomial probit based models.
Although shares of textiles and agriculture increase the probability of being a wage worker,
they do not a¤ect the probability of self-employment compared to non-participation. The
�rst pattern is inconsistent for agriculture sector, and observe that share of textiles is an
insigni�cant variable for both of the selection equations in the analysis of the multinomial
probit based models. Moreover, both the probabilities of being a wage worker and self-
employed relative to non-participation are not a¤ected by share of �nance, which again
disagrees with the results in Table 2. Notice that the interpretation of migration rate remains
the same.
If the husband is self employed, log odds of being a wage worker relative to non-participation
decreases. Furthermore, nearly all of the region dummies have interpretations consistent with
the results of the multinomial probit based models. Likewise, the interpretations regarding
share of votes received by welfare and left of center parties agree with the multinomial probit
based models except that the probability of self-employment compared to non-participation
decreases with an increase in share of votes received by left of center parties. The vari-
able population 200; 000 or more is signi�cant for self-employment and wage worker selection
29
equations, and this does not comply with the results of of the multinomial probit based
models.
Unlike our model, logit based models use an independent equation for the unemployed.
The results for the unemployed paint a reasonable picture of the labor market. In the selection
equation regarding unemployment, we see that as age increases, the probability of being
unemployed relative to non-participation become less likely. This is reasonable since more
and more people give up looking for a job as they get older. With illiterates as the reference
group, the probability of being unemployed compared to non-participation increases with
middle school education or more. This is as expected since price of leisure is higher for
more educated people, and thus they look for a job. Moreover, having children aged 0-
2 decreases unemployment probability compared to non-participation since women refrain
from looking for a job in order to nurture young children. Besides, women with self employed
husband have a lower unemployment probability compared to non-participation probability.
The reason may be that if these women would like to work, they probably work with their
husbands.
Share of textiles increases the unemployment probability relative to non-participation.
As new textile jobs become available, some low-educated and inexperienced non-participants
may decide to look for a job. Furthermore, as votes received by left of center parties increase,
unemployment probability increases compared to non-participation. This is in line with the
Tunali and Baslevent�s (2006) assertion that people would have a more liberal view about
the female labor in the provinces where the share of votes received by center of left parties is
higher.
In the second stage, selectivity correction terms are formed according to the assumptions
of the speci�ed models and the regression estimates for wage workers are obtained using the
selmlog module. Standard errors are estimated using the bootstrap method by drawing with
replacement 100 and 400 samples where the sample size is determined by the size of the
sample in use, which is 735. Since additional bootstrapping does not lead to big di¤erences
in the standard errors of the variables, we only report the estimates of the version with 100
replications.
Table 5 gives the results based on LEE where weighted least squares is used to account
for heteroskedasticity. The weights are given in the appendix of Bourguignon et al. (2007).
30
The selmlog module provides consistent estimates of �33 and c2. It does not restrict c2 to
the [�1; 1] interval.
Table 5. LEE Estimates of the Wage EquationVariable Coefficient Std. ErrorExperience 0.033** 0.014Experience sq./100 0.068* 0.041Literate w/o a diploma 0.110 0.190Elementary school 0.003 0.124Middle school 0.250* 0.143High school 0.486*** 0.184University 1.067*** 0.256Share of Textiles 0.391*** 0.144Share of Agriculture 0.156 0.254Share of Finance 4.211*** 1.194Aegean 0.132** 0.067South 0.153 0.095Central 0.029 0.077North West 0.171 0.110East 0.137 0.130South East 0.184* 0.112North East 0.129 0.125
j2λ 0.009 0.114
Constant 0.967*** 0.321σ33 0.279*** 0.062c2 0.016 0.178No. of obs. 735Standard errors are estimated using 100 bootstrap replications.* significant at 10%; ** significant at 5%; *** significant at 1%.
The �nding from LEE supports random selection since selectivity correction term has
a p � value = 0:939. This is inconsistent with the results obtained both from the trivariate
normality and our robust speci�cation. In fact, results based on LEE are almost exactly the
same with the results from the model without selectivity correction reported in Table 3.
Table 6 reports the second step estimates from DMF, DMF 1 and DMF 2. This step, for all
these models, is estimated by employing weighed least squares to deal with heteroskedasticity.
The relevant weights can be found in the appendix of Bourguignon et al. (2007). Note
that, for j = 0; 1; 2; 3, sj�s denote di¤erent variables for each model and their de�nitions
were already given in Section 4:2:2. The selmlog module provides consistent estimates of �33,
�j and ��j for j = 0; 1; 2; 3 by not restricting the correlation estimates to their theoretical
ranges.
31
Table 6. DMF, DMF1 and DMF2 Estimates of the Wage EquationDMF 2 DMF 1 DMFVariable
Coef. Std.Error
Coef. Std.Error
Coef. Std.Error
Experience 0.031** 0.012 0.028** 0.014 0.029** 0.013Experience sq./100 0.064* 0.034 0.057 0.036 0.060* 0.036Literate w/o a diploma 0.127 0.195 0.138 0.190 0.128 0.182Elementary school 0.030 0.137 0.044 0.137 0.032 0.124Middle school 0.191 0.176 0.167 0.187 0.208 0.143High school 0.453** 0.197 0.409** 0.186 0.436*** 0.162University 1.044*** 0.299 1.041*** 0.285 1.052*** 0.225Share of Textiles 0.419** 0.166 0.377* 0.197 0.329** 0.160Share of Agriculture 0.176 0.278 0.146 0.271 0.105 0.234Share of Finance 4.593*** 1.728 4.400** 2.166 3.920*** 1.247Aegean 0.115 0.085 0.114 0.121 0.113 0.095South 0.155* 0.105 0.155 0.134 0.159 0.102Central 0.045 0.088 0.044 0.107 0.035 0.081North West 0.150 0.125 0.138 0.135 0.131 0.111East 0.156 0.143 0.149 0.191 0.145 0.129South East 0.199 0.140 0.178 0.192 0.140 0.111North East 0.101 0.173 0.077 0.176 0.090 0.125
0s 0.142 0.256 0.361 0.410 0.352 0.346
1s 0.667 1.415 0.290 2.475 0.136 0.544
2s 0.025 0.974 0.064 0.060 ― ―
3s 0.090 0.546 0.084 0.413 0.273 0.392Constant 1.002*** 0.325 1.052** 0.439 1.001*** 0.352σ33 0.309*** 0.132 0.417 0.420 0.432 0.347
0ρ ― ― 0.718 0.529 0.686 0.476
1ρ ― ― 0.576 2.588 0.266 0.784
2ρ ― ― 0.127 0.096 ― ―
3ρ ― ― 0.168 0.610 0.532 0.574*0ρ 0.255 0.432 ― ― ― ―*1ρ 1.200 1.932 ― ― ― ―*2ρ 0.046 0.160 ― ― ― ―*3ρ 0.163 0.831 ― ― ― ―
No. of obs. 735 735 735Standard errors are estimated using 100 bootstrap replications.* significant at 10%; ** significant at 5%; *** significant at 1%.
Since s0, s1 and s3 are jointly statistically insigni�cant for DMF (p� value = 0:667), and
s0, s1, s2 and s3 are jointly statistically insigni�cant for DMF 1 (p � value = 0:8021) and
DMF 2 (p�value = 0:9564), these models provide evidence in favor of random selection. This
32
conclusion contradicts the results based on trivariate normality and our robust speci�cations.
Like LEE, results obtained from these models are very similar to that of random selection,
reported in Table 3.
Table 7 gives the second step estimates obtained from DAHL 1 and DAHL 2. Note that
we cannot test for non-random selection since the selmlog module neither provides robust
standard errors for correction terms nor computes R2�s.
Table 7. DAHL 1 and DAHL 2 Estimates of the Wage EquationDAHL 2 DAHL 1Variable
Coef. Std.Error
Coef. Std.Error
Experience 0.027* 0.016 0.033** 0.014Experience sq./100 0.050 0.039 0.067* 0.039Literate w/o a diploma 0.204 0.187 0.140 0.182Elementary school 0.090 0.120 0.041 0.120Middle school 0.158 0.162 0.182 0.146High school 0.592*** 0.201 0.533*** 0.170University 1.066*** 0.291 1.147*** 0.240Share of Textiles 0.274 0.195 0.370*** 0.121Share of Agriculture 0.114 0.276 0.140 0.233Share of Finance 4.474*** 1.636 4.138*** 1.168Aegean 0.122 0.109 0.142* 0.077South 0.146 0.112 0.130 0.086Central 0.098 0.097 0.029 0.073North West 0.103 0.125 0.181** 0.091East 0.206 0.152 0.113 0.113South East 0.003 0.165 0.172 0.109North East 0.074 0.161 0.122 0.113Constant 267.265 418.920 1.030*** 0.223No. of obs. 735 735Standard errors are estimated using 100 bootstrap replications.significant at 10%; ** significant at 5%; *** significant at 1%.
Comparing the results in Table 7 with those obtained from our robust correction in
Table 3, observe that South in both models, and experience sq./100, share of textiles and
Aegean in DAHL 2 become insigni�cant. Moreover, North West becomes signi�cant at 5
percent in DAHL 1. The point estimates of the signi�cant variables in both DAHL 1 and our
speci�cation di¤er up to 20 percent whereas the discrepancy rises to 30 percent with DAHL
2.
To recapitulate, Multinomial Logit based selection correction models fail to detect selec-
tivity in wage equation estimates and yield magnitudes di¤ering much from ours. The reason
33
might be that these models ignore the logic of participation. Among these models, closest
estimates to ours are given by DAHL 1. This is surprising since by comparing root mean
squared errors via Monte Carlo experiments, Bourguignon et al. (2007) �nd that DAHL 1
performs better than other models in only 1 out of 100 cases. Furthermore, their results
support using variants of DMF or DAHL 2, which perform badly in our case.
5 Conclusion
This paper discusses correction of selectivity bias in models of double selection. The approach
proposed here relaxes the conventional trivariate normality assumption. To assess the merits
of the new method, we compare our methodology with multinomial based selection correction
models, which have been developed for handling two or more rounds of selection in the
literature.
The conventional selectivity correction mechanism used in double selection models relies
on trivariate normality among the random disturbances of the regression and the two selection
equations. However, when tests are done, this assumption is typically rejected. We relax this
assumption by relying on an Edgeworth expansion, extending what Lee (1982) did for simple
selection models with one selection equation. We assume bivariate normality among the
random disturbances in the two selection equations, but do not impose any restriction on
the form of the distribution of the random disturbance in the regression equation. Since it
relaxes the skewness and/or kurtosis assumptions, our methodology is superior to trivariate
normality based approaches. Besides, we provide simple tests for the presence of selectivity
bias and the trivariate normality speci�cation under our distributional assumption.
The drawback of our model is that extensions beyond 2 selection equations are cumber-
some and computationally intensive. To handle multiple selection, Multinomial Logit based
correction models are used frequently. These models de�ne a selection equation for each
possible category and impose the IIA structure. Our model has the advantage of allowing
correlation between the random disturbances of the two selection equations. In addition,
problems in which the states are not pairwise disjoint are tractable via our speci�cation.
Cross-equation restrictions are impossible for Multinomial Logit based models due to the
maintained IIA property.
34
To illustrate the di¤erences among various selection correction models, an empirical ex-
ample involving the wage determinants of regular and casual female workers in Turkey is
presented. Only a fraction of women living in urban areas of Turkey participate and par-
ticipation probability increases dramatically in response to an increase in education level.
Thus, there is reason to believe that wage workers are not a random subsample. Indeed, both
the conventional correction method and the robust version support this conjecture. Notably,
the example provides evidence in favor of our robust speci�cation, and against the trivariate
normality assumption. We conclude that it is important to allow kurtosis and skewness.
As examples of Multinomial Logit models, we analyzed the methods provided by Lee
(1983), Dubin and McFadden (1984) and its two variants proposed by Bourguignon et al.
(2007), and Dahl (2002). These models fail to capture selectivity. Furthermore, the substan-
tive results di¤er from those obtained under our robust speci�cation. Our model captures
the logic of participation better, so we believe that our robust results are more reliable and
conclude that the speci�ed Multinomial Logit based selection correction models are far from
providing reasonable estimates in this empirical example.
35
6 Appendix
6.1 Derivation of Equations (11) and (12)
Let f(u1; u2; ~u3) be the joint density of u1, u2 and ~u3. The triplet (u1i; u2i; ~u3i) is assumed
to be independently and identically distributed across individulas with zero mean vector and
covariance matrix
� =
2666641 �12 �13
�12 1 �23
�13 �23 1
377775 ;and is independent of Xki for k = 1; 2; 3. For brevity, we suppress the conditioning on
covariate matrix X and the individual subscript i throughout the section. Our aim is to
obtain an expression for E(~u3 j u1; u2).
We denote the standard trivariate normal density STVN (�12; �13; �23) by g(u1; u2; u3)
and the standard bivariate normal density SBVN (�12) by g(u1; u2). Let k(~u3 j u1; u2) be
the conditional density of ~u3 given u1 and u2. We assume that k(~u3 j u1; u2)g(u1; u2) =
f(u1; u2; ~u3). That is, we specify the joint distribution of (u1; u2) as standard bivariate
normal, but allow the conditional distribution of ~u3 given u1 and u2 to be non-normal. Let
A =q1� (�212 + �213 + �223 � 2�12�13�23); (A.1.1)
a =1p
1� �212; (A.1.2)
b1 = �13 � �12�23; (A.1.3)
b2 = �23 � �12�13; (A.1.4)
c =a
A(b1u1 + b2u2) ; (A.1.5)
u�1(u1; u2) = u�1 = a(u1 � �12u2); (A.1.6)
u�2(u1; u2) = u�2 = a(u2 � �12u1); (A.1.7)
u�3 =u3Aa
� c: (A.1.8)
36
We can write:
E
�~u3Aa
j u1; u2�g(u1; u2) =
Z 1
�1
~u3Aa
k(~u3 j u1; u2)g(u1; u2) d~u3: (A.1.9)
Equivalently,
E(~u3 j u1; u2)g(u1; u2)
Aa=
Z 1
�1
~u3Aa
f(u1; u2; ~u3) d~u3: (A.1.10)
For the Type AAA surface de�ned in Section 2:3, we may expand f(u1; u2; ~u3) in terms of a
series of derivatives of g(u1; u2; u3) via Equation (7). We truncate the expansion by keeping
terms with r + s + p � 4. This allows us to set Arsp = Krsp where K�s denote cumulants
(Lahiri and Song, 1999). Thus, the truncated version of Equation (A:1:10) is:
E(~u3 j u1; u2)g(u1; u2)
Aa=
Z 1
�1
u3Aa
g(u1; u2; u3) du3
+4X
r+s+p=3
(�1)r+s+pr!s!p!
Krsp
Z 1
�1
u3Aa
Dru1D
su2D
pu3g(u1; u2; u3)du3: (A.1.11)
Let
I1 =
Z 1
�1
u3Aa
g(u1; u2; u3) du3; (A.1.12)
and
Irsp =
Z 1
�1
u3Aa
Dru1D
su2D
pu3g(u1; u2; u3)du3: (A.1.13)
Then, Equation (A:1:11) may be expressed as
E(~u3 j u1; u2)g(u1; u2)
Aa= I1 +
4Xr+s+p=3
(�1)r+s+pr!s!p!
KrspIrsp: (A.1.14)
37
Claim: g(u1; u2; u3) =�(u1)�(u�2)�(u�3)
A , where � is the univariate standard normal density.
Proof: Let g(u2 j u1) be the conditional density function of u2 given u1 and g(u3 j u1; u2)
be the conditional density of u3 given u1 and u2. Then, g(u1; u2; u3) can be written as a
product of marginal and conditional distributions:
g(u1; u2; u3) = g(u1)g(u2 j u1)g(u3 j u1; u2): (A.1.15)
Clearly, g(u1) = �(u1). We know from Goldberger (1991) that
g(u2 j u1) = a� (u�2) : (A.1.16)
Now, let us �nd g(u3 j u1; u2). Using the formula in Goldberger (1991), we obtain
E(u3 j u1; u2) = a2(b1u1 + b2u2); (A.1.17)
V ar(u3 j u1; u2) = A2a2: (A.1.18)
Then,
g(u3 j u1; u2) =� (u�3)
Aa: (A.1.19)
Substituting Equation (A:1:16) and Equation (A:1:19) into Equation (A:1:15), we obtain
g(u1; u2; u3) =�(u1)� (u
�2)� (u
�3)
A: Q:E:D: (A.1.20)
Substituting Equation (A:1:20) into Equation (A:1:13) and making the transformation z3 =
u�3, so Aadz3 = du3, we get
Irsp =
Z 1
�1(z3 + c)D
ru1D
su2D
pu3�(u1)� (u
�2)�(z3)adz3: (A.1.21)
Let �(p) denote the pth derivative of the univariate standard normal density. Observe that
Dpu3�(z3) = �(p)(z3)
�1
Aa
�p; (A.1.22)
38
and
Dpz3�(z3) = �(p)(z3): (A.1.23)
Hence,
�1
Aa
�pDpz3�(z3) = Dp
u3�(z3): (A.1.24)
Inserting Equation (A:1:24) into Equation (A:1:21), we obtain
Irsp =1
Apap�1
Z 1
�1(z3 + c)D
ru1D
su2D
pz3�(u1)� (u
�2)�(z3)dz3: (A.1.25)
Making another transformation z3 = u3 (abusing the notation), so Dpz3�(z3) = Dp
u3�(u3) and
dz3 = du3, we have
Irsp =1
Apap�1
Z 1
�1(u3 + c)D
ru1D
su2D
pu3�(u1)� (u
�2)�(u3)du3: (A.1.26)
The de�nition of rth order univariate Hermite polynomial can be obtained by setting s = 0
in Equation (9). In other words, Dru1�(u1) = (�1)
rHr(u1)�(u1). Using this, we can deduce
H0(a) = 1 and H1(a) = a. Then, we can write
u3 + c = H1(u3 + c) =
1Xi=0
0B@ 1
i
1CA ciH1�i(u3); (A.1.27)
where
0B@ 1
i
1CA denotes the binomial coe¢ cient. Incorporating Equation (A:1:27) into Equa-
tion (A:1:26), we get
Irsp =(�1)pApap�1
Dru1D
su2�(u1)� (u
�2)
1Xi=0
0B@ 1
i
1CA ciZ 1
�1Hp(u3)H1�i(u3)�(u3)du3: (A.1.28)
39
For any x, the orthogonality property of Hermite polynomials allows us to write
Z 1
�1Hn(x)Hm(x) exp
��x2=2
�dx = n!
p2��n;m, (A.1.29)
where �n;m is the Kronecker product, �n;m =
8><>: 1 if n = m
0 if n 6= m
9>=>;. Since �p;1�i = 1 () p =
1� i() i = 1� p() p � 1,
Z 1
�1Hp(u3)H1�i(u3)�(u3)du3 =
Z 1
�1Hp(u3)H1�i(u3)
exp��u23=2
�p2�
du3
= p!p2��p;1�i
1p2�
= p! for p � 1: (A.1.30)
Also observe thatR1�1Hp(u3)H1�i(u3)�(u3)du3 = 0 for p > 1. Returning to Equation
(A:1:28), we conclude that
Irsp = 0 for p > 1: (A.1.31)
Moreover, substituting Equation (A:1:30) into Equation (A:1:28), we obtain:
Irsp =(�1)pApap�1
Dru1D
su2�(u1)� (u
�2)
0B@ 1
1� p
1CA c1�pp! for p � 1: (A.1.32)
Equivalently,
Irsp =(�1)pApap�1
Dru1D
su2�(u1)� (u
�2)
c1�p
(1� p)! for p � 1: (A.1.33)
Substituting aA(b1u1 + b2u2) back instead of c, we obtain
Irsp =(�1)pAa2p�2
1
(1� p)!Dru1D
su2�(u1)� (u
�2) (b1u1 + b2u2)
1�p for p � 1: (A.1.34)
We can conclude from Equations (A:1:12) and (A:1:13) that I1 = I000. Then, we can write
I1 as
I1 =a2
A�(u1)� (u
�2) (b1u1 + b2u2) : (A.1.35)
40
Setting p = 0 and 1 in Equation (A:1:34); we respectively get:
Irs0 =a2
ADru1D
su2�(u1)� (u
�2) (b1u1 + b2u2) ; (A.1.36)
and
Irs1 = �1
ADru1D
su2�(u1)� (u
�2) : (A.1.37)
After substituting Equations (A:1:31), (A:1:35), (A:1:36), and (A:1:37) into Equation (A:1:16)
and doing some manipulations, we get
E(~u3 j u1; u2)g(u1; u2) = a3�(u1)� (u�2) (b1u1 + b2u2)
+4X
r+s=3
(�1)r+sr!s!
Krs0a3Dr
u1Dsu2�(u1)� (u
�2) (b1u1 + b2u2)
+
3Xr+s=2
(�1)r+sr!s!
Krs1aDru1D
su2�(u1)� (u
�2) : (A.1.38)
Since a�(u1)�(u�2) = g(u1; u2), which may be obtained by multiplying Equation (A:1:16) by
�(u1), Equation (A:1:38) can be written as:
E(~u3 j u1; u2)g(u1; u2) = a2g(u1; u2) (b1u1 + b2u2)
+
4Xr+s=3
(�1)r+s
r!s!Krs0a
2Dru1D
su2g(u1; u2) (b1u1 + b2u2)
+3X
r+s=2
(�1)r+sr!s!
Krs1Dru1D
su2g(u1; u2): (A.1.39)
Now, let us compute Dru1D
su2g(u1; u2) (b1u1 + b2u2) via the chain rule:
Dru1D
su2g(u1; u2) (b1u1 + b2u2) = (b1u1 + b2u2)D
ru1D
su2g(u1; u2)
+ g(u1; u2)Dru1D
su2 (b1u1 + b2u2) : (A.1.40)
Observe that Dru1D
su2(b1u1+ b2u2) = 0 for r+ s � 2. Using the formula for bivariate Hermite
41
polynomials given in Equation (9), we obtain
Dru1D
su2g(u1; u2) (b1u1 + b2u2) = (b1u1 + b2u2)D
ru1D
su2g(u1; u2)
= (b1u1 + b2u2) (�1)r+sHrs(u1; u2)g(u1; u2) for r + s � 2: (A.1.41)
Incorporating Equation (A:1:41) into Equation (A:1:39) and doing some manipulations, we
get
E(~u3 j u1; u2) = a2 (b1u1 + b2u2)
"1 +
4Xr+s=3
1
r!s!Krs0Hrs(u1; u2)
#
+3X
r+s=2
1
r!s!Krs1Hrs(u1; u2): (A.1.42)
Since (u1; u2) � SBVN (�12) Equation (10) allows us to write
1 +4X
r+s=3
1
r!s!Krs0Hrs(u1; u2) = 1: (A.1.43)
This expression may be veri�ed by exploiting the moment formulas in Stuart and Ord (1994),
after showing that Kijk = �ijk for i + j + k = 3 using their technique where ��s denote
trivariate central moments. Thus, when h(u1; u2) = g(u1; u2), Equation (A:1:42) becomes
E(~u3 j u1; u2) = a2(b1u1 + b2u2) +K201
2H20(u1; u2) +K111H11(u1; u2)
+K021
2H02(u1; u2) +
K301
6H30(u1; u2) +
K211
2H21(u1; u2)
+K121
2H12(u1; u2) +
K031
6H21(u1; u2): (A.1.44)
When we express a, b1 and b2 as in Equations (A:1:2), (A:1:3) and (A:1:4) respectively, we
get Equation (12). Using Equations (A:2:1) and (A:2:2) given in the next section, and the
cumulant formulas K101 = �13 and K011 = �23 (Stuart and Ord, 1994), it can be shown that
a2 (b1u1 + b2u2) = K101H10(u1; u2) +K011H01(u1; u2); (A.1.45)
Combining Equation (A:1:45) with Equation (A:1:44), we obtain Equation (11).
42
6.2 Expressions for Bivariate Hermite Polynomials
The expressions given below were obtained by solving Equation (9) for various values of r
and s using Maple. The code may be obtained from the author upon request.
H10(u1; u2) = au�1; (A.2.1)
H01(u1; u2) = au�2; (A.2.2)
H20(u1; u2) = �a+ u�21 ; (A.2.3)
H02(u1; u2) = �a+ u�22 ; (A.2.4)
H11(u1; u2) = ��12 + u�1u�2; (A.2.5)
H30(u1; u2) = au�1�H20 � 2a2
�; (A.2.6)
H03(u1; u2) = au�2�H02 � 2a2
�; (A.2.7)
H21(u1; u2) = aH11u�1 � a3(u�2 � �12u�1); (A.2.8)
H12(u1; u2) = aH11u�2 � a3(u�1 � �12u�2): (A.2.9)
6.3 Moments of the Truncated SBVN Distribution
The moment formulas of the truncated SBVN distribution up to the second order may be
found in Rosenbaum (1961). However, in our independent derivation, we discovered that
Rosenbaum made a sign error while calculating integrals. Our expressions up to the second
order agree with those in Henning and Henningsen (2007), who cross-check their formulas
using numerical integration and Monte Carlo simulation. We also provide some third order
moments of the truncated SBVN distribution below. The derivations may be obtained from
the author upon request. We denote the univariate standard normal density and distribution
function respectively by �(:) and �(:). Let Cti = �tiXti for t = 1; 2 for individual i. We
suppress the individual subscript throughout the section to avoid notational clutter. We
denote the moments of the truncated SBVN distribution for the subsample S4 (de�ned in
43
Section 2:1) as mj;k = E(uj1uk2 j u1 > �C1;u2 > �C2). Recall a = 1p
1��212and let
q =
p1� �212p2�
=1
ap2�; (A.3.1)
D(C1; C2) � D2 = C21 � 2�12C1C2 + C22 ; (A.3.2)
C�1 (C1; C2) � C�1 = a(C1 � �12C2); (A.3.3)
C�2 (C1; C2) � C�2 = a(C2 � �12C1); (A.3.4)
C��1 (C1;�C2) � C��1 = C1 + �12C2; (A.3.5)
C��2 (�C1; C2) � C��2 = C2 + �12C1; (A.3.6)
�(C1; C2) = �(�C1)�(C�2 ); (A.3.7)
�(C1; C2) = �(�C2)�(C�1 ); (A.3.8)
�(C1; C2) = � (aD) ; (A.3.9)
and
L(C1; C2; �12) =
Z 1
�C2
Z 1
�C1g(u1; u2)du1du2: (A.3.10)
Then,
m0;1 =�12�(C1; C2) + �(C1; C2)
L(C1; C2; �12); (A.3.11)
m1;0 =�(C1; C2) + �12�(C1; C2)
L(C1; C2; �12); (A.3.12)
m0;2 =L(C1; C2; �12)� �212C1�(C1; C2)� C2�(C1; C2) + q�12�(C1; C2)
L(C1; C2; �12); (A.3.13)
m2;0 =L(C1; C2; �12)� C1�(C1; C2)� �212C2�(C1; C2) + q�12�(C1; C2)
L(C1; C2; �12); (A.3.14)
m1;1 =�12L(C1; C2; �12)� �12C1�(C1; C2)� �12C2�(C1; C2) + q�(C1; C2)
L(C1; C2; �12); (A.3.15)
44
m0;3 =1
L(C1; C2; �12)[2L(C1; C2; �12)m0;1 +
��12�1� �212
�+ �312C
21
��(C1; C2)
+ C22�(C1; C2)� q�12C��2 �(C1; C2)]; (A.3.16)
m3;0 =1
L(C1; C2; �12)[2L(C1; C2; �12)m1;0 +
��12�1� �212
�+ �312C
22
��(C1; C2)
+ C21�(C1; C2)� q�12C��1 �(C1; C2)]; (A.3.17)
m1;2 =1
L(C1; C2; �12)[2�12L(C1; C2; �12)m0;1 + �12C
22�(C1; C2)
+ (1� �212 + �212C21 )�(C1; C2)� qC��2 �(C1; C2)]; (A.3.18)
m2;1 =1
L(C1; C2; �12)[2�12L(C1; C2; �12)m1;0 + �12C
21�(C1; C2)
+ (1� �212 + �212C22 )�(C1; C2)� qC��1 �(C1; C2)]:
(A.3.19)
6.4 Expressions of ��s
We obtain below formulas using Equation (12) and the formulas provided in Appendices 6:1
and 6:2 via a Maple code.
�1 =�(C1; C2)
L(C1; C2; �12); (A.4.1)
�2 =�(C1; C2)
L(C1; C2; �12); (A.4.2)
�3 =�C1�(C1; C2)� q�12�(C1; C2)
L(C1; C2; �12); (A.4.3)
�4 =�C2�(C1; C2)� q�12�(C1; C2)
L(C1; C2; �12); (A.4.4)
�5 = �q�(C1; C2)
L(C1; C2; �12); (A.4.5)
45
�6 =��(C1; C2) + C21�(C1; C2) + q�12 (aC�1 + C1) �(C1; C2)
L(C1; C2; �12); (A.4.6)
�7 =��(C1; C2) + C22�(C1; C2) + q�12 (aC�2 + C2) �(C1; C2)
L(C1; C2; �12); (A.4.7)
�8 = �aqC�1�(C1; C2)
L(C1; C2; �12); (A.4.8)
�9 = �aqC�2�(C1; C2)
L(C1; C2; �12); (A.4.9)
6.5 Expressions of ��s for Tunali and Baslevent�s (2006) Problem
Let �12 = �1�2�12, �11 = �21 and �22 = �22. When (u1; u2) � SBVN (�12), (�1u1; �1u1 +
�2u2) � BVN (0; 0; �11; a2; e) where a =p�11 + �22 + 2�12 and e = �11 + �12. Let b =
�22 + �12. Denote the probability of LFP = j by Pj and the standard bivariate normal
cumulative distribution function with upper thresholds s1, s2 and correlation coe¢ cient r by
�(s1; s2; r). Assume that state probabilities are given as:
P0 = �(C01; C02;C03); (A.5.1)
P1 = �(C11; C12;C13); (A.5.2)
P2 = �(C21; C22;C23); (A.5.3)
P3 = 1� P0: (A.5.4)
Unemployment probability follows from the adopted de�nition. Considering the categor-
ical variable LFP in Equation (20), observe that
P0 = P (y�1 < 0; y�1 + y
�2 < 0) = P
��1u1 < ��
01z; �1u1 + �2u2 < �
��01 + �
02
�z�
= P
0@u1 < ��01z�1
;�1u1 + �2u2
a<���01 + �
02
�z
a
1A : (A.5.5)
46
P1 = P (y�1 > 0; y�2 < 0) = P
��1u1 > ��
01z; �2u2 < ��
02z�
= P
u1 >
��01z�1
; u2 <��02z�2
!= P
u1 <
�01z
�1; u2 <
��02z�2
!: (A.5.6)
P2 = P (y�2 > 0; y�1 + y
�2 > 0) = P
��2u2 > ��
02z; �1u1 + �2u2 > �
��01 + �
02
�z�
= P
0@u2 > ��02z�2
;�1u1 + �2u2
a>���01 + �
02
�z
a
1A= P
0@u2 < �02z
�2;�1u1 + �2u2
a<
��01 + �
02
�z
a
1A : (A.5.7)
In Equations (A:5:5) through (A:5:7), we rescaled the variables to have unit variances
by dividing them to their standard errors. Observe that for LFP = 0, correlation will be
Cov(�1u1;�1u1+�2u2)pV ar(�1u1)
pV ar(�1u1+�2u2)
= ea�1; for LFP = 1, it will be � Cov(�1u1;�2u2)p
V ar(�1u1)pV ar(�2u2)
= � �12�1�2
;
and for LFP = 2, it will be Cov(�2u2;�1u1+�2u2)pV ar(�2u2)
pV ar(�1u1+�2u2)
= ba�2. Hence, we can write
C01 =��01z�1
; C02 =���01 + �
02
�z
a; C03 =
e
a�1; (A.5.8)
C11 =�01z
�1; C12 =
��02z�2
; C13 = ��12�1�2
; (A.5.9)
C21 =�02z
�2; C22 =
��01 + �
02
�z
a; C23 =
b
a�2: (A.5.10)
Then, in Appendix 6:4, by changing all C1�s to C21,all C2�s to C22, and all �12�s to C23,
we obtain the expressions for ��s in context of Tunali and Baslevent�s (2006) problem.
47
7 References
Bhattacharya, R. N., Ghosh, J. K. (1978) "On the Validity of Formal Edgeworth expansion."
The Annals of Statistics 6 (2), 434-451.
Bhattacharya, R. N., Rao, R. (1976) Normal Approximation and Asymptotic Expansions.
John Wiley & Sons, New York, NY.
Bourguignon, F., Fournier, M., Gurgand, M. (2007) "Selection Bias Corrections Based on the
Multinomial Logit Model: Monte-Carlo Comparisons." Journal of Economic Surveys
21 (1), 174-205.
Chambers, J. M. (1967) "On Methods of Asymptotic Approximations for Multivariate
Distributions." Biometrika 54 (3/4), 367-383.
Cramer, H. (1937) Random Variables and Probability Distributions. Cambridge Tracts,
Cambridge University Press, Cambridge, UK.
Dahl, G. B. (2002) "Mobility and the Returns to Education: Testing a Roy Model with
Multiple Markets." Econometrica 70 (6), 2367-2420.
David, S. T., Kendall, M. G., Stuart, A. (1951) "Some Questions of Distribution in the
Theory of Rank Correlation." Biometrika 38, 131-140.
Davis, P. J., Polonsky, I. (1964) "Numerical Interpolation, Di¤erentiation and Integration,"
in M. Abramowitz and I. A. Stegun (eds.). Ch. 25 in Handbook of Mathematical
Functions with Formulas, Graphs and Mathematical Tables, National Bureau of
Standards, Applied Mathematics Series 55, U. S. Government Printing O¢ ce,
Washington, DC.
Dubin, J. A., McFadden, D. L. (1984) "An Econometric Analysis of Residential Electric
Appliance Holdings and Consumption." Econometrica 52 (2), 345-362.
Edgeworth, F. Y. (1905) "The law of error" Transactions of the Cambridge Philosophical
Society 20, 36-65 & 113-141.
48
Elderton, E. P. (1906) "Notes and Bibliography." Biometrika 5, 206-210.
Eriksson, A., Forsberg, L., Ghysels, E. (2004) "Approximating the Probability Distribution
of Functions of Random Variables: A New Approach." Scienti�c Series, Cirano, Montreal.
Goldberger, A. S. (1991) A course in econometrics. Harvard University Press, Cambridge,
UK.
Heckman, J. J. (1976) "The Common Structure of Statistical Models of Truncation, Sample
Selection and Limited Dependent Variables and a Simpler Estimator for Such Models."
Annals of Economic and Social Measurement 5 (4), 475-492.
Heckman, J. J. (1979) "Sample Selection Bias as a Speci�cation Error." Econometrika
47 (1), 153-161.
Henning, C. H. C. A., Henningsen, A. (2007) "Modeling Farm Households�Price Responses
in the Presence of Transaction Costs and Heterogeneity in Labor Markets." American
Journal of Agricultural Economics 89 (3), 665-681.
Keane, M. P. (1992) "A Note on Identi�cation in the Multinomial Probit Model." Journal
of Business and Economic Statistics 10 (2), 193-200.
Kolassa, J. E. (1994) Series Approximation Methods in Statistics. Springer �Verlag, New
York, NY.
Kolassa, J. E, McCullagh, P. (1990) "Edgeworth series for lattice distributions." The Annals
of Statistics 18 (2), 981-985.
Lahiri, K., Song, J. G. (1999) "Testing for Normality in a Probit Model with Double
Selection." Economics Letters 65 (1), 33-39.
Lee, L. F. (1976) "Estimation of Limited Dependent Variable Models by Two Stage Methods."
Ph.D. Thesis, Department of Economics, University Of Rochester.
Lee, L. F. (1979) "Identi�cation and Estimation in Binary Choice Models with Limited
(Censored) Dependent Variables." Econometrica 47 (4), 977-996.
49
Lee, L. F. (1982) "Some Approaches to the Correction Of Selectivity Bias." Review of
Economic Studies 49 (3), 355-372.
Lee, L. F. (1983) "Generalized Econometric Models with Selectivity." Econometrica 51 (2),
507-512.
Lee, L. F. (1984) "Tests for the Bivariate Normal Distribution in Econometric Models with
Selectivity." Econometrica 52 (4), 843-863.
Lillard, L. A., Panis, C. W. A. (2003) aML multilevel multiprocess statistical software,
Version 2.0. EconWare, Los Angeles, California.
McFadden, D. L. (1973) Conditional Logit Analysis of Qualitative Choice Behavior in
Frontiers in Econometrics in P. Zarembka (eds.). Academic Press, New York, NY.
Magnac, T. (1991) "Segmented or Competitive Labor Markets." Econometrica 59 (1),
165-187.
Mardia, K. V. (1970a) Families of Bivariate Distributions in A.Stuart (eds.). Harner
Publishing Company, Darien, CONN.
Mardia, K. V. (1970b) "Measures of Multivariate Skewness and Kurtosis with Applications."
Biometrica 57 (3), 519-530.
Miller, J. (2005). History of mathematics Pages, http://members.aol.com/je¤570/e.html
Olsen, R. (1980). "A least squares correction for selectivity bias." Econometrica 48 (7),
1815�1820
Pretorious, S. J. (1930) "Skew Bivariate Frequency Surfaces, Examined in the Light of
Numerical Illustrations" Biometrika 22 (1/2), 109-223.
Rosenbaum, S. (1961) "Moments of the Truncated Bivariate Normal Distribution." Journal
of the Royal Statistical Society B2, 405-408.
Rothenberg, T. J. (1983) "Asymptotic properties of some estimators in structural models
in Studies in Econometrics," in S. Karlin, T. Amemiya and L.A. Goodman (eds.). Time
50
Series,and Multivariate Statistics. In Honor of Theodore W. Anderson. Academic Press,
New York, NY.
Ruud, P. A. (1983) "Su¢ cient Conditions for the Consistency of Maximum Likelihood
Estimation Despite Misspeci�cation of Distribution in Multinomial Discrete Choice
Models." Econometrica 51 (1), 225-228.
Sargan, J. D. (1984)Wages and Prices in the UK, Econometrics and Quantitative Economics
in D. Hendry and K. Wallis (eds.). Blackwell, Oxford, UK.
Schmertmann, C. P. (1994) "Selectivity Bias Correction Methods in Polychotomous Sample
Selection Models." Journal of Econometrics 60 (1/2), 101-132.
Stoker, T. M. (1986) "Consistent Estimation of Scaled Coe¢ cients." Econometrica 54 (6),
1461-1481.
Stuart, A., Ord, J. K. (1994) Kendall�s Advanced Theory of Statistics, Volume 1: Distribution
Theory. 6th edition, Edward Arnold, London, UK.
Tunali, I. (1986) "A General Structure for Models of Double-Selection and an Application of
a Joint Migration/Earnings Process with Re-migration." Research in Labor Economics
8B, 235-282.
Tunali, I. , Baslevent, C. (2006) "Female Labor Supply in Turkey," in S. Altug and A.
Filiztekin (eds.). Ch.4 in The Turkish Economy: The Real Economy, Corporate
Governance and Reform and Stabilization Policy, Routledge, London, UK.
51