CORRECTION OF SELECTIVITY BIAS IN MODELS OF ......CORRECTION OF SELECTIVITY BIAS IN MODELS OF DOUBLE SELECTION Berk Yavuzoglua, Insan Tunalib, Elvan Ceyhanc aDepartment of Economics,

CORRECTION OF SELECTIVITY BIAS IN MODELS OFDOUBLE SELECTION

Berk Yavuzoglua, Insan Tunalib, Elvan Ceyhanc

aDepartment of Economics, University of Wisconsin-Madison, Madison, Wisconsin

53706-1393, USA

bDepartment of Economics, Koc University, Sariyer, ·Istanbul 34450 Turkey

cDepartment of Mathematics, Koc University, Sariyer, ·Istanbul 34450, Turkey

August15, 2008

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

Abstract

Edgeworth expansions are known to be useful for approximating probability distributions

and moments. In our case, we exploit the expansion to specify models of double selection

by relaxing the trivariate normality assumption. We assume bivariate normality among the

error terms in the two selection equations and allow the distribution of the error term in the

regression equation to be free. This sets the stage for a simple test for the more common

trivariate normality speci�cation. Other recently proposed methods for handling multiple

categories are Multinomial Logit based selection correction models. An empirical example

about the wage determinants of regular and casual female workers in Turkey is presented to

document the di¤erences among the results obtained from our selectivity correction approach,

trivariate normality speci�cation and Multinomial Logit based selection correction models.

Keywords: double selection models; Edgeworth expansion; female labor supply; Multino-

mial Logit based selection correction models; selectivity bias

JEL codes: C34, C35, C63.

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

1 Motivation

Availability of a random sample provides the conventional setting for estimation of population

parameters. In this process, if the outcome variable is observed just for a subsample of the

original sample and this subsample itself fails to be a random sample, we encounter what

is called selectivity. In case of selectivity, reduced form estimates obtained by conventional

linear regression will be inconsistent for the population parameters. Selectivity problem,

which refers to the non-randomness of a sample caused by selection according to speci�ed

rules, has been a topic of interest for many researchers in econometrics and statistics in the

last 30 years.

To solve the selectivity problem, most of the literature deals with one selection equation

according to which the non-random sample is formed. Heckman (1976, 1979) and Lee (1976,

1979) solved this problem by assuming bivariate normality. Lee (1982) extended this literature

by dropping the bivariate normality assumption and relying on an Edgeworth expansion.

Tunali (1986) built on Heckman�s and Lee�s original work by bringing a second selection

equation into the scene and was able to handle complete and incomplete classi�cations using

a double selection model. He assumed trivariate normality among the random disturbances

of these three equations. Our contribution on the subject is relaxing the trivariate normality

assumption by extending Lee (1982). In obtaining our correction, we do not impose any

condition on the form of the distribution of the random disturbance in the regression equation,

but conveniently assume bivariate normality between the random disturbances of the two

selection equations. This apparently strong distributional assumption can be defended in

light of Ruud (1983) and Stoker (1986), or quasi-maximum likelihood methodology.

As documented in Miller (2005), Edgeworth expansion was introduced by F. Y. Edgeworth

in 1905 and dubbed "Edgeworth�s series" by E. P. Elderton in 1906. H. Cramer established the

theory of "Edgeworth�s series" in the 1920s. The presently popular term Edgeworth expansion

was used for the �rst time by David et al. (1951). The topic has received considerable

attention in statistics and econometrics. In order to sample the contributions, one may look

at Bhattacharya and Rao (1976), Bhattacharya and Ghosh (1978), Rothenberg (1983) and

Kolassa and McCullagh (1990).

Although double selection problem can be solved by employing other methodologies, we

2

opt for Edgeworth expansion because it is easier to implement and yields a richer functional

form for selectivity correction. This form nests the version obtained under trivariate normality

and provides a test of that restriction. We illustrate our robust selectivity bias correction in

the empirical context of Tunali and Baslevent (2006), where the goal is to estimate a wage

equation subject to participation and mode of employment choices.

Multinomial Logit based selection correction models have also been developed for the

purpose of handling two or more rounds of selection. These models have a very di¤erent

structure. They de�ne a selection equation for each possible state, and the random distur-

bances of these selection equations are assumed to be independent and identically Gumbel

distributed (Bourguignon et al., 2007). Our model has the advantage of allowing correlation

between the random disturbances of the two selection equations. Another advantage of our

model is the accommodation of states which are not disjoint with another state (as in Tunali

and Baslevent, 2006). In such cases, using a Multinomial Logit based selection correction

model is equivalent to ignoring the structure of the problem. Nonetheless, we present the re-

sults obtained from some well-known Multinomial Logit based selection models: Lee (1983),

Dubin and McFadden (1984) and its two variants proposed by Bourguignon et al. (2007),

and Dahl (2002).

In Section 2, we introduce the theory related to our problem. We start with the de�n-

ition of the double selection problem; then, discuss various solution strategies and reasons

for employing Edgeworth expansion. After explaining Edgeworth expansion in detail and

developing a theoretical solution for the double selection problem, we conclude with an ex-

planation of the estimation procedure. In Section 3, we provide an empirical example which

illustrates our selectivity correction. In the following section, we revisit the example using

Multinomial Logit based selection correction models. We compare these results with those

based on our selectivity correction approach. In the �nal section of the paper, we present the

conclusion.

3

2 Theory

2.1 Double Selection Problem

We take as our point of departure the following generic representation of the double selection

problem:

y�1i = �01X1i + �1u1i (�rst selection rule), (1)

y�2i = �02X2i + �2u2i (second selection rule), (2)

y3i = �03X3i + �3u3i (regression equation). (3)

For k = 1; 2; 3 and individual i, Xki�s are vectors of explanatory variables, �k�s are the

corresponding vectors of unknown coe¢ cients, uki�s are the random disturbances, and �k�s

are unknown scale parameters. The variables y�1i and y�2i are unobserved continuous random

variables, but without loss of generality, functions of them classify individuals to di¤erent

categories according to a sample selection regime denoted by �. From now on, we drop

subscript i to avoid notational clutter. Let X = X1 [X2 [X3. The feature of interest is

E(y3 j X;�) = �03X3 + �3E(u3 j X;�): (4)

Selectivity is manifested via E(u3 jX;�) 6= 0, which renders conventional linear regression

inconsistent. Parametric approaches exploit additional assumptions which yield a functional

form for E(u3 j X;�), so that adjustment for selectivity can be implemented. It is customary

to assume that (u1; u2; u3) is distributed independently across individuals and of Xk for

k = 1; 2; 3. Once the selectivity adjustment component is consistently estimated, linear

regression becomes a viable choice. The goal, then, is to consistently estimate �3 conditional

on the sample selection regime determined by �.

In the cases covered in Tunali (1986), the selection rule can be expressed as � = fD1; D2g

where D1 and D2 are two dichotomous variables indicating the outcomes of the two selection

rules as follows:

D1 =

8><>: 1 if y�1 > 0

0 if y�1 6 0

9>=>; and D2 =

8><>: 1 if y�2 > 0

0 if y�2 6 0

9>=>; : (5)

4

Then, we can characterize the outcomes resulting from the two selection equations in the

following 2� 2 table, where Sj is the set of individuals in the jth subsample for j = 1; 2; 3; 4.

D2

D1

0 1

0 S1 S2

1 S3 S4

Figure 1. Possible outcomes resulting from selection equations.

The case in which we observe all the cells of 2 � 2 table yields a complete classi�cation

of the original sample. However, incomplete classi�cation cases, in which only three or two

distinct cells of 2� 2 table are observed, are also possible. Various sample selection regimes

which �t this set-up are discussed in Tunali (1986). Without loss of generality, he assumes

that y3 is observed for the subsample S4, so that the conditional expectation function of

interest given in Equation (4) specializes to:

E(y3 j X;D1 = 1; D2 = 1) = �03X3 + �3E(u3j X;D1 = 1; D2 = 1)

= �3X3 + �3E(u3 j �1u1 > ��01X1; �2u2 > ��

02X2): (6)

The conditioning on X is implicit throughout, but we drop it for notational convenience.

Tunali (1986) assumes that u1, u2 and u3 have a trivariate normal distribution and sets

�1 = �2 = 1. The latter is an innocuous assumption for his set-up. However, for other

choices of �, this is no longer the case. In obtaining our solution, we relax both restrictions.

In particular, we let the form of the distribution of the random disturbance in the regression

equation be free. To prevent any confusion, under our distributional assumption, we denote

this term by ~u3 instead of u3, and assume that it has expectation E(~u3) = 0 and variance

V ar(~u3) = 1. However, we maintain that u1 and u2 have a standard bivariate normal

distribution.

In Section 2:3, after setting �1 = �2 = 1 for convenience and giving the functional form

of E(~u3 j u1 > ��01X1; u2 > ��2X2) obtained via an Edgeworth expansion, we provide a

strategy for consistent estimation of �3. The methodology can easily be extended to other

cases covered in Tunali (1986). Presently, we situate our approach within a broader distri-

5

butional context and justify our choice. We relax the variance restriction when we confront

the empirical problem in Tunali and Baslevent (2006).

2.2 Solution Strategies

Given our distributional assumptions, numerous approaches can be pursued for the purposes

of estimation. Kolassa (1994) o¤ers a broad list of series approximations and discusses the

computational details. One commonly used approximation is the Normal Inverse Gaussian,

examined in detail in Eriksson et al. (2004). Unlike Maximum Likelihood Estimation, this

method involves an incomplete speci�cation: the conditional expectation is written without

specifying the joint distribution. Even though this method works well for the univariate case,

its behavior is not known in the multivariate case. Edgeworth expansion also involves an

incomplete speci�cation, but is easier to apply compared to the alternatives.

For our purposes, the most appealing characteristic of Edgeworth expansion is the fact

that it nests the conventional trivariate normality assumption. Under the trivariate normality

assumption, conditional expectation function E(u3 j u1; u2) becomes a linear function of u1

and u2. In other words, trivariate normality allows us to express E(u3 j �) using only the

�rst and second central moments. Edgeworth expansion provides an improvement because

higher order moments are introduced. These allow us to relax the conventional restrictions

imposed on the skewness and kurtosis of the distribution.

Skewness is a measure of the degree of asymmetry of a distribution. If the left tail is

more pronounced than the right tail, the function is said to have negative skewness or said

to be left skewed. If the reverse is true, it has positive skewness or is right skewed. If two tails

are symmetric as in a normal distribution, then the density function has zero skewness. The

skewness of a multivariate distribution is de�ned by a function of 3rd order central moments

(Mardia, 1970b). For the univariate case, it is given by = �3=�3=22 where �i = E(x�E(x))i.

Kurtosis is the degree of peakedness of a distribution compared to the normal distribution,

and is de�ned as a function of 4th order central moments (Mardia, 1970b). For the univariate

case, the kurtosis is de�ned as & = �4=�22 .

Since only the �rst and second central moments are used under the trivariate normality

assumption, the conventional method will not work well if our data are skewed and/or kurtic.

In the derivation of our Edgeworth expansion, we also use the third and fourth central

6

moments. This allows us to test the restrictions on skewness and kurtosis. Most real life data

reveal some amount of skewness and/or kurtosis. Data are said to be skewed or kurtic if a

non-normal distribution is involved in their generation. Lahiri and Song (1999) provide an

example in the context of smoking related diseases.

2.3 Trivariate Edgeworth expansion

Let f(u1; u2; ~u3) be the joint density of u1, u2 and ~u3. The triplet (u1i; u2i; ~u3i) is assumed

to be independently and identically distributed across individulas with zero mean vector and

covariance matrix

� =

2666641 �12 �13

�12 1 �23

�13 �23 1

377775 ;and is independent of Xki for k = 1; 2; 3. It is helpful to keep in mind that u1, u2 and ~u3 are

the random disturbances of Equations (1) - (3). For brevity, we suppress the conditioning on

covariate matrix X and the individual subscript i throughout the section.

We denote the standard trivariate normal density STVN (�12; �13; �23) by g(u1; u2; u3)

and the standard bivariate normal density SBVN (�12) by g(u1; u2). Under some general

conditions given by Chambers (1967) and conditional on existence of all the moments of

u1, u2, and ~u3, the trivariate density f(u1; u2; ~u3) can be expanded in terms of a series of

derivatives of g(u1; u2; u3):

f(u1; u2; ~u3) = g(u1; u2; u3) +X

r+s+p�3

(�1)r+s+pr!s!p!

ArspDru1D

su2D

pu3g(u1; u2; u3); (7)

where Arsp�s are functions of the moments of f(u1; u2; ~u3), and

Dru1D

su2D

pu3g(u1; u2; u3) =

@r+s+pg(u1; u2; u3)

@ur1@us2@u

p3

: (8)

The iterative relation between the derivatives in Equation (8) can be captured with the

help of Hermite polynomials. In our case, a bivariate version is su¢ cient. The bivariate

7

Hermite polynomial of (r; s)th order, denoted by Hrs(u1; u2), satis�es

Dru1D

su2g(u1; u2) = (�1)

r+sHrs(u1; u2)g(u1; u2): (9)

In other words, Hrs(u1; u2) is a function of the ratio of (r; s)th to (0; 0)th order derivatives

of g(u1; u2). With the help of Equation (9), we can deduce from Equation (7) that the joint

density of u1 and u2 can be written as:

h(u1; u2) =

241 + Xr+s�3

1

r!s!Ars0Hrs(u1; u2)

35 g(u1; u2): (10)

Lee (1984) exploits this expression for testing bivariate normality in single selection models.

Employing the usual practice in the literature (see Lee, 1982; Lee, 1984; Lahiri and Song,

1999), we consider terms up to 4th order (r + s + p = 4 in Equation (7)). This amounts

to truncating the series approximation. While this practice is known as a Gram Charlier

Expansion for the univariate case, for the bivariate case, it is called a Type AaAa surface in

Pretorious (1930) and a Type AA surface in Mardia (1970a). For the trivariate case, we may

adopt Mardia�s (1970a) designation and identify the expansion as a Type AAA surface. We

further know that for r + s + p � 4, Arsp = Krsp where Krsp�s denote trivariate cumulants

(Lahiri and Song, 1999).

Returning to the generic representation of the double selection problem, we assume

(u1; u2) � SBVN (�12) in Equations (1) and (2), but allow the distribution of ~u3 to be free

with E(~u3) = 0 and V ar(~u3) = 1 in Equation (3) and set �1 = �2 = 1. Extending the steps

given in Mardia (1970a), we get the following operationally useful formula for the Type AAA

surface:

E(~u3 j u1; u2) = K101H10(u1; u2) +K011H01(u1; u2) +1

2K201H20(u1; u2)

+1

2K021H02(u1; u2) +K111H11(u1; u2) +

1

6K301H30(u1; u2)

+1

6K031H03(u1; u2) +

1

2K211H21(u1; u2) +

1

2K121H12(u1; u2): (11)

8

Equivalently,

E(~u3 j u1; u2) =1

1� �212[(�13 � �12�23)u1 + (�23 � �12�13)u2] +

1

2K201H20(u1; u2)

+1

2K021H02(u1; u2) +K111H11(u1; u2) +

1

6K301H30(u1; u2)

+1

6K031H03(u1; u2) +

1

2K211H21(u1; u2) +

1

2K121H12(u1; u2): (12)

The derivation is provided in Appendix 6:1. Appendix 6:2 gives explicit expressions for the

bivariate Hermite polynomials in Equation (11), and Appendix 6:3 presents derivations of

moments of the truncated SBVN distribution up to the third order.1

Incorporating the explicit formulas for the Hermite polynomials into Equation (12) and

using the moment formulas of the truncated SBVN distribution, we obtain:

E(~u3 j u1 > ��01X1; u2 > ��2X2) = �13�1 + �23�2 +

1

2K201�3 +K111�4 +

1

2K021�5

+1

6K301�6 +

1

6K031�7 +

1

2K211�8 +

1

2K121�9: (13)

Explicit formulas of ��s are given in Appendix 6:4. Note that �1 and �2 are the terms given

in Tunali (1986) and omission of the remaining terms in Equation (12) corresponds to the

standard trivariate normality speci�cation.

In theory, we can exploit Equation (10) and use approximation to the same order (r+s =

4) for the bivariate component, h(u1; u2), instead of assuming standard bivariate normality.

However, this will add 9 terms to the denominator. Since each of these terms includes

a di¤erent cumulant, the problem becomes intractable. We therefore rely on the SBVN

distribution. Two strands in the econometrics literature help us justify our choice. First, as

Ruud (1983) and Stoker (1986) have shown, slope coe¢ cients of index-function models can be

consistently estimated up to a factor of proportionality using any commonly used technique

like probit or logit. Second is the work on quasi maximum likelihood methodology: If the

correct speci�cation of the distribution is in the linear exponential family and we incorrectly

specify the model by using another distribution from that family, we still get consistent

1Our moment derivations of the truncated SBVN distribution up to the second order corroboratewith Henning and Henningsen�s (2007) formulas, but di¤er from Rosenbaum (1961). In our view,Rosenbaum (1961) made a sign error while calculating integrals. Besides, Henning and Henningsen(2007) cross-check their formulas using numerical integration and Monte Carlo simulation.

9

estimates.

As seen from the formulas of ��s in Appendix 6:4, the model is highly non-linear. There-

fore, adopting the practice in the literature, we rely on Heckman�s (1979) two-step estimation

procedure to estimate Equation (12). In the �rst step, we maximize the relevant likelihood

function to get consistent estimates of ��s and �12.2 Then, we �t the following linear regression

for the individuals in S4:

y3 = �03X3+ 1�1+ 2�2+ 3�3+ 4�4+ 5�5+ 6�6+ 7�7+ 8�8+ 9�9+ �3�; (14)

where E(� j u1 > ��01X1; u2 > ��2X2) = 0. We may refer to ��s as selectivity correction

terms. Then, we can test for the presence of selectivity bias by testing joint signi�cance of all

selectivity correction terms �1, �2, :::, �9. Furthermore, we can set 3 = 4 = ::: = 9 = 0 to

test for the conventional trivariate normality speci�cation under the maintained assumption

that the random disturbances in the two selection equations are bivariate normally distrib-

uted.3 The latter test can also be thought as a test for linearity of the conditional expectation

of ~u3 given u1 and u2.

3 Application

3.1 Problem

We illustrate our methodology in the context of the empirical work undertaken in Tunali

and Baslevent (2006). The data come from October 1988 Household Labor Force Survey

conducted by the State Institute of Statistics. The working sample consists of 8962 married

women between 20-54 years old who reside in urban areas (population > 20000), are not

in school and identi�ed as the only wife of the household head. There are four labor force

participation categories in the data set: 7770 non-participants, 745 wage workers (or regu-

lar/casual wage and salary earners), 181 self-employed which include employers, own-account

work and unpaid family workers, and 266 unemployed. The goal of the paper is to estimate

labor supply elasticities for wage workers. This calls for estimation of a wage equation for

2The formation of the likelihood function according to di¤erent sample selection regimes is explainedin detail by Tunali (1986).

3Lahiri and Song (1999) provide a Lagrange Multiplier procedure for testing trivariate normalityassumption in a sequential selection context.

10

the subsample of wage workers, taking selection into consideration.

Tunali and Baslevent (2006) assume that home-work (or non-participation), self-employment

and wage work utilities can be expressed as follows.

Home-work utility : U�0 = �00z + �0; (15)

Self-employment utility : U�1 = �01z + �1; (16)

Wage work utility : U�2 = �02z + �2; (17)

where z is a vector of observed variables, �j�s are the corresponding vectors of unknown

coe¢ cients and �j�s are the random disturbances. Assuming that individuals choose the

state with highest utility, their decision can be captured using the utility di¤erences:

y�1 = U�1 � U�0 = (�01 � �

00)z + (�1 � �0) = �

01z + �1u1; (18)

y�2 = U�2 � U�1 = (�02 � �

01)z + (�2 � �1) = �

02z + �2u2: (19)

This pair of selection equations has the same form as the ones in Equations (1) and (2)

with X1 = X2 = z. Note that y�1 can be expressed as the propensity to be self-employed or

the excess of the utility gained from self-employment compared to home-work and y�2 as the

incremental propensity to engage in wage work rather than self-employment. Then, y�1 + y�2

gives the excess utility under wage work over home-work. Even though we do not know

the preferences of unemployed women, Tunali and Baslevent (2006) follow Magnac (1991)

and de�ne them as people obtaining higher utility either from self-employment or wage work

relative to home-work. The four way classi�cation observed in the sample can be obtained

via the categorical variable LFP (Labor Force Participation)

LFP =

8>>>>>>><>>>>>>>:

0 = home-work, if y�1 < 0 and y�1 + y

�2 < 0;

1 = self-employment, if y�1 > 0 and y�2 < 0;

2 = wage labor, if y�2 > 0 and y�1 + y

�2 > 0;

3 = unemployed, if y�1 > 0 or y�1 + y

�2 > 0:

9>>>>>>>=>>>>>>>;(20)

Hence, the sample selection regime is given by � = fLFPg for this problem. In Appendix

6:5, the methodology given in Section 2:3 is adapted to handle the current case.

11

Since probability of being unemployed is de�ned as a union of self-employed and wage

work probabilities, we conclude that our support is broken down into three mutually exclusive

regions, which correspond to LFP = 0; 1; 2. Considering Equation (20), we see that the

classi�cation in our sample is obtained via y�1, y�2 and y

�1 + y

�2. Suppose we were to normalize

the variances of y�1 and y�1 + y�2 to 1, but this has an implication for the variance of y�2

(�22 = �2�12). Hence, we can apply the normalization to one of �11 = �21 and �22 = �22,

and must let the other variance be free. In our analysis, we apply the normalization �11 = 1

and let �22 be free. In the �rst step, we rely on maximum likelihood estimation and obtain

consistent estimates of �1, �2, �12 and �2 subject to �1 = 1. The relevant likelihood function

is given by

L =Y

LFP=0

P0Y

LFP=1

P1Y

LFP=2

P2Y

LFP=3

P3; (21)

where Pj = Pr(LFP = j) for j = 0; 1; 2; 3 and their explicit de�nitions are provided in

Appendix 6:5.

The regression equation for this problem is a Mincer-type wage equation obtained by

setting y3 = log(wage) in Equation (3) where X3 includes human capital variables and labor

market characteristics:

log(wage) = �03X3 + �3u3: (22)

The aim is to estimate �3 for wage workers, who have LFP = 2. As in Equation (14),

after forming the estimates of selectivity correction terms via �rst step estimates, we run a

linear regression equation with 9 selectivity correction terms in the second step.

3.2 Results

Table 1 provides the summary statistics for the variables used in the �rst stage of the analysis.

We �rst run a normalized (�11 = 1) maximum likelihood estimation as Tunali and Baslevent

(2006) did. This step is estimated using derivative free Maximum Likelihood option (lf) of

Stata by restricting the correlations and variances according to their theoretical ranges using

non-linear transformations. Besides, the probability weights coming from the data are used

12

Table 1. Sample Means (Standard Deviations) of Variables by Labor Force ParticipationStatus

Variable Fullsample

Nonparticipant

Selfemployed

Wageworker

Unemployed

Own wage 1.39(3.17)

Age 33.29 33.41 34.28 32.82 30.65Age squared/100 11.75 11.86 12.37 11.18 9.84Experience 20.29 20.82 21.31 15.76 16.93Experience squared/100 4.87 5.08 5.26 3.11 3.39Illiterate (reference) 0.25 0.27 0.21 0.055 0.13Literate without a diploma 0.090 0.095 0.12 0.034 0.071Elementary school 0.49 0.52 0.47 0.22 0.48Middle school 0.058 0.055 0.10 0.060 0.094High school 0.087 0.059 0.072 0.35 0.20University 0.030 0.0069 0.022 0.28 0.023Husband selfemployed 0.33 0.35 0.39 0.16 0.18Children aged 02 0.26 0.26 0.18 0.22 0.22Children aged 35 0.34 0.35 0.28 0.27 0.34Female children aged 614 0. 41 0.42 0.49 0.32 0.36Male children aged 614 0.44 0.45 0.49 0.34 0.43Extended household 0.12 0.13 0.14 0.11 0.079Ext. hh × Children aged 02 0.029 0.029 0.028 0.031 0.019Ext. hh × Children aged 35 0.036 0.036 0.033 0.039 0.019Ext. hh × Female ch. aged 614 0.045 0.046 0.044 0.032 0.030Ext. hh × Male ch. aged 614 0.047 0.049 0.050 0.039 0.030Share of textiles 0.31 0.31 0.39 0.30 0.31Share of agriculture 0.17 0.18 0.20 0.15 0.19Share of finance 0.060 0.059 0.053 0.064 0.056Migration rate 0.024 0.023 0.021 0.035 0.018Marmara (reference) 0.35 0.35 0.32 0.37 0.28Aegean 0.11 0.11 0.10 0.14 0.21South 0.12 0.12 0.26 0.099 0.13Central 0.19 0.18 0.10 0.23 0.17North West 0.040 0.037 0.028 0.055 0.075East 0.078 0.081 0.088 0.044 0.086South East 0.081 0.087 0.088 0.039 0.015North East 0.033 0.034 0.0055 0.027 0.041Population 200,000 or more 0.67 0.66 0.69 0.74 0.60Population 1 million or more 0.51 0.50 0.43 0.61 0.46Share of Welfare Party 0.073 0.074 0.059 0.062 0.058Share of Left of Center 0.35 0.35 0.33 0.36 0.36Sample size 8962 7770 181 745 266

13

since sampling frame relies on strati�cation. Robust standard errors are reported throughout

the section. Table 2 provides the results of the �rst step.

Based on the likelihood ratio statistics, 4-way model �ts the data well (p�value < 0:0001).

Since �12 is very close to �1, computational fragility is an issue. A cure for computational

fragility in multinomial probit models may be adding alternative speci�c variables for each

election equation as shown by Keane (1993). Since alternative speci�c variables are not

available, this venue is not a viable option. To explore the issue, Tunali and Baslevent (2006)

estimate restricted versions of the model in which they set �22 = B �11 for B = 0:35, 0:5; 1,

and 2 to see the sensitivity of the results. They state that all these restricted versions lead

to very similar inferences. However, they only report the estimates of the restricted version

with B = 0:35 since �22 is equal to 0:358 in the normalized version.4 This version improves

the precision of several slope estimates.5

Detailed discussion of the substantive results can be found in Tunali and Baslevent (2006).

We review some of these results in Section 4:3 in order to compare them with the �rst stage

results of Multinomial Logit based selection correction models. The negative sign of �12 can

be interpreted as follows: if the unobserved characteristics of a woman make her more likely

to choose self-employment rather than home-work, these unobserved characteristics also make

her less likely to choose wage work over self-employment. (Tunali and Baslevent, 2006)

We provide three least squares estimates of the wage equation in Table 3: without se-

lectivity correction terms, with conventional correction (corresponds to trivariate normality

speci�cation), and with robust correction (corresponds to the speci�cation of this paper).

We exclude 10 observations in these regressions due to missing wage information. Note that

experience corresponds to potential labor market experience, which equals age � 7 � years

of schooling for individuals having elementary school education or above and age � 12 for

individuals who have not �nished elementary school.

The regression equation is estimated after excluding 15 variables that appear in the se-

lection equations and putting experience and experience squared/100 instead of age and

4We do not provide the �rst step estimates of the restricted version with B = 0:35 here since thecoe¢ cient estimates di¤er at most by 4% from the normalized version. Furthermore, we only needpoint estimates of the normalized version in order to correct selectivity bias in the regression equation.

5Note that age, age squared/100, high school, university, Aegean, North West, North East andconstant in the �rst selection equation, and literate without a diploma, middle school, share of �nanceand population 1 million or more in the second selection equation become signi�cant in the restrictedversion with B = 0:35 in addition to the signi�cant variables in Table 2.

14

Table 2. Maximum Likelihood Estimates of Reduced Form Participation Equations(Normalized Version)

First Selection Second SelectionVariableCoefficient Standard

ErrorCoefficient Standard

ErrorAge 0.077 0.066 0.010 0.029Age squared/100 0.124 0.096 0.024 0.041Literate without a diploma 0.256*** 0.099 0.185 0.122Elementary school 0.124 0.085 0.048 0.075Middle school 0.471*** 0.138 0.194 0.187High school 0.534 0.580 0.090 0.116University 0.931 1.050 0.149 0.154Husband selfemployed 0.081 0.242 0.152* 0.092Children aged 02 0.178** 0.075 0.077 0.089Children aged 35 0.078 0.077 0.012 0.053Female children aged 614 0.055 0.103 0.075 0.094Male children aged 614 0.031 0.089 0.072 0.065Extended household 0.189 0.133 0.167 0.130Ext. hh. × Children aged 02 0.096 0.181 0.047 0.182Ext. hh. × Children aged 35 0.058 0.178 0.091 0.149Ext. hh. × Female ch. aged 614 0.252 0.165 0.221 0.161Ext. hh. × Male ch. aged 614 0.083 0.161 0.103 0.137Share of textiles 0.058 0.242 0.172 0.153Share of agriculture 0.759* 0.432 0.952*** 0.356Share of finance 3.686* 2.121 2.918 2.637Migration rate 3.642** 1.613 4.214*** 1.515Aegean 0.229 0.228 0.326* 0.180South 0.038 0.101 0.015 0.093Central 0.508* 0.263 0.547* 0.292North West 0.392 0.306 0.526** 0.262East 0.381 0.233 0.424* 0.233South East 0.291* 0.151 0.149 0.201North East 0.850 0.546 0.970* 0.528Share of Welfare Party 5.969*** 2.063 5.628** 2.857Share of Left of Center 0.697 0.695 1.172** 0.508Population 200,000 or more 0.114 0.088 0.029 0.123Population 1 million or more 0.160* 0.092 0.139 0.099Constant 1.688 1.340 0.242 0.643σ11 1 [normalized]σ22 0.358 (0.514)ρ12 0.971*** (0.046)No. of observations 8962Loglikelihood without covariates 3247.25Loglikelihood with covariates 2443.05Robust standard errors are reported in parentheses.* significant at 10%; ** significant at 5%; *** significant at 1%.

15

Table 3. Least Squares Estimates of the Wage EquationWith robustcorrection

With conventionalcorrection

Without selectivitycorrection terms

Variable

Coef. Std.Error

Coef. Std.Error

Coef. Std.Error

Experience 0.032*** 0.012 0.034*** 0.013 0.031** 0.013Experience sq./100 0.064** 0.033 0.072** 0.037 0.064* 0.037Literate w/o a diploma 0.119 0.174 0.080 0.181 0.112 0.186Elementary school 0.055 0.116 0.028 0.120 0.011 0.121Middle school 0.215 0.149 0.256* 0.144 0.231* 0.125High school 0.594*** 0.191 0.437** 0.184 0.465*** 0.119University 1.254*** 0.272 1.083*** 0.250 1.038*** 0.128Share of Textiles 0.308** 0.150 0.282** 0.143 0.407*** 0.122Share of Agriculture 0.098 0.237 0.085 0.238 0.150 0.238Share of Finance 3.437*** 1.319 2.557** 1.225 4.241*** 1.158Aegean 0.135* 0.072 0.169** 0.073 0.131* 0.069South 0.162** 0.081 0.172** 0.080 0.159* 0.082Central 0.035 0.072 0.007 0.073 0.029 0.073North West 0.151 0.100 0.170* 0.099 0.166* 0.096East 0.109 0.114 0.096 0.115 0.142 0.113South East 0.130 0.118 0.086 0.114 0.191* 0.100North East 0.137 0.117 0.210* 0.113 0.129 0.115

1λ 0.044 0.687 0.171 0.146 ―

2λ 2.441 3.132 0.458** 0.182 ―

3λ 0.135 0.412 ― ―

4λ 2.033 2.784 ― ―

5λ 2.076 2.897 ― ―

6λ 0.060 0.215 ― ―

7λ 0.625 1.006 ― ―

8λ 0.595 1.291 ― ―

9λ 1.149 2.080 ― ―

Constant 1.087** 0.519 0.927*** 0.345 0.923*** 0.194No. of obs. 735 735 735R2 0.387 0.378 0.367Robust standard errors are in parentheses.* significant at 10%; ** significant at 5%; *** significant at 1%.

age squared/100. In order to be con�dent that exclusion of these variables do not cast

any doubt in our �ndings, we conduct Sargan�s (1984) test for overidentifying restrictions

on our robust correction.6 The regression of the residuals from the wage equation on the

6We do not include age and age squared/100 in this test since experience is a highly collinearfunction of age.

16

15 excluded variables generates an R2 = 0:0035. Let n denote the sample size. Then,

nR2 = 2:5725 � �2(15), which is well within the acceptance region. Hence, the exclusions are

warranted.

Using robust correction results, a test for the joint signi�cance of all the selectivity cor-

rection terms yields strong evidence in favor of non-random selection (p � value = 0:0006).

Moreover, the test 3 = 4 = ::: = 9 = 0 provides evidence in favor of our robust correction

against conventional correction (p� value = 0:0424).

Only a fraction of women in urban areas participate and participation probability in-

creases dramatically with education. In light of the �rst stage results, one would think that

participants do not constitute a random subsample and this has implications for the estimate

of the wage equation. Indeed, comparison of the substantive results from the three models un-

derscores the importance of proper adjustment for selection. According to our speci�cation,

while the coe¢ cient of middle school becomes insigni�cant (implying people having middle

school education or less earn the same wage), the coe¢ cients of high school and university

dramatically increase in magnitude. Evidently, the minority which opts for wage work is

highly selective of unobservables and failure to account for this yields underestimation of the

returns to higher levels of education. With illiterates as the reference category,high school

results in 81 percent higher returns on average according to our robust correction whereas it

is 55 percent on average under conventional correction. Compared to illiterates, while univer-

sity results in 250 percent higher returns on average under our robust correction, this statistic

is 195 percent on average in the trivariate normality speci�cation. According to our robust

correction, average incremental returns are 21.9 percent per additional year of high school

and 28 percent per additional year of university education. The other signi�cant variables

under our robust correction have coe¢ cient estimates between that of random selection and

conventional correction.

Our robust correction implies that wage is maximized at 25 years of potential labor market

experience. This statistics is given by 24 in the conventional correction. Furthermore, regions

are jointly statistically signi�cant with a p� value of 0:0158. Besides, a one percent increase

in share of textiles causes a 0:31 percent decrease in wages while a one percent increase in

share of �nance increases wages by 3:4 percent on average. These e¤ects are underestimated

in the trivariate normality speci�cation.

17

3.3 Multicollinearity

One issue that arises in the context of our robust correction approach is the large standard

errors of individual selectivity correction terms (see Table 3). Letting R2j for j = 1; 2; ::9

denote the coe¢ cient of determination obtained by regressing �j on the remaining selectivity

correction terms, we obtain values in the range 0:995 < R2j < 1. This implies extremely high

multicollinearity. At a �rst glance, one may think that this is caused by the fragility of our

selection model since �12 is very close to �1. To get rid of this fragility, we reestimated the

model by restricting �22 = 1. This version yields �12 = �0:746 and 0:964 < R2j < 0:9997 for

j = 1; 2; ::9.

To assure that multicollinearity issue is not speci�c to this problem, we also generated

��s for Tunali�s (1986) migration-remigration problem, which is another example of a double

selection problem. The generic statement of this problem was given in Section 2:1. The

empirical context of the problem is as follows. There are three earnings pro�les: y�s under

the stay option, y�o under the one-time move option, y�f under the frequent move option. Let

��1 = y�o� y�s denote anticipated earnings gain from the one-time move to the best potential

destination, and ��1 be the cost of that move. Let ��2 = y�f� y�o denote anticipated incremental

earnings gain from the frequent move involving the best potential combination of intermediate

and �nal destinations, and ��2 be the cost of that move. Tunali (1986) assumes that individuals

are rational, in that sense anticipations are unbiased. Then, the two selection equations in

Equations (1) and (2) are obtained as y�1 = ��1 � ��1 and y�2 = ��2 � ��2. Upon introducing the

variable r = supf0; y�1; y�1 + y�2g, the decision rule becomes

Stay if r = 0; (23)

Move once if r = y�1; (24)

Move more than once if r = y�1 + y�2: (25)

However, Tunali�s (1986) modi�ed decision rule treats r = y�1 + y�2 > 0 > y�1 (move more

than once will be optimal in this case) as a stay decision on the grounds that the �rst move

will not be feasible when 0 > y�1.

Using the two dichotomous variables given in Equation (5), we can de�ne the decision

18

rule as:

Stay if D1 = 0; (26)

Move once if D1 = 1 and D2 = 0; (27)

Move more than once if D1 = 1 and D2 = 1: (28)

Thus, D2 is observed () D1 = 1. The sample selection regime for this problem is a 3-way

classi�cation with � = fD1; D2g, where cells in the �rst row of Figure 1 are combined. A more

detailed description can be found in Tunali (1986). For this problem, we �nd �12 = 0:389.

Nevertheless, high multicollinearity is present: 0:910 < R2j < 0:9999 for j = 1; 2; ::9.

Note that STATA had no trouble in computing standard errors and test statistics despite

extremely high multicollinearity. We are able to explain this issue by exploiting linear alge-

bra. Partition the full covariate matrix ~X3 as (X31 j X32), where X31 includes all explanatory

variables of the model without selectivity correction terms (X31 = X3 in terms of the nota-

tion given in the generic statement of the double selection model), and X32 includes all the

selectivity correction terms. We can write:

( ~X03~X3)

�1 =

0B@ X031X31 X

031X32

X032X31 X

032X32

1CA�1

: (29)

We use the lower right block of the matrix ( ~X03~X3)

�1 in calculating standard errors of

��s. This block may be computed as [(X032X32)�X

032X31(X

031X31)

�1X031X32]

�1 via submatrix

inverse theorem (see Goldberger, 1991). Extremely large R2j�s imply that the matrix (X032X32)

is nearly singular. However, this matrix is not inverted. Instead, the di¤erence between

(X032X32) and X

032X31(X

031X31)

�1X031X32 is inverted. Therefore, it is possible to compute

standard errors for ��s.

19

4 Multinomial Logit Based Selection Correction Models

4.1 Selection Step

Multinomial Logit based selection correction models are used as alternatives to Multivariate

Normal based models in the literature. These models de�ne a selection equation for each

possible state.

In the context of Tunali and Baslevent (2006), let7

y�j = �0jz + �j for j = 0; 1; 2; 3; (30)

where y�j�s denote the utility obtained from the choice of labor force participation status j and

are unobserved, z is the vector of explanatory variables8, �j�s are the corresponding vectors

of unknown coe¢ cients and �j�s are the random disturbances. Let r = max (y�0; y�1; y

�2; y

�3).

The four way classi�cation can obtained via the following categorical variable:

LFP 0 =

8>>>>>>><>>>>>>>:

0 = home-work, if r = y�0;

1 = self-employment, if r = y�1;

2 = wage labor, if r = y�2;

3 = unemployed, if r = y�3:

9>>>>>>>=>>>>>>>;(31)

Hence, the sample selection regime is given by � = fLFP 0g and Equation (30) corresponds

to the selection equations.

We assume that �j�s satisfy Independence of Irrelevant Alternatives (IIA) hypothesis, so

that they are independent and identically Gumbel distributed. McFadden (1973) proves that

this speci�cation corresponds to the Multinomial Logit model (Bourguignon et al., 2007).

The choice probabilities are

�j = Pr(LFP0 = j j z) =

exp(�0jz)

3X�=0

exp(�0�z)

; j = 0; 1; 2; 3: (32)

7 Individual subscripts are dropped from Equation (30) in order to avoid notational clutter.8We deliberately use the same set of explanatory variables speci�ed in the Equations (18) and (19)

for logit based selection equations. The aim is to compare the empirical results of these models andour selectivity correction approach in Section 4:3.

20

Since3Xk=0

�k = 1, we choose non-participants as the reference group and set �0 = 0. The

estimation of these models relies on two step estimation as well. In the �rst step, we obtain

consistent estimates of �j�s by maximizing the likelihood function in Equation (33):

L =Y

LFP=0

�0Y

LFP=1

�1Y

LFP=2

�2Y

LFP=3

�3: (33)

Note that our selectivity correction approach allows correlation between the random dis-

turbances of the two selection equations, but computational di¢ culties limit its generaliza-

tion9 (unless simulation based estimation is used at the �rst stage). On the other hand,

Multinomial Logit based selection correction models do not allow correlation between the

random disturbances of the selection equations, but can handle a large number of selection

equations. Because of the IIA assumption, these models treat unemployment as a distinct

state. However, unemployment probability is de�ned as a union of self-employment and wage

labor options in our problem [see Equation (20)]. Therefore, another potential limitation of

these models is the failure to capture the choice structure appropriately. Despite this failure,

we still discuss second (regression) step estimation for some well-known Multinomial Logit

based selection correction models (Lee (1983), Dubin and McFadden (1984) and its two vari-

ants proposed by Bourguignon et al. (2007), and Dahl (2002)) in the following subsection.

4.2 Regression Step

Regression equation given in Equation (22) remains intact. Our aim is to estimate the

parameter vector of the regression equation under the speci�ed Multinomial Logit based

selection correction models for the individuals with LFP0= 2. Returning to the generic

statement of the selectivity problem in Equation (4), once again we are confronted with

E(u3 j X;�) 6= 0. Multinomial Logit based models di¤er in the assumptions needed for

obtaining a parametric form for E(u3 j X;�). The regression step for each model is discussed

in detail below. Denote the correlations between �j and u3 by �j for j = 0; 1; 2; 3 and let

9This is the case for any multinomial probit model. For example, the aML software package (Lillardand Panis, 2003) may allow at most 4 alternatives.

21

�l = �l � �2 for l = 0; 1; 3. In addition, considering Equation (30), we de�ne

�2 = max (y�l � y�2) for l 2 f0; 1; 3g: (34)

Then,

LFP 0 = 2() �2 < 0: (35)

4.2.1 Lee�s (1983) Model

Lee (1983) generalizes the two-step selection correction methodology provided by Heckman

(1976, 1979) and Lee (1976, 1979) by extending it to polychotomous choice. Thus, this

method can be applied to Multinomial Logit based selectivity problems.

Assume that the cumulative distribution function for �2 is given by F�2(�2 j �0z) where

� is a matrix including the vectors of unknown coe¢ cients in each selection equation, i.e.

� = (�0 j �1 j �2 j �3). Using the translation method, we can de�ne

J�2(�2 j �0z) = ��1(F (�2 j �

0z)); (36)

where � is the standard normal distribution function. Lee (1983) makes the following as-

sumption:

The joint distribution of (u3; J�2(�2 j �0z)) does not depend on the index �

0z: (37)

This assumption is contested in the literature since it has strong implications. First, the

transformation J�2(�2 j �0z) implies that the marginal distribution is independent of �

0z, but

it does not tell us anything about the joint distribution. Besides, Schmertmann (1994) shows

that under Lee�s (1983) index independence assumption, all �l�s must have the same sign;

moreover, if we assume �l� �2�s are identically distributed, �l�s will be identical for l = 0; 1;

3 (Bourguignon et al., 2007).

Lee (1983) also assumes that

E(u3 j �2; �0z) = c2J�2(�2 j �

0z); (38)

22

where c2 is the correlation between u3 and J�2(�2 j �0z). In the literature, some ways of avoid-

ing the additional assumption in Equation (38) are available. If u3 is normally distributed,

this equation follows directly. When u3 is not normally distributed, this equation may be

viewed as a linear approximation to the conditional expectation function. Another way to

justify Equation (38) is to translate the distribution of u3 to standard normal. Following

Equation (36), we denote this translation as Ju3(u3). Then, assuming that the joint distrib-

ution of (Ju3(u3), J�2(�2 j �0z)) does not depend on �

0z (i.e. the correlation between Ju3(u3)

and J�2(�2 j �0z) is zero), we obtain Equation (38).

To recapitulate, under Lee (1983)�s assumptions, we get

E(u3 j LFP0= 2; �

0z) = �2�

j2; (39)

where �2 = �c2 and �j2 =�(J�2 (0 j �

0z))

F�2 (0 j �0z). Note that �j2 is the familiar term in conventional

selection correction. As before, after obtaining the consistent estimate of � from the �rst step,

so for �j2, the second step estimates of Lee�s (1983) model (LEE) are acquired by estimating

a linear regression:

log(wage) = �03X3 + �3�2�

j2: (40)

4.2.2 Dubin and McFadden�s (1984) Model and Its Two Variants

The conditional expectation function of the regression equation error is assumed to be a linear

function of �j � E(nj)�s, yielding Equation (41):10

E(u3 j �1; �2; �3; �4) =p6

�

3Xj=0

�j(�j � E(nj)): (41)

In this expression, consistent with the set-up of the selection step, �j�s are assumed to have

independent extreme value distributions. It can be shown that

E(�2 � E(n2) j LFP0= 2; �

0z) = � log(�2); (42)

10Linear conditional expectation function assumption is a convenient starting point going back toOlsen (1980).

23

E(�l � E(nl) j LFP0= 2; �

0z) =

�l log(�l)

1� �lfor l = 0; 1; 3: (43)

where log denotes the natural logarithm. To assure that E(u3 j X3) = 0, they also assume

3Xj=0

�j = 0: (44)

Under the assumptions given by Equations (41) and (44), the regression equation becomes

log(wage) = �03X3 +

�3p6

�

Xl=0;1;3

�lsl; (45)

where sl =�l log(�l)1��l + log(�2) for l = 0, 1, 3. After forming the three selectivity correction

terms via selection step estimates, we run a least squares on Equation (45) to obtain the

second step estimates of DMF model. Even though the estimation will not provide us a

consistent estimate for �2, we may calculate it from Equation (44) as �2 = �(�0 + �1 + �3).

Since only restriction on the correlation structure is via Equation (44), DMF can be regarded

as superior to LEE.

Bourguignon et al. (2007) show via Monte Carlo experiments that the restriction in

Equation (44) results in biased results when incorrectly imposed; in contrast, e¢ ciency loss

is small by not imposing it when the restriction holds. Hence, they drop the restriction in

Equation (44) and refer to it as the �rst variant of Dubin and McFadden�s (1984) model

(DMF 1). For this model, the regression equation is given by

log(wage) = �03X3 +

�3p6

�

24 Xl=0;1;3

�lsl + �2s2

35 ; (46)

where sl =�l log(�l)1��l for l = 0, 1, 3 and s2 = � log(�2). Observe that if we set �2 =

�(�0 + �1 + �3) in this equation, we get Equation (45). Hence, DMF 1 is more general

compared to DMF. After forming the four selectivity correction terms via selection step

estimates, we run a least squares on Equation (46) to obtain the second step estimates of the

DMF 1 model.

Since the assumption in Equation (41) restricts the distribution of u3, Bourguignon et al.

(2007) proposes a second variant of Dubin and McFadden�s (1984) model (DMF 2). Assuming

24

that the cumulative distribution function for �j is given by G(�j) and using the trick given

in Lee (1983), they make the transformation

��j = J(�j) = ��1(G(�j)): (47)

Then, they assume that conditional expectation function of the regression equation error

is a linear function ��j�s:

E(u3 j �1; �2; �3; �4) =4Xj=1

��j��j ; (48)

where ��j is the correlation between u3 and ��j . Observe that Equation (41) can be obtained

by setting ��j = �j�E(�j) in this equation; therefore, DMF 2 is a generalized version of DMF

1. Note that the assumption in Equation (48) holds if u3 and ��j�s are multivariate normally

distributed. Assuming that �(:) denotes the density function of �j , let

M(�j) =

ZJ(v � log(�j))�(v)dv for j = 0; 1; 2; 3: (49)

Then, Bourguignon et al. (2007) derive the following:

E(��2 j LFP0= 2; �

0z) =M(�2): (50)

E(��l j LFP0= 2; �

0z) =M(�l)

�l�l � 1

for l = 0; 1; 3: (51)

As a result, wage equation can be written as

log(wage) = �03X3 + �3

24 Xl=0;1;3

��l sl + ��2s2

35 : (52)

where sl =M(�l)�l�l�1 for l = 0, 1, 3 and s2 =M(�2). Even though M�s do not have closed

form solutions, they are computed using Gauss-Laguerre quadrature. In this calculation,

methods given by Davis and Polonsky (1964) are used. Second step estimates of DMF 2 are

obtained by running a linear regression on Equation (52) after obtaining consistent �rst step

estimates and then forming selectivity correction terms.

25

4.2.3 Dahl�s (2002) Model

Dahl (2002) proposes a semi-parametric model. Estimation of the model becomes more di¢ -

cult as the number of categories increases. To simplify matters, he makes an index su¢ ciency

assumption by restricting the set of known probabilities to a subset Q0of the full choice set

Q = f�0; �1; �2g in order to deal with large number of categories. We do not include �3 in

Q since �3 = 1� (�0+ �1+ �2) This assumption can be written as follows:

'(u3; �2 j �0z) = '(u3; �2 j Q) = '(u3; �2 j Q

0) (53)

where ' is the conditional density function of u3 and �2.

In other words, Dahl (2002) makes the following restriction in order to reduce the dimen-

sionality of the problem:

E(u3 j LFP0= 2; �

0z) = E(u3 j LFP

0= 2; Q

0) = (Q

0): (54)

is a unique function, which Dahl (2002) obtains by assuming that the probabilities are

invertible. Then, the regression function becomes

log(wage) = �03X3 + (�2): (55)

This approach has two shortcomings. First, as stated by Dahl (2002), we generally can

not test the index su¢ ciency assumption. Second, reducing dimensionality is equivalent to

making a correlation assumption, i.e. �j , the correlation between �j and u3, is a function of

the elements of Q0for j = 0; 1; 2; 3 (Bourguignon et al., 2007). However, compared to other

Multinomial Logit based selection correction models, Dahl (2002) does not assume a linear

conditional expectation function.

A special case of this model proposed by Dahl (2002) is to assume that the set Q0is a

singleton containing only the chosen alternative, in our case 2. Then, the regression equation

becomes

log(wage) = �03X + (�2) (56)

26

We assume that is a fourth order polynomial as in Bourguignon et al. (2007), and call

this model DAHL 1. Another approach, following Bourguignon et al. (2007), is to express

as a function of all probabilities up to fourth order plus the interactions between them.

This latter approach is identi�ed as DAHL 2 in our analysis. In both approaches, �s are

approximated by a series expansion after obtaining �rst step estimates. Unlike the other logit

based methods, DAHL�s approach does not yield consistent estimates of �33 and ��s.

4.3 Application Revisited

We revisit the application given in Section 3 and estimate it using the Multinomial Logit

based selection correction models. Conveniently, the results for both of the steps can be

obtained by using the Stata module selmlog developed by Bourguignon et al. (2007).11 The

results obtained from the selection step are presented in Table 4. Observe that while Tunali

and Baslevent (2006) and we set P3 = P1 + P2, Multinomial Logit based selection correction

models have �3 = 1� (�0+�1+�2). The restriction for unemployment probability in Tunali

and Baslevent (2006) and our model is not satis�ed in Table 4 since coe¢ cients in columns 1

and 2 do not sum to those in column 3. Since our model captures the logic of participation

better, it may be safe to assume that it gives more accurate results. Then, do �rst step

results of Multinomial Logit based models yield us astray? Even though most qualitative

results are similar except those regarding unemployment (for obvious reasons) with those of

the multinomial probit based models12, there are some di¤erences. These di¤erences can be

seen as costs of simplicity.

As seen from Table 4, with illiterates as the reference category, being a middle school or

university graduate increase the log odds for self-employment relative to non-participation.

High school can also be regarded as a signi�cant variable for the selection equation regarding

self-employment since it has a p� value of 0:101. As people get more educated, they become

more able, so they may set up their own business or work for their own account. In that

sense, these �ndings are consistent with expectations. Besides, education levels starting from

elementary school are signi�cant for the selection equation regarding wage workers, and the

coe¢ cient estimates increase with education. This implies the probability of being a wage

11The STATA ado �le is available at http://www.pse.ens.fr/gurgand/selmlog13.html12Throughout the section, in order to make a comparison, we use the results of the restricted version

with �22= 0:35 since it gives more precise estimates.

27

Table 4. Selection Step for Multinomial Logit Based Selection Correction ModelsSelfemployed Wage Worker UnemployedVariableCoef. Std.

ErrorCoef. Std.

ErrorCoef. Std.

ErrorAge 0.085 0.090 0.575*** 0.067 0.128 0.087Age squared/100 0.113 0.126 0.839*** 0.097 0.264** 0.129Literate without a diploma 0.402 0.282 0.371 0.262 0.310 0.295Elementary school 0.194 0.221 0.500*** 0.185 0.287 0.207Middle school 1.083*** 0.311 1.493*** 0.233 0.863*** 0.282High school 0.572 0.348 3.226*** 0.189 1.642*** 0.244University 1.630*** 0.559 5.089*** 0.235 1.735*** 0.476Husband selfemployed 0.082 0.160 1.318*** 0.125 0.863*** 0.166Children aged 02 0.362 0.232 0.390*** 0.134 0.505*** 0.174Children aged 35 0.163 0.193 0.429*** 0.118 0.176 0.147Female children aged 614 0.381** 0.178 0.260** 0.114 0.086 0.151Male children aged 614 0.160 0.180 0.405*** 0.113 0.020 0.152Extended household 0.634* 0.343 0.094 0.256 0.156 0.400Ext. hh. × Children aged 02 0.220 0.562 0.260 0.338 0.099 0.566Ext. hh. × Children aged 35 0.067 0.523 0.446 0.308 0.450 0.543Ext. hh. × Female ch. aged 614 0.702 0.487 0.007 0.312 0.183 0.505Ext. hh. × Male ch. aged 614 0.301 0.476 0.293 0.306 0.090 0.511Share of textiles 0.163 0.539 1.212*** 0.433 0.823* 0.498Share of agriculture 1.115 1.248 1.939** 0.841 1.397 1.039Share of finance 6.208 4.435 2.910 3.615 4.184 4.793Migration rate 10.18*** 3.356 7.469** 3.070 3.402 3.406Aegean 0.729** 0.310 0.551** 0.227 0.507* 0.260South 0.105 0.291 0.283 0.240 0.033 0.315Central 1.739*** 0.361 0.670** 0.292 0.090 0.343North West 1.559*** 0.552 1.012** 0.398 0.350 0.412East 1.386** 0.543 0.495 0.415 0.118 0.481South East 0.451 0.468 0.223 0.361 1.750*** 0.609North East 2.790*** 1.062 1.037** 0.463 0.347 0.485Share of Welfare Party 16.509*** 2.978 2.749 2.309 4.006 2.941Share of Left of Center 4.545*** 1.240 2.400* 1.346 3.757*** 1.465Population 200,000 or more 0.766*** 0.263 0.303* 0.174 0.088 0.230Population 1 million or more 0.372 0.259 0.126 0.185 0.523** 0.249Constant 1.993 1.693 14.708*** 1.343 5.258*** 1.605No. of observations 8962Loglikelihood w/o covariates 4603.94Loglikelihood with covariates 3528.64Robust standard errors are in parentheses.* significant at 10%; ** significant at 5%; *** significant at 1%.The reference group is nonparticipation.

worker compared to non participation increases with education level where illiterates consti-

tute the reference group. These results are consistent with those of the multinomial probit

based models.

Self employment is more likely for women who live in extended households or have female

children aged 6-14 compared to non-participation. However, probability of wage labor over

28

non-participation is not a¤ected by extended households and decreases with the presence

of children. Typically, women are responsible for care of dependents (child and/or elderly

people) due to their comparative advantage and/or the social role assigned to them in the

Turkish culture. As the number of dependents in the household increases, not only time

required for care responsibilities but also living expenses increase. That probably induces

women to work, but they do not have enough time to work in a regular job due to the time

constraint arising from increased care responsibilities. As a result, they may end up being

a self-employed rather than a wage worker. The results of the multinomial probit based

models imply that extended household and presence of children except those aged 0-2 do

not a¤ect participation probabilities. Children aged 0-2 decreases the probability of being a

self-employed over non-participation. Hence, the results di¤er.

While probability of being a wage worker compared to non-participation has a concave

pro�le with respect to age, log odds for self-employment relative to non-participation is not

a¤ected by age. The part regarding wage workers are consistent with the theory of labor

supply over the life cycle. However, self-employment part does not comply with this theory,

and it is contrary to the results of the multinomial probit based models.

Although shares of textiles and agriculture increase the probability of being a wage worker,

they do not a¤ect the probability of self-employment compared to non-participation. The

�rst pattern is inconsistent for agriculture sector, and observe that share of textiles is an

insigni�cant variable for both of the selection equations in the analysis of the multinomial

probit based models. Moreover, both the probabilities of being a wage worker and self-

employed relative to non-participation are not a¤ected by share of �nance, which again

disagrees with the results in Table 2. Notice that the interpretation of migration rate remains

the same.

If the husband is self employed, log odds of being a wage worker relative to non-participation

decreases. Furthermore, nearly all of the region dummies have interpretations consistent with

the results of the multinomial probit based models. Likewise, the interpretations regarding

share of votes received by welfare and left of center parties agree with the multinomial probit

based models except that the probability of self-employment compared to non-participation

decreases with an increase in share of votes received by left of center parties. The vari-

able population 200; 000 or more is signi�cant for self-employment and wage worker selection

29

equations, and this does not comply with the results of of the multinomial probit based

models.

Unlike our model, logit based models use an independent equation for the unemployed.

The results for the unemployed paint a reasonable picture of the labor market. In the selection

equation regarding unemployment, we see that as age increases, the probability of being

unemployed relative to non-participation become less likely. This is reasonable since more

and more people give up looking for a job as they get older. With illiterates as the reference

group, the probability of being unemployed compared to non-participation increases with

middle school education or more. This is as expected since price of leisure is higher for

more educated people, and thus they look for a job. Moreover, having children aged 0-

2 decreases unemployment probability compared to non-participation since women refrain

from looking for a job in order to nurture young children. Besides, women with self employed

husband have a lower unemployment probability compared to non-participation probability.

The reason may be that if these women would like to work, they probably work with their

husbands.

Share of textiles increases the unemployment probability relative to non-participation.

As new textile jobs become available, some low-educated and inexperienced non-participants

may decide to look for a job. Furthermore, as votes received by left of center parties increase,

unemployment probability increases compared to non-participation. This is in line with the

Tunali and Baslevent�s (2006) assertion that people would have a more liberal view about

the female labor in the provinces where the share of votes received by center of left parties is

higher.

In the second stage, selectivity correction terms are formed according to the assumptions

of the speci�ed models and the regression estimates for wage workers are obtained using the

selmlog module. Standard errors are estimated using the bootstrap method by drawing with

replacement 100 and 400 samples where the sample size is determined by the size of the

sample in use, which is 735. Since additional bootstrapping does not lead to big di¤erences

in the standard errors of the variables, we only report the estimates of the version with 100

replications.

Table 5 gives the results based on LEE where weighted least squares is used to account

for heteroskedasticity. The weights are given in the appendix of Bourguignon et al. (2007).

30

The selmlog module provides consistent estimates of �33 and c2. It does not restrict c2 to

the [�1; 1] interval.

Table 5. LEE Estimates of the Wage EquationVariable Coefficient Std. ErrorExperience 0.033** 0.014Experience sq./100 0.068* 0.041Literate w/o a diploma 0.110 0.190Elementary school 0.003 0.124Middle school 0.250* 0.143High school 0.486*** 0.184University 1.067*** 0.256Share of Textiles 0.391*** 0.144Share of Agriculture 0.156 0.254Share of Finance 4.211*** 1.194Aegean 0.132** 0.067South 0.153 0.095Central 0.029 0.077North West 0.171 0.110East 0.137 0.130South East 0.184* 0.112North East 0.129 0.125

j2λ 0.009 0.114

Constant 0.967*** 0.321σ33 0.279*** 0.062c2 0.016 0.178No. of obs. 735Standard errors are estimated using 100 bootstrap replications.* significant at 10%; ** significant at 5%; *** significant at 1%.

The �nding from LEE supports random selection since selectivity correction term has

a p � value = 0:939. This is inconsistent with the results obtained both from the trivariate

normality and our robust speci�cation. In fact, results based on LEE are almost exactly the

same with the results from the model without selectivity correction reported in Table 3.

Table 6 reports the second step estimates from DMF, DMF 1 and DMF 2. This step, for all

these models, is estimated by employing weighed least squares to deal with heteroskedasticity.

The relevant weights can be found in the appendix of Bourguignon et al. (2007). Note

that, for j = 0; 1; 2; 3, sj�s denote di¤erent variables for each model and their de�nitions

were already given in Section 4:2:2. The selmlog module provides consistent estimates of �33,

�j and ��j for j = 0; 1; 2; 3 by not restricting the correlation estimates to their theoretical

ranges.

31

Table 6. DMF, DMF1 and DMF2 Estimates of the Wage EquationDMF 2 DMF 1 DMFVariable

Coef. Std.Error

Coef. Std.Error

Coef. Std.Error

Experience 0.031** 0.012 0.028** 0.014 0.029** 0.013Experience sq./100 0.064* 0.034 0.057 0.036 0.060* 0.036Literate w/o a diploma 0.127 0.195 0.138 0.190 0.128 0.182Elementary school 0.030 0.137 0.044 0.137 0.032 0.124Middle school 0.191 0.176 0.167 0.187 0.208 0.143High school 0.453** 0.197 0.409** 0.186 0.436*** 0.162University 1.044*** 0.299 1.041*** 0.285 1.052*** 0.225Share of Textiles 0.419** 0.166 0.377* 0.197 0.329** 0.160Share of Agriculture 0.176 0.278 0.146 0.271 0.105 0.234Share of Finance 4.593*** 1.728 4.400** 2.166 3.920*** 1.247Aegean 0.115 0.085 0.114 0.121 0.113 0.095South 0.155* 0.105 0.155 0.134 0.159 0.102Central 0.045 0.088 0.044 0.107 0.035 0.081North West 0.150 0.125 0.138 0.135 0.131 0.111East 0.156 0.143 0.149 0.191 0.145 0.129South East 0.199 0.140 0.178 0.192 0.140 0.111North East 0.101 0.173 0.077 0.176 0.090 0.125

0s 0.142 0.256 0.361 0.410 0.352 0.346

1s 0.667 1.415 0.290 2.475 0.136 0.544

2s 0.025 0.974 0.064 0.060 ― ―

3s 0.090 0.546 0.084 0.413 0.273 0.392Constant 1.002*** 0.325 1.052** 0.439 1.001*** 0.352σ33 0.309*** 0.132 0.417 0.420 0.432 0.347

0ρ ― ― 0.718 0.529 0.686 0.476

1ρ ― ― 0.576 2.588 0.266 0.784

2ρ ― ― 0.127 0.096 ― ―

3ρ ― ― 0.168 0.610 0.532 0.574*0ρ 0.255 0.432 ― ― ― ―*1ρ 1.200 1.932 ― ― ― ―*2ρ 0.046 0.160 ― ― ― ―*3ρ 0.163 0.831 ― ― ― ―

No. of obs. 735 735 735Standard errors are estimated using 100 bootstrap replications.* significant at 10%; ** significant at 5%; *** significant at 1%.

Since s0, s1 and s3 are jointly statistically insigni�cant for DMF (p� value = 0:667), and

s0, s1, s2 and s3 are jointly statistically insigni�cant for DMF 1 (p � value = 0:8021) and

DMF 2 (p�value = 0:9564), these models provide evidence in favor of random selection. This

32

conclusion contradicts the results based on trivariate normality and our robust speci�cations.

Like LEE, results obtained from these models are very similar to that of random selection,

reported in Table 3.

Table 7 gives the second step estimates obtained from DAHL 1 and DAHL 2. Note that

we cannot test for non-random selection since the selmlog module neither provides robust

standard errors for correction terms nor computes R2�s.

Table 7. DAHL 1 and DAHL 2 Estimates of the Wage EquationDAHL 2 DAHL 1Variable

Coef. Std.Error

Coef. Std.Error

Experience 0.027* 0.016 0.033** 0.014Experience sq./100 0.050 0.039 0.067* 0.039Literate w/o a diploma 0.204 0.187 0.140 0.182Elementary school 0.090 0.120 0.041 0.120Middle school 0.158 0.162 0.182 0.146High school 0.592*** 0.201 0.533*** 0.170University 1.066*** 0.291 1.147*** 0.240Share of Textiles 0.274 0.195 0.370*** 0.121Share of Agriculture 0.114 0.276 0.140 0.233Share of Finance 4.474*** 1.636 4.138*** 1.168Aegean 0.122 0.109 0.142* 0.077South 0.146 0.112 0.130 0.086Central 0.098 0.097 0.029 0.073North West 0.103 0.125 0.181** 0.091East 0.206 0.152 0.113 0.113South East 0.003 0.165 0.172 0.109North East 0.074 0.161 0.122 0.113Constant 267.265 418.920 1.030*** 0.223No. of obs. 735 735Standard errors are estimated using 100 bootstrap replications.significant at 10%; ** significant at 5%; *** significant at 1%.

Comparing the results in Table 7 with those obtained from our robust correction in

Table 3, observe that South in both models, and experience sq./100, share of textiles and

Aegean in DAHL 2 become insigni�cant. Moreover, North West becomes signi�cant at 5

percent in DAHL 1. The point estimates of the signi�cant variables in both DAHL 1 and our

speci�cation di¤er up to 20 percent whereas the discrepancy rises to 30 percent with DAHL

2.

To recapitulate, Multinomial Logit based selection correction models fail to detect selec-

tivity in wage equation estimates and yield magnitudes di¤ering much from ours. The reason

33

might be that these models ignore the logic of participation. Among these models, closest

estimates to ours are given by DAHL 1. This is surprising since by comparing root mean

squared errors via Monte Carlo experiments, Bourguignon et al. (2007) �nd that DAHL 1

performs better than other models in only 1 out of 100 cases. Furthermore, their results

support using variants of DMF or DAHL 2, which perform badly in our case.

5 Conclusion

This paper discusses correction of selectivity bias in models of double selection. The approach

proposed here relaxes the conventional trivariate normality assumption. To assess the merits

of the new method, we compare our methodology with multinomial based selection correction

models, which have been developed for handling two or more rounds of selection in the

literature.

The conventional selectivity correction mechanism used in double selection models relies

on trivariate normality among the random disturbances of the regression and the two selection

equations. However, when tests are done, this assumption is typically rejected. We relax this

assumption by relying on an Edgeworth expansion, extending what Lee (1982) did for simple

selection models with one selection equation. We assume bivariate normality among the

random disturbances in the two selection equations, but do not impose any restriction on

the form of the distribution of the random disturbance in the regression equation. Since it

relaxes the skewness and/or kurtosis assumptions, our methodology is superior to trivariate

normality based approaches. Besides, we provide simple tests for the presence of selectivity

bias and the trivariate normality speci�cation under our distributional assumption.

The drawback of our model is that extensions beyond 2 selection equations are cumber-

some and computationally intensive. To handle multiple selection, Multinomial Logit based

correction models are used frequently. These models de�ne a selection equation for each

possible category and impose the IIA structure. Our model has the advantage of allowing

correlation between the random disturbances of the two selection equations. In addition,

problems in which the states are not pairwise disjoint are tractable via our speci�cation.

Cross-equation restrictions are impossible for Multinomial Logit based models due to the

maintained IIA property.

34

To illustrate the di¤erences among various selection correction models, an empirical ex-

ample involving the wage determinants of regular and casual female workers in Turkey is

presented. Only a fraction of women living in urban areas of Turkey participate and par-

ticipation probability increases dramatically in response to an increase in education level.

Thus, there is reason to believe that wage workers are not a random subsample. Indeed, both

the conventional correction method and the robust version support this conjecture. Notably,

the example provides evidence in favor of our robust speci�cation, and against the trivariate

normality assumption. We conclude that it is important to allow kurtosis and skewness.

As examples of Multinomial Logit models, we analyzed the methods provided by Lee

(1983), Dubin and McFadden (1984) and its two variants proposed by Bourguignon et al.

(2007), and Dahl (2002). These models fail to capture selectivity. Furthermore, the substan-

tive results di¤er from those obtained under our robust speci�cation. Our model captures

the logic of participation better, so we believe that our robust results are more reliable and

conclude that the speci�ed Multinomial Logit based selection correction models are far from

providing reasonable estimates in this empirical example.

35

6 Appendix

6.1 Derivation of Equations (11) and (12)

Let f(u1; u2; ~u3) be the joint density of u1, u2 and ~u3. The triplet (u1i; u2i; ~u3i) is assumed

to be independently and identically distributed across individulas with zero mean vector and

covariance matrix

� =

2666641 �12 �13

�12 1 �23

�13 �23 1

377775 ;and is independent of Xki for k = 1; 2; 3. For brevity, we suppress the conditioning on

covariate matrix X and the individual subscript i throughout the section. Our aim is to

obtain an expression for E(~u3 j u1; u2).

We denote the standard trivariate normal density STVN (�12; �13; �23) by g(u1; u2; u3)

and the standard bivariate normal density SBVN (�12) by g(u1; u2). Let k(~u3 j u1; u2) be

the conditional density of ~u3 given u1 and u2. We assume that k(~u3 j u1; u2)g(u1; u2) =

f(u1; u2; ~u3). That is, we specify the joint distribution of (u1; u2) as standard bivariate

normal, but allow the conditional distribution of ~u3 given u1 and u2 to be non-normal. Let

A =q1� (�212 + �213 + �223 � 2�12�13�23); (A.1.1)

a =1p

1� �212; (A.1.2)

b1 = �13 � �12�23; (A.1.3)

b2 = �23 � �12�13; (A.1.4)

c =a

A(b1u1 + b2u2) ; (A.1.5)

u�1(u1; u2) = u�1 = a(u1 � �12u2); (A.1.6)

u�2(u1; u2) = u�2 = a(u2 � �12u1); (A.1.7)

u�3 =u3Aa

� c: (A.1.8)

36

We can write:

E

�~u3Aa

j u1; u2�g(u1; u2) =

Z 1

�1

~u3Aa

k(~u3 j u1; u2)g(u1; u2) d~u3: (A.1.9)

Equivalently,

E(~u3 j u1; u2)g(u1; u2)

Aa=

Z 1

�1

~u3Aa

f(u1; u2; ~u3) d~u3: (A.1.10)

For the Type AAA surface de�ned in Section 2:3, we may expand f(u1; u2; ~u3) in terms of a

series of derivatives of g(u1; u2; u3) via Equation (7). We truncate the expansion by keeping

terms with r + s + p � 4. This allows us to set Arsp = Krsp where K�s denote cumulants

(Lahiri and Song, 1999). Thus, the truncated version of Equation (A:1:10) is:

E(~u3 j u1; u2)g(u1; u2)

Aa=

Z 1

�1

u3Aa

g(u1; u2; u3) du3

+4X

r+s+p=3

(�1)r+s+pr!s!p!

Krsp

Z 1

�1

u3Aa

Dru1D

su2D

pu3g(u1; u2; u3)du3: (A.1.11)

Let

I1 =

Z 1

�1

u3Aa

g(u1; u2; u3) du3; (A.1.12)

and

Irsp =

Z 1

�1

u3Aa

Dru1D

su2D

pu3g(u1; u2; u3)du3: (A.1.13)

Then, Equation (A:1:11) may be expressed as

E(~u3 j u1; u2)g(u1; u2)

Aa= I1 +

4Xr+s+p=3

(�1)r+s+pr!s!p!

KrspIrsp: (A.1.14)

37

Claim: g(u1; u2; u3) =�(u1)�(u�2)�(u�3)

A , where � is the univariate standard normal density.

Proof: Let g(u2 j u1) be the conditional density function of u2 given u1 and g(u3 j u1; u2)

be the conditional density of u3 given u1 and u2. Then, g(u1; u2; u3) can be written as a

product of marginal and conditional distributions:

g(u1; u2; u3) = g(u1)g(u2 j u1)g(u3 j u1; u2): (A.1.15)

Clearly, g(u1) = �(u1). We know from Goldberger (1991) that

g(u2 j u1) = a� (u�2) : (A.1.16)

Now, let us �nd g(u3 j u1; u2). Using the formula in Goldberger (1991), we obtain

E(u3 j u1; u2) = a2(b1u1 + b2u2); (A.1.17)

V ar(u3 j u1; u2) = A2a2: (A.1.18)

Then,

g(u3 j u1; u2) =� (u�3)

Aa: (A.1.19)

Substituting Equation (A:1:16) and Equation (A:1:19) into Equation (A:1:15), we obtain

g(u1; u2; u3) =�(u1)� (u

�2)� (u

�3)

A: Q:E:D: (A.1.20)

Substituting Equation (A:1:20) into Equation (A:1:13) and making the transformation z3 =

u�3, so Aadz3 = du3, we get

Irsp =

Z 1

�1(z3 + c)D

ru1D

su2D

pu3�(u1)� (u

�2)�(z3)adz3: (A.1.21)

Let �(p) denote the pth derivative of the univariate standard normal density. Observe that

Dpu3�(z3) = �(p)(z3)

�1

Aa

�p; (A.1.22)

38

and

Dpz3�(z3) = �(p)(z3): (A.1.23)

Hence,

�1

Aa

�pDpz3�(z3) = Dp

u3�(z3): (A.1.24)

Inserting Equation (A:1:24) into Equation (A:1:21), we obtain

Irsp =1

Apap�1

Z 1

�1(z3 + c)D

ru1D

su2D

pz3�(u1)� (u

�2)�(z3)dz3: (A.1.25)

Making another transformation z3 = u3 (abusing the notation), so Dpz3�(z3) = Dp

u3�(u3) and

dz3 = du3, we have

Irsp =1

Apap�1

Z 1

�1(u3 + c)D

ru1D

su2D

pu3�(u1)� (u

�2)�(u3)du3: (A.1.26)

The de�nition of rth order univariate Hermite polynomial can be obtained by setting s = 0

in Equation (9). In other words, Dru1�(u1) = (�1)

rHr(u1)�(u1). Using this, we can deduce

H0(a) = 1 and H1(a) = a. Then, we can write

u3 + c = H1(u3 + c) =

1Xi=0

0B@ 1

i

1CA ciH1�i(u3); (A.1.27)

where

0B@ 1

i

1CA denotes the binomial coe¢ cient. Incorporating Equation (A:1:27) into Equa-

tion (A:1:26), we get

Irsp =(�1)pApap�1

Dru1D

su2�(u1)� (u

�2)

1Xi=0

0B@ 1

i

1CA ciZ 1

�1Hp(u3)H1�i(u3)�(u3)du3: (A.1.28)

39

For any x, the orthogonality property of Hermite polynomials allows us to write

Z 1

�1Hn(x)Hm(x) exp

��x2=2

�dx = n!

p2��n;m, (A.1.29)

where �n;m is the Kronecker product, �n;m =

8><>: 1 if n = m

0 if n 6= m

9>=>;. Since �p;1�i = 1 () p =

1� i() i = 1� p() p � 1,

Z 1

�1Hp(u3)H1�i(u3)�(u3)du3 =

Z 1

�1Hp(u3)H1�i(u3)

exp��u23=2

�p2�

du3

= p!p2��p;1�i

1p2�

= p! for p � 1: (A.1.30)

Also observe thatR1�1Hp(u3)H1�i(u3)�(u3)du3 = 0 for p > 1. Returning to Equation

(A:1:28), we conclude that

Irsp = 0 for p > 1: (A.1.31)

Moreover, substituting Equation (A:1:30) into Equation (A:1:28), we obtain:


Dru1D

su2�(u1)� (u

�2)

0B@ 1

1� p

1CA c1�pp! for p � 1: (A.1.32)

Equivalently,


Dru1D

su2�(u1)� (u

�2)

c1�p

(1� p)! for p � 1: (A.1.33)

Substituting aA(b1u1 + b2u2) back instead of c, we obtain

Irsp =(�1)pAa2p�2

1

(1� p)!Dru1D

su2�(u1)� (u

�2) (b1u1 + b2u2)

1�p for p � 1: (A.1.34)

We can conclude from Equations (A:1:12) and (A:1:13) that I1 = I000. Then, we can write

I1 as

I1 =a2

A�(u1)� (u

�2) (b1u1 + b2u2) : (A.1.35)

40

Setting p = 0 and 1 in Equation (A:1:34); we respectively get:

Irs0 =a2

ADru1D

su2�(u1)� (u

�2) (b1u1 + b2u2) ; (A.1.36)

and

Irs1 = �1

ADru1D

su2�(u1)� (u

�2) : (A.1.37)

After substituting Equations (A:1:31), (A:1:35), (A:1:36), and (A:1:37) into Equation (A:1:16)

and doing some manipulations, we get

E(~u3 j u1; u2)g(u1; u2) = a3�(u1)� (u�2) (b1u1 + b2u2)

+4X

r+s=3

(�1)r+sr!s!

Krs0a3Dr

u1Dsu2�(u1)� (u

�2) (b1u1 + b2u2)

+

3Xr+s=2

(�1)r+sr!s!

Krs1aDru1D

su2�(u1)� (u

�2) : (A.1.38)

Since a�(u1)�(u�2) = g(u1; u2), which may be obtained by multiplying Equation (A:1:16) by

�(u1), Equation (A:1:38) can be written as:

E(~u3 j u1; u2)g(u1; u2) = a2g(u1; u2) (b1u1 + b2u2)

+

4Xr+s=3

(�1)r+s

r!s!Krs0a

2Dru1D

su2g(u1; u2) (b1u1 + b2u2)

+3X

r+s=2

(�1)r+sr!s!

Krs1Dru1D

su2g(u1; u2): (A.1.39)

Now, let us compute Dru1D

su2g(u1; u2) (b1u1 + b2u2) via the chain rule:

Dru1D

su2g(u1; u2) (b1u1 + b2u2) = (b1u1 + b2u2)D

ru1D

su2g(u1; u2)

+ g(u1; u2)Dru1D

su2 (b1u1 + b2u2) : (A.1.40)

Observe that Dru1D

su2(b1u1+ b2u2) = 0 for r+ s � 2. Using the formula for bivariate Hermite

41

polynomials given in Equation (9), we obtain

Dru1D

su2g(u1; u2) (b1u1 + b2u2) = (b1u1 + b2u2)D

ru1D

su2g(u1; u2)

= (b1u1 + b2u2) (�1)r+sHrs(u1; u2)g(u1; u2) for r + s � 2: (A.1.41)

Incorporating Equation (A:1:41) into Equation (A:1:39) and doing some manipulations, we

get

E(~u3 j u1; u2) = a2 (b1u1 + b2u2)

"1 +

4Xr+s=3

1

r!s!Krs0Hrs(u1; u2)

#

+3X

r+s=2

1

r!s!Krs1Hrs(u1; u2): (A.1.42)

Since (u1; u2) � SBVN (�12) Equation (10) allows us to write

1 +4X

r+s=3

1

r!s!Krs0Hrs(u1; u2) = 1: (A.1.43)

This expression may be veri�ed by exploiting the moment formulas in Stuart and Ord (1994),

after showing that Kijk = �ijk for i + j + k = 3 using their technique where ��s denote

trivariate central moments. Thus, when h(u1; u2) = g(u1; u2), Equation (A:1:42) becomes

E(~u3 j u1; u2) = a2(b1u1 + b2u2) +K201

2H20(u1; u2) +K111H11(u1; u2)

+K021

2H02(u1; u2) +

K301

6H30(u1; u2) +

K211

2H21(u1; u2)

+K121

2H12(u1; u2) +

K031

6H21(u1; u2): (A.1.44)

When we express a, b1 and b2 as in Equations (A:1:2), (A:1:3) and (A:1:4) respectively, we

get Equation (12). Using Equations (A:2:1) and (A:2:2) given in the next section, and the

cumulant formulas K101 = �13 and K011 = �23 (Stuart and Ord, 1994), it can be shown that

a2 (b1u1 + b2u2) = K101H10(u1; u2) +K011H01(u1; u2); (A.1.45)

Combining Equation (A:1:45) with Equation (A:1:44), we obtain Equation (11).

42

6.2 Expressions for Bivariate Hermite Polynomials

The expressions given below were obtained by solving Equation (9) for various values of r

and s using Maple. The code may be obtained from the author upon request.

H10(u1; u2) = au�1; (A.2.1)

H01(u1; u2) = au�2; (A.2.2)

H20(u1; u2) = �a+ u�21 ; (A.2.3)

H02(u1; u2) = �a+ u�22 ; (A.2.4)

H11(u1; u2) = ��12 + u�1u�2; (A.2.5)

H30(u1; u2) = au�1�H20 � 2a2

�; (A.2.6)

H03(u1; u2) = au�2�H02 � 2a2

�; (A.2.7)

H21(u1; u2) = aH11u�1 � a3(u�2 � �12u�1); (A.2.8)

H12(u1; u2) = aH11u�2 � a3(u�1 � �12u�2): (A.2.9)

6.3 Moments of the Truncated SBVN Distribution

The moment formulas of the truncated SBVN distribution up to the second order may be

found in Rosenbaum (1961). However, in our independent derivation, we discovered that

Rosenbaum made a sign error while calculating integrals. Our expressions up to the second

order agree with those in Henning and Henningsen (2007), who cross-check their formulas

using numerical integration and Monte Carlo simulation. We also provide some third order

moments of the truncated SBVN distribution below. The derivations may be obtained from

the author upon request. We denote the univariate standard normal density and distribution

function respectively by �(:) and �(:). Let Cti = �tiXti for t = 1; 2 for individual i. We

suppress the individual subscript throughout the section to avoid notational clutter. We

denote the moments of the truncated SBVN distribution for the subsample S4 (de�ned in

43

Section 2:1) as mj;k = E(uj1uk2 j u1 > �C1;u2 > �C2). Recall a = 1p

1��212and let

q =

p1� �212p2�

=1

ap2�; (A.3.1)

D(C1; C2) � D2 = C21 � 2�12C1C2 + C22 ; (A.3.2)

C�1 (C1; C2) � C�1 = a(C1 � �12C2); (A.3.3)

C�2 (C1; C2) � C�2 = a(C2 � �12C1); (A.3.4)

C��1 (C1;�C2) � C��1 = C1 + �12C2; (A.3.5)

C��2 (�C1; C2) � C��2 = C2 + �12C1; (A.3.6)

�(C1; C2) = �(�C1)�(C�2 ); (A.3.7)

�(C1; C2) = �(�C2)�(C�1 ); (A.3.8)

�(C1; C2) = � (aD) ; (A.3.9)

and

L(C1; C2; �12) =

Z 1

�C2

Z 1

�C1g(u1; u2)du1du2: (A.3.10)

Then,

m0;1 =�12�(C1; C2) + �(C1; C2)

L(C1; C2; �12); (A.3.11)

m1;0 =�(C1; C2) + �12�(C1; C2)

L(C1; C2; �12); (A.3.12)

m0;2 =L(C1; C2; �12)� �212C1�(C1; C2)� C2�(C1; C2) + q�12�(C1; C2)

L(C1; C2; �12); (A.3.13)

m2;0 =L(C1; C2; �12)� C1�(C1; C2)� �212C2�(C1; C2) + q�12�(C1; C2)

L(C1; C2; �12); (A.3.14)

m1;1 =�12L(C1; C2; �12)� �12C1�(C1; C2)� �12C2�(C1; C2) + q�(C1; C2)

L(C1; C2; �12); (A.3.15)

44

m0;3 =1

L(C1; C2; �12)[2L(C1; C2; �12)m0;1 +

��12�1� �212

�+ �312C

21

��(C1; C2)

+ C22�(C1; C2)� q�12C��2 �(C1; C2)]; (A.3.16)

m3;0 =1

L(C1; C2; �12)[2L(C1; C2; �12)m1;0 +

��12�1� �212

�+ �312C

22

��(C1; C2)

+ C21�(C1; C2)� q�12C��1 �(C1; C2)]; (A.3.17)

m1;2 =1

L(C1; C2; �12)[2�12L(C1; C2; �12)m0;1 + �12C

22�(C1; C2)

+ (1� �212 + �212C21 )�(C1; C2)� qC��2 �(C1; C2)]; (A.3.18)

m2;1 =1

L(C1; C2; �12)[2�12L(C1; C2; �12)m1;0 + �12C

21�(C1; C2)

+ (1� �212 + �212C22 )�(C1; C2)� qC��1 �(C1; C2)]:

(A.3.19)

6.4 Expressions of ��s

We obtain below formulas using Equation (12) and the formulas provided in Appendices 6:1

and 6:2 via a Maple code.

�1 =�(C1; C2)

L(C1; C2; �12); (A.4.1)

�2 =�(C1; C2)

L(C1; C2; �12); (A.4.2)

�3 =�C1�(C1; C2)� q�12�(C1; C2)

L(C1; C2; �12); (A.4.3)

�4 =�C2�(C1; C2)� q�12�(C1; C2)

L(C1; C2; �12); (A.4.4)

�5 = �q�(C1; C2)

L(C1; C2; �12); (A.4.5)

45

�6 =��(C1; C2) + C21�(C1; C2) + q�12 (aC�1 + C1) �(C1; C2)

L(C1; C2; �12); (A.4.6)

�7 =��(C1; C2) + C22�(C1; C2) + q�12 (aC�2 + C2) �(C1; C2)

L(C1; C2; �12); (A.4.7)

�8 = �aqC�1�(C1; C2)

L(C1; C2; �12); (A.4.8)

�9 = �aqC�2�(C1; C2)

L(C1; C2; �12); (A.4.9)

6.5 Expressions of ��s for Tunali and Baslevent�s (2006) Problem

Let �12 = �1�2�12, �11 = �21 and �22 = �22. When (u1; u2) � SBVN (�12), (�1u1; �1u1 +

�2u2) � BVN (0; 0; �11; a2; e) where a =p�11 + �22 + 2�12 and e = �11 + �12. Let b =

�22 + �12. Denote the probability of LFP = j by Pj and the standard bivariate normal

cumulative distribution function with upper thresholds s1, s2 and correlation coe¢ cient r by

�(s1; s2; r). Assume that state probabilities are given as:

P0 = �(C01; C02;C03); (A.5.1)

P1 = �(C11; C12;C13); (A.5.2)

P2 = �(C21; C22;C23); (A.5.3)

P3 = 1� P0: (A.5.4)

Unemployment probability follows from the adopted de�nition. Considering the categor-

ical variable LFP in Equation (20), observe that

P0 = P (y�1 < 0; y�1 + y

�2 < 0) = P

��1u1 < ��

01z; �1u1 + �2u2 < �

��01 + �

02

�z�

= P

0@u1 < ��01z�1

;�1u1 + �2u2

a<��01 + �

02

�z

a

1A : (A.5.5)

46

P1 = P (y�1 > 0; y�2 < 0) = P

��1u1 > ��

01z; �2u2 < ��

02z�

= P

u1 >

��01z�1

; u2 <��02z�2

!= P

u1 <

�01z

�1; u2 <

��02z�2

!: (A.5.6)

P2 = P (y�2 > 0; y�1 + y

�2 > 0) = P

��2u2 > ��

02z; �1u1 + �2u2 > �

��01 + �

02

�z�

= P

0@u2 > ��02z�2

;�1u1 + �2u2

a>��01 + �

02

�z

a

1A= P

0@u2 < �02z

�2;�1u1 + �2u2

a<

��01 + �

02

�z

a

1A : (A.5.7)

In Equations (A:5:5) through (A:5:7), we rescaled the variables to have unit variances

by dividing them to their standard errors. Observe that for LFP = 0, correlation will be

Cov(�1u1;�1u1+�2u2)pV ar(�1u1)

pV ar(�1u1+�2u2)

= ea�1; for LFP = 1, it will be � Cov(�1u1;�2u2)p

V ar(�1u1)pV ar(�2u2)

= � �12�1�2

;

and for LFP = 2, it will be Cov(�2u2;�1u1+�2u2)pV ar(�2u2)

pV ar(�1u1+�2u2)

= ba�2. Hence, we can write

C01 =��01z�1

; C02 =��01 + �

02

�z

a; C03 =

e

a�1; (A.5.8)

C11 =�01z

�1; C12 =

��02z�2

; C13 = ��12�1�2

; (A.5.9)

C21 =�02z

�2; C22 =

��01 + �

02

�z

a; C23 =

b

a�2: (A.5.10)

Then, in Appendix 6:4, by changing all C1�s to C21,all C2�s to C22, and all �12�s to C23,

we obtain the expressions for ��s in context of Tunali and Baslevent�s (2006) problem.

47

7 References

Bhattacharya, R. N., Ghosh, J. K. (1978) "On the Validity of Formal Edgeworth expansion."

The Annals of Statistics 6 (2), 434-451.

Bhattacharya, R. N., Rao, R. (1976) Normal Approximation and Asymptotic Expansions.

John Wiley & Sons, New York, NY.

Bourguignon, F., Fournier, M., Gurgand, M. (2007) "Selection Bias Corrections Based on the

Multinomial Logit Model: Monte-Carlo Comparisons." Journal of Economic Surveys

21 (1), 174-205.

Chambers, J. M. (1967) "On Methods of Asymptotic Approximations for Multivariate

Distributions." Biometrika 54 (3/4), 367-383.

Cramer, H. (1937) Random Variables and Probability Distributions. Cambridge Tracts,

Cambridge University Press, Cambridge, UK.

Dahl, G. B. (2002) "Mobility and the Returns to Education: Testing a Roy Model with

Multiple Markets." Econometrica 70 (6), 2367-2420.

David, S. T., Kendall, M. G., Stuart, A. (1951) "Some Questions of Distribution in the

Theory of Rank Correlation." Biometrika 38, 131-140.

Davis, P. J., Polonsky, I. (1964) "Numerical Interpolation, Di¤erentiation and Integration,"

in M. Abramowitz and I. A. Stegun (eds.). Ch. 25 in Handbook of Mathematical

Functions with Formulas, Graphs and Mathematical Tables, National Bureau of

Standards, Applied Mathematics Series 55, U. S. Government Printing O¢ ce,

Washington, DC.

Dubin, J. A., McFadden, D. L. (1984) "An Econometric Analysis of Residential Electric

Appliance Holdings and Consumption." Econometrica 52 (2), 345-362.

Edgeworth, F. Y. (1905) "The law of error" Transactions of the Cambridge Philosophical

Society 20, 36-65 & 113-141.

48

Elderton, E. P. (1906) "Notes and Bibliography." Biometrika 5, 206-210.

Eriksson, A., Forsberg, L., Ghysels, E. (2004) "Approximating the Probability Distribution

of Functions of Random Variables: A New Approach." Scienti�c Series, Cirano, Montreal.

Goldberger, A. S. (1991) A course in econometrics. Harvard University Press, Cambridge,

UK.

Heckman, J. J. (1976) "The Common Structure of Statistical Models of Truncation, Sample

Selection and Limited Dependent Variables and a Simpler Estimator for Such Models."

Annals of Economic and Social Measurement 5 (4), 475-492.

Heckman, J. J. (1979) "Sample Selection Bias as a Speci�cation Error." Econometrika

47 (1), 153-161.

Henning, C. H. C. A., Henningsen, A. (2007) "Modeling Farm Households�Price Responses

in the Presence of Transaction Costs and Heterogeneity in Labor Markets." American

Journal of Agricultural Economics 89 (3), 665-681.

Keane, M. P. (1992) "A Note on Identi�cation in the Multinomial Probit Model." Journal

of Business and Economic Statistics 10 (2), 193-200.

Kolassa, J. E. (1994) Series Approximation Methods in Statistics. Springer �Verlag, New

York, NY.

Kolassa, J. E, McCullagh, P. (1990) "Edgeworth series for lattice distributions." The Annals

of Statistics 18 (2), 981-985.

Lahiri, K., Song, J. G. (1999) "Testing for Normality in a Probit Model with Double

Selection." Economics Letters 65 (1), 33-39.

Lee, L. F. (1976) "Estimation of Limited Dependent Variable Models by Two Stage Methods."

Ph.D. Thesis, Department of Economics, University Of Rochester.

Lee, L. F. (1979) "Identi�cation and Estimation in Binary Choice Models with Limited

(Censored) Dependent Variables." Econometrica 47 (4), 977-996.

49

Lee, L. F. (1982) "Some Approaches to the Correction Of Selectivity Bias." Review of

Economic Studies 49 (3), 355-372.

Lee, L. F. (1983) "Generalized Econometric Models with Selectivity." Econometrica 51 (2),

507-512.

Lee, L. F. (1984) "Tests for the Bivariate Normal Distribution in Econometric Models with

Selectivity." Econometrica 52 (4), 843-863.

Lillard, L. A., Panis, C. W. A. (2003) aML multilevel multiprocess statistical software,

Version 2.0. EconWare, Los Angeles, California.

McFadden, D. L. (1973) Conditional Logit Analysis of Qualitative Choice Behavior in

Frontiers in Econometrics in P. Zarembka (eds.). Academic Press, New York, NY.

Magnac, T. (1991) "Segmented or Competitive Labor Markets." Econometrica 59 (1),

165-187.

Mardia, K. V. (1970a) Families of Bivariate Distributions in A.Stuart (eds.). Harner

Publishing Company, Darien, CONN.

Mardia, K. V. (1970b) "Measures of Multivariate Skewness and Kurtosis with Applications."

Biometrica 57 (3), 519-530.

Miller, J. (2005). History of mathematics Pages, http://members.aol.com/je¤570/e.html

Olsen, R. (1980). "A least squares correction for selectivity bias." Econometrica 48 (7),

1815�1820

Pretorious, S. J. (1930) "Skew Bivariate Frequency Surfaces, Examined in the Light of

Numerical Illustrations" Biometrika 22 (1/2), 109-223.

Rosenbaum, S. (1961) "Moments of the Truncated Bivariate Normal Distribution." Journal

of the Royal Statistical Society B2, 405-408.

Rothenberg, T. J. (1983) "Asymptotic properties of some estimators in structural models

in Studies in Econometrics," in S. Karlin, T. Amemiya and L.A. Goodman (eds.). Time

50

Series,and Multivariate Statistics. In Honor of Theodore W. Anderson. Academic Press,

New York, NY.

Ruud, P. A. (1983) "Su¢ cient Conditions for the Consistency of Maximum Likelihood

Estimation Despite Misspeci�cation of Distribution in Multinomial Discrete Choice

Models." Econometrica 51 (1), 225-228.

Sargan, J. D. (1984)Wages and Prices in the UK, Econometrics and Quantitative Economics

in D. Hendry and K. Wallis (eds.). Blackwell, Oxford, UK.

Schmertmann, C. P. (1994) "Selectivity Bias Correction Methods in Polychotomous Sample

Selection Models." Journal of Econometrics 60 (1/2), 101-132.

Stoker, T. M. (1986) "Consistent Estimation of Scaled Coe¢ cients." Econometrica 54 (6),

1461-1481.

Stuart, A., Ord, J. K. (1994) Kendall�s Advanced Theory of Statistics, Volume 1: Distribution

Theory. 6th edition, Edward Arnold, London, UK.

Tunali, I. (1986) "A General Structure for Models of Double-Selection and an Application of

a Joint Migration/Earnings Process with Re-migration." Research in Labor Economics

8B, 235-282.

Tunali, I. , Baslevent, C. (2006) "Female Labor Supply in Turkey," in S. Altug and A.

Filiztekin (eds.). Ch.4 in The Turkish Economy: The Real Economy, Corporate

Governance and Reform and Stabilization Policy, Routledge, London, UK.

51

Documents

CORRECTION OF SELECTIVITY BIAS IN MODELS OF ......CORRECTION OF SELECTIVITY BIAS IN MODELS OF DOUBLE SELECTION Berk Yavuzoglua, Insan Tunalib, Elvan Ceyhanc aDepartment of Economics,