Self-selection: The Roy model Heidi L. Williams MIT 14.662 Spring 2015 Outline (1) Preliminaries: Overview of 14.662, Part II (2) A model of self-selection: The Roy model (Roy, 1951) (3) Application: Immigration (Chiswick, 1978; Borjas, 1985, 1987; Abramitzky, Boustan and Eriksson, 2012, 2014) (4) Application: Health care (Chandra and Staiger, 2007) (5) Application: Redistribution (Abramitzky, 2009) 1 Preliminaries: Overview of 14.662, Part II In their 2000 Handbook of Income Distribution review, Neal and Rosen (2000) discuss empirical regularities that have motivated theories of the distribution of labor earnings: for example, earnings distributions tend to be skewed to the right, and mean earnings tend to differ greatly across groups defined by occupation, education, experience and other observed traits (such as gender and race). A variety of models have been proposed as frameworks to explain these types of facts. For example, several theories of earnings distributions were covered in 14.661 and Part I of 14.662, such as human capital models and Rosen-style superstar models. We are going to start off Part II of 14.662 by covering four additional classes of theoretical models with implications for earnings distributions: the Roy model, the compensating differen- tials model, discrimination models, and models of rent-sharing. For each class of models, we will discuss recent empirical papers applying these models to questions in labor economics as well as applied microeconomics more generally. From there we will cover three closely related topics - management practices, intergenerational mobility and early life determinants of long-run outcomes - which speak to other empirically important determinants of the distribution of labor earnings. 1 1 Almond and Currie’s 2011 Handbook of Labor Economics chapter (Almond and Currie, 2011) motivates the increase in economics research on the latter two of these topics over the past few decades by saying: “Child and family characteristics measured at school entry do as much to explain future outcomes as factors that labor economists have more traditionally focused on, such as years of education.” 1
where i0 and i1 are jointly normally distributed, and σ0,1 denotes
cov( i0, i1): ( ( )) ( i i 0 1 ) ∼ N ( 0 ) , σ0
2 σ0,1 (2)0 σ2σ0,1 1
From here forward we drop the i subscripts; they were useful above
just to clarify that these distributions describe agent i’s wages
if she were in each of country 0 and country 1, but in practice we
of course only observe one of those two wages. Borjas assumes that
μ1 also gives the earnings of the average native worker in the US.
Borjas notes that we can think of these expressions as decomposing
earnings into the part explained by observable characteristics such
as age and completed education (μ0 and μ1) and a part due to
unobserved characteristics ( 0
and 1). Let ρ0,1 denote the correlation coefficient of 0 and 1,
which represents the correlation of
productive ability in the home country with productive ability in
the US:
cov( 0, 1) ρ0,1 = (3)
σ0σ1 σ0,1
= (4) σ0σ1
Let C denote the cost of moving. Borjas defines C = πw0 so that
moving costs are expressed relative to the home country wage;
expressing moving costs like this allows them to be neatly included
in the ln(wo + C) expression below. An individual’s decision of
whether to migrate to the US is then determined by the sign of the
following index function I:4
I = ln
w1 (5)
= ln(w1) − ln(w0 + πw0) (7)
= ln(w1) − ln(w0(1 + π)) (8)
≈ (μ1 − μ0 − π) + ( 1 − 0) (10)
Define v ≡ 1 − 0. Since migration occurs if I > 0, we can write
the migration rate P as:
P = Pr[ 1 − 0 > −(μ1 − μ0 − π)] (11)
= Pr[v > −(μ1 − μ0 − π)] (12)
= Pr[v > (μ0 − μ1 + π)] (13)
μ0−μ1+πDefine z = , and let Φ denote the CDF of the standard normal
distribution. Noteσv
that because v = 1 − 0, and because 1 and 0 are normally
distributed with mean zero, we vknow that v is normally distributed
with mean zero, and that σv
follows a standard normal
4The approximation here is to approximate ln(1 + π) as ≈ π, based
on the first-order Taylor approximation of 1 1f(x) = ln(1 + x)
around 0: f(a) + f !(a) (x − a)|a=0 = ln(1 + 0) + (x − 0) =
x.
1! 1+0 1!
P = Pr v σv
μ0 − μ1 + π = 1 − Φ (16)
σv
= 1 − Φ(z) (17)
For higher values of z, P is lower - implying migration is less
likely. The migration rate P is increasing in mean US wages ( ∂P
> 0), decreasing in mean home country wages ( ∂P < 0),∂μ1
∂μ0
and decreasing in moving costs (∂P < 0). Borjas assumes that P
< 1, so that at least part of ∂π the population in the country
of origin is better off not migrating. He also assumes μ1 ≈
μ0.
3.3.2 Useful facts
Deriving Borjas’s expressions for self-selection requires applying
some properties of the normal distribution and a version of the law
of iterated expectations; in case anyone is rusty:
Property 1. If a vector of random variables X ∼ N(μ,Σ), then AX + b
∼ N(Aμ+ b, AΣA ).5
( ) (( ) ( σ2 )) ( )
X μx x σX,Y σy 6Property 2. If ∼ N μy , σ2 , then (Y |X = x) ∼ N μy
+ ρX,Y ( )(x− μx), σy
2(1 − ρ2 ) .Y σX,Y y σx X,Y
Property 3. For any non-stochastic function f(·) and X = f(W ), E(Y
|X) = E(E(Y |W )|X).7
Property 4. Let φ(z) and and Φ(z) denote the PDF and CDF of the
standard normal distribution, ( ) v v φ(z)respectively. If ∼ N(0,
1), then E | v > z = ; we refer to this expression as σv σv σv
1−Φ(z)
the Inverse Mills Ratio. Because φ(z) = φ(−z) and 1 − Φ(z) = Φ(−z),
we can also write φ(−z) 8the Inverse Mills Ratio as λ(z) = Φ(−z)
.
3.3.3 Analyzing self-selection
In order to analyze self-selection, we want to derive expressions
that let us compare E(ln w0|I > 0) and E(ln w1|I > 0) – that
is, for individuals who immigrate we’d like to compare average log
earnings in country 0 and average log earnings in country 1. Let’s
start with E(ln wo|I > 0), which can be re-written as
follows:
5See, e.g., page 198 of Goldberger (1991). You can prove this using
the moment generating function. 6See, e.g., page 175-177 of Casella
and Berger (2001). 7This is a version of the law of iterated
expectations. See, e.g., page 31 of Wooldridge (2010). 8See, e.g.,
page 672 of Wooldridge (2010).
11
v E(ln w0|I > 0) = E μ0 + 0| > z (18)
σv
σv
v0 = μ0 + σ0E | > z (20)
σ σ0 v ( ) The following steps are useful in deriving a simplified
version of the E 0 | v > z term:σ0 σv
1. Because 0 and 1 are jointly normally distributed, applying
Property 1 you can show that ( ( )) σ2 −σ2
0 0 σ0,1 0 9 0 and v ≡ 1− 0 are jointly normally distributed: ( ) ∼
N ( 0 ) , .
1− 0 0 σ2σ0,1−σ2 0+σ2−2σ0,10 1
2. Given that 0 and v ≡ 1 − 0 are jointly normally distributed,
applying Property 2 you can σ0,v σ0,vshow that E ( 0|v) = ρ0,v( σ0
)v, where ρ0,v = . Simplifying implies E ( 0|v) =
σ2 v.σv σ0σv v
0 03. Applying Property 3, you can show that E( | v > z) = E(E(
| v )| v > z).10 σ0 σv σ0 σv σv
4. Finally, it will be useful to have a simplified expression for
E( 0 | v ). Let s = v ∼ N(0, 1).σ0 σv σv σ0,s σ0,vApplying Property
2, you can show that E ( 0|s) = σ2 s. Substituting ρ0,v = gives:
σ0σvs
v 10 E( | ) = E( 0|s) (21)
σ0 σv σ0 1 σ0,s
= 2 s (22)
σ σ0 s
σ 1 σ0 v σ v
= 0,v (24)
1 0 n
You can show this by letting A = −1 1 . 10You can show this by
letting X = 1( v > z) be a function of W = v .
σv σv
ε ε ε
Returning to our derivation of E(lnwo|I > 0), we now have:
ε v E(lnw0| 0
I > 0) = µ0 + σ0E
( E
1− Φ(z)
The last equality follows from Property 4. We can derive a similar
expression for E(lnw1|I > 0):
v E(lnw1|I > 0) = E
( µ1 + ε1| > z
) (32)
It will be useful to re-write these expressions for the expected
wages of migrants in each country
(E(lnw0|I > 0), E(lnw1|I > 0)). Using that σ0,v = cov(ε0, v)
= E[ε0 · (ε 2 1 − ε0)] = σ0,1 − σ0:
φ(z) E(lnw0|I > 0) = µ0 + σ0ρ0,v (33)
1− Φ(z)
σ
( − Φ(z)
σv( 1− Φ(z)
σ0σ1 σ0,1 σ
)( ( 1− Φ(z)
v σ1 1− Φ(z)
Analogously for E(lnw1|I > 0), substituting σ1,v = σ2 1 − σ0,1,
we have:
φ(z) E(lnw1|I > 0) = µ1 + σ1ρ1,v
( 1− Φ(z)
σv σ0
)( 1− Φ(z)
) Define µj + Qj as the expected wage of migrants in country j. In
order to understand the
position of migrants in the distribution of workers in each country
(that is, whether migrants
are positively or negatively selected), we want to know the signs
of Q0 and Q1:
13
σ0σ1 σ0 φ(z)
Q0 ≡ E( 0| I > 0) = ρ0,1 − (41) σv σ1 1 − Φ(z) σ0σ1 σ1
φ(z)
Q1 ≡ E( 1| I > 0) = − ρ0,1 (42) σv σ0 1 − Φ(z)
3.3.4 Four cases of immigrant selection
1. Positive selection: Q0 > 0 and Q1 > 0. Migrants are
positively selected relative to σ0 11either country’s income
distribution ⇐⇒ ρ0,1 > σ1 . This requires a high
correlation
between the value of skills in countries 0 and 1, and that income
is more dispersed in the US than in country 0. Borjas’s example is
high-skilled migration from Western Europe.
2. Negative selection: Q0 < 0 and Q1 < 0. Migrants are
negatively selected relative to σ1 12either country’s income
distribution ⇐⇒ ρ0,1 > σ0 . This requires a high
correlation
between the value of skills in countries 0 and 1, and that income
is less dispersed in the US than in country 0. Borjas’s example is
the US social safety net drawing low-skilled immigrants from
countries with less of a social safety net.
3. Refugee selection: Q0 < 0 and Q1 > 0. Migrants are
negatively selected relative to the home country income
distribution, but fall in the top of the US income
distribution
σ1⇐⇒ ρ0,1 < min(σ0 , ). This requires a low correlation between
the value of skills in σ1 σ0
country 0 and in country 1. Borjas argues this may be the case for
countries that have recently experienced a Communist
takeover.
4. No fourth case: Q0 > 0 and Q1 < 0. Mathematically, this
case is ruled out because it would require ρ0,1 > 1.
3.3.5 A note on the joint normality assumption
As an econometrician, what you observe is individuals’ migration
decisions (whether they moved to the US or stayed in their home
country), data on the US wages of migrants E(ln w1| I > 0), and
data on the home country wages of non-migrants E(ln w0| I > 0).
Given this data, we would like to know the joint distribution of ln
w0 and ln w1 so that we can make statements about where migrants
fall in the home and US country income distributions. Heckman and
Honore (1990) show that the joint normality assumption in the
original Roy model allows you to identify the joint distribution of
ln w0 and ln w1 in a single cross section of data, but that if you
relax this joint normality assumption the Roy model is no longer
identified. French and Taber (2011) give some intuition for these
results.
11Note that this implies σ σ 1
0 > 1, since ρ0,1 ≤ 1.
12Note that this implies σ σ 0
1 > 1, since ρ0,1 ≤ 1.
14
ε
( )( ) ε
3.4 Testing the Roy model: Abramitzky, Boustan and Eriksson
(2012)
Research investigating patterns of immigration in recent decades
has generally provided mixed evidence for the Roy model. For
example, Chiquiar and Hanson (2005) document evidence against
negative selection of Mexican migrants (as would be predicted by
the Roy model). I’m going to focus on discussing a (second) recent
paper by Ran Abramitsky, Leah Boustan, and Katherine Erikkson that
investigated the predictions of the Roy model during the Age of
Mass Migration (1850-1913). Because the US maintained essentially
open borders during this period, this setting is a natural one in
which to test for patterns of self-selection free of the legal
factors that have governed migration in more recent years.
Abramitzky, Boustan and Eriksson (2012) investigate whether
migrants were positively or negatively selected from the European
population, focusing on Norwegian migrants to the US. During the
age of mass migration, Norway had a more unequal income
distribution than did the US (as illustrated in Figure 1, and
supported in the paper by other past work):
Courtesy of Ran Abramitzky, Leah Platt Boustan, Katherine Eriksson,
and the American Economic Association. Used with permission.
Given this pattern that (at the time) income in Norway was more
dispersed than was income in the US, the Roy model predicts that
Norwegian migrants will be negatively selected. A key contribution
of this paper is collecting the data necessary to test this
prediction. They combine two fully digitized versions of the
Norwegian censuses (1865 and 1900) with a dataset of all
Norwegian-born men in the US in 1900 using the
now-publicly-available census records from 1900 (the latter is
drawn from the same data as they used in Abramitzky, Boustan and
Eriksson (2014)). They match migrants and stayers in the US and
Norwegian censuses of 1900 to birth families in 1865 based on name
and age. The earnings-related outcome they observe is occupation,
again as in Abramitzky, Boustan and Eriksson (2014).
Empirically, they document mixed evidence on the selection of
rural-born Norwegian men, but strong evidence of negative selection
among urban-born Norwegian men. That is, in the
15
urban sample men with poorer economic prospects in Norway were more
likely to migrate to the US. They provide two pieces of direct
evidence on this negative selection. First, they compare the
occupational distributions of migrants and stayers. Figure 3 (Panel
B, focusing on the urban sample) arrays occupations from
lowest-paid to highest-paid according to the average US earnings in
that occupation. Migrants are more likely to hold low-paying jobs,
while men remaining in Norway exhibit an occupational distribution
that is skewed towards higher-paying jobs. Figure 3 Panel A (not
shown, focusing on the rural sample) suggests that men born in
rural areas are employed in similar jobs in both countries.
Courtesy of Ran Abramitzky, Leah Platt Boustan, Katherine Eriksson,
and the American Economic Association. Used with permission.
16
Second, they compare fathers of migrants and non-migrants. Table 4
compares the occupa tions, asset holdings, and property tax values
of the heads of migrant and nonmigrant households in 1865. For both
the urban and rural samples, this analysis provides evidence of
negative selec tion. For example, in the urban sample heads of
migrant households are 5.8 percentage points less likely to hold
assets than are heads of non migrant households.
Courtesy of Ran Abramitzky, Leah Platt Boustan, Katherine Eriksson,
and the American Economic Association. Used with permission.
17
In the paper, they also infer the direction of migrant selection
indirectly by comparing an OLS estimate of the return to migration
(comparing the earnings of migrants with the earnings of stayers)
with a family fixed effects estimate of the return to migration
(comparing the earnings of migrants with earnings of their stayer
brothers). If the OLS estimate measures the return to migration
plus a selection term, and if migrants are negatively selected,
then the OLS estimate will be smaller than the family fixed effect
estimate. While the family fixed effect estimate is clearly not
free from selection concerns, this analysis can inform the
direction of across-household selection into migration. As shown in
Table 3, this indirect method also provides evidence in favor of
negative selection.
Courtesy of Ran Abramitzky, Leah Platt Boustan, Katherine Eriksson,
and the American Economic Association. Used with permission.
Of course, even within households brothers can differ in unmeasured
personal attributes, so it is difficult to interpret even the
family fixed effect estimates as estimates of the returns to
migration. In Appendix A of their paper, Abramitzky, Boustan and
Eriksson (2012) document an analysis estimating the returns to
migration using the gender composition of a man’s siblings and his
place in the household birth order as instruments for migration.
The idea is that both of these factors influence a man’s
expectation of inheriting farmland in Norway and therefore his
probability of migrating to the US; while the exclusion
restrictions for these variables need not obviously be satisfied,
the authors provide some evidence that the necessary assumptions
appear reasonable in their context.
In terms of take-aways from this paper, there are three things that
I would stress. First, this is a very recent paper that was
providing new, interesting evidence testing the predictions of the
Roy model. This is a classic question, but that doesn’t mean that
there isn’t room for more good papers on it! Second, this paper as
well as Abramitzky, Boustan and Eriksson (2014) highlight the value
of looking for the ‘right’ empirical setting (testing for
self-selection in migration during
18
a time period with open borders) and of constructing the ‘right’
data (the key tests of this paper are just tabulations of the data,
but that’s because the authors did an enormous amount of work in
order to construct data that would allow for transparent empirical
tests). Finally, this paper as well as Abramitzky, Boustan and
Eriksson (2014) are great examples of how economic history can
overlap nicely with core questions in labor economics. There is a
lot of exciting work being done in economic history that overlaps
with labor and other applied micro fields, and I would encourage
you to keep historical settings in mind as you are thinking about
settings in which to investigate research questions of contemporary
interest (in addition to what I think of as the more ‘traditional’
focus of economic history, which is shedding light on the long-run
impacts of economic phenomena).
4 Application: Health care
Despite its origin in labor economics, the Roy model has been
applied across a wide range of fields in economics. As an example,
I’m going to talk in detail about an application of the Roy model
from the field of health economics - Chandra and Staiger (2007).
This is one of the most important papers in health economics in
recent years, and one that has really changed how people think
about a variety of issues.
4.1 Brief background on geographic variation in medical
expenditures
The earliest work I’m aware of that documented geographic variation
in medicine is Glover (1938). Using a variety of data sources from
England and Wales, Glover documented significant variation in
small-area tonsillectomy rates. Despite looking for correlations
with “any factor which might have some ætiological bearing on
chronic tonsillitis and adenoidal growths - such factors for
example as overcrowding and unemployment...not the slightest
suggestion of correla tion has been obtained.” Maybe not the
regressions we would estimate today, but the start of a
puzzle!
In his recent Handbook of Health Economics review, Skinner (2012)
provides an overview of the economics and medical literatures that
have subsequently documented variation in the use of medical care
across “observably similar” patients. Perhaps the key reference in
this area is the Dartmouth Atlas, which has used the Medicare
claims data to document tremendous variation in expenditures across
“hospital referral regions” (markets defined to include at least
one hospital offering some key services such as cardiovascular
treatments). While other market definitions and other datasets have
been used (such as private claims data documenting the use of
medical services in the under-65 non-Medicare population), many of
the key ‘facts’ that have shaped peoples’ thinking in this area
have come from the Dartmouth Atlas. The position taken by the
Skinner (2012) and Chandra and Staiger (2007) is that explanations
such as differences in income, patient preferences, and underlying
health status don’t explain these variations. Other work has shown
that adjusting for prices doesn’t hugely change these factors
either - although price adjustments do matter in some places, like
the Bronx and Manhattan (Gottlieb et al.,
19
2010). Given that most research has examined the importance of one
factor in isolation, we don’t currently have a good sense of what
share of the variation could be explained by these factors when
taken together.
The consensus view from the Dartmouth Atlas is that this geographic
variation in health spending is not associated with improved
satisfaction, outcomes, or survival. This conclusion should be
digested together with recent research that has suggested that - at
least in some contexts - more spending is associated with better
health outcomes (see, e.g., Cutler (2005) and Joe Doyle’s line of
research). That said, for now let’s take as a first fact that
geographic variation in medical expenditures is not associated with
improved health outcomes. This first fact is surprising in light of
the second fact that many (not all) new medical technologies are
shown in randomized clinical trials to be associated with
improvements in health outcomes (if you are interested, ask me more
about how this interacts with product entry regulation that differs
across pharmaceuticals vs. medical procedures and medical
devices).
Chandra and Staiger (2007) discuss how these two facts are often
interpreted as evidence of diminishing returns. Randomized trials
tend to be conducted on groups of patients most likely to benefit
from treatment, whereas the lack of a cross-sectional relationship
between spending and outcomes could be explained by a “flat of the
curve” argument where physicians perform the intervention until the
marginal return is zero. Chandra and Staiger outline three problems
with this argument:
1. No explanation of why we observe geographic variation in the
first place.
2. Still predicts a positive relationship between medical spending
and patient outcomes un less all areas are in the range of zero or
negative marginal benefits; this has never been documented in the
literature.
3. Predicts that the marginal benefit from more intensive patient
treatment should be lower in areas that treat more aggressively.
However, the available evidence from US-Canada comparisons suggest
the opposite: even though the US treats heart attacks more
intensively than does Canada, the marginal benefit from intensive
heart attack treatments appears to be larger in the US.
Chandra and Staiger present a Roy model with productivity
spillovers that can reconcile these facts. Their paper is an
excellent illustration of how a set of facts can motivate a
(relatively simple) theoretical framework producing testable
implications that can then be taken back to the data.
4.2 Preliminaries: What motivates a model with productivity
spillovers?
In the Chandra-Staiger model, patients can receive either of two
alternative treatments: nonin tensive management (medical
management, denoted by subscript 1), and intensive intervention
(surgery, denoted by subscript 2). Physicians choose the treatment
option for each patient that maximizes utility on the basis of the
expected survival rate (Survival1, Survival2) and
20