31
Optimal Migration: A World Perspective by Jess Benhabib and Boyan Jovanovic August 25, 2010 ABSTRACT We ask what level of migration would maximize world welfare. Welfare is assumed to be a weighted average of the utilities of the world’s various citizens, but the weights are also country specic. Using a calibrated one-sector model we nd that unless the weights are heavily biased towards the natives of rich countries, the extent of migration that would be optimal far exceeds the levels observed today. The claim remains true in a two-sector extension of the model. All versions of the model assume that migration is the only redistributive tool. Keywords: World welfare optimum, inequality, migration. JEL Classification number: O15 Author information: Jess Benhabib Boyan Jovanovic [email protected] [email protected] Department of Economics New York University 19 W. 4th St. New York, N.Y. 10012 USA Acknowledgement : We thank Jonathan Morduch for comments, the NSF and the Kauman Foundation for support, and Matthias Kredler for doing all the computa- tions and producing the associated plots. 1

Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

Optimal Migration: A World Perspective

byJess Benhabib and Boyan Jovanovic

August 25, 2010

ABSTRACT

We ask what level of migration would maximize world welfare. Welfare is assumedto be a weighted average of the utilities of the world’s various citizens, but the weightsare also country specific. Using a calibrated one-sector model we find that unlessthe weights are heavily biased towards the natives of rich countries, the extent ofmigration that would be optimal far exceeds the levels observed today. The claimremains true in a two-sector extension of the model. All versions of the model assumethat migration is the only redistributive tool.

Keywords: World welfare optimum, inequality, migration.

JEL Classification number: O15

Author information:

Jess Benhabib Boyan [email protected] [email protected]

Department of EconomicsNew York University19 W. 4th St.New York, N.Y. 10012USA

Acknowledgement : We thank Jonathan Morduch for comments, the NSF and theKauffman Foundation for support, and Matthias Kredler for doing all the computa-tions and producing the associated plots.

1

Page 2: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

1 Introduction

What is the optimal distribution of personal incomes in the world, and how best canit be attained? Economists have studied the question of domestic redistribution usingtaxes and subsidies as the instruments.

When dealing with the world as a whole, however, political constraints arise thatlimit the usefulness of taxes and transfers. The limits and difficulties of foreign aid orinternational redistribution have been studied by Easterly (2006): foreign aid fundsare often wasted or misdirected.

If public foreign aid has been ineffective in reducing international inequality, sohave private capital flows failed to equalize returns and factor prices: Lucas (1990) ar-gued that inequality originates in human-capital differences and that physical capitalflows can do little to eliminate inequality.

A neglected mechanism for reducing inequality, the flip side of capital flows, isinternational migration. Using a one-sector model we find that the optimal use ofmigration alone would seemingly raise welfare to levels far above those achieved atpresent. The mechanism that achieves this is the spillover of knowledge flowingto immigrants when they work along side highly skilled natives. Starting from thedifferences in human capital currently in place, optimal migration policy would involvemoving far more people from poor to rich countries than the latter admit at present.We then confirm this conclusion, with and without spillovers, in a two-sector extensionof the model.

In all versions of the model there is just one consumption good, and thereforeonly factor mobility (as opposed to trade in goods) can achieve equalization of pricesper unit of skill. Moreover, and this is specific to our model, spillovers by assumptionoccur among people in the same geographical area, so people must locate together inorder to share them.In deriving the optimal migration policy we ignore political constraints, except

implicitly those constraints that prevent foreign aid from being the main redistribu-tive tool (for an analysis of political constraints see Benhabib (1996)). Redistributioninvolves winners and losers, but the losers, typically in rich countries, may block poli-cies based on egalitarian social weights worldwide, and restrict access to their labormarkets by foreigners. Our aim here is to derive the egalitarian or near-egalitarianideal, as the reasons why the world falls short of the ideal are political. It is unlikelythat the ideal can be collectively achieved in the near future, even as the poor ofthe world increasingly press for more open access to the labor markets of the richcountries.

In related work, Klein and Ventura (2007) study the steady state of a dynamicmodel with two location in which labor can move from one location to another at acost and in which capital is costlessly mobile. The authors study optimal allocationof labor over the two regions. In spite of this similarity, our model differs in several

2

Page 3: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

ways from theirs. On the one hand, our model is static; on the other it endogenizesTFP along the lines suggested by Lucas (1988).The model and the planner’s problem and presented in Section 2. Calibration

and Simulation are in Section 3 and several extensions are in Section 4. Section 5concludes, and some proofs are in the Appendix.

2 The Model

We first introduce the production technology and its implications for segregated pro-duction. We then introduce the rest of model and the welfare-maximizing planner’sproblem.

2.1 Production technology

Immigrants in our model affect the well-being of the residents of the host countrythrough an external effect. The effect works through human capital per person, h, asin Lucas (1990). The output of a country is

Y = G¡h¢H, (1)

where H is the total human capital in the country. The private marginal product ofhuman capital is G

¡h¢.1

Constant returns and decentralizability.–The production function (1) obeys con-stant returns to scale in the sense that doubling the number of residents while leavingthe distribution of individual human capital h unchanged leaves h unaffected, butdoubles H and, hence, Y. This allows for a competitive situation in which zero-profitfirms (of indeterminate size) hire labor and pay a wage of G

¡h¢per efficiency unit.

Efficiency vs. distribution.–The model has a tension between considerations of ef-ficiency and distribution. Efficiency requires that production be segregated geograph-ically. This is the content of Proposition 1. Let M (h) be the world’s distribution ofhuman capital, and assume that

G (h) = hα, (2)

where α > 0.

Proposition 1 World output is maximized when there is complete segregation by h,i.e.,

Y ≤Z

h1+αdM (h) .

1Evidence that G0 > 0 is in Clark (1987) who invokes national culture somewhat in line withthe ‘social capital’ interpretation of Coleman (1988), and in Rauch (1993) who attributes it tohuman-capital spillovers at the regional level.

3

Page 4: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

Proof. Suppose that there is a location in the world where people are heteroge-neous in h. Let the distribution at that location have measure μ (h), with mean h.Let the total output at that location be

y = G¡h¢µZ

hdμ

¶Then

1Rdμ

y = hG¡h¢= h1+a =

µRhdμRdμ

¶1+α≤ 1R

Zh1+αdμ

where the inequality follows because dμ/Rdμ is a measure adding up to unity, and

h1+α is a convex function. Cancelling the multiplicative constant leaves us with

y ≤Z

h1+αdμ

and the inequality is strict if the support of μ has more than one point. Therefore nolocation can have heterogeneity of h.The foregoing proposition and many of the other results of the paper extend to a

world in which there is a perfectly mobile physical capital K and in which, insteadof (1), the production function is Y = G

¡h¢KbH1−b. The reduced-form production

function would take the form of (1) and an efficiency-equity tradeoff would remain:Migration from poor to rich countries would reduce both income inequality and worldoutput.2

2.2 The planner’s problem

If taxes and subsidies had no disincentive effects and if tax proceeds could be dis-tributed without waste or diversion, the optimal redistribution mechanism would beforeign aid. A world Planner would segregate people by skill, tax the rich and dis-tribute the proceeds to the poor. But if foreign aid is not feasible, the Planner canuse migration. This is the problem we shall now analyze.

Analysis.–Let μA be the pre-migration mean skills in country A and let thehuman capital of A’s residents be distributed h ∼ FA (h) . Let μB be the mean skillsin country B and let the human capital of B’s residents be distributed h ∼ FB (h) ,with density function fB(h). Let

x = φ (h)

2The tradeoff arises because spillovers are local in nature because h is region specific. Firms inthe same location share the same production function function determined by the TFP term G

¡h¢.

Kremer and Maskin (2006) and Eeckhout and Jovanovic (2009) relax the assumption that workersmust be in the same location to participate in the same production process; their models do nothave external effects in production.

4

Page 5: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

be the probability that a type-h resident of B will be allowed to emigrate to A. Thatis, φ : R→ [0, 1] .

A skill-neutral policy is one in which φ is a constant, independent of h. Policiesthat are not skill neutral are skill biased. Denote the average post-migration h levelsin A and B by hA and hB respectively. Let n be the population of B relative to thatof A. The latter is normalized to unity. Human capital per head in A is

hA =μA + n

Rhφ (h) dFB (h)

1 + nRφ (h) dFB (h)

, (3)

and in B it is

hB =

Rh [1− φ (h)] dFB (h)R[1− φ (h)] dFB (h)

. (4)

Migration costs and incentive compatibility.–We assume that each individual mi-grant loses a fraction t of his or her income in the host country. These are costs ofassimilating, finding a job and so on and one would expect them to be proportionalto potential income. Migration must be voluntary which now means that net of mi-gration costs the migrant must earn more in the country of his or her destinationthan in the country of origin. This requires that for an immigrant with skill-level h,G¡hA¢(1− t)h ≥ G

¡hB¢h or simply that3

G¡hA¢(1− t) ≥ G

¡hB¢. (5)

Practically, this constraint rules out policies that would send so many unskilled peoplefrom B to A (see “skimming from the bottom” defined in (8)) that hB becomes somuch higher hA as to imply the negation of (5). Clearly, the higher is t, the largerthe range of policies that (5) rules out.

Social welfare function and the Planner’s problem.–The Planner is a Stackelbergleader. He announces a policy at the outset, and agents then choose their migrationdecisions and production takes place. Let θ and (1 − θ) denote the welfare weightsthat the Planner assigns to utilities of the residents of A and B, respectively. LetU (c) be an agent’s utility function of consumption c. Agents simply consume their

3The implicit assumption is that the immigrant spends t units of time on migrating and adapt-ing to the new environment, leaving him with 1 − t units of time for work. We nevertheless as-sume, for simplicity, that immigrants contribute fully to hA. One could alternatively assume thattheir contribution to hA was proportional to (1− t) in which case the RHS of (3) would becomeμA+(1−t)n hφ(h)dFB(h)

1+(1−t)n φ(h)dFB(h). This would slightly lower the costs of an influx of low-h immigrants, slightly

lower the benefits of a high-h influx, but otherwise leave the results largely unchanged. Our formu-lation gets support from Caponi (2006) who finds that immigrants face a significant loss of capacityto translate their abilities into earnings but no loss of capacity to transfer their human capital totheir children; we loosely associate the latter with how the human capital of immigrants enters G (·).

5

Page 6: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

wages. The Planner chooses a function φ (h) to maximize

θ

ZU£G¡hA¢h¤dFA (h)+(1− θ)n

Z ©φ (h)U

¡G£hA¤h [1− t]

¢+ [1− φ (h)]U

¡G£hB¤h¢ª

dFB (h) .

(6)subject to (3), (4), and (5).The planner therefore chooses the migration policy that maximizes the weighted

sum of the utilities of agents, with the weights differing between natives of the hostcountry and others. An alternative would be to have the planner assign weights as afunction not of countries but of agents’ characteristics alone. Such a welfare functionwould then give the same weight to agents with the same skill level, independently oftheir country of origin. This is a welfare function that one would regard as fair, andwith our parametrization it obtains only when θ = 1/2.Until Section 4.2., we shall take the distribution of skills FA and FB as given, and

throughout the paper we shall treat as exogenous the parameter n (the number ofresidents in the poor world relative to that of the rich). This means that a rise in αwill have two opposing effects. It will raise the efficiency losses resulting from mixingthe skill levels, but it also will widen the welfare losses resulting from differences inhA and hB. In other words as α rises, the tension between distribution and efficiencygets stronger.

2.2.1 The optimal policy

The rest of the paper will assume that μA > μB, that h has no upper bound in thesupports of FA and FB, and that

U (c) = ln c. (7)

In this case, we shall show that the optimal policy is skill dependent and of the “bang-bang” type: Among people of type h, either everyone should migrate or no one shoulddo so. Moreover, the set of types is connected in that if type h0 is allowed to migrate,then either everyone with h below h0 is also allowed to migrate, or everyone above h0is allowed to migrate. Let us first describe these policies.

Skimming from the bottom of FB.–Under this policy there exists a cutoff, h, suchthat everyone with h < h

φ (h) =

½1 for h < h

0 for h > h(8)

Under this policy (8), there exists a unique h that will equate the average skill in thetwo countries. Formally, there exists an unique h <∞ such that the RHSs of (3) and

6

Page 7: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

(4) are equated:4

μA + nR h0hdFB (h)

1 + nFB

³h´ =

R∞h

hdFB (h)

1− FB

³h´ . (9)

This policy always helps the non-migrating natives of B, and for h < h it helps themigrants too. The planner will use this policy when θ (the social weight on the richcountry) is low.Skimming from the top of FB.–This we shall refer simply a “brain-drain” policy:

φ (h) =

½0 for h < h

1 for h > h(10)

For h large enough, this policy raises hA and helps the natives of A, while it hurtsthe non-migrating natives of B. The planner will use this policy when θ is high.5

The rest of this section will prove that the policy will indeed be bang-bang, takingthe form (8) if θ is low and (10) if θ is high. The full characterization will emergein section 3 where we shall simulate two versions of the model. To ease notation, letgi = lnG

¡hi¢for i = A,B.

Lemma 1 When U (C) = lnC, the maximand in (6) reduces to the equivalent

W ≡ θ∗AgA + θ∗B (gB − ln [1− t]) (11)

subject to (3) and (4), where

θ∗A = θ + (1− θ)ωn, θ∗B = (1− θ)n (1− ω) , and ω ≡Z

φ (h) dFB.

Proof. Substituting for U and leaving out terms that do not depend on φ, (6)reads

θgA + (1− θ)n

Z{φ (h) (gA + lnh+ ln [1− t]) + (1− φ [h]) (gB + lnh)} dFB

= θgA + (1− θ)n

Z{φ (h) gA + (1− φ [h]) (gB − ln [1− t])} dFB + (1− θ)n

Z(h+ ln [1− t]) dFB.

But the last terms does not depend on φ and we are left with (11).Assume that the density fB exists for all h, and define

z(h) = nfB(h)φ(h)

4Proof: At h = 0, the LHS is larger, whereas as h→∞, the LHS→ μA+nμB1+n while the RHS→∞.

Uniqueness is shown by showing that when evaluated at the solution of (9), the derivative of theRHS exceeds that of the LHS, ruling out multiple crossings

5For both policies at point h = h we know only that 0 ≤ φ (h) ≤ 1, the Planner being indifferentabout whether h should migrate or not.

7

Page 8: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

to be the new control variable that satisfies z(h) : R → [0, nfB(h)] for all h. Interms of this control variable in (11) we have

hA =μA +

Rhz (h) dh

1 +Rz (h) dh

, hB =nμB −

Rhz (h) dh

n−Rz (h) dh

, and nω ≡ Z =

Zz (h) dh.

The constraint set for z is convex. We attach the multiplier λ0 to the non-negativityconstraint, and the multiplier λ1 to the upper-bound constraint. The Planner facesthe Lagrangian

L =W +

Zλ0 (h) z (h) dh−

Zλ1 (h) z (h) .

We do not include the incentive-compatibility constraint (5) in the Lagrangian. Rather,we shall verify (5) ex post, and in Section 3 characterize the policies that (5) rulesout. The FOC is

∂W

∂z(h)= λ1 (h)− λ0 (h) , (12)

where ∂W∂z(h)

is evaluated at the optimal policy, the latter consisting of an entire func-tion z (.). Note that at most one multiplier can be non-zero and that

∂W

∂z(h)=

½< 0 =⇒ λ0 (h) > 0 and φ (h) = 0> 0 =⇒ λ1 (h) > 0 and φ (h) = 1

(13)

Now let nA = 1 + nω be the post-immigration population of A and nB = n (1− ω)the post-immigration population of B.

∂W∂z(h)

= θ∗Ag0(hA)

h−hAnA− θ∗Bg

0(hB)h−hBnB

+ (1− θ)£g(hA)− g(hB) + ln (1− t)

¤= α(1− θ)

hln hA

hB+ ln(1−t)

α+ 1−m+

³mhA− 1

hB

´hi.

,

(14)where the second equality follows because g(h) = α lnh and g0(h) = α/h, and where

m ≡ θ + (1− θ)Z

(1− θ) (1 + Z)and Z ≡

Zz(h)dh. (15)

Proposition 2 The optimal policy has the following properties:

1. Whenever immigration is positive, it is always skill biased,

2. For θ sufficiently close to unity, the policy is of the form (10), and

3. For θ < 12, the policy is of the form (8).

8

Page 9: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

Proof. The RHS of the FOC (14) at the status-quo point at which Z = 0, i.e.the point at which there is no migration, is illustrated in Figure 3 in which verticalaxis measures the marginal benefit of moving a person of type h from B to A. Thatbenefit depends linearly on the migrant’s level of h: I.e., in (14), ∂W

∂z(h)is linear in

h, which immediately shows that either (8) or (10) must hold, though possibly withh = 0 or h = +∞. The larger is m (which, in turn, is more likely when θ is large) themore likely the slope will be positive and that the more likely that a brain drain isoptimal. Conversely when θ is small, the more likely that low-h migrating is optimal,i.e., (8).We also note that for some θ’s satisfying 1

2< θ < 1, the optimal policy may

involve no migration — see Figure 3 and the top panel of Figure 4.A number of alternative assumptions would overturn the optimality of the bang-

bang policy. First, conditional on h, people may differ in their moving costs. Sec-ond, there may be h-specific congestion costs such as were imposed by the AmericanMedical Association some decades ago. Third, the production function could entail

diminishing returns to each skill if, for instance, Y = G¡h¢ ³R

n (h)φ dh´1/φ

. Fourth,physical-capital limits in the rich world would, coupled with diminishing returns toH, limit the optimal immigration flow.6 We do, however, present a two-skill extensionthat features diminishing returns to skill of each type and show that our conclusionssurvive this extension.

3 Calibration and simulation

We now wish to illustrate the optimal policy for all θ, and for realistic FA and FB.We choose “country A” to be the OECD which we shall think of as the developedworld. “Country B” will then be the rest of the world. Sala-i-Martin (2006) reportsthe world distribution in the year 2000, and how it comprises the distributions ofincome in individual countries. We reproduce these distributions in Figure 1, whichshows them to be roughly log-normal in form.We observe the distribution of income y for each citizen, which we approximate

6Nevertheless, we do not believe that migration is deterred largely by moving costs or by dimin-ishing returns to labor — the latter would provide additional reasons why immigrants’ wages in thehost countries would be lower. Rather, it would appear that migrants are eager to move but thatthey are shut out by the restrictive immigration policies of the rich countries. In fact, there seemsto be a vast excess demand for migration. Such excess demand prompted the U.S. to built a fenceon the Mexican border and to ration migrants it by lottery. People are willing to risk imprisonmentand deportation and estimates are that there are at least 10 million illegal immigrants in the us.Similar conditions exist in Europe; Italian authorities intercept boats from Albania, and the Frenchlegal system imposes fines and even jail on those who help illegal immigrants, North Africans risktheir lives trying to cross the Mediterranean to the south of Europe.. A further indication that thereis excess demand to migrate is the proposal that visas be auctioned to the highest bidders.

9

Page 10: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

Figure 1: The world income distribution in 2000

Figure 2: Calibrated distributions of A and B

10

Page 11: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

as follows by a log-normal distribution:

μOECD(log y) = ln 20, 000, σOECD(log y) = ln 2 (16)

μRest(log y) = ln 2, 000, σRest(log y) = ln 2.5

These are portrayed in Figure 2. We set

n = 10, α = 0.35

the latter following Lucas (1990). The following equation identifies h:

E (y) = exp(μ+ σ2/2) = G(h)E (h) = hα+1 =⇒ hA = exp

µμ+ σ2/2

1 + α

¶To infer the human capital, h, of a citizen with income y, we invert the equationy = G(h)h = hαh to get h = yh−α, i.e., lnh = ln y − α ln h

Rendon and Cuecuecha (2007) estimate that the out-of pocket moving cost of aMexican to the U.S. is about US$ 550 in 1992 dollars and Amuedo-Dorantes andBansak (2007) report the slightly higher estimate of $655-$831. As a fraction of aMexican’s lifetime U.S. earnings this is negligible. In our first calibration, then, weassume that moving costs are zero.

Figure 3 plots the RHS of the FOC (14) at the status-quo point at which Z = 0,i.e. the point at which there is no migration. The vertical axis measures the marginalbenefit of allowing a migrant in; the benefit depends on the migrant’s level of h. Thefigure shows that for some values of θ — say around θ = 0.8, the marginal benefit ofmigration is negative at all levels of migration. Because the first-order condition islinear in h, the gain to migrating a worker of type h is either decreasing or increasingin h depending on the sign of θ

hA− 1−θ

hB.

The brain-drain region θ ∈ [θBD, 1] – The slope of the FOC changes sign at

θBD =1

1 + hB/hA≈ 0.88 in the calibrated example,

where BD is for brain-drain: If (as is the case in this calibration) FB has unboundedsupport, then for any θ > θBD, some very smart B-people should go to A, and therewill be a brain drain. This will remain so even when we add migration costs.The skim-the-bottom region of θ ∈ [0, θSB].–There is another threshold, call it

θSB, below which country A will only receive low-h types. Suppose that the lowestlevel of h in the support of FB is zero (as, again, is the case in the calibration). Thenas shown in Figure 4,

θSB =1 + C

2 + C≈ 0.71

11

Page 12: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

Figure 3: The first-order condition at the status quo

where C = ln(hA/hB). This is also apparent in Figure 3 in which the FOC for θ = 0.7barely crosses the zero axis in the neighborhood of zero.

The inaction region θ ∈ (θSB, θBD).–In this region, efficiency losses stemmingfrom the mixing overwhelm the redistributive gains. It is not worth moving the high-skilled B-natives to A because, while this would raise G

¡hA¢, it would reduce G

¡hB¢

by too much. At these intermediate θs, it is not that the Planner does not value theA-natives; he simply values the B-natives too much to allow a brain drain from B tooccur.

The optimal policy.–The optimal policy is described in Figure 4. The horizontalaxis in each panel measures θ, the weight that the planner assigns to the rich country’snatives. The top panel describes the skill of the movers. The purple area is the setof people who can move under the optimal policy. Rather than measure the levelof a migrant’s h, however, the vertical axis in the top panel measures G (μA)h, thewage that a migrant of type h would earn in country A assuming that no one elsewas allowed to move so that average skills in A were at their pre-migration level ofμA. For θ ≤ 0.72, the unskilled B-natives migrate to A, and for θ ≥ 0.89, the skilledB-natives migrate. In between, migration is zero.The middle panel of Figure 4 plots the numbers optimally moving at various

levels of θ. At the egalitarian weight θ = 1/2 the optimal number migrating is 2.3billion, and then as θ rises the number declines at an increasing rate, and reaches zero

12

Page 13: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90$

20,000$

40,000$

60,000$

Optimal migration: Type of migrants

θh:

Mig

rant

’spo

tent

ialwag

ein

OE

CD

hOECD

hREST

q0.99REST

0 0.2 0.4 0.6 0.8 10

1

2

3

Optimal migration: Quantities

θ

Mig

rati

onin

billi

onpe

rson

s

t=0.00

0 0.2 0.4 0.6 0.8 10

500

1000

1500

2000

2500

θ

Ave

rage

hum

anca

pita

l

h: OECDh∗ : REST

θSTOP θSB θBD

Figure 4: The optimal policy

13

Page 14: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

when θ = 0.72.When θ reaches 0.88, skimming from the top starts, but the numbersare seen to be small, simply because the developing countries have few highly-skilledpeople. The bottom panel Figure 4 shows that when evaluated at t = 0, the incentive-compatibility constraint (5) is binding for θ ≤ 1/2. That is because, for θ < 1/2,the planner would like the residents of B to enjoy higher utility, and this requireshB > hA, but in that case migrants would not wish to move from B to A

3.1 Broader migration costs

One can argue that after crossing the border, an immigrant needs to look for a job anda place to live, and so on, and that these additional costs should be included in t. Wenow pursue this possibility. One can estimate such costs from wage differentials net ofcost-of-living differentials within, say, the U.S., within which mobility is unrestricted.Gemici (2007) estimates an individual’s cost of moving from one U.S. census regionto another7 to be about seven percent of lifetime income. If we add the Rendon-Cuecuecha or the Amuedo-Dorantes and Bansak estimates to Gemici’s estimate, weend up with at most 7.5 percent of U.S. lifetime income.The counterpart of Figure 4 when t = 0.075 is Figure 5. We find that at all levels

of θ the effect of t > 0 is to reduce the amount of optimal migration. At the egalitarianweight of of θ = 0.5, the number of people optimally moving drops from 2.5 billion to1.8 billion, which is still two orders of magnitude higher than the estimates of currentmigration levels of 100-200 million.8 In the bottom panel we see that (5) starts tobind when hA/hB = (1− t)−1/α = 1.25, which binds for θ ≤ 0.46. In neither case,therefore does the incentive constraint impede the Planner from implementing theegalitarian allocation.

Now the region of θ’s for which zero migration is optimal grows only slightly,from [0.72, 0.88] to [0.70, 0.88], and it would now take a value of θ of almost 0.7 tojustify the currently observed migration levels in terms of numbers alone. But suchan inference would not be correct because of the composition of the migrants whichstill come mainly from the right tail of the h distribution in the source countries —what the advanced countries have in place at the moment may not be a brain drainpolicy, but it certainly is not a ‘skim from the bottom policy’ that is optimal whenθ < 0.70. A significant number of rich countries are evidently behaving as if the

7The U.S. is divided into 9 such regions so that the average region contains six states. Gemicicontrols for unobserved attributes of regional locations so that the effect of cost-of-living differentialson mobility costs would be reflected in the influence of these attributes. She estimates the residualpecuniary costs of moving between a pair of census divisions to be about $19,000 (in 1982 dollars),which is seven percent of the lifetime income ($240,000) of an average white male with 9 years ofregional tenure.

8Freedberg and Hunt (1995) report that all but 100 million of the world’s 6 billion people, i.e.,all but 1.7 percent, live in the country of their birth. The International Organization for Migrationestimates that there are 191 million transnational migrants worldwide comprising 3% of the globalpopulation. See http://www.iom.int/jahia/page254.html.

14

Page 15: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90$

20,000$

40,000$

60,000$

Optimal migration: Type of migrants

θh:

Mig

rant

’spo

tent

ialwag

ein

OE

CD

hOECD

hREST

q0.99REST

0 0.2 0.4 0.6 0.8 10

1

2

3

Optimal migration: Quantities

θ

Mig

rati

onin

billi

onpe

rson

s

t=0.075

0 0.2 0.4 0.6 0.8 10

500

1000

1500

2000

2500

θ

Ave

rage

hum

anca

pita

l

h: OECD

h∗ : REST

θSTOP θSB θBD

Figure 5: Optimal migration when migration costs are 7.5% of lifetimeincome in the host country

15

Page 16: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

Planner weighed their citizens with a value of θ in excess of 0.88.9

4 Three extensions

We now ask if our conclusions remain valid when we extend the model to incorporatethree realistic features that have so far been left out. Each feature is likely to changethe quantitative predictions, but they leave intact the conclusion that a welfare func-tion that considers individuals alone (and not their country of origin) would implythat much larger flows of migrants are optimal.

4.1 Cross-country differences in TFP

Cross-country TFP differentials raise the optimal level of migration for any welfarefunction except at θ = 1. To analyze this extension, we redefine the productionfunction as

Yi = SiG¡hi¢Hi (17)

where Si is TFP in country i = A,B. Then (11) in the Lemma above becomes

W ≡ θ∗A (gA + sA) + θ∗B (gB + sB − ln [1− t]) (18)

where si = lnSi. Instead (14), the first order condition is now

∂W∂z(h)

= θ∗Ag0(hA)

h−hAnA− θ∗Bg

0(hB)h−hBnB

+ (1− θ)£g(hA) + sA + ln (1− t)− g(hB)− sB

¤= α(1− θ)

hln hA

hB+ ln(1−t)

α+ sA−sB

α+ 1−m+

³mhA− 1

hB

´hi.

(19)If we assume that productivity is higher in the host country A, then sA > sB andthe derivative ∂W

∂z(h)is now larger than it was in (14) where the TFP differences were

9The U.S. today follows a mixture of skill-biased policies and skill-neutral policies based on fourprinciples: The reunification of families, the admission of immigrants with needed skills, the protec-tion of refugees, and the diversity of admissions by country of origin. While special legislation now al-lows for special consideration for medical professionals for example, the majority of legal immigrantsenter the US through the family-reunification program. While Canadian policy also allows immigra-tion based on family reunification, preferences stress skills and youth: During 1990—2002, 65 per centof permanent immigrants to the United States were admitted under family preferences. In Canada,the equivalent proportion was 34 per cent (International Migration and Development: Regional Fact-sheet, The Americas, http://www.un.org/migration/presskit/factsheet_america.pdf). Similarly,Australia heavily emphasizes skills and youth in its preference system for immigrants. See for exam-ple http://www.workpermit.com/Australia/australia.htm. Recently France has also moved towardsa skill biased immigration policy: see http://www.migrationpolicy.org/pubs/Backgrounder2_Caponi (2006a), however, finds that an empirical analog of φ (h), relation between skill and theprobability of migration of Mexicans to the U.S. is U-shaped: The highest and lowest educated tendto migrate more than the middle educated.

16

Page 17: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

absent. Therefore larger is the TFP difference sA − sB, the more likely it is that theslope will be positive at all levels of h.The displacements are the same at all levels of h, but they are larger when θ is

small. In terms of Figure 3 which plots the RHS of the FOC (14), the RHS of 19 entailsan upward displacement of the lines by the amount (1− θ) (sA − sB). Therefore theaddition of TFP differences raises the optimal migration for all welfare functionsexcept the one in which θ = 1. Since TFP is exogenous, the cost to migration thatthe planner faces are the same as they were in the absence of TFP differences — theforegone earnings costs is the efficiency loss entailed by the reduction in G

¡hA¢. But

the benefits are now larger, and so more migration would be optimal.

4.2 Dynamics and investment in h

To keep things simple we assume that within countries A and B agents are homoge-neous, with skills μA and μB respectively, once again we assume that μA > μB.Accumulation technology.–The technology for accumulating human capital is the

same in both countries. The fraction of the period-0 time spent working is uA and uB,respectively. The remaining time is spent training, with the resulting human capitallevels being, as in Lucas (1988),

hA = δ (1− uA)μA and hB = δ (1− uB)μB

in period one.Equal migration probabilities.–Let x be the probability that any B-native will be

allowed to move to A. Within-country homogeneity implies that (3) and (4) become

hA =hA + xnhB1 + xn

, and hB = hB (20)

respectively.Preferences.–Lifetime utility is U (c0)+ρU (c1). There is no borrowing and lend-

ing, so that each worker simply consumes his wages. We also assume that immigrantscannot send money back to their own country.The investment decision in A.–Each resident of A starts life with μA. In the

first period there are no immigrants and therefore the first-period efficiency wage isG (μA) . The only variable that the immigration policy x affects is the second-periodhA, and through it, each A-native’s second period wage, G

¡hA¢. It is convenient to

index the decision problems by hA. Therefore a native of A maximizes his lifetimeutility

v¡hA¢≡ max

c0,c1,u{U (c0) + ρU (c1)}

subject toc0 = uG (μA)μA, and c1 = δ (1− u)G

¡hA¢μA. (21)

17

Page 18: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

He takes the wages in both periods as given, because these depend only on the skillcomposition of the population at large. Then the A-native’s problem becomes

v¡hA¢= max

u

©U (uG (μA)μA) + ρU

¡δ (1− u)G

¡hA¢μA¢ª

.

We once again assume (7). Then the problem boils down to maximizing lnu +ρ ln (1− u). The first-order condition is 1

u− ρ1−u = 0, and it yields optimal investment

uA =1

1 + ρ. (22)

which does not depend on the immigration policy x.The investment decision in B.–The policy is assumed to be skill neutral, which

means x is not influenced by the B-native’s h. Thus, a B-native must choose hisfirst-period investment before he knows if he will immigrate or not. Should he haveto remain in B, his efficiency wage will be G

¡hB¢, and if he is allowed to emigrate,

his wage will be G¡hA¢. His lifetime utility is

v¡hA, hB, x

¢= max

u{U (c0) + ρ [xU (c1A) + (1− x)U (c1B)]}

subject to:c0 = uG (μB)μB

c1A = δ (1− u)G¡hA¢μB

c1B = δ (1− u)G¡hB¢μB

(23)

where c1A is B’s consumption if he wins the lottery and moves to A and c1B is hisconsumption if he loses and stays in B. The worker takes h∗ and h as given. Onceagain, under (7),the investment rate is the same as in A, i.e.,

uB =1

1 + ρ= uA. (24)

Planner’s choice of x.–The planner is a Stackelberg leader. He announces amigration policy at the outset, and carries it out at the end of the first period. Giventhe policy, agents invest in h at t = 0. The planner then chooses x to solve theproblem

maxx

©θv¡hA¢+ (1− θ)nv

¡hA, hB, x

¢ªsubject to (20) and (24). The first-period utilities do not depend on x, and thediscount factor drops out. The problem reduces to

maxx{θU (c1) + (1− θ)n [xU (c1A) + (1− x)U (c1B)]} .

Substituting from (7), (21), and (23), and letting g = lnG,the problem boils down tomaxx J , where (writing gi ≡ g

¡hi¢for i ∈ {A,B})

J ≡ θgA + (1− θ)n (xgA + (1− x) gB) (25)

subject to (20). The following Lemma (proved in Appendix 2) shows that under somereasonable conditions the solution for x is bang-bang, i.e., either x = 0 or x = 1:

18

Page 19: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

10.750.50.250

1

0.75

0.5

0.25

0

z

theta

z

theta

x = 1

x = 0n = 10

n = 1

θ

zThe bang-bang policy

Figure 6: Optimal policy when h is engogenous

Lemma 2 For G (h) = hα, and for any α > 0, n > 0 and θ ∈ (0, 1) , x is either zeroor one

This result implies that the planner’s problem cannot have an interior maximum.Rather, the planner’s maximum is at a corner: Either x = 0 or x = 1.Characterizing the solution for x.–Define the initial, date-zero productivity of a

B-native relative to that of an A-native by

Relative backwardness ≡ z =μBμA

< 1. (26)

The Appendix also shows that the optimal policy is

x =

⎧⎨⎩ 0 if θ >n ln z−1+n

1+n

ln z−1+(n−1) ln z−1+n1+n

,

1 otherwise(27)

We plot the indifference locus for n = 1 and n = 10 in Figure 6.The planner is more inclined to a policy of immigration if B is poor, and if B

is large (large n), though the latter is not a quantitatively important consideration.In the plot, the action x = 0 is preferred in the north east quadrant (NE) and

19

Page 20: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

x = 1 is preferred in the south east quadrant (SW). So, the planner chooses maximalimmigration if he cares enough for B (low value of θ), and if B is poor enough (lowvalue of z) and if B is large (high values of n) . Empirically, z = 0.1 is a goodapproximation to the average non-U.S. income, which means, for n = 1, (since in factx is close to zero) we may infer that θ ≥ 3/4. For example, at z = 0.1, the criticalθ needed to switch to a zero-migration regime is 0.78, and at z = 0.5 the critical θthen drops to 0.59

Freedberg and Hunt’s (1995) evidence tells us that we are, effectively, in the x = 0region. But, since z must be rather small — say 1/10 — the action x = 0 is optimalonly if θ is at least 0.8. Thus, from the homogeneous-residents model the followingconclusions emerge:

1. Investment in human capital leaves intact the main conclusion that the policyoutcome we now have is incompatible with even approximately equal weightsin the social welfare function.

2. The model of this subsection assumes that within each country agents werehomogeneous before and after training. If the planner could and did redistributeincomes within economies, he would still want to use migration to redistributeacross economies.

4.3 Two skills, and no external effects

In this section we show that our main conclusions are robust to the introductionof a second skill in the production function. They also are robust to the removal ofexternal effects and their replacement with exogenous productivity differences that donot depend on the within-country composition of skills. In particular, the followingtwo conclusions (see Figure 4),

1. When θ = 1/2, the Planner equalizes h in the two countries, and

2. When θ = 1, the Planner pursues a ‘brain-drain’ policy,

remain valid when restated in their two-skill version as in Propositions 2 and 3.

Let production now depend on two homogeneous groups of workers: the ‘skilled,’the number of which is s, and the ‘unskilled,’ the number of which is u. Output is

Y = xG³ su

´sβu1−β, (28)

where G¡su

¢is an external effect operating through the ratio of skilled to unskilled

workers, and where G is an increasing function. The parameter x is introduced toproxy for all other country-specific variables that may affect the productivity of thetwo factors. When external effects are absent G ≡ 1.

20

Page 21: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

Let r = s/u. The wage of a skilled worker in that country then is

ws = βrβ−1xG (r) , (29)

and the wage of an unskilled worker is

wu = (1− β) rβxG (r) . (30)

In general, x and s depend on the location A or B. We assume that xA > xB so thatother things the same the Planner wants to move people from B to A.

Let s0 and u0 denote the number of skilled and unskilled that move from B to A.Post-migration wages are for each country then given by (29) and (30), but with theappropriate r substituted in, namely

rA =sA + s0

uA + u0and rB =

sB − s0

uB − u0.

The Welfare criterion.–As before, let θ be the weight on country A natives.Assume moving costs are zero. The Planner’s criterion then is

W = θ¡sAU

¡wAs

¢+ uAU

¡wAu

¢¢+(1− θ)

½s0U

¡wAs

¢+ u0U

¡wAu

¢+

(sB − s0)U¡wBs

¢+ (uB − u0)U

¡wBu

¢ ¾ .

(31)

The incentive-compatibility constraint.–Each factor can flow from B to A only ifits wages inA exceed those inB. Instead of (5), then, we now have two IC constraints:¡

wAs − wB

s

¢s0 ≥ 0, and

¡wAu − wB

u

¢u0 ≥ 0. (32)

The Planner’s tradeoff.–As before, the Planner faces a tradeoff between efficiencyand utility. But because of the complementarity between s and u, (in contrast toProposition 1) output is now lo longer largest under complete segregation. If xA wereto equal xB, because (28) is homogeneous of degree 1 in (s, u), output would be at amaximum as long as rA = rB, and the distribution of activity between A and B wouldnot matter. But if xA > xB, the world would produce the most output if everyonewere to be moved to location A. The Planner’s question then is how far the Plannercan shift domestic factor ratios without reducing too much the consumption of oneor the other group of A-natives.

As before, we shall assume that

G (r) = rα and α+ β < 1, (33)

21

Page 22: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

so that ws = βrα+β−1x, and wu = (1− β) rα+βx. Then, skilled immigration from Bto A is incentive compatible as long as

rB >

µxBxA

¶1/[1−(α+β)]rA (34)

and unskilled immigration from B to A is feasible as long as, i.e.,

rB <

µxAxB

¶1/(α+β)rA. (35)

Proposition 3 If θ = 1/2, and if xA = xB, the Planner welfare criterion is maxi-mized by immigration flows that equate factor ratios

sB − s0

uB − u0=

sA + s0

uA + u0.

This result, proved in the Appendix, parallels the one-skill result (see Figure4) that when t = 0 and θ = 1/2, the Planner equalizes h in the two countries.It also assumes that (7) holds. Thus if the Planner’s preferences were egalitarian,immigration flows would be much larger than they are in practice.

In the Appendix we also prove that when θ = 1, a ‘brain-drain’ policy is optimal,just as it was in the one-skill case.

Proposition 4 If θ = 1 and if

rA <α+ β

1− (α+ β), (36)

the optimal policy is to allow only the skilled emigrate from B to A until wAs = wB

s .

The converse is also true: If the inequality in (36) is reversed, then it is optimalfor only the unskilled to emigrate from B to A until wA

u = wBu .

Since the nature of the policy depends on whether (36) holds or not, we needevidence on the parameters. Condition (36) pertains to the pre-migration relationbetween factor endowments. Taking the U.S. evidence as an indication, (36) is likelyto hold for in fact. If we consider the skilled as college educated labor, then in theUS roughly half the working population now holds a college degree, so that rA ≈ 1.On the other hand, the wage bill of the skilled is roughly twice that of the unskilled.In our context this means that β ≈ 2/3. Therefore even if α = 0 the RHS of (36) isat least 2. Therefore (36) is the relevant case empirically.

Proposition 3 is illustrated in Figure 7. The planner’s FOCs both hold along thatdashed line with its intercept at rA =

α+β1−(α+β) .(See (37) and (38) of the Appendix).

If he did not have to worry about (32), the Planner would end up on the dashed line.

22

Page 23: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

Figure 7: Illustration of Proposition 3

The location of that line does not depend on xA or xB since the Planner cares onlyabout the A-natives.

Incentive compatibility fails in the shaded areas in Figure 7. Now (34) is violatedin the bottom-right shaded area, and (35) is violated in the top-right shaded area.If xA = xB, both lines would coincide with the 450 line. The figure assumes thatxA > xB.

Equation (36) holds to the left of the dashed line, and we have shown that this isthe empirically relevant starting point. But there is a second consideration: Becausecountry A is skill abundant, we have rA > rB which means that we start somewhereto the right of the 450 line, and as the skilled move from A to B, we move southeast.Now there are two subcases, depending on how high the initial rB is.

1. If country B is fairly well endowed with skilled labor and if rB is fairly high sothat the starting point is Q, say, then, the Planner’s optimum is to move to theunconstrained interior maximum point R.

2. If rB is low, however, far below the 450 line, then the starting point is a pointsuch as C. At this point wB

S is closer to wAS than in the first case, and that

means that less immigration is feasible before (32) is violated (which it would

23

Page 24: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

be if the planner were to continue to an interior point such as E). Thereforeless immigration of the skilled can occur in this case.

Whether the move is a large one from Q to R, or a smaller one from C to D,the empirically relevant optimal policy involves an influx of the skilled factor fromthe poor world, i.e., a brain drain. Thus when we use empirically relevant parametervalues, this result parallels our previous result (see Figure 4) that in the one-skillcase, when θ = 1, the Planner should pursue a brain-drain policy.

Therefore the main implications of our one skill analysis survive the model’s ex-tension to two skills. In particular, current immigration policies still resemble farmore the θ = 1 case than they do the egalitarian case of θ = 1/2.

5 Conclusion

We have studied the role that migration can play in redistributing income to theworld’s poor. We argued that there is an equity-efficiency tradeoff. In spite of thattradeoff, we found that for reasonable social weights on the rich and the poor, theextent of migration that would be optimal would appear to be much larger than thelevels observed today.Any policy conclusion drawn from this exercise must, however, be quite tentative,

for several reasons. First, except in section 4.2, the model is static, yet the fraction ofthe world’s population that is migrating appears to be rising — the stock of migrantshas increased from 155 million in 1990 to 213 million today. The model is silent onwhy this shift may have taken place. Second, all versions of the model assume thatmigration is the only redistributive tool, excluding foreign aid and FDI. Third, asidefrom foregone-earnings costs, the diminishing returns to migration operate entirelyby reducing the human-capital externality.More broadly, If it is productivity differentials to stimulate migration, different

sources of these differentials will affect differently the incentive to migrate. In par-ticular, the rate at which the incentive diminishes with a rise in migration dependson the source of the differential. Except for the two-skill extension where migrationis driven by factor-ratio differentials, incentives diminish exclusively through the di-lution of the externality term. Overall, several reasonable modifications of the modelmay reduce the optimal migration flows, but the thrust of our conclusion appears tosurvive: World-welfare would be higher if migration flows from poor to rich countrieswere considerably larger than they are today.

24

Page 25: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

6 Appendix

We prove Propositions 2 and 3 after some preliminary derivations. Since lnwAs −

lnwBs = lnw

As /w

Bs , and since xA = xB,

lnwAs − lnwB

s = lnGA

GB+ (1− β) ln

(uA + u0) (sB − s0)

(uB − u0) (sA + s0),

and

lnwAu − lnwB

u = lnGA

GB− β ln

(uA + u0) (sB − s0)

(uB − u0) (sA + s0).

The terms involving the lnxi appear additively and drop out of the calculations. Wecan write the remaining part of the RHS of (31) as

W = (θsA + (1− θ) s0) (lnβ + (1− β) ln (uA + u0) + (β − 1) ln (sA + s0) + lnGA)

+ (θuA + (1− θ)u0) (ln (1− β)− β ln (uA + u0) + (β) ln (sA + s0) + lnGA)

+ (1− θ) (sB − s0) (lnβ + (1− β) ln (uB − u0) + (β − 1) ln (sB − s0) + lnGB)

+ (1− θ) (uB − u0) (ln (1− β)− β ln (uB − u0) + (β) ln (sB − s0) + lnGB) .

We maximize it with respect to s0 and u0 subject to (32). We do so in turn for θ = 1and θ = 1/2.

Proof of Proposition 2 (the case θ = 1/2).–In this case θ cancels from the FOCswhich, after rearrangement, become

∂W

∂s0= (1− β)

µln(uA + u0) (sB − s0)

(uB − u0) (sA + s0)

¶+ ln

GA

GB+

G0A

GA

sA + s0

uA + u0− G0

B

GB

sB − s0

uB − u0

µuA + u0

sA + s0− (uB − u0)

sB − s0

¶+

G0A

GA− G0

B

GB,

and

∂W

∂u0= −β

µln(uA + u0) (sB − s0)

(uB − u0) (sA + s0)

¶+ ln

GA

GB+ (1− β)

µsA + s0

uA + u0− sB − s0

uB − u0

¶+G0B

GB

(sB − s0)2

(uB − u0)2− G0

A

GA

(sA + s0)2

(uA + u0)2+

G0B

GB

sB − s0

(uB − u0)− G0

A

GA

sA + s0

uA + u0.

Now ifsB − s0

uB − u0=

sA + s0

uA + u0=

sA + sBsB + uB

,

G0AGA=

G0BGB, and ∂W

∂s0 =∂W∂u0 = 0. So the first order conditions hold. Using (33) we can

show that the Hessian of W is negative semi-definite along a ray. Thus the optimalimmigration flows equalize factor ratios.

25

Page 26: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

Proof of Proposition 3 (case θ = 1).–In this case the welfare criterion reduces toW = sA lnw

As + uA lnw

Au , or (dropping the A subscript from G),

W = sA (lnβ + (1− β) ln (uA + u0) + (β − 1) ln (sA + s0) + lnG)

+uA (ln (1− β)− β ln (uA + u0) + β ln (sA + s0) + lnG) .

The derivative with respect to s0 is

∂W

∂s0=

−sAsA + s0

µ1− β − G0

G

sA + s0

uA + u0

¶+

uAsA + s0

µβ +

G0

G

sA + s0

uA + u0

¶(37)

=

µsA

sA + s0

¶ ∙β

µuAsA− 1− β

β

¶+ α

µsA + uA

sA

¶¸,

and with respect to u0 it is

∂W

∂u0=

1

uA + u0

∙µ(1− β) sA − sA

G0

G

µsA + s0

uA + u0

¶¶+

µ−βuA − uA

G0

G

µsA + s0

uA + u0

¶¶¸(38)

=−sA

uA + u0

∙β

µuAsA− 1− β

β

¶+ α

µsA + uA

sA

¶¸.

Then the claim holds if β³uAsA− 1−β

β

´+α

³sA+uAsA

´> 0, which reads β

rA−1+β+α+

αrA

> 0, i.e., (36).

APPENDIX 2: Proof of Lemma 2.

We start by noting that since neither hB nor uA = uB = 1/ (1 + ρ) depend on x,

dJ

dx= (θ + [1− θ]nx)

dgAdx

+ (1− θ)n (gA − gB) , (39)

andd2J

dx2= (θ + [1− θ]nx)

d2gAdx2

+ 2 (1− θ)ndgAdx

. (40)

If (39) does not hold for any x ∈ (0, 1), the solution must be at a corner and theclaim is proved. Conversely, if (39) does hold at an interior point, we now show that(40) also does. Note that

d2J

dx2> 0⇐⇒ d2gA/dx

2

dgA/dx< − 2 (1− θ)n

θ + (1− θ)nx. (41)

We shall now show that if (39) holds, then (41) also does. Since G (h) = hα,

gA = α lnδ (1− uA)μA + nxδ (1− uB)μB

1 + nx= α ln δ (1− uA) + α ln

h+ nxμB1 + nx

= constant + α (ln (μA + nxμB)− ln (1 + nx)) .

26

Page 27: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

Therefore in light of (26), 1αdgAdx= n

z−1+nx −n

1+nx< 0, and

1

α

d2gAdx2

=d

dx

µn

z−1 + nx− n

1 + nx

¶=

Ã−µ

n

z−1 + nx

¶2+

µn

1 + nx

¶2!> 0.

Therefore

d2gA/dx2

dgA/dx=

³−¡

nz−1+nx

¢2+¡

n1+nx

¢2´n

z−1+nx −n

1+nx

=

£¡n

1+nx

¢−¡

nz−1+nx

¢¤ £¡n

1+nx

¢+¡

nz−1+nx

¢¤n

z−1+nx −n

1+nx

= −∙µ

n

1 + nx

¶+

µn

z−1 + nx

¶¸.

Note that hB = δ (1− uB)μB, and that hA =δ(1−uA)μA+nxδ(1−uB)μB

1+nx. Then,

g − g∗ = α ln δ (1− uA) + α

µln

µμA + nxμB(1 + nx)

¶− α ln (δ (1− uB)μB)

¶(42)

= α ln δ (1− uA) + α ln

µμA + nxμB

(1 + nx) δ (1− uB)μB

¶so that

gA − gBdgA/dx

=

α ln δ (1− uA) + α ln

µμAμB+nx

(1+nx)δ(1−uB)

¶α¡

nz−1+nx −

n1+nx

¢ =

α ln δ (1− uA) + α ln

µ(z−1+nx)

(1+nx)δ(1−uB)

¶α¡

nz−1+nx −

n1+nx

¢=

ln

µ(z−1+nx)(1+nx)

¶¡

nz−1+nx −

n1+nx

¢ .Now from the definitions of hA, hB, and g,

d2gA/dx2

dgA/dx=

³−¡

nz−1+nx

¢2+¡

n1+nx

¢2´n

z−1+nx −n

1+nx

=

£¡n

1+nx

¢−¡

nz−1+nx

¢¤ £¡n

1+nx

¢+¡

nz−1+nx

¢¤n

z−1+nx −n

1+nx

= −∙µ

n

1 + nx

¶+

µn

z−1 + nx

¶¸.

So, for (41) to hold, one needs that

−∙µ

n

1 + nx

¶+

µn

z−1 + nx

¶¸< − 2 (1− θ)n

θ + (1− θ)nx,

27

Page 28: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

or that1

2

∙µ1

1 + nx

¶+

µ1

z−1 + nx

¶¸>

(1− θ)

θ + (1− θ)nx=

1θ1−θ + nx

.

Now (39) implies θ(1−θ) = −

³n³g−g∗g0

´+ nx

´, which is equivalent to

1

2

∙µ1

1 + nx

¶+

µ1

z−1 + nx

¶¸>

1

−³n³g−g∗g0

´+ nx

´+ nx

or, from (42), to

1

2

∙µ1

1 + nx

¶+

µ1

z−1 + nx

¶¸>

1

−n³g−g∗g0

´ = −1ln

(z−1+nx)(1+nx)

1z−1+nx−

11+nx

=−¡

1z−1+nx −

11+nx

¢ln³(z−1+nx)(1+nx)

´ .

Now, this condition can be re-written as

ln

µ(z−1 + nx)

(1 + nx)

¶>

11+nx

− 1z−1+nx

¢£¡1

1+nx

¢+¡

1z−1+nx

¢¤ = 2 ¡ 11+nx

¢2 − ¡ 1z−1+nx

¢2¡¡1

1+nx

¢+¡

1z−1+nx

¢¢2ln

µ1

(1 + nx)

¶− ln

µ1

(z−1 + nx)

¶>

11+nx

− 1z−1+nx

¢£¡1

1+nx

¢+¡

1z−1+nx

¢¤ ,or, if we write A = 1

(1+nx)and B = 1

(z−1+nx) , to

ln (A)− ln (B) = lnµA

B

¶> 2

µA−B

A+B

¶= 2

AB− 1

AB+ 1

.

That is, we need, for all y ≡ A/B > 1, that

ln y > 2y − 1y + 1

. (43)

The LHS and RHS of (43) are both zero at y = 1. Therefore it’s enough to show thatthe first derivative of the LHS is more positive than the first-derivative of the RHSfor all y. That is, it suffices to show that

1

y> 2

µ1

y + 1− y − 1(y + 1)2

¶=

2

y + 1

µ1− y − 1

y + 1

¶=

2

y + 1

µy + 1− y + 1

y + 1

¶=

4

(y + 1)2.

So, we need to show that (y + 1)2 > 4y, or that y2 + 2y + 1 > 4y. But y2 + 1− 2y =(y − 1)2 > 0. Therefore (41) holds. ¥

28

Page 29: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

Derivation of (27).–Consider the set of (θ, z) pairs at which J (1) = J (0). Weshall show that the boundary of the two regions in (27) is the indifference curve θ =

n ln z−1+n1+n

ln z−1+(n−1) ln z−1+n1+n

. First note from (20), that hA is proportional to 1−uA(= 1−uB),and so is hB. Neither uA nor uB depends on x, and after being substituted into gAand gB they become additive constants. Therefore J (1) = J (0) is equivalent to

g

µμA + nμB1 + n

¶(θ + (1− θ)n) = θg (μA) + (1− θ)ng (μB)

Since g (s) = α ln s, the parameter α cancels and upon exponentiating both sides wehave µ

μA + nμB1 + n

¶θ+n(1−θ)= μθAμ

n(1−θ)B

Now μA = z−1μB, so that³z−1+n1+n

´θ+(1−θ)n= z−θ and this then reduces toµ

z−1 + n

1 + n

¶θ+(1−θ)n= z−θ, i.e.,

[θ + (1− θ)n] ln

µz−1 + n

1 + n

¶= −θ ln z, i.e.,

θ

∙(1− n) ln

µz−1 + n

1 + n

¶+ ln z

¸+ n ln

µz−1 + n

1 + n

¶= 0,

i.e.,

θ =n ln

³z−1+n1+n

´ln z−1 + (n− 1) ln

¡z−1+n1+n

¢ .References

[1] Amuedo-Dorantes, Catalina and Cynthia Bansak. “Illegal Border Crossings At-tempts Following Apprehension.” San Diego State University, January 2007.

[2] Benhabib, Jess. “On the political economy of immigration” European EconomicReview 40, no. 9 (December 1996): 1737-1743.

[3] Butcher, Kristin and Anne Morrison Piehl. “Recent Immigrants: UnexpectedImplications for Crime and Incarceration,” Industrial and Labor Relations Review51, no. 4 (1998): 654-79.

[4] Caponi, Vincenzo. “Intergenerational Transmission of Abilities and Self Selec-tion of Mexican Immigrants.” International Economic Review (Accepted). Olderversion: DP 2431, IZA 2006

29

Page 30: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

[5] Caponi, Vincenzo. “Heterogeneous Human Capital and Migration: Who Mi-grates from Mexico to the US?” Annales d’Economie et de Statistique, Forth-coming. Older version: DP 2446, IZA 2006(a)

[6] Clark, Gregory. “Why Isn’t the Whole World Developed? Lessons from theCotton Mills.” Journal of Economic History 47, no. 1 (March 1987): 141-73.

[7] Coleman, James. “Social Capital in the Creation of Human Capital” Ameri-can Journal of Sociology 94, Supplement: Organizations and Institutions: So-ciological and Economic Approaches to the Analysis of Social Structure (1988):S95-S120.

[8] Easterly, William. The White Man’s Burden: Why The West’s Efforts To AidThe Rest Have Done So Much Ill And Little Good. Penguin Press, New York,2006.

[9] Eeckhout, Jan and Boyan Jovanovic. “Occupational Choice and Development.”New York University, September 2009.

[10] Friedberg, Rachel, and Jennifer Hunt. “The Impact of Immigrants on Host Coun-try Wages, Employment and Growth.” Journal of Economic Perspectives 9, no.2 (Spring, 1995): 23 - 44.

[11] Gemici, Ahu. “Family Migration and Labor Market Outcomes.” NYU, June2008.

[12] Klein, Paul and Gustavo J. Ventura. “TFP Differences and the Aggregate Effectsof Labor Mobility in the Long Run.” The B.E. Journal of Macroeconomics,Berkeley Electronic Press 7, no. 1 (2007).

[13] Kremer, Michael and Eric Maskin. “Globalization and Inequality.” Harvard wp,2006.

[14] Lucas, Robert E., Jr. “On the Mechanics of Economic Development.” Journalof Monetary Economics 22, no. 1 (July 1988): 3-42.

[15] Lucas, Robert E., Jr. “Why Doesn’t Capital Flow from Rich to Poor Countries?”A.E.A. Papers and Proceedings 80, no. 2 (May 1990): 82-86.

[16] Rauch James E. “Productivity Gains from Geographic Concentration of HumanCapital: Evidence from the Cities” Journal of Urban Economics 34, Issue 3(November 1993): 380-400.

[17] Rendon, Silvio and Alfredo Cuecuecha. “International Job Search: Mexicans Inand Out of the US.” IZA DP #3219, December 2007.

30

Page 31: Optimal Migration: A World Perspective C.pdf · Constant returns and decentralizability .–The production function (1) obeys con-stant returns to scale in the sense that doub ling

[18] Sala-i-Martin, Xavier. “The World Distribution of Income.” Quarterly Journalof Economics 121, no 2 (May 2006): 351-97.

31