63
Redistribution and Social Insurance Mikhail Golosov Princeton Maxim Troshkin Cornell Aleh Tsyvinski Yale October 2013 Abstract We study optimal redistribution and insurance with dynamic idiosyn- cratic shocks. We show the forces that determine the optimal labor dis- tortions and derive closed form expressions for their limiting behavior. The labor distortions for high shocks are driven by labor elasticity and higher mo- ments of the shock process; the labor distortions for low shocks are driven by the persistence of the shock process, redistributive objectives, and by past distortions. We calibrate our model using newly available estimates of idio- syncratic shocks and nd that optimal labor distortions are approximately U-shaped, saving distortions are at except for low realization of shocks. The previous draft of this paper was circulated under a title "Optimal Dynamic Taxes". We thank Stefania Albanesi, Fernando Alvarez, V.V. Chari, Dirk Krueger, Larry Jones, Igor Livshits, Stephen Morris, James Poterba, Emmanuel Saez, Ali Shourideh, Nancy Qian, Hongda Xiao, Pierre Yared, and numerous seminar and conference audiences. Marianne Bruins, James Du/y and Nicolas Werquin provided outstanding research assistance. Go- losov and Tsyvinski thank EIEF for hospitality and NSF for support. Troshkin thanks Minneapolis Fed for hospitality and support. Tsyvinski thanks IMES of the Bank of Japan and John Simon Guggenheim Foundation.

Redistribution and Social Insurance

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Redistribution and Social Insurance

Mikhail GolosovPrinceton

Maxim TroshkinCornell

Aleh Tsyvinski∗

Yale

October 2013

Abstract

We study optimal redistribution and insurance with dynamic idiosyn-

cratic shocks. We show the forces that determine the optimal labor dis-

tortions and derive closed form expressions for their limiting behavior. The

labor distortions for high shocks are driven by labor elasticity and higher mo-

ments of the shock process; the labor distortions for low shocks are driven by

the persistence of the shock process, redistributive objectives, and by past

distortions. We calibrate our model using newly available estimates of idio-

syncratic shocks and find that optimal labor distortions are approximately

U-shaped, saving distortions are flat except for low realization of shocks.

∗The previous draft of this paper was circulated under a title "Optimal Dynamic Taxes".

We thank Stefania Albanesi, Fernando Alvarez, V.V. Chari, Dirk Krueger, Larry Jones,

Igor Livshits, Stephen Morris, James Poterba, Emmanuel Saez, Ali Shourideh, Nancy Qian,

Hongda Xiao, Pierre Yared, and numerous seminar and conference audiences. Marianne

Bruins, James Duffy and Nicolas Werquin provided outstanding research assistance. Go-

losov and Tsyvinski thank EIEF for hospitality and NSF for support. Troshkin thanks

Minneapolis Fed for hospitality and support. Tsyvinski thanks IMES of the Bank of Japan

and John Simon Guggenheim Foundation.

A sizeable New Dynamic Public Finance (NDPF) literature studies redis-

tribution and social insurance in lifecycle models.1 This literature extends the

classic Mirrlees equity-effi ciency trade-off to dynamic settings, where agents

are heterogeneous in their abilities and receive stochastic shocks over lifetime.

Despite a significant progress achieved in recent years, still relatively little

is known about the general forces determining optimal allocations and their

qualitative and quantitative properties. This paper makes a step toward filling

that gap.

We study a lifecycle economy with individuals who are ex ante hetero-

geneous in their abilities and experience idiosyncratic shocks to the abilities

over time. We make two contributions. Our theoretical contribution char-

acterizes key forces determining the size of optimal distortions and derives

transparent, easily interpretable expressions for the labor distortions, for high

and low realizations of idiosyncratic shocks, in terms of empirically measurable

parameters and the moments of the stochastic shock process. Our quantitat-

ive contribution computes optimal labor and savings distortions in a lifecycle

model, calibrated to newly available data on the labor earning process for a

random sample of 10 percent of U.S. working-age male population. The data

set allows highly accurate estimates of the moments of the stochastic process

(Guvenen et al. (2013)), which emerge from our analysis as key parameters

determining the properties of the optimum.

Most of our theoretical analysis focuses on characterizing the properties of

optimal labor distortions, or wedges, between marginal utility of consumption

and leisure. We show that the labor distortion in a given period has two

components: an intratemporal component that provides insurance against new

1See, for example, Golosov, Kocherlakota and Tsyvinski (2003) or reviews in Golosov,

Tsyvinski and Werning (2006) and Kocherlakota (2010).

1

shocks in that period, and an intertemporal component that relaxes incentive

constraints in the previous periods and helps reduce the costs of insurance

provision against idiosyncratic shocks. The intratemporal component has the

same general structure as optimal labor distortions in static models, such as

Mirrlees (1971) and Saez (2001). The intertemporal component is new to

dynamic settings. It is driven by a specific form of a likelihood ratio of the

stochastic idiosyncratic shock process, which is equal to the persistence of

the shock when preferences are separable between consumption and labor,

and when idiosyncratic shocks are drawn from a log-normal distribution (or a

mixture of log-normals).

We show that intra- and intertemporal components affect the high and the

low realizations of shocks differently. In a benchmark model with separable

consumption and leisure, the size of the labor distortions for high realizations

of shocks is driven primarily by the intratemporal component, while the size

of the labor distortions for the low realizations is primarily driven by the in-

tertemporal component. We use this insight to derive explicit expressions for

the labor distortions conditional on high and low realizations of idiosyncratic

shocks. The labor distortions for high shocks depend on the Frisch elasticity

of labor supply and on a measure of the dispersion of shocks - variance when

shocks are drawn from a log-normal distribution, kurtosis when shocks are

drawn from a mixture of normals, or a Pareto tail parameter when shocks are

drawn from an asymptotically Pareto distribution. The labor distortions for

low realizations of shocks are proportional to shock persistence, past labor dis-

tortions, and the size of the fall in consumption due to the shock. In addition

to providing transparent, easily interpretable expressions, the decomposition

into low and high realizations facilitates an understanding of quantitative res-

ults by shedding light on how different parameters affect the size and shape of

2

the distortions. The expressions imply, for instance, that the planner’s redis-

tribution objective primarily affects the labor distortions for low, but not for

high shocks.

We also show that an additional force emerges in dynamic economies when

consumption and labor are complements. Complementarity exacerbates the

distortion and, as a result, motivates the social planner to shift from intratem-

poral provision of incentives to the intertemporal provision, especially for the

high skilled. We show that for a range of preferences with complementarities

and no income effects, such as those in Greenwood, Hercowitz and Huffman

(1988), this implies that the optimal labor distortions for the highly skilled

individuals are very low (and asymptotically zero) and savings distortions are

quantitatively high, independently of the nature of the underlying shock pro-

cess.

Our analysis emphasizes the importance of the properties of the distribu-

tion from which individuals draw idiosyncratic shocks, and, in particular, its

higher moments. In the quantitative analysis of a life cycle, we use newly

available estimates of the idiosyncratic labor income process from a high qual-

ity administrative U.S. data by Guvenen et al. (2013) and Guvenen, Ozkan

and Song (2013). This idiosyncratic shock process implies that optimal labor

distortions are approximately U-shaped when preferences are separable and

isoelastic. The dip in the distortion is around the expected realization of the

shock, conditional on past information. Savings distortions are approximately

flat, except for the low skilled. As explained by our theoretical analysis, both

the shape and the size of optimal labor distortions depend significantly on

the kurtosis of the idiosyncratic shock, which, in Guvenen et al. (2013), is

estimated to be over 20 for prime-age males. In contrast, a commonly used as-

sumption of log-normal shocks implies an approximately flat labor distortions

3

with much higher optimal savings distortions.

A number of papers are related to our work. Our theoretical and quantit-

ative analyses are built on the recursive approach developed in Kapicka (2013)

and Pavan, Segal and Toikka (2010). An important contribution of Farhi and

Werning (2013) derives a formula describing the dynamics of labor income

taxes in continuous and discrete cases, provides a simulation of a lifecycle

economy, and develops additional insights using a continuous time approach.

Most of their analysis studies the time-series properties, focusing on the per-

sistence and trend in the expected labor wedge. In contrast, our work focuses

on characterization of the cross-sectional properties of distortions for various

realizations of shocks. Our main analytical findings —the decomposition of the

labor distortion into the two components, characterization of the properties of

those components in terms of the primitives of the shock process and elasticit-

ies —are all new. Golosov, Kocherlakota and Tsyvinski (2003), Golosov and

Tsyvinski (2006), Grochulski and Kocherlakota (2010), Kocherlakota (2005),

Werning (2009) are some of the examples of the earlier theoretical work ex-

amining different properties of the optimal distortions and their relationships

to taxes.

Our quantitative analysis is also related to a number of earlier studies.

Albanesi and Sleet (2006) provide a comprehensive numerical and theoret-

ical study of optimal capital and labor taxes in a dynamic economy with

i.i.d. shocks. Golosov, Tsyvinski and Werning (2006) is a two-period numer-

ical study of the determinants of dynamic optimal taxation in the spirit of

Tuomala (1990). Ales and Maziero (2007) numerically solve a version of a life

cycle economy with i.i.d. shocks drawn from a discrete, two-type distribution,

and find that the labor distortions are lower earlier in life. Weinzierl (2011),

Fukushima (2010) and Farhi and Werning (2013) numerically solve the op-

4

timal labor and savings distortions in dynamic economies. Most of this work

assumes log-normal distribution of idiosyncratic shocks and calibrates it using

PSID. Ours is the first study that uses the U.S. administrative data and the

higher moments, such as skewness and kurtosis, estimated from the data.

The rest of the paper is organized as follows. Section 1 describes the

environment. Section 2 provides theoretical analysis and its quantitative illus-

trations. Section 3 analyzes a calibrated quantitative life cycle model. Section

4 concludes.

1 Environment

We consider an economy that lasts T + 1 periods, denoted by t = 0, ..., T .2

Each agent’s preferences are described by a time separable utility function over

consumption ct ≥ 0 and labor lt ≥ 0,

E0

T∑t=0

βtU(ct, lt), (1)

where β ∈ (0, 1) is a discount factor, E0 is a period 0 expectation operator,

and U : R2+ → R.

In period t = 0, agents draw their initial type (skill), θ0, from a distribution

F0(θ). For t ≥ 1, skills follow aMarkov process Ft (θ|θt−1), where θt−1 is agent’s

skill realization in period t− 1.We denote the probability density function by

ft(θ|θt−1). For some parts of the analysis it will be convenient to assume that

people retire at some period T , in which case Ft (0|θ) = 1 for all θ and all

t ≥ T .We assume that ft is differentiable in both arguments for t < T . Skills

2The recursive formulation of the problem that follows makes it easy to extend the

analysis to the case of infinitely lived agents. In fact, the calibration and numerical analysis

are greatly simplified in the case of infinitely lived agents.

5

are non-negative: θt ∈ Θ = R+ for all t. The set of possible histories up to

period t is denoted by Θt.

An agent of type θt who supplies lt units of labor produces yt = θtlt units

of output. The skill shocks and the history of shocks are privately observed

by the agent. Output yt = θtlt and consumption ct are publicly observed.

In period t, the agent knows his skill realization only for the first t periods

θt = (θ1, ..., θt). Denote by ct(θt)

: Θt → R+ agent’s allocation of consumption

and by yt(θt)

: Θt → R+ agent’s allocation of output in period t. Denote by

σt(θt)

: Θt → Θt agent’s report in period t. Let Σt be the set of all such

reporting strategies in period t. Resources can be transferred between periods

at rate δ > 0. The observability of consumption implies that all savings are

publicly observable. The social planner evaluates welfare using Pareto weights

α : Θ → R+, where α (θ) is a weight assigned to an agent born in period

0 with type θ. We normalize∫∞

0α (θ) dF0 (θ) = 1. Social welfare is given by∫∞

0α (θ)

(E0

∑Tt=0 β

tU (ct, lt))dF0(θ).

We denote partial derivatives of U with respect to c and l as Uc and Ul

and define all second derivatives and cross-partials accordingly. We make the

following assumptions about U.

Assumption 1. U is twice continuously differentiable in both arguments, sat-

isfies Uc > 0, Ul < 0, Ucc < 0, Ull < 0, Ucl ≥ 0, and

∂θ

Uy (c, y; θ)

Uc (c, y; θ)≥ 0.

These assumptions are standard. The last restriction is the single crossing

property. The assumption that Ucl ≥ 0 ensures that consumption and leisure

are substitutes, which is generally considered to be the empirically relevant

case (e.g. Browning, Hansen and Heckman (1999)).

6

We define two objects that are important for our characterization of labor

distortions — the Frisch elasticity of labor supply ε, and the coeffi cient of

consumption-labor complementarity γ :

ε =UlUlll

, γ =Ucll

Uc. (2)

Although these coeffi cients are endogenous, many commonly used specifica-

tions of utility function imply that they are either constant or take a simple

form. Throughout the paper we make the following assumption about prefer-

ences.

Assumption 2. U is such that 1 + 1ε> γ.

This mild assumption is satisfied for many functional forms, for example,

in static economies when leisure is not an inferior good.

The optimal allocations solve the dynamic mechanism design problem (see,

e.g., Golosov, Kocherlakota and Tsyvinski (2003)):

max{ct(θt),yt(θt)}

θt∈Θt;t=0,..,T

∫ ∞0

α (θ)

(E0

{T∑t=0

βtU(ct(θt), yt(θt)/θt)})

dF0(θ)

(3)

subject to the incentive compatibility constraint:

E0

{T∑t=0

βtU(ct(θt), yt(θt)/θt)}

≥ E0

{T∑t=0

βtU(ct(σt(θt)), yt(σt(θt))/θt)}

,∀σT ∈ ΣT , (4)

and the feasibility constraint:∫ ∞0

E0

{T∑t=0

δtct(θt)}

dF0(θ) ≤∫ ∞

0

E0

{T∑t=0

δtyt(θt)}

dF0(θ). (5)

7

We follow Fernandes and Phelan (2000) and Kapicka (2013) to write the

problem recursively. Here we briefly sketch the main steps and refer to the

two papers for technical details. Constraint (4) can be written recursively as

U(c(θt), y(θt)/θt)

+ βωt+1

(θt|θt

)≥ U

(c(θt−1, θ

), y(θt−1, θ

)/θt

)+ βωt+1

(θt−1, θ|θt

), ∀θ, θ ∈ Θ,∀t (6)

and

ωt+1

(θt−1, θ|θt

)= Et

{T∑

s=t+1

βs−t−1U(cs

(θs), ys

(θs)/θs

)∣∣∣∣∣ θt},

where θs

=(θ0..., θt−1, θ, θt+1, ..., θs

), that is all the histories in which the

agent misreports his type once in the history θs. It is possible to write the

optimization problem recursively using ω(θ|θ)as a state variable following

methods developed by Fernandes and Phelan (2000). This problem, however,

is intractable since ω(θ|θ)is a function of

(θ, θ)and thus the state space

becomes infinite dimensional. Kapicka (2013) and Pavan, Segal and Toikka

(2010) further simplify this problem by replacing global incentive constraints

(6) with their local analogue, the first-order conditions, to obtain a more man-

ageable recursive formulation. When "non-local" constraints do not bind one

needs to keep track of only "on the path" promised utility w (θ) = ω (θ|θ)

and the utility from a local deviation w2 (θ) = ω2 (θ|θ), where ω2 (θ|θ) is the

derivative of ω with respect to its second argument evaluated at (θ|θ) . The

maximization problem (3) can be re-written for t ≥ 1 as

Vt(w, w2, θ−) = minc,y,u,w,w2

∫ ∞0

(c (θ)− y (θ) + δVt+1 (w (θ) , w2(θ), θ)) ft (θ|θ−) dθ

(7)

subject to

u′ (θ) = Uθ(c(θ), y(θ)/θ) + βw2 (θ) , (8)

8

w =

∫ ∞0

u (θ) ft (θ|θ−) dθ, (9)

w2 =

∫ ∞0

u (θ) f2,t(θ|θ−)dθ, (10)

u(θ) = U(c(θ), y(θ)/θ) + βw(θ). (11)

The value function VT+1 as well as w and w2 disappear from this formu-

lation in the last period.3 The value function V0 in period t = 0 takes a

form

V0(w0) = minc,y,u,w,w2

∫ ∞0

(c (θ)− y (θ) + δV1 (w (θ) , w2(θ), θ)) f0 (θ) dθ (12)

subject to (8), (11) and

w0 =

∫ ∞0

α (θ)u (θ) f0 (θ) dθ. (13)

There are four state variables in this recursive formulation: w is the prom-

ised utility associated with the promise-keeping constraint (9); w2 is the state

variable associated with the threat-keeping constraint (10); θ− is the reported

type in period t − 1; and age t. The initial value w0 is the largest solution to

the equation V0(w0) = 0.4

The first-order approach is valid only if at the optimum the local con-

straints (8) are suffi cient to guarantee that global incentive constraints (6)

are satisfied. It is well known that there are no general conditions either in

the static mechanism design problem with multiple goods (see, e.g., Mirrlees

3This discussion is given for the case when there is no retirement. If there is a retirement

period, the value function VT (w) is equal to the present value of resources needed to provide

w utils to a retired agent between periods T and T. In this case the choice variable w2

disappears from the recursive formulation in period T − 1. The rest of the formulation is

unchanged.4If we add exogenous government expenditures to our model, then w0 should satisfy

V0(w0) = −G where G is the present value of such expenditures.

9

(1976)) or in dynamic models (see, e.g., Kapicka (2013)) which guarantee that

only local incentive constraints bind. In the next lemma we show suffi cient

conditions that the optimal allocations must satisfy to guarantee that local

constraints (8) imply (6).

Assumption 3. In the optimum c (·) and ω (·|θ) are piecewise C1 and increas-

ing for all θ, and the derivative of ω(θ|θ)with respect to θ (when exists),

ω1

(θ|θ), is increasing in θ for all θ.

Lemma 1. If Assumptions 1 and 3 are satisfied, then (8) implies (6).

This Lemma provides a simple set of suffi cient conditions to check in nu-

merical analysis to verify that global incentive constraints are satisfied. To see

this, note that equation (6) implies (8), and therefore maximization problem

(12) is a relaxed version of the original maximization problem (3). Thus, the

value of (12) must be weakly higher than that of (3). If a solution to (12) sat-

isfies Assumption 3, it must also satisfy all the additional incentive constraints

by Lemma 1 and hence be a solution to (3).

The focus of our analysis is qualitative and quantitative characterization

of the optimal labor and savings distortions, or wedges. For an agent with the

history of shocks θt at time t, we define a labor distortion, τ yt(θt), as

1− τ yt(θt)≡−Ul

(ct(θt), yt(θt)/θt)

θtUc(ct(θt), yt(θt)/θt) (14)

and a savings distortion, τ st(θt), as

1− τ st(θt)

=

β

)Uc(ct(θt), yt(θt)/θt)

Et{Uc(ct+1

(θt+1

), yt+1

(θt+1

)/θt+1

)} . (15)

For some results it will also be useful to define a life-time savings distortion,

τ st , as

1− τ st(θt)

=

β

)T−t Uc(ct(θt), yt(θt)/θt)

Et{Uc(cT(θT), yT

(θT)/θT)} .

10

In what follows we sometimes use notation τ t(θt|θt−1

)instead of τ t

(θt)to

emphasize that distortion in period t for type θt depends on the past history

θt−1.

2 Characterization of distortions

In this section, we consider a general formulation of our problem. We show

that the optimal labor distortion in period t is driven by a combination of two

forces: an intratemporal component, that provides insurance against shocks in

period t, and an intertemporal component, that relaxes incentive constraints

in previous periods. In the sections that follow, we provide a tight character-

ization of these forces for specific utility functions and shock processes.

Maximization problem (12) implies that the optimal labor distortion in

period 0 is given by

τ y0 (θ)

1− τ y0 (θ)=

(1 +

1

εθ− γθ

)1− F0 (θ)

θf0 (θ)(16)

×∫ ∞θ

Uc (θ)

Uc (x)exp

(−∫ x

θ

γxdx

x

)(1− λ1,0α (x)Uc(x))

f0(x)dx

1− F0(θ),

where εθ and γθ referred to the Frisch elasticity and coeffi cient of complement-

arity defined in (2) and evaluated at the optimal allocations for type θ, and

λ1,0 is the Lagrange multiplier on constraint (13). The expression for λ1,0 as

well as all the other derivations are provided in the online appendix.

The expression for the optimal labor distortions in period 0 is identical to

the optimality condition in the static model, e.g. equations (16) and (17) in

Saez (2001).5 The same general forces determine the shape of the optimal labor

5Saez (2001) derived conditions for the optimal labor distortions in terms of compensated

11

distortions in the two models - the elasticity of labor supply and the coeffi cient

of complementarity, tail ratio 1−F0(θ)θf0(θ)

, and the redistributive objectives of the

government.

Despite qualitative similarity, there are some important differences between

the two models. First, the shape of labor distortions in period 0 in the dy-

namic model is determined by the initial distribution of types F0 (θ) while a

cross-sectional distribution is typically used in the static model. The cross-

sectional distribution is more dispersed than F0 because it consists of the ini-

tial distribution of types plus additional shocks that an individual experiences

over lifetime. As we discuss below, many commonly used preferences imply

that distortions are smaller if shocks are drawn from the distribution with

less dispersion and hence distortions are generally lower if initial, rather than

cross-sectional, distribution is used. Second, several objects on the right hand

side of (16) are endogenous. This may lead to qualitative differences between

the static and dynamic models. In particular, we show that if consumption

and labor are complements, it may be optimal to reduce some of the labor

distortions in period 0 and provide incentives in the future, especially for high

realizations of θ0.

The optimal labor distortion in period t ≥ 1 is given by:

and uncompensated wage elasticities of labor supply, ζcθ and ζuθ . In the static model the two

expressions are the same once we observe that 1− ζcθζuθ= γθ. In dynamic models the expression

for the optimal distortions cannot be expressed solely in terms as elasticities of labor supply

due to the response of savings (see, e.g. Golosov, Tsyvinski and Werquin (2013)). Deriving

the expression for labor distortions in terms of the Frisch elasticity and the coeffi cient of

complementarity allows us to better capture the underlying economic forces in dynamic

settings.

12

τ yt(θ|θt−1

)1− τ yt

(θ|θt−1

) =

(1 +

1

εθ− γθ

)1− Ft (θ|θt−1)

θft (θ|θt−1)(17)

×∫ ∞θ

Uc (θ)

Uc (x)exp

(−∫ x

θ

γxdx

x

)(1− λ1,tUc (x))

ft (x|θt−1) dx

1− Ft(θt|θt−1

)+β

δ

τ yt−1

(θt−1

)1− τ yt−1

(θt−1

) 1 + 1εθ− γθ

1 + 1εθt−1

− γθt−1

Uc (θ)

Uc(θt−1

) θt−1

θ

1

ft (θ|θt−1)

∫ ∞θ

exp

(−∫ x

θ

γxdx

x

)f2,t (x|θt−1) dx.

The optimal labor distortion in period t is the sum of two terms. The first

term on the right hand side of (17) has the same general form as (16) when the

social planner is Utilitarian, α (θ) = 1 for all θ. As in the case of the Utilitarian

planner who provides the optimal insurance against ex-ante, period 0 shocks,

this term stems from the need to provide insurance against idiosyncratic shocks

realized in period t. We refer to this term as the intratemporal, or insurance,

component of the labor wedge.

The second term on the right hand side of (17) is new and it depends

on distortions in previous periods. We refer to it as the intertemporal, or

dynamic, component of the labor wedge. It depends on the distortions in

the previous period,τ t−1(θt−1)

1−τ t−1(θt−1), and relative elasticities of labor supply in

the current and previous periods,1+ 1

εθ−γθ

1+ 1εθt−1

−γθt−1, since these elasticities meas-

ure relative deadweight costs of distortions between the periods. It also de-

pends on persistence and higher moments of idiosyncratic shocks captured byθt−1

θ1

ft(θ|·)∫∞θ

exp(−∫ xθγxd ln x

)f2,t (x|·) dx.When the idiosyncratic shocks are

persistent, the planner can reduce labor distortions today by postponing them

into the future. We show that this expression simplifies significantly for many

commonly used stochastic processes.

One way to understand the intuition for the intertemporal component is

to write a Hamiltonian to (7) with −λ2,t being a Lagrange multiplier on (10).

13

Simple algebra shows that u in the Hamiltonian is multiplied by a term α (θ) =(1− λ2,t

λ1,t

f2,t(θ|θt−1)

ft(θ|θt−1)

)and τ yt can be written as

τ yt(θ|θt−1

)1− τ yt

(θ|θt−1

) =

(1 +

1

εθ− γθ

)1− Ft (θ|θt−1)

θft (θ|θt−1)(18)

×∫ ∞θ

Uc (θ)

Uc (x)exp

(−∫ x

θ

γxdx

x

)(1− λ1,tα (x)Uc (x))

ft (x|θt−1) dx

1− Ft (θ|θt−1).

In the natural case where provision of promised utility is costly and the

incentive constraints in history θt−1 bind downward, λ2,t/λ1,t > 0. Therefore,

α (θ) assigns a weight greater than 1 for the realizations θt that are more

likely to occur for type θt−1 than for type θt−1 + ∆ for small ∆ > 0, and a

weight greater than 1 for the realizations of θt that are more likely to occur

for θt−1 + ∆. When the planner weighs realizations of θt with such α (θt) , it

negatively affects the truth-telling agent θt−1 less than the agent θt−1 + ∆ who

misreports his type. This relaxes the incentive constraint of type θt−1 + ∆

and allows better insurance in period t− 1. The larger the difference is in the

likelihood of realization of θt, between a deviating agent and a truth-telling

one, f2,t/ft, the lower the weight α (θt) the planner assigns to such realizations

of θt.

We conclude this section with a general result about optimality of savings

distortions.

Proposition 1. Suppose that assumptions 1 and 2 hold. Suppose that FT (0|θ) =

1 for all θ. Then τ yt(θt)≥ 0 implies τ st

(θt)≥ 0 with strict inequality if vari-

ance of consumption in period T conditional on information in θt is positive,

vart (cT ) > 0.

When preferences are separable, it is well known (see, e.g., Golosov, Kocher-

lakota and Tsyvinski (2003)) that it is optimal to have a positive savings dis-

tortion in all periods. This proposition provides a version of that result for

14

all preferences with weak complementarity between consumption and labor.

Note that τ st(θt)> 0 implies that some savings distortions following history

θt must be strictly positive. By the law of iterated expectations

1

1− τ st= Et

1

1− τ st× ...× 1

1− τ sT−1

,

therefore, τ st > 0 if there is a positive saving distortion in at least some states

in the future.

2.1 Log-normal shocks with separable preferences

This section makes two simplifying assumptions —that idiosyncratic shocks are

log-normal, and that preferences are separable. We relax each of these assump-

tions in the sections that follow. The analysis there shows that the economic

forces that can be transparently illustrated for log-normal shocks continue to

hold for richer and more realistic stochastic processes. Additionally, comple-

mentarity between consumption and labor introduces an addition element that

may substantially change the qualitative properties of labor distortions.

In this section, we assume that θ0 is drawn from an arbitrary distribution

F0 and in all subsequent periods the law of motion for θt is given by

ln θt = ρ ln θt−1 + εt, (19)

where εt ∼ N(0, σ2

).6 The density function ft (θ|θt−1) implied by this distri-

bution satisfiesθt−1

θ

1

ft (θ|θt−1)

∫ ∞θt

f2,t (x|θt−1) dx = ρ. (20)

First, we consider separable preferences of the form

U (c, l) =c1−σ − 1

1− σ −(

1 +1

ε

)−1

l1+1/ε. (21)

6The analysis extends directly when ρ and σ depend on age t.

15

This functional form fixes important parameters, the Frisch elasticity and the

coeffi cient of complementarity, at constant levels of ε and 0 respectively. Sub-

stituting these values together with (20) into the expression for labor wedge

(17) we obtain

τ yt(θ|θt−1

)1− τ yt

(θ|θt−1

) =

(1 +

1

ε

)1− F (θ|θt−1)

θf (θ|θt−1)

∫ ∞θ

Uc (θ)

Uc (x)(1− λ1,tUc (x))

f (x|θt−1) dx

1− F (θ|θt−1)

(22)

+ ρβ

δ

τ yt−1

(θt−1

)1− τ yt−1

(θt−1

) Uc (θ)

Uc(θt−1

) ,where λ1,t =

∫∞0

[Uc (x)]−1 f (x|·) dx.

The intertemporal component of the labor wedge significantly simplifies

with log-normal shocks. The likelihood ratio that captures relative probabil-

ities of realizations of a given shock between a truth-teller and a deviator is

simply ρ. In addition, the intertemporal term depends on the ratio of marginal

utilities, Uc,t/Uc,t−1, because it is easier to provide incentives to the agent if he

is relatively poor, so that his marginal utility of consumption is high.

Expression (22) allows us to obtain a tight characterization of both the

intertemporal and intratemporal components of labor distortions and shed

light on their qualitative and quantitative properties. Before doing that, we

discuss a particularly simple and frequently used case in which preferences are

quasi-linear (σ = 0) and β = δ. In this case, (22) yields a closed form solution

for the optimal labor distortion for all histories of shocks:

τ y0 (θ)

1− τ y0 (θ)=

(1 +

1

ε

)1− F0 (θ)

θf0 (θ)

∫ ∞θ

(1− α (x))f0 (x) dx

1− F0 (θ),

τ yt(θ|θt−1

)1− τ yt

(θ|θt−1

) = ρτ yt−1

(θt−1

)1− τ yt−1

(θt−1

) for all t > 0.

The expression for the optimal labor distortion in period 0 is identical

to the one obtained by Diamond (1998) for the static model. All his argu-

16

ments about qualitative properties of the optimal labor distortions carry over

unchanged into our setting, with the only caveat that they apply to the dis-

tribution of the initial shocks F0 rather than the cross-sectional distribution

of earnings. Agent’s labor distortion in period t ≥ 1 is simply his period 0

distortion multiplied by ρt. This simple form is driven by two forces. First,

since agents are risk-neutral, they require no insurance against idiosyncratic

shocks, and the intratemporal component of labor wedge is zero. Second, since

their marginal utility is always constant, the marginal cost of providing incent-

ives for all agents is independent of their promised utility, which simplifies the

intertemporal component.

We now turn to characterization of labor distortions when σ > 0. Al-

though closed form expressions cannot be obtained for the whole schedule of

distortions, we provide a tight characterization of distortions for low and high

realizations of θ. This characterization shows distinct roles played by the in-

tertemporal and intratemporal forces in shaping the optimal labor distortion.

2.1.1 Labor distortions for low realizations of shocks

We start by characterizing labor distortions for low realization of shocks.

Proposition 2. Suppose that the idiosyncratic shocks are log-normal and U

satisfies (21). Then the intratemporal component converges to zero as θ → 0

and

limθ→0

τ yt(θ|θt−1

)1− τ yt

(θ|θt−1

) =β

δρ

τ yt−1

(θt−1

)1− τ yt−1

(θt−1

) ( ct (0|θt−1)

ct−1

(θt−1

))σ

.

This proposition shows that the labor distortion for low realizations of

shocks is primarily driven by the intertemporal component. The intertem-

poral component depends mainly on the persistence of the shock process and

past distortions, which are determined by the redistributive objectives of the

17

social planner and the history of idiosyncratic shocks accumulated by the agent

before period t. These observations also help explain the comparative statics

later in this section, for instance, with respect to the redistributive objective.

Before that, we show next that a different force shapes the labor distortion for

the high realizations of θt.

2.1.2 Labor distortions for high realizations of shocks

To characterize the labor distortion for high θ, we use the fact that τ yt =

1 + Ul,t/θtUc,t to write (18) as

τ yt(θ|θt−1

)l1/ε(θ|θt−1

) =

(1 +

1

ε

)1

(θ)2 f (θ|θt−1)

∫ ∞θ

(1

Uc (x)− λ1,t + λ2,t

f2 (x|θt−1)

f (x|θt−1)

)f (x|θt−1) dx.

(23)

To study the asymptotic behavior of this equation, it is useful to re-normalize it

so that the left hand side converges to a constant. A natural conjecture is that

as θ →∞, τ yt converges to a constant less than 1 and current period consump-

tion becomes a constant fraction of the after tax labor income, (1− τ yt ) θtlt.7

When this is true, ct is proportional to(θtl−1/εt

)1/σ

(because τ yt converges to a

constant) and to (1− τ yt ) θtlt for high θt. These two conditions imply that l1/ε

must be proportional to θκ where κ = (1− σ) / (1 + σε) < 1. Let s = l1/ε/θκ

and v = τ y/s. Multiply both sides of (23) by θκ and differentiate to obtain

7Although we do not have general characterization of conditions under which this is

true, it holds in our numerical simulations and this result provides an excellent insight into

the shape of the labor distortiond that emerge in our quantitative analysis. It can also be

proven formally for some special cases of our economy along the lines of Mirrlees (1971).

18

θdv

dθ= −

(1 +

1

ε

)(1

s− v − 1

θ1−κλ1,t

)− v

(2− κ+

θf ′

f

)︸ ︷︷ ︸

intratemporal component

(24)

−(

1 +1

ε

)λ2,t

1

θ1−κf2

f︸ ︷︷ ︸intertemporal component

,

where f ′ is a derivative of f (θ|θt−1) with respect to θ.

This equation allows one to study the effects of intra- and intertemporal

components on labor distortions for high realizations of shocks. When shocks

are log-normal, the intertemporal component converges to zero at the rate of1

θ1−κt

(see Lemma 2 in the online appendix for details). Thus, this term quickly

drops out from the law of motion for v.

The intratemporal component is much more slowly moving. Log-normality

implies that θf ′/f = − (ln θ − ρ ln θt−1) /σ2− 1.When θ is high and v is close

to its asymptotic values, dvd ln θ≈ 0, equation (24) implies that(

1 +1

ε

)(1

s− v)≈ −v

(1− κ−

(ln θ − ρ ln θt−1

σ2

)).

Since τ y = vs, it follows that

τ y (θ) ≈ (ε+ 1)[1 + εκ+

(ln θ−ρ ln θt−1

σ2

)ε] (25)

and

τ y (θ) ln θ →(

1 +1

ε

)σ2. (26)

This provides several insights. First, τ y converges to zero but at the rate of

ln θ. Since the rate of convergence of ln θ is very slow, in quantitative analyses

it will appear virtually flat when plotted against high values of θ. Second,

since the intertemporal component converges to zero at a much faster rate, the

19

50 100 150 2000

0.2

0.4

0.6

0.8

1

θ0

τy

C: Labor distortions in period 0

τy  staticτ0

y baseline

20 40 60 80 1000

0.2

0.4

0.6

0.8

1

θ1

τ 1yA: Labor distortions in period 1

τy1(θ1|θL)

τy1(θ

1|θ

H)

20 40 60 80 1000

0.2

0.4

0.6

0.8

1

θ1

τ 1y

B: Effects of variance of θ

σθ=0.55

σθ=0.70

σθ=0.85

Figure 1: Labor distortions with log-normal shocks.

asymptotic properties of labor distortions in period t depend only on the Frisch

elasticity and variance of idiosyncratic shocks in period t and do not depend

on redistributive objectives or on past distortions. This explains, for example,

why Pareto weights affect the size of the distortion only for low realizations

of θt, but not for high realizations, in the quantitative comparative statics

exercises in Section 2.3. Third, equation (25) demonstrates that the types

who had higher realization of θt−1 face higher labor distortion than the types

who had lower realizations of θt−1 for the same (high) shock θt.

2.1.3 Numerical illustration

We illustrate our results for log-normal shocks in Figure 1.8 For this example

we chose a two period economy and set σ = 1 and ε = 0.5. We set persistence

parameter of the stochastic process to ρ = 1 and standard deviation σθ = 0.55.

For the initial distribution of shocks we use a distribution calibrated to match

the one used by Saez (2001) to have an easy comparison with the static model.

We set α (θ) = 1 for all θ.

8The computational strategy for the numerical illustrations in this section is a special

case of the approach to the full life cycle model we discuss in Section 3.

20

Panel A plots labor distortions in period 1 for two different realization of

θ in period 0, θH > θL. As our discussion of equation (25) explains, τy1 (·|θH)

is generally greater than τ y1 (·|θL) , and the difference between the two lines

shrinks for high θ. Panel B shows comparative statics of τ y1 (·|θH) for different

values of σθ. Following equations (25) and (26), higher variance of shocks leads

to higher distortions. Finally, Panel C shows the optimal labor distortions in

period 0 and compares them to the distortions that are obtained in the static

model. The two distortions are very similar. As we discuss in more details in

Section 2.3 and in Proposition 3, this result should generally be expected for

separable, isoelastic preferences.

An important feature of labor distortions with log-normal shocks is that

they appear essentially flat. We showed above that the intertemporal compon-

ent of the distortion starts at a positive value and decreases to zero relatively

rapidly, while the intratemporal component starts at zero, first increases and

then decreases very slowly. With log-normal shocks the decrease of the inter-

temporal component is being largely offset by the increase of the intratemporal

one, which leads to distortions that are essentially flat. This is one of the reas-

ons why the optimal linear distortions can approximate the optimum very well,

for example, in Farhi and Werning (2013).

2.2 Non-separable preferences

Previous section focused on separable, isoelastic preferences. That analysis

easily extends to other separable utility functions although some of the para-

meters will be endogenous. For example, when preferences take a form U (c, l) =

ln c + ln (1− l) , expression (26) still holds, but the Frisch elasticity of labor

supply depends on l and is equal to εθ = 1/ (1− l (θ)) .

21

The analysis may change significantly when consumption and labor are

complements. In this section, we focus on preferences with no income effect

popularized by Greenwood, Hercowitz and Huffman (1988), to which we refer

as GHH preferences:

U

(c−

(1 +

1

ε

)−1

l1+1/ε

). (27)

When U satisfies (27), 1 + 1εθ− γθ = 1 + 1

εand the expression (17) with

log-normal shocks becomes

τ yt(θ|θt−1

)1− τ yt

(θ|θt−1

) =

(1 +

1

ε

)1− F (θ|θt−1)

θf (θ|θt−1)(28)

×∫ ∞θ

exp

(∫ x

θ

βU ′′

U ′ω1 (x|x) dx

)(1− λ1,tU

′ (x))f (x|θt−1) dx

1− F (θ|θt−1)

+ρβ

δ

τ yt−1

(θt−1

)1− τ yt−1

(θt−1

) Uc (θ)

Uc(θt−1

) ∫∞θ exp(−∫ xθγx

dxx

)f2,t (x|θt−1) dx∫∞

θf2,t (x|θt−1) dx

,

where ω1 (θ|θ) = w (θ)− w2 (θ) .

Both intra- and intertemporal components of the labor distortion have

additional terms not present in the separable case. The intertemporal term

is multiplied by∫∞θ exp(−

∫ xθ γx

dxx )f2,t(x)dx∫∞

θ f2,t(x)dx. If θt > ρ ln θt−1, then f2 > 0 and

this term is less than one. The intratemporal component has an additional

term exp(∫ x

θβ U

′′

U ′ ω1 (x|x) dx). If Assumption 3 is satisfied, ω1 ≥ 0 and this

term is also less than one. Thus, non-separability introduces an additional

force that calls for lower distortions in dynamic economies, especially for high

types. The intuition for it is as follows. In a static economy the only way to

extract resources from high types is by taking a fraction of their labor income.

This distortion reduces agents’labor supply and, because of complementarily,

further increase their marginal utility of consumption, offsetting some gains. In

dynamic settings this can be avoided by extracting resources in future periods.

22

We illustrate this point in a simple two period economy when agents retire

in the second period. The advantage of this example is a simple closed form

expression for value function in period 1, V1 (w) , and in particular we know

that it is increasing in w. The arguments go through more generally for all

Vt (w,w2, θ) that have that property and we illustrate that in a stochastic

example that follows. We contrast the optimal distortions in GHH case with

those obtained with separable preferences.

Proposition 3. Suppose that T = 1 and FT (0|θ) = 1 for all θ.

1. Suppose U is of isoelastic form (21). Then τ y0 (θ) = τ ystatic (θ) where

τ ystatic (θ) is the optimal labor distortion in a static economy with the

same F0 and α.

2. Suppose U is of GHH form (27). If −U ′′/U ′ is bounded from below and

τ y0 is bounded away from 1, then limθ→∞ τy0 (θ) = 0.

The proof of this proposition is in the online appendix. In the proof we

also give suffi cient conditions on the primitives that ensure that τ y0 is bounded

away from 1 in part 2 of this Proposition.

This proposition highlights important differences in the intertemporal pro-

vision of incentives with and without complementaries. With separable, isoelastic

preferences the optimal labor distortion in period 0 is not affected by retire-

ment period and coincides with static optimal distortions. With non-separable

GHH preferences (at least when −U ′′/U ′ is bounded) it is optimal not to dis-

tort labor supply of high types, and, unlike in a static economy, this result does

not depend on the distribution F0. This stark result follows from our discus-

sion of the role of complementarities in equation (28). With GHH preferences

γ = −U ′′/U ′l1+1/ε and this term goes to infinity for high types when −U ′′/U ′

23

50 100 150 2000

0.2

0.4

0.6

0.8

1

θ0

τ 0yA: Labor distortions in period 0

CESLog GHHExp GHH

20 40 60 80 1000

0.2

0.4

0.6

0.8

1

θ1

τ 1y

C: Labor distortions in period 1

CESLog GHHExp GHH

50 100 150 2000

0.2

0.4

0.6

0.8

1

θ0

τ 0s

B: Savings distortions

CESLog GHHExp GHH

Figure 2: Labor and savings distortions with and without consumption-labor

complementarity.

is bounded. This case provides a sharp illustration of how complementarity

may lead to lower labor distortions for the most productive agents in dynamic

models.

The insight of Proposition 3 continues to hold if there is uncertainty in

period 1. Figure 2 takes the same two period economy with uncertainty and

compares three types of preferences: separable constant elasticity preferences

used in Figure 1, exponential GHH preferences − exp(−c+ ε

ε+1l(ε+1)/ε

), and

logarithmic GHH preferences ln(c− ε

ε+1l(ε+1)/ε

). A shift from labor to sav-

ings distortion for high types is clearly seen form these figures. The labor

distortions reach a peak and then decrease (eventually to zero) in period 0 for

both exponential and logarithmic GHH preferences, as seen on Panel A. The

peak for labor distortions coincides with a point at which savings distortions

increase sharply, as shown on panel B. This shift in the behavior of distortions

does not occur in period 1, when it is impossible to use dynamic incentives,

nor does it occur in the separable case, when complementarity does not exist.

24

2.3 Other stochastic processes for θ

The assumption that idiosyncratic shocks are log-normally distributed is com-

mon in the applied work (see, e.g. Storesletten, Telmer and Yaron (2004)).

Persistence and volatility of such process are relatively easy to estimate with

readily available panel data sets such as PSID. More recent work, that uses

newly available high quality data from the U.S. Social Security Administration

(SSA), shows that idiosyncratic shocks have much larger kurtosis and lower

skewness than allowed by log-normal distributions and identifies stochastic

processes that better capture idiosyncratic shocks in the data. In this section,

we discuss the implications of such processes. Throughout this section we

assume separable isoelastic preferences (21).

Guvenen, Ozkan and Song (2013) and Guvenen et al. (2013) use SSA ad-

ministrative data to show that earnings in the data have negative skewness

and a much higher kurtosis than the kurtosis of the normal distribution. These

papers show that the shock process in the data is better approximated by a

mixture of log-normal distributions, when with a small probability a shock

is drawn from distributions with high variance or negative mean. Next, we

analyze a general stochastic process for θ given by (19) where ε is drawn with

probability pi from a normal distribution N(µi, σ

2i

)with some mean µi and

variance σ2i for i = 1, ..., I. In our quantitative analysis in Section 3, we discuss

how to choose{pi, µi, σ

2i

}Ii=1

to be consistent with empirical evidence.

Most of the analysis of Section 2.1 carries through with only small modi-

fications. In particular, the likelihood ratio in the intertemporal component of

labor distortion still takes a simple form (20) and therefore the optimal labor

distortion satisfies equation (22). Proposition 2, which shows that the labor

distortions for low shocks are determined by the intertemporal component,

25

also extends to this case. The proof of this Proposition used the fact that for

log-normal density function limθ→0 θf′/f = ∞. The same property continues

to hold for a mixture of log-normals9 and hence Proposition 2 and discussion

in Section 2.1.1 are valid for the more general shock process.

The differential equation (24) also holds with higher moment shocks and

it implies that the intertemporal component vanishes quickly for high θ with

rate 1/θ1−κ. In parallel with the discussion in Section 2.1.2, the importance

of intertemporal component vanishes quickly for high realizations of θ. The

limiting behavior of intratemporal component is determined by θf ′/f.

One of the main differences that higher moment shocks introduce is in the

determination of the size of the labor distortions for high θ. When f is drawn

from a mixture of log-normals,

θf ′ (θ|θt−1)

f (θ|θt−1)=−∑I

i=1piσi

exp(− (ln θt−ρ ln θt−1−µi)2

2σ2i

) [(ln θt−ρ ln θt−1−µi)

σ2i

+ 1]

∑Ii=1

piσi

exp(− (ln θt−ρ ln θt−1−µi)2

2σ2i

) .

Suppose without loss of generality that σ1 ≥ σi for all i. Divide both sides of

this expression by exp(− (ln θt−ρ ln θt−1−µ1)2

2σ21

)and re-group to show that

θf ′ (θ|θt−1)

f (θ|θt−1)= −

((ln θ − ρ ln θt−1 − µ1)

σ21

+ 1

)X (θ) ,

where the term X (θ) is expressed in terms of the primitives of the stochastic

process and converges to 1 as θ → ∞ (see the online appendix). Therefore

expressions (25) and (26) hold, but variance σ2 is replaced with σ2i where i

is the component of f with the highest variance. In Guvenen et al. (2013)

approximation, small probability draws from high variance distribution are

used to match high kurtosis of idiosyncratic shocks. Therefore the size of the

9See Lemma 2 in the online appendix for formal proofs of the limiting properties of our

higher moment distribution.

26

20 40 60 80 1000

0.2

0.4

0.6

0.8

1

θ1

τ 1yA: Higher moments and τy

1

kurtosis=20kurtosis=12log­normal

20 40 60 80 1000

0.2

0.4

0.6

0.8

1

θ0

τ 0s

B: Higher moments and τs0

kurtosis=20kurtosis=12log­normal

20 40 60 80 1000

0.2

0.4

0.6

0.8

1

θ1

τ 1y

C: Redistribution and τy1

a=0a=5a=10

Figure 3: Labor and savings distortions with higher moments shocks.

kurtosis of the shock, together with the elasticity of labor supply, determine

the size of the labor distortion for high θ.

Figure 3 shows the effect of richer shock processes on labor and savings

distortions. For these simulations we took the same two period economy from

Section 2.1.3 and introduced higher moment shocks. The solid lines on panels

A and B correspond to a distribution with kurtosis of 20 and Kelly’s skewness

of -0.14. The dashed line corresponds to the same skewness but kurtosis of 12.10

Dotted line is for the log-normal distribution, which, as all normal distributions

do, has kurtosis of 3 and skewness of 0. All three distributions have mean 0

and standard deviation 0.55.

Panel A plots labor distortions in period 1 for the same realization of θ0 in

all three cases. Relative to labor distortions in the log-normal case, which are

approximately flat as discussed in Figure 1, there is a pronounced U-shaped

"dip" for shocks with higher kurtosis. The dip is centered around expected

value of θ1 conditional on realized θ0. To understand the reason for the dip,

10These are the approximate values for idiosyncratic shocks for prime age and working

age males in the data, see Guvenen et al. (2013) and our quantitative section 3 for more

details.

27

note from formula (22) that the labor distortion for type θ depends on the tail

ratio term (1− F (θ|θ0)) /θf (θ|θ0) . High kurtosis distributions have higher

mass around E [θ|θ0] , which lowers this tail ratio. At the same time the labor

distortions for high realizations of θ1 are higher than for log-normal which is

driven by the fact that the size of those distortions are driven by the magnitude

of the kurtosis as discussed above.

Panel B compares savings distortions τ s0 (θ0) . Even though the first and the

second moments are the same in all three distributions, large kurtosis leads to

smaller distortions. More disperse distribution of shocks makes provision of

insurance more important, which lowers the savings wedge implied by Jensen’s

inequality and the Inverse Euler Equation.

Panel C plots labor wedges in period 1 for different redistributive object-

ives. We assume that Pareto weights are α (θ) ∝ θ−a and plot τ y1 (·|θ0) given

the same θ0 but different values of a. The more redistributive (higher a) the

social planner is, the higher the labor distortions is for low realizations of θ,

but not for the high ones. This result can be understood from our discussion of

the relative importance of intra- and inter-temporal components for different

shocks. Labor distortions for low shocks are driven by the intertemporal com-

ponents, which carries over the distortions inherited from previous periods.

More redistributive planner sets higher labor distortions in period 0, which

leads to higher distortions for low realizations of θ1. On the other hand, the

labor distortions for high realizations of θ1 are driven by the intratemporal

component, which depends only on kurtosis and the elasticity of labor supply

in the right tail. One of the implications of this observation is that the size

and shape of labor distortions for high θ are independent of the redistributive

objectives of the government; they are driven purely by the insurance needs

of the agent.

28

While a mixture of log-normal distributions captures well idiosyncratic

shocks experienced by individuals over lifetime, it fails to produce fat-tailed

cross-sectional distribution observed in the data. To capture such tails, one

can assume that the initial types are drawn from a distribution F0 which has a

Pareto tail with some coeffi cient a. This assumption introduces only minimal

changes to our discussion of labor distortions for high shocks in Section 2.1.2.

Suppose that α (x) is non-increasing. The differential equation (24) still holds

but Pareto tails imply that θ0f′0/f0 → −a− 1. Taking the limits, the optimal

labor distortion satisfies

limθ→∞

τ y (θ) =1 + ε

1 + ε (κ+ a)=

11

1+εσ+ a ε

1+ε

. (29)

When shocks have Pareto tails, labor distortions converge to a positive number

which depends only on Frisch elasticity ε and σ as well as the thickness of the

tail a.

This formula is closely related to the closed form expression for optimal

taxes obtained by Saez (2001) in static models. Saez (2001, equation (9))

showed that if empirical cross-sectional distribution of earning θl is Pareto with

tail a, and ζu and ζc are uncompensated and compensated elasticities of labor

supply, then τ y → 1/ (1 + ζu + ζc (a− 1)) as θ → ∞. Since the relationship

between the tail of θ, a, and the tail of θl, a, is given by a = (1 + ζu) a,11 this

can be re-written as

τ → 1

1 + ζu − ζc + aζc

1+ζu

. (30)

When preferences take the form (21), ζu → ε 1−σ1+εσ

and ζc → ε1+εσ

as θ → ∞.

Substituting that into (30) and re-arranging we obtain (29).

11This follows from Lemma 1 in Saez (2001) that shows that dyy = (1 + ζu) dθθ .

29

3 Quantitative analysis

We now turn to the quantitative study of a calibrated life cycle model. We

chose a 50 period economy in which agents work for the first 40 years and then

retire for the last 10 years. Agents’utility function is

ln c−(

1 +1

ε

)−1

l1+1/ε

with ε = 0.5. We set β = δ = 0.95 and chose utilitarian Pareto weights.

Stochastic process for skills. Our theoretical analysis unambiguously points

to the stochastic process for idiosyncratic shocks, and especially its higher mo-

ments, as a crucial input for a quantitative analysis. We rely on the findings

of the recent work by Guvenen, Ozkan and Song (2013) and Guvenen et al.

(2013), who use a newly available high quality administrative data from the

U.S. Social Security Administration based on a random sample of 10% U.S.

taxpayers. For our purposes their approach has significant advantages over

using easily accessible panel data sets such as U.S. Panel Study of Income

Dynamics (PSID). The small sample size and top coding in those data sets

do not allow to estimate higher moments well and, consequently, researchers

often assume a log-normal shock process (e.g. Storesletten, Telmer and Yaron

(2004)).12

Guvenen et al. (2013) find that the persistence of annual log earning for

working age males is close to one and its standard deviation is about 0.55.

12In the previous version of the paper (Golosov, Troshkin and Tsyvinski (2011)) we used

PSID together with the TAXSIM’s calculations of individuals’effective marginal tax rates

to estimate non-parametrically stochastic process for θt. Our estimated stochastic process

was close to log-normal but a small number of observations in the tails and top coding made

a good estimation of higher moments impossible.

30

Higher moments are significantly different from those implies by a log-normal

distribution. In particular, the kurtosis of the shocks to log earning for prime

age males (35 to 55 years old) is about 20, while kurtosis for all working

age makes (25 to 60 years old) is about 12. Kelly’s skewness, defined as(P90−P50)−(P50−P10)

(P90−P10), where Pz is the zth percentile growth rate, is about −0.14

for both prime age and working age males. Guvenen, Ozkan and Song (2013)

and Guvenen et al. (2013) also show that the empirical shock process can

be approximated well by a mixture of three log-normal distributions, shocks

from two of which are drawn with low probabilities. The high probability

distribution controls the variance of the shocks while the two low probability

distributions control its kurtosis and skewness.

Guvenen et al. (2013) report these statistics for the stochastic process for

earnings, not skills θ or wages. In principle, one can structurally estimate θ

by using observations for earnings and taxes. This would require access to the

restricted SSA data and would be far beyond the scope of this paper. Instead

we chose a simpler route. Our preferences imply that log earning of individuals

who have a small amount of assets and transfers should follow approximately

the same process as log θ. Thus we chose stochastic process for θ to match

the moments reported by Guvenen et al. (2013). We believe the benefits of

transparency of this approach outweigh possible costs.

We assume that the stochastic process for θt follows

ln θt = ln θt−1 + εt,

where

εt =

ε1,t ∼ N (µ1, σ1) w.p. p1

ε2,t ∼ N (µ2, σ2) w.p. p2

ε3,t ∼ N (µ3, σ3) w.p. p3

31

Table 1: Parametrization of the stochastic process

µ1 µ2 µ3 σ1 σ2 σ3 p1 p2 p3

0.05 0 -0.4 0.19 1.6 0.16 0.8 0.1 0.1

and {µi, σi, pi}3i=1 are chosen to match the annual values of standard devi-

ation of 0.55, kurtosis of 20 and Kelly’s skewness of −0.14 for εt. The exact

parameters are given in Table 1.

Finally, we need to choose distribution F0 (θ) from which agents draw their

shocks in period 0. Ideally, one would infer it from observations of wages or

earnings early in life, e.g., for the 25 to 30 year old. Such data are available

but access to it is restricted. Instead, we use F0 (θ) to match cross-sectional

distribution reported by Saez (2001).13 This makes our distortions in period

0 directly comparable to his. As we discussed in the theoretical part of the

paper, this approach overpredicts the true size of initial heterogeneity and

overestimates the size of the optimal period 0 labor distortions, as well as

labor distortions for low realizations of θt in subsequent periods. Initial dis-

tribution, however, does not affect the shape or the size of the distortions for

high realizations of θt.

Computational strategy. We rely on the recursive structure of the dual

formulation of the planner’s problem, that we explored in the theoretical ana-

lysis, to solve numerically the problem of this size and complexity (i.e., with

multitude of periods and correlated shocks). We provide a summary of our

13Analogous to the calibration procedure for the stochastic process for the shocks above,

we mix a log-normal distribution with a Pareto tail to match Figure 4 in Saez (2001). We

set Pareto tail parameter a = 2 and start the tail at θ corresponding to income of $150,000

per year following Figure 2 and its discussion in Saez (2001).

32

computational approach here while the online appendix contains additional

details.

Our main problem is a finite-horizon discrete-time dynamic programming

problem with a three-dimensional continuous state space. First, we implement

a value function iteration algorithm. We start from period T−1 and proceed by

backward induction. The last working period, T−1, incorporates present value

of resources needed to provide promised utility over the remaining T − T + 1

retirement periods. Before proceeding to a previous period, we approximate

value functions with tensor products of orthogonal polynomials evaluated at

their root nodes. To solve each node’s minimization sub-problem effi ciently, we

use an implementation of interior-point algorithm, with a trust-region method

to solve barrier problems and an l1 barrier penalty function. We verify that

increasing properties in Assumption 3 are satisfied numerically. Assumptions

1 and 2 are satisfied trivially for the preferences and parameter values we chose

above. Next, we compute w0 such that V0 (w0) = 0. Given continuously differ-

entiable approximation of V0, we solve for w0 by binary jumps and bisection.

We compute optimal allocations reported below by forward induction starting

with the policies given by V0 (w0) = 0. Optimal labor and savings distortions

are then computed from the policy functions using definitions (14) and (15).

Results. Figure 4 shows the main findings of this section. Each solid or

dashed line in the top row depicts τ yt(θt|θ, ..., θ

)for a given t and θ. The three

columns ($30K, $60K and $120K) correspond to the optimal distortions in

period t after repeatedly drawing the same value of θ for t − 1 periods. The

lines on the bottom row show τ st(θ|θ, ..., θ

)for the same values of θ. We plot

both savings and labor distortions against yt = θtlt.

Quantitative and qualitative insights of the theoretical analysis can be

33

100 200 300 4000

0.2

0.4

0.6

0.8

1τ ty   a

nd  f

(yt|y

t­1)

$30K history

100 200 300 4000

0.1

0.2

0.3

0.4

0.5

τ ts

100 200 300 4000

0.2

0.4

0.6

0.8

1$60K history

100 200 300 4000

0.1

0.2

0.3

0.4

0.5

yt ($1,000s)

100 200 300 4000

0.2

0.4

0.6

0.8

1$120K history

100 200 300 4000

0.1

0.2

0.3

0.4

0.5f(y

t|y

t­1)

t=0t=1t=10t=20t=30t=40

Figure 4: Labor and savings distortions for different realizations of histories.

easily seen on these figures. Labor distortions are approximately U-shaped

and centered around expected realizations of shocks. The left tail of these

distortions depends on the past distortions and age while the right tail does

not. Agents, who had higher realizations of θt−1, face higher labor distortions

in period t. All of these insights follow from our previous discussion of the

roles of intertemporal and intratemporal components of wedges for different

realizations of shocks and the roles that higher moments play. Period 0 labor

distortions are similar to those in the two-period simulations in Figure 1 as

well as those in Saez (2001), which is not surprising given our choice of F0 and

the analysis in preceding sections.

An additional insight that emerges from these figures is that the left tail

of labor distortions is generally increasing in age while saving distortions de-

34

crease.14 The observation is driven by the fact that variance of consumption

decreases as retirement approaches, the forces discovered by Farhi andWerning

(2013) when the shocks are log-normal. Our figures show that the increase in

the optimal labor distortions is asymmetric. Only the left tail of the distortions

increases, the effect that can be seen from Proposition 2, since lower variance of

consumption also means that consumption growth term ct(0|θt−1

)/ct−1

(θt−1

)is bigger. The right tail of the distortions does not change with age since it

is pinned down by kurtosis and the elasticity of labor supply and does not

depend on other parameters, such as age.

4 Conclusion

In this paper we take a step toward characterization of optimal labor and

savings distortions in a life cycle model. These distortions are driven by an

interplay of redistributive objectives and the need to provide insurance against

idiosyncratic shocks. We show how the size of the distortions depends on the

parameters that can be measured directly in the data. Our analysis unambigu-

ously points to the importance of higher moments of the idiosyncratic shock

process that the individuals face, and in particular its kurtosis.

For our life cycle calibration we chose what we viewed to be the simplest

and most transparent strategy. The estimation of the underlying stochastic

processes can be further refined to produce better estimates of the distortions.

We also did not discuss the role of heterogeneity in shock process among rich

14The fact that savings distortions decrease with age is sensitive to the assumption that

there is no complementarity between consumption and labor. When labor and consumption

are complements, the savings distortion may increase with age, as with, for example, GHH

preferences (see Golosov, Troshkin and Tsyvinski (2011)).

35

and poor agents. Such heterogeneity is present in the data (see Guvenen,

Ozkan and Song (2013)) and we leave the investigation of its role to future

work.

Our analysis focuses on the distortions in fully optimal allocations which

are restricted only by informational constraint. Implementations of these al-

locations may require complex history dependence in a tax code. We view

our approach as complementary to, e.g., the one taken by Conesa, Kitao and

Krueger (2009). They study tax reforms and the optimal taxes within a set of

the parametrically restricted tax functions. One advantage of that approach

over solving for the full informationally constrained optimum is that it focuses

on simpler taxes. Our paper points out the elements that may be important

in choosing the parameters of such tax functions.

References

Abraham, Arpad, and Nicola Pavoni. 2008. “Effi cient Allocations with

Moral Hazard and Hidden Borrowing and Lending: A Recursive Formula-

tion.”Review of Economic Dynamics, 11(4): 781—803.

Albanesi, Stefania, and Christopher Sleet. 2006. “Dynamic Optimal

Taxation with Private Information.”Review of Economic Studies, 73(1): 1—

30.

Ales, Laurence, and Pricila Maziero. 2007. “Accounting for Private In-

formation.”working paper.

Browning, Martin, Lars Peter Hansen, and James J. Heckman. 1999.

“Micro Data and General Equilibrium Models.”In Handbook of Macroeco-

36

nomics. Vol. 1 of Handbook of Macroeconomics, , ed. J. B. Taylor and M.

Woodford, Chapter 8, 543—633. Elsevier.

Conesa, Juan Carlos, Sagiri Kitao, and Dirk Krueger. 2009. “Taxing

Capital? Not a Bad Idea After All!”American Economic Review, 99(1): 25—

48.

Diamond, Peter. 1998. “Optimal Income Taxation: An Example with a

U-Shaped Pattern of Optimal Marginal Tax Rates.”American Economic

Review, 88(1): 83—95.

Farhi, Emmanuel, and IvánWerning. 2013. “Insurance and Taxation over

the Life Cycle.”Review of Economic Studies, 80(2): 596—635.

Fernandes, Ana, and Christopher Phelan. 2000. “A Recursive Formula-

tion for Repeated Agency with History Dependence.”Journal of Economic

Theory, 91(2): 223—247.

Fukushima, Kenichi. 2010. “Quantifying the Welfare Gains from Flexible

Dynamic Income Tax Systems.”mimeo.

Golosov, Mikhail, Aleh Tsyvinski, and Iván Werning. 2006. “New

Dynamic Public Finance: A User’s Guide.”NBER Macroeconomics Annual,

21: 317—363.

Golosov, Mikhail, Aleh Tsyvinski, and Nicolas Werquin. 2013. “Dy-

namic Tax Reforms.”working paper.

Golosov, Mikhail, and Aleh Tsyvinski. 2006. “Designing Optimal Disab-

ility Insurance: A Case for Asset Testing.” Journal of Political Economy,

114(2): 257—279.

37

Golosov, Mikhail, Maxim Troshkin, and Aleh Tsyvinski. 2011. “Op-

timal Dynamic Taxes.”NBER Working Paper 17642.

Golosov, Mikhail, Narayana Kocherlakota, and Aleh Tsyvinski. 2003.

“Optimal Indirect and Capital Taxation.” Review of Economic Studies,

70(3): 569—587.

Greenwood, Jeremy, Zvi Hercowitz, and Gregory W Huffman. 1988.

“Investment, Capacity Utilization, and the Real Business Cycle.”American

Economic Review, 78(3): 402—17.

Grochulski, Borys, and Narayana Kocherlakota. 2010. “Nonseparable

Preferences and Optimal Social Security Systems.” Journal of Economic

Theory, 145: 2055—77.

Guvenen, Fatih, Fatih Karahan, Serdar Ozkan, and Jae Song. 2013.

“What Do Data onMillions of U.S. Workers Say About Labor Income Risk?”

Working paper.

Guvenen, Fatih, Serdar Ozkan, and Jae Song. 2013. “The Nature of

Countercyclical Income Risk.”Working paper.

Judd, Kenneth L. 1998. Numerical Methods in Economics. The MIT Press.

Judd, Kenneth L., and Che-Lin Su. 2006. “Optimal Income Taxation

with Multidimensional Taxpayer Types.”working paper.

Kapicka, Marek. 2013. “Effi cient Allocations in Dynamic Private Informa-

tion Economies with Persistent Shocks: A First-Order Approach.”Review

of Economic Studies, 80(3): 1027—1054.

38

Kocherlakota, Narayana. 2005. “Zero Expected Wealth Taxes: A Mirrlees

Approach to Dynamic Optimal Taxation.”Econometrica, 73(5): 1587—1621.

Kocherlakota, Narayana. 2010. The New Dynamic Public Finance. Prin-

ceton University Press, USA.

Mirrlees, James. 1971. “An Exploration in the Theory of Optimum Income

Taxation.”Review of Economic Studies, 38(2): 175—208.

Mirrlees, James. 1976. “Optimal Tax Theory: A Synthesis.” Journal of

Public Economics, 6(4): 327—358.

Pavan, Alessandro, Ilya Segal, and Juuso Toikka. 2010. “Dynamic

Mechanism Design: Incentive Compatibility, Profit Maximization and In-

formation Disclosure.”working paper.

Saez, Emmanuel. 2001. “Using Elasticities to Derive Optimal Income Tax

Rates.”Review of Economic Studies, 68(1): 205—229.

Storesletten, Kjetil, Christopher I. Telmer, and Amir Yaron. 2004.

“Cyclical Dynamics in Idiosyncratic Labor Market Risk.”Journal of Polit-

ical Economy, 112(3): 695—717.

Su, Che-Lin., and Kenneth L. Judd. 2007. “Computation of Moral-

Hazard Problems.”working paper.

Tuomala, Matti. 1990. Optimal Income Tax and Redistribution. Oxford Uni-

versity Press, USA.

Weinzierl, Matthew. 2011. “The Surprising Power of Age-Dependent

Taxes.”Review of Economic Studies, 78(4): 1490—1518.

39

Werning, Iván. 2009. “Nonlinear Capital Taxation.”MIT working paper.

Wilson, R. 1996. “Nonlinear Pricing and Mechanism Design.”Handbook of

Computational Economics, 1: 253—293.

40

A Online Appendix

A.1 Proof of Lemma 1

Note that given any solution u∗ (θ) following a sequence of reports(θt−1, θ

),

we can construct

ω(θ|θ)

=

∫ ∞0

u∗(θt−1, θ, s

)ft+1 (s|θ) ds.

We can re-write (6) as

maxθV(θ; θ)≡ max

θU(c(θ), y(θ); θ

)+ βω(θ|θ).

Since c (·) and ω (·|θ) are piecewise C1, they are differentiable except at a

finite number of points. Then for all θ where it is differentiable,

Uc (c(θ), y(θ); θ) c′ (θ) + Uy (c(θ), y(θ); θ) y′ (θ) + βω1(θ|θ) = 0, (31)

where c′ and y′ are derivatives of c and y. Optimality requires that y (·) and

V (·; θ) are piecewise C1 and if c (·) and ω (·|θ) are.

Suppose that the global incentive constraint is violated, i.e. V(θ; θ)−

V (θ; θ) > 0 for some θ. Suppose θ > θ is a point of differentiability. Then

0 <

∫ θ

θ

∂V (x; θ)

∂xdx

=

∫ θ

θ

[Uc (x; θ)

dc

dx+ Uy (x; θ)

dy

dx+ β

dω (x|θ)dx

]dx.

Since all the objects in the integral are piecewise differentiable, it can be

represented as a finite sum of the terms∫ θj+1

θj

Uc (x; θ)

[c′ (x) + y′ (x)

Uy (x; θ)

Uc (x; θ)+ β

ω1 (x|θ)Uc (x; θ)

]dx

for some finite number of intervals (θj, θj+1) .

i

If x > θ, Uy(x;θ)

Uc(x;θ)≤ Uy(x;x)

Uc(x;x)and Uc (x; θ) ≥ Uc (x;x) (from the single crossing

property and complementarity in Assumption 1 respectively) and ω1 (x|x) ≥

ω1 (x|θ) from Assumption 3. Therefore∫ θj+1

θj

Uc (x; θ)

[c′ (x) + y′ (x)

Uy (x; θ)

Uc (x; θ)+ β

ω1 (x|θ)Uc (x; θ)

]dx

≤∫ θj+1

θj

Uc (x; θ)

[c′ (x) + y′ (x)

Uy (x;x)

Uc (x;x)+ β

ω1 (x|x)

Uc (x;x)

]dx

= 0

where the last equality follows from (31). Therefore,∫ θθ∂V(x;θ)∂x

dx ≤ 0, a con-

tradiction. If θ < θ the arguments are analogous. Finally, since V(θ; θ)is

continuous in θ, taking limits establishes that V(θ; θ)≤ V (θ; θ) at the points

of non-differentiability.

A.2 Details of Section 2

We drop explicit time subscripts t whenever it does not lead to confusion. The

Hamiltonian to problem (7) is

H = (c− θl + δVt+1 (w,w2, θ)) ft + µ

[−Ul(c, l)

l

θ+ βw2

]−λ1u (θ) ft + λ2u (θ) f2,t + ϕ [u− U(c, l)− βw]

and the envelope conditions are

∂Vt∂w

= λ1,∂Vt∂w2

= −λ2. (32)

The first order conditions are

[u] : ϕ− λ1f + λ2f2 = −µ

[l] : −Ulϕ− θf = −1θµ[Ulll+UlUl

](−Ul)

[c] : f − µUcl lθ = ϕUc

ii

[w] : δ ∂Vt+1

∂wf = ϕβ

[w2] : δ ∂Vt+1

∂w2f = −µβ

Use the first order condition for c to substitute away for ϕ

[u] : 1Ucf − λ1f + λ2f2 − µ

θUcllUc

= −µ

[l] : −UlUcf − µUcll

Uc

(−Ul)θ− θf = −1

θµ[Ulll+UlUl

](−Ul)

[w] : δβ∂Vt+1

∂w= 1

Uc− µ

θfUcllUc

[w2] : δβ∂Vt+1

∂w2= −µ

f

Use definitions of εθ, γθto write the first order condition for l as(UlθUc

+ 1

)θf =

1

θµ

(1 +

1

εθ− γθ

)(−Ul)

Since τ y = 1 + UlθUc

this can be equivalently written as

τ y

1− τ y =µUcθf

(1 +

1

εθ− γθ

). (33)

This expression together with [w2] implies

λ2,t+1 = −∂Vt+1

∂w2

δ

τ yθt1− τ yθt

θtUc (θt)

(1 +

1

εθt− γθt

)−1

. (34)

To find µ we integrate [u]

µ (θ) =

∫ ∞θ

exp

(−∫ x

θ

γxdx

x

)(1

Uc (x)f (x)− λ1f (x) + λ2f2 (x)

)dx

From boundary condition µ (0) = 0 we get

λ1,t =

∫∞0

exp(−∫ x

0γx

dxx

) (1Ucft + λ2,tf2,t

)dx∫∞

0exp

(−∫ x

0γx

dxx

)ftdx

(35)

and λ2,t is given by (34).

Use the expression for µ (θ) and (34) for t− 1 to substitute into (33)

τ y (θ)

1− τ y (θ)=

(1 +

1

εθ− γθ

)1

θtft (θ)

∫ ∞θ

Uc (θt)

Uc (x)exp

(−∫ x

θ

γxdx

x

)(1− λ1,tUc (x)) ft (x) dx

δ

τ yθt−1

1− τ yθt−1

1 + 1εθ− γθ

1 + 1εθt−1

− γθt−1

Uc (θ)

Uc(θt−1

) θt−1

θft (θ)

∫ ∞θ

exp

(−∫ x

θ

γxdx

x

)f2,t (x) dx

iii

which is the same expression as (17). This proof with slight modifications also

yields (18).

Period 0 expression and static expression can be derived along the same

lines. In both cases λ2,0 = 0 and the condition for [u] is replaced by the

expression ϕ− λ1αf = −µ. The rest of the analysis is unchanged.

Finally, we prove Proposition 1. Note that we can express ϕ using the first

order conditions for w rather than c in which case the differential equation for

µ becomes δβ∂Vt+1

∂wf − λ1f + λ2f2 = −µ. Integrate this expression from 0 to

infinity, use the boundary conditions µ (0) = µ (∞) = 0 and∫∞

0f2dx = 0 to

obtain15

λ1,t =δ

β

∫ ∞0

∂Vt+1

∂w(x) f (x) dx.

Combine this expression with (32) to get ∂Vt∂w

= δβEt ∂Vt+1

∂wand by the law of

iterated expectations∂Vt∂w

=

β

)T−t−1

Et∂VT∂w

.

When FT (0|θ) = 1 for all θ, ∂VT∂w

(θT)

= 1

Uc(θT ), which, in combination with

the first order condition on [w] implies

1

Uc(θt) − µ

θfγθ =

β

)T−tEt

1

Uc(θT)

From Assumption 2 1 + 1εθ− γθ > 0, and then from equation (33) the sign

of µ is equal to the sign of τ y. Thus if τ y ≥ 0,

1

Uc(θt) ≥ ( δ

β

)T−tEt

1

Uc(θT) ≥ ( δ

β

)T−t1

EtUc(θT)

where the last expression follows from Jensen’s inequality. This expression

implies that τ st(θt)≥ 0. This inequality is strict if varθt (cT ) > 0.

15To see that∫∞0f2 (x|θ−) dx = 0 for all θ−, differentiate both sides of

∫∞0f (x|θ−) dx = 1

with respect to θ−

iv

A.3 Details for Section 2.1 and Section 2.3

In this part we provide details of the derivations used in Section 2.1 and Section

2.3. First we state some preliminary facts about distributions used in these

sections

A.3.1 Preliminaries: properties of distributions

In the paper we considered stochastic processes of the form

ln θt = ρ ln θt−1 + εt

where εt can take one of the three forms:

• log-normal distribution: εt is normally distributed with mean 0 and vari-

ance σ2.

• mixture of log-normals: εt is drawn with probability pi from a normal

distribution with mean µi and variance σ2i for i = 1, ..., I.

• fat-tailed distribution: εt has a tail with a pdf const · axa+1 for some a > 1.

Let f (θt|θt−1) be the pdf of θt. We will say that f is log-normal, mix-

ture of log-normals or Pareto if εt are drawn from one of the three respective

distributions mentioned above. Since time subscript are irrelevant, we will

use notation f (θ|θ−) for f (θt|θt−1) . Let f ′ (θ|θ−) be the derivative of f with

respect to θ and f2 (θ|θ−) be the derivative of f with respect to θ−.

Lemma 2. 1. Suppose that f (θ|θ−) is log-normal. Then

θ−∫∞θf2 (x|θ−) dx

θf (θ|θ−)= ρ, (36)

v

θf ′ (θ|θ−)

f (θ|θ−)= −

(ln θ − ρ ln θ0

σ2 + 1

)(37)

and for any κ < 1

limθ→∞

1

θ1−κf2 (θ|θ−)

f (θ|θ−)= 0 (38)

2. Suppose that f (θ|θ−) is a mixture of log-normals. Then equations (36)

and (38) hold and

limθ→0

θf ′ (θ|θ−)

f (θ|θ−)=∞. (39)

Suppose σ21 ≥ σi for all i. Then

limθ→∞

θf ′ (θ|θ−) /f (θ|θ−)

−(

ln θ−ρ ln θ0−µ1

σ21

+ 1) = 1. (40)

3. Suppose that f (θ|θ−) is Pareto. Then

limθ→∞

θf ′ (θ|θ−)

f (θ|θ−)= − (a+ 1)

Proof. Proof of part 1.

When shocks are log normal,

f (θ|θ−) =1

θσ√

2πexp

(−(ln θ − ρ ln θ−)2

2σ2

)

In this case we have

f2 (θ|θ−) =1

θσ√

2πexp

(−(ln θ − ρ ln θ−)2

2σ2

)(ln θ − ρ ln θ−)

σ2

ρ

θ−

We need to find∫∞θ f2(x|θ−)dx

θf(θ|θ−). Let start with

∫∞θf2 (x|θ−) dx.

Case 1 : Suppose that θ ≥ exp (ρ ln θ−) . Let z = (ln θ−ρ ln θ−)2

2σ2 so that

vi

dz = (ln θ−ρ ln θ−)

σ21θdθ. We have∫ ∞

θ

f2 (x|θ−) dx =

∫ ∞θ

1

xσ√

2πexp

(−(lnx− ρ ln θ−)2

2σ2

)(lnx− ρ ln θ−)

σ2

ρ

θ−dx

θ−

1

σ√

∫ ∞(ln θ−ρ ln θ−)2

2σ2

exp (−z) dz

θ−

1

σ√

2πexp

(−(ln θ − ρ ln θ−)2

2σ2

)

Case 2 : Suppose that θ < exp (ρ ln θ−) . In this case we can write∫ ∞θ

f2 (x|θ−) dx =

∫ exp(ρ ln θ−)

θ

f2 (x|θ−) dx+

∫ ∞exp(ρ ln θ−)

f2 (x|θ−) dx.

The second integral can be inferred from Case 1, it is equal to ρθ−

1σ√

2π. For

the first integral we again define z = (ln θ−ρ ln θ−)2

2σ2 and note that

∫ exp(ρ ln θ−)

θ

f2 (x|θ−) dx = − ρ

θ−

1

σ√

∫ (ln θ−ρ ln θ−)2

2σ2

0

exp (−z) dz

θ−

1

σ√

2πexp

(−(ln θ − ρ ln θ−)2

2σ2

)− ρ

θ−

1

σ√

2π.

Therefore in this case∫ ∞θ

f2 (x|θ−) dx =

∫ exp(ρ ln θ−)

θ

f2 (x|θ−) dx+

∫ ∞exp(ρ ln θ−)

f2 (x|θ−) dx

θ−

1

σ√

2πexp

(−(ln θ − ρ ln θ−)2

2σ2

)

Combining both cases, we see that∫ ∞θ

f2 (x|θ−) dx =ρ

θ−

1

σ√

2πexp

(−(ln θ − ρ ln θ−)2

2σ2

)for all θ

andθ−∫∞θf2 (x|θ−) dx

θf (θ|θ−)= ρ,

vii

which proves (36).

To see (37), differentiate directly

θf ′

f=

θ 1θσ√

2πexp

(− (ln θ−ρ ln θ−)2

2σ2

)−(ln θ−ρ ln θ−)

σ21θ− 1

θσ√

2πexp

(− (ln θ−ρ ln θ−)2

2σ2

)1

θσ√

2πexp

(− (ln θ−ρ ln θ−)2

2σ2

)= −(ln θ − ρ ln θ−)

σ2 − 1

To see (38) use L’Hospital’s rule

limθ→∞

1

θ1−κf2 (θ|θ−)

f (θ|θ−)=

ρ

σ2θ−limθ→∞

(ln θ − ρ ln θ−)

θ1−κ

σ2θ−limθ→∞

ln θ

θ1−κ

σ2θ−limθ→∞

θ−1

(1− κ) θ−κ

σ2θ− (1− κ)limθ→∞

1

θ1−κ

= 0

Proof of part 2.

When shocks are a mixture of log normals,

f (θ|θ−) =I∑i=1

pi1

θσi√

2πexp

(−(ln θ − ρ ln θ− − µi)

2

2σ2i

)

and

f2 (θ|θ−) =

I∑i=1

pi1

θσi√

2πexp

(−(ln θ − ρ ln θ− − µi)

2

2σ2i

)(ln θ − ρ ln θ− − µi)

σ2i

ρ

θ−

Using the same steps as in the proof of the previous part, we obtain∫ ∞θ

f2 (x|θ−) dx =ρ

θ−

I∑i=1

pi1

σi√

2πexp

(−(ln θ − ρ ln θ− − µi)

2

2σ2i

)=

ρ

θ−θf (θ|θ−)

which implies that f satisfies (36).

viii

To see (40) and (39), observe that

θf ′

f=

∑Ii=1

piσi

exp(− (ln θ−ρ ln θ−−µi)2

2σ2i

)(−(ln θ−ρ ln θ−−µi)

σ2i

− 1)

∑Ii=1

piσi

exp(− (ln θ−ρ ln θ−−µi)2

2σ2i

)=− (ln θ − ρ ln θ− − µ1)

σ21

− 1

×p1

σ1+∑I

i=2piσi

exp(

(ln θ−ρ ln θ−−µ1)2

2σ21

− (ln θ−ρ ln θ−−µi)2

2σ2i

)−(ln θ−ρ ln θ−−µi)−σ2

i

−(ln θ−ρ ln θ−−µ1)−σ21

σ21

σ2i

p1

σ1+∑I

i=2piσi

exp(

(ln θ−ρ ln θ−−µ1)2

2σ21

− (ln θ−ρ ln θ−−µi)2

2σ2i

)Since σ2

1 ≥ σ2i ,

(ln θ−ρ ln θ−−µ1)2

2σ21

− (ln θ−ρ ln θ−−µi)2

2σ2i

→ −∞ as ln θ → ±∞ and

therefore

limln θ→±∞

exp

((ln θ − ρ ln θ− − µ1)2

2σ21

− (ln θ − ρ ln θ− − µi)2

2σ2i

)= 0,

and

limln θ→±∞

exp

((ln θ − ρ ln θ− − µ1)2

2σ21

− (ln θ − ρ ln θ− − µi)2

2σ2i

)− (ln θ − ρ ln θ− − µi)− σ2

i

− (ln θ − ρ ln θ− − µ1)− σ21

= 0.

This implies that

limln θ→±∞

p1

σ1+∑I

i=2piσi

exp(

(ln θ−ρ ln θ−−µ1)2

2σ21

− (ln θ−ρ ln θ−−µi)2

2σ2i

)−(ln θ−ρ ln θ−−µi)−σ2

i

−(ln θ−ρ ln θ−−µ1)−σ21

σ21

σ2i

p1

σ1+∑I

i=2piσi

exp(

(ln θ−ρ ln θ−−µ1)2

2σ21

− (ln θ−ρ ln θ−−µi)2

2σ2i

) = 1,

which proves (40) and (39).

To see that f also satisfies (38), note that

f2

f=

ρ

θ−

∑Ii=1

piσi

exp(− (ln θ−ρ ln θ−−µi)2

2σ2i

)(ln θ−ρ ln θ−−µi)

σ2i∑I

i=1piσi

exp(− (ln θ−ρ ln θ−−µi)2

2σ2i

)=

ρ

θ−

(ln θ − ρ ln θ− − µ1)

σ21

×p1

σ1+∑I

i=2piσi

exp(

(ln θ−ρ ln θ−−µ1)2

2σ21

− (ln θ−ρ ln θ−−µi)2

2σ2i

)−(ln θ−ρ ln θ−−µi)−(ln θ−ρ ln θ−−µ1)

σ21

σ2i

p1

σ1+∑I

i=2piσi

exp(

(ln θ−ρ ln θ−−µ1)2

2σ21

− (ln θ−ρ ln θ−−µi)2

2σ2i

)ix

The last term converges to 1, and L’Hospital’s rule implies that 1θ1−κ (ln θ − ρ ln θ− − µ1)→

0 as θ →∞.

Proof of part 3.

If ε has a Pareto tail, then for high θ the distribution f must take a

form f (θ|θ−) = const · aθa+1 . Direct calculation shows that it satisfies θf ′

f=

− (a+ 1) .

A.3.2 Labor distortions with CES preferences

Equation (22) follows from (17) by applying Lemma 2. The fact that λ1,t =∫∞0

f(x)dxUc(x)

follows from (35) together with the fact that that separability in

preferences implies γ = 0 and∫∞

0f2 (x|θ−) dx = 0.

Next we present the proof of Proposition 2.

Proof. Let Ψ (θ) =[∫∞

θ

(1

Uc(x)− λ1

)f (x|θ−) dx

]. To prove the equality in

Proposition 2 first we show that limθ→0Ψ(θ)θf

= 0. Applying L’Hospital’s rule

limθ→0

Ψ (θ)

θf= lim

θ→0

Ψ′ (θ)

f + θf ′= lim

θ→0

(λ1 − 1/Uc (θ))

1 + θf ′

f

By Lemma 2, θf ′

f= −

(ln θ−ρ ln θ0

σ2 + 1)and hence limθ→0

θf ′

f= ∞. Since

Uc (0) is finite, limθ→0Ψ(θ)Uc(θ)

θf= 0. Therefore(

1 +1

ε

)1− F (θ|θ−)

θf (θ|θ−)

∫ ∞θ

Uc (θ)

Uc (x)(1− λ1Uc (x))

f (x|θ−) dx

1− F (θ|θ−)=

(1 +

1

ε

)Uc (θ)

Ψ (θ)

θf→ 0.

This shows the equality part in Proposition 2.

The inequality follows if we show that 1 ≤ βδUc(0|θt−1)Uc(θt−1)

. When preferences

are separable, the first order conditions for c and w together with (35) imply

that1

Uc(θt−1

) =δ

β

∂Vt∂w

βλ1,t =

δ

β

∫ ∞0

f (x) dx

Uc(x|θt−1

) .

x

If c increases in θt, we can further show that

limθ→0

τ yt(θ|θt−1

)1− τ yt

(θ|θt−1

) ≥ β

δρ

τ yt−1

(θt−1

)1− τ yt−1

(θt−1

) .Indeed, in this case 1/Uc

(0|θt−1

)≤ 1/Uc

(θt|θt−1

)and therefore the previous

equation implies

1

Uc(θt−1

) ≥ δ

β

∫ ∞0

f (x) dx

Uc(0|θt−1

) =δ

β

1

Uc(0|θt−1

) ,from which it follows that 1 ≤ β

δUc(0|θt−1)Uc(θt−1)

.

We now characterize properties of labor distortions for high realizations of

θ.

Multiply both sides of (23) by θκ and use definition of v

v =

(1 +

1

ε

)1

θ2−κf

∫ ∞θ

(1

Uc (x)− λ1 + λ2

f2

f

)fdx.

Differentiate and multiply by θ

θdv

dθ= −

(1 +

1

ε

)(1

θ1−κ1

Uc (θ)− 1

θ1−κλ1 +1

θ1−κλ2f2

f

)−v(

2− κ+θf ′

f

)Note that using definitions of (1− τ y) , s and v we have

1

Uc (θ)=θ (1− τ y)

l1/ε=

θ

θκθκ

l1/ε(1− vs) = θ1−κ

(1

s− v).

Substitute this for 1/Uc (θ) to get (24):

θdv

dθ= −

(1 +

1

ε

)(1

s− v − 1

θ1−κλ1 + λ21

θ1−κf2

f

)− v

(2− κ+

θf ′

f

)Finally, we show the relationship of our formulas in the case of thick-

tailed distribution with the results obtained by Saez (2001). Saez’formula (30)

xi

expresses asymptotic taxes in a static model with respect to compensated and

uncompensated elasticities of labor supply, ζc and ζu. To obtain the expressions

for ζc and ζu for our preferences, consider a consumer who maximizes her utility

subject to a budget constraint c = wl + R where R is non-labor income and

w is the wage rate. The optimality condition is l1/ε = wc−σ. Substitute the

expression for c from the budget constraint to get

l1/ε − w (wl +R)−σ = 0. (41)

Fully differentiate this expression[1

εl1/ε−1 + σw2 (wl +R)−σ−1

]dl

−[(wl +R)−σ − σwl (wl +R)−σ−1] dw

+ σw (wl +R)−σ−1 dR = 0.

The uncompensated elasticity of labor supply ζu = ∂l∂w

wlis given by

ζu =w

l

(wl +R)−σ − σwl (wl +R)−σ−1

1εl1/ε−1 + σw2 (wl +R)−σ−1

= εw (wl +R)−σ − σw2l (wl +R)−σ−1

l1/ε + εσw2l (wl +R)−σ−1

Substitute (41) for l1/ε to get

ζu = εw (wl +R)−σ − σw2l (wl +R)−σ−1

w (wl +R)−σ + εσw2l (wl +R)−σ−1

= ε1− σ wl

wl+R

1 + εσ wlwl+R

As w →∞, wl→∞ and therefore ζu → ε−εσ1+εσ

.

xii

Income effect, ηincome = ∂l∂Rw is equal to

ηincome = −w σw (wl +R)−σ−1

1εl1/ε−1 + σw2 (wl +R)−σ−1

= −εw σwl (wl +R)−σ−1

l1/ε + εσw2l (wl +R)−σ−1

= −εw σwl (wl +R)−σ−1

w (wl +R)−σ + εσw2l (wl +R)−σ−1

= −εσ wlwl+R

1 + εσ wlwl+R

which implies that ηincome → − εσ1+εσ

as w → ∞. By Slutsky equality ζc =

ζu − ηincome → ε1+εσ

. Substituting it into (30) gives (29).

A.4 Additional details for Section 2.2

Observe that for preferences of the form (27), γ = −U ′′U ′ l

1+1/ε > 0 and expression

(17) becomes

τ t(θ|θt−1

)1− τ t

(θ|θt−1

) =

(1 +

1

ε

)1− Ft (θ|θt−1)

θft (θ|θt−1)

×∫ ∞θ

U ′ (θ)

U ′ (x)exp

(−∫ x

θ

−U ′′U ′

l1+1/εdx

x

)(1− λ1,tU

′ (x))ft (x|θt−1) dx

1− Ft (θ|θt−1)

δ

τ t−1

(θt−1

)1− τ t−1

(θt−1

) U ′ (θ)

U ′(θt−1

) θt−1

θ

1

ft (θ|θt−1)

∫ ∞θ

exp

(−∫ x

θ

γxdx

x

)f2,t (x|θt−1) dx.

Now, observe that

U ′ (θ)

U ′ (x)exp

(−∫ x

θ

−U ′′U ′

l1+1/εdx

x

)= exp

(lnU ′ (θ)

U ′ (x)−∫ x

θ

−U ′′U ′

l1+1/εdx

x

)= exp

(−∫ x

θ

dU ′ (x)

U ′ (x)−∫ x

θ

−U ′′U ′

l1+1/εdx

x

)We have dU ′ (x) = U ′′

(c− l1/εl

). Also from (8) and (11) we have

U ′c− U ′l1/εl + βw = U ′l1+1/ε1

θ+ βw2

c− l1/εl = l1+1/ε1

θ− β w − w2

U ′

xiii

Substitute that into the expression above

U ′ (θ)

U ′ (x)exp

(−∫ x

θ

−U ′′U ′

l1+1/εdx

x

)= exp

(∫ x

θ

−U′′

U ′

(l1+1/ε 1

x− β w − w2

U ′

)dx−

∫ x

θ

−U ′′U ′

l1+1/εdx

x

)= exp

(∫ x

θ

βU ′′

U ′(w − w2) dx

)Proof of Proposition 3

Proof of part 1 The optimization problem for the model with retirement

can be written as

Problem 1

minc0,c1,u,l0

∫ ∞0

(c0 + δc1 − θl0) dF0

subject to

u′ (θ) =1

θl1+1ε

u =c1−σ

0

1− σ + βc1−σ

1

1− σ −(

1 +1

ε

)−1

l1+1/ε

w0 =

∫ ∞0

udF0

It is straightforward to show that if δ = β then c0 = c1 in the op-

timum. Next, we show that by redefining the variables, this problem be-

comes isomorphic to a static model. Let c0 = c1 = $2c and l0 = $1l where

$1 =(

11+β

)−εσ/(1+σε)

and $2 =(

11+β

)1/(1+εσ)

. Note that with this redefini-

tion

u = (1 + β)c1−σ

0

1− σ −(

1 +1

ε

)−1

l1+1/ε

=

(1

1 + β

) 1−σ1+εσ

−1c1−σ

1− σ −(

1

1 + β

)− εσ1+σε

1+εε(

1 +1

ε

)−1

l1+1/ε

=

(1

1 + β

)−σ+σε1+εσ

[c1−σ

1− σ −(

1 +1

ε

)−1

l1+1/ε

]

xiv

Similarly, ∫ ∞0

(c0 + δc1 − θl0) dF0

=

∫ ∞0

((1 + β) c0 − θl0) dF0

=

∫ ∞0

((1

1 + β

) 11+εσ

−1

c− θ(

1

1 + β

) −εσ1+σε

l

)dF0

=

(1

1 + β

) −εσ1+σε

∫ ∞0

(c− θl

)dF0

Therefore, the maximization problem can be re-written as

minc,u,l

∫ ∞0

(c− θl

)dF0

subject to

u′ (θ) =

(1

1 + β

)−σ+εσ1+σε 1

θl1+1ε

u =

(1

1 + β

)−σ+σε1+εσ

[c1−σ

1− σ −(

1 +1

ε

)−1

l1+1/ε

]

w0 =

∫ ∞0

udF0

Now, defining u by u =(

11+β

)−σ+σε1+εσ

u, w0 =(

11+β

)−σ+σε1+εσ

w0 this problem

becomes

Problem 0

minc,u,l

∫ ∞0

(c− θl

)dF0

subject to

u′ (θ) =1

θl1+1ε

u =

[c1−σ

1− σ −(

1 +1

ε

)−1

l1+1/ε

]

w0 =

∫ ∞0

udF0

xv

which is a static optimization problem. Therefore, if(c∗, l∗

)is a solution

to Problem 0 for w0 such that∫∞

0

(c∗ − θl∗

)dF0 = 0, then (c∗0, c

∗1, l∗0) =(

$2c∗, $2c

∗, $1l∗)is a solution to Problem 1 forw0 such that

∫∞0

(c∗0 + δc∗1 − θl∗0) dF0 =

0. By definition of labor distortion

1−τ ystatic (θ) =

(l∗)1/ε

θ (c∗)−σ=

$1/ε1 (l∗0)1/ε

$−σ2 θ (c∗0)−σ=

(1

1+β

)−σ/(1+σε)

(1

1+β

)−σ/(1+εσ)

(l∗0)1/ε

θ (c∗0)−σ= 1−τ y0 (θ) .

Proof of part 2 The first order conditions for c and w can be written as

1

Uc− µ

θf

Ucll

Uc=δ

β

∂Vt+1

∂w

From (33) we have

µ

θf=

τ y

1− τ y

(1 +

1

ε

)−11

Uc

which implies

δ

β

∂V1 (w)

∂wUc = 1− τ y

1− τ y

(1 +

1

ε

)−1Ucll

Uc

With GHH preferences, it becomes

δ

β

∂V1 (w)

∂wUc = 1 +

τ y

1− τ y

(1 +

1

ε

)−1U ′′

U ′l1+1/ε (42)

Since with retirement V1 (w) = U−1 (w) , ∂V1(w)∂w

> 0.

Suppose that τ y does not converge to 1. Take any sequence τ y (θn) and

since τ y (θn) ∈ [0, 1] it must have a convergent subsequence. We will show that

any such subsequence that does not converge to 1 must converge to 0. Suppose

τ y (θn) → τ y < 1. Then the FOCs l1/ε = θ (1− τ y) implies that l → ∞, and,

since −U′′

U ′ is bounded away from 0, the right hand side of (42) converges to

−∞. The left hand side is positive, leading to a contradiction.

xvi

A suffi cient condition for τ y (θ) not to converge to 1 is that

1− F0 (θ)

θf0 (θ)

∫ ∞θ

exp

(∫ x

θ

βU ′′

U ′c1dx

)(1− λ1,0α (x)U ′ (x))

f0 (x) dx

1− F0 (θ)

is bounded, where c1 is consumption in period 1. Suppose1−F0(θ)θf0(θ)

is bounded

from above, α (θ) = 0 for θ suffi ciently high. Suppose that U (x) = − exp (−x).

In this case exp[∫ θ

0β U

′′

U ′ c1dx]

= exp β[c1 (0)− c

(θ)]and this term must be

bounded since if c1

(θ)→ −∞ and U is bounded from above, the incentive

compatibility is violated.

A.5 Computational details for Section 3

This subsection describes additional details of how we compute the constrained

effi cient distortions for the quantitative analysis in Section 3. Our computa-

tional strategy for the numerical illustrations of theoretical analysis in Section

2 is a special case of the approach described here.

Incentive problems of dynamic mechanism design pose significant chal-

lenges for numerical analysis.16 As it is common in the optimal taxation lit-

erature, we keep the model restricted to agents who are heterogeneous along

a single dimension, skills.17 Nevertheless, it is still in general diffi cult to find

global numerical solutions to these types of problems with more than a small

number of agent types alive for a small number of periods. To make numerical

16Standard expositions of the computational approaches and challenges are, e.g., Wilson

(1996) and Judd (1998). For a recent contribution of comutational techniques see, e.g., Su

and Judd (2007).17A recent exception is, e.g., Judd and Su (2006). They show that when types are

heterogeneous along more than one dimension, the planner’s problem is a compounded

nonlinear optimization problem since the linear independence constraint qualification does

not need to hold at the solution. They find that the results with multidimensional types

may differ substantially.

xvii

solution of the model computationally feasible, we exploit the recursive struc-

ture of the dual formulation of the planner’s problem that we explored in the

theoretical analysis.

Our theoretical analysis established that our main problem is given by (7)

and (12) together with V0 (w0) = 0. This is a finite-horizon discrete-time dy-

namic programming problem with a three-dimensional continuous state vector:

w is the promised utility associated with the promise-keeping constraint (9);

w2 is the state variable associated with the threat-keeping constraint (10); θ−

is the reported type in the preceding period. In the initial period the state w0

is given by the solution to V0(w0) = 0.

To compute the optimal distortions, we proceed in three stages. First,

we implement a value function iteration algorithm and apply it to problems

(7) and (12). We start from the last working period, T − 1, and proceed by

backward induction. Since Ft (0|θ) = 1 for all θ for t ≥ T , the planner will

set w2 (θ) = 0 for all θ in period T − 1 and we can replace the value function

VT (w (θ) , 0, θ) in problem (7) for period T − 1 with the discounted present

value of resources required to provide promised utility w over the remaining

T − T + 1 periods.

We implement the backward induction procedure numerically by approxim-

ating value functions with tensor products of orthogonal polynomials evaluated

over the state space. We chose Chebyshev orthogonal polynomials of degrees 1

through 15 and checked, on numerical illustrations of Section 2, that quantit-

ative differences did not exceed 1 percent of original values after doubling the

number of degrees to 30. We allocate evaluation nodes over the state space

using roots of the polynomials given by rn = − cos (π (2n− 1) /2N), where

n = 1, .., N indexes the nodes. Note that the trigonometric formula gives the

roots on the interval [−1, 1] and hence we perform a change of variables to

xviii

adjust the root nodes to our state space. We let N = 16 for both the promise,

w, and for the threat, w2. For the skill, we set 30 equally spaced nodes to

capture equally accurately the whole shape of the distribution of distortions

within each period. Chebyshev coeffi cients are then computed by minimizing

the sum of squared distances from the computed values at 7,680 evaluation

nodes.

It is a familiar property of a state space in these problems, that no con-

strained optimal allocations may exist for some nodes. Before proceeding, one

must either implement a state space restriction algorithm to eliminate such

nodes or assign suffi ciently large penalties to such nodes and ensure that large

enough number of nodes are actually computed.18 We implement a heuristic

algorithm combining the features of both methods. First, we choose an arbit-

rarily large state space and attempt to find the optimum at each node. The

nodes where we fail receive minus infinity for the value of the planner (we

use two optimization algorithms on each node with multiple starting points).

Once all nodes are computed we verify that infinite values are along the borders

and replace them with values equal to linearly extrapolated borders, making

these nodes less attractive to the planner than any other computed node, but

preserving the usability of a continuous approximation. If a given period has

more than 77 penalized nodes (1 percent of total for a period) we re-start

by reducing the size of the last period’s state space and continue backwards.

We stopped with the maximum of 0.99 percent and the average of 0.47 per-

cent of penalized nodes per period. To keep the overall computational burden

manageable, we used this procedure rather than a full fixed point search.

The approximation procedure provides each period t problem (7) with a

18For a discussion of both and an implementation of the first method in a related problem

see, e.g., Abraham and Pavoni (2008) and references therein.

xix

continuous function approximating Vt+1, which we write as a tensor product of

the polynomials with their computed coeffi cients. Even though recursive eval-

uation of Chebyshev polynomials is faster (i.e., Pd (r) = 2rPd−1 (r)−Pd−2 (r),

where Pd (r) is the Chebyshev polynomial of degree d > 1 evaluated at r),

we use their trigonometric form in the evaluation of the tensor products (i.e.,

Pd (r) = cos (d arccos (r))). This allows us to apply an implementation of al-

gorithmic (chain rule) differentiation. In contrast, we found that numerical

differentiation methods, e.g., complex step differentiation, resulted in suffi -

ciently large numerical errors to often prevent our optimization algorithms

from converging.

At this stage we also verify the assumptions we made in the theoretical ana-

lysis. Assumption 1 is satisfied analytically given our choices of preferences.

Assumption 2 is satisfied trivially for the parameter values we choose in the

main text. We verify the increasing properties of Assumption 3 numerically

during this computational stage. The assumption requires three increasing

properties of policy functions, c (·), ω (·|θ), and ω1

(θ|·). We compute nu-

merical differences at each node and interpret differences greater than −1e5

as non-negative, satisfying the increasing properties. In addition, we check

that the computed value functions Vt are convex in the promise, w, and in the

threat, w2.

The second stage computes w0 such that V0 (w0) = 0. Given V0 computed

in the first stage, we first search for an interval containing zero using binary

jumps. We start with a high initial promise (e.g., −1e5 for the benchmark

CES case) to ensure that the starting V0 > 0, and scale the initial promise by

2i on the ith iteration until V0 < 0. Then, we converge to w0 with bisection.

We set the numerical error tolerance for the bisection procedure to 1e5 once

again so that a smaller V0 is interpreted to be zero. The rate of convergence

xx

for bisection is linear, however, it is relatively more important at this stage to

guarantee convergence.

In the final stage, we compute optimal labor and savings distortions. Given

Vt’s computed and approximated as continuously differentiable functions in the

first stage and given w0 solved for in the second stage, we generate optimal

allocations by forward induction, starting from policy functions produced by

V0 (w0) from problem (12). Optimal labor and savings distortions are com-

puted from the policy functions using definitions (14) and (15) respectively.

Given a history up to period t, labor distortions can be readily computed by

this procedure with polynomial algorithm complexity. However, computing

savings distortions for a given history up to period t requires computing all

possible t + 1 conditional history realizations. As a consequence, this stage

quickly adds up to exponential algorithm complexity. To keep the problem

computationally feasible and at the same time develop novel insights, we com-

pute several examples of complete histories that we argue are of interest in

Section 3.

Note that for the first and the last stages of this approach to be compu-

tationally feasible, it is essential to use an effi cient and robust optimization

algorithm to solve each of the separate period t minimization problems for

each evaluation node. We solve these problem using an implementation of

the interior-point algorithm.19 We choose conjugate gradient iteration to com-

pute the optimization step. The implementation uses a trust-region method

to solve barrier problems. The acceptance criterion is an l1 barrier penalty

function. Interior-point algorithms are some of the most effi cient and stable

19The implementation we use is KNITRO’s algorithm 1. We implement the model in

AMPL to streamline the application of the algorithm and to take advantage of algorithmic

differentiation.

xxi

methods that are currently available for solving large non-linear optimization

problems. To further improve the accuracy of the solution estimates, includ-

ing Lagrange multipliers, we proceed, at the value function iteration stage, to

active-set iterations that use the output of the interior-point algorithm as its

input. The implementation of the active-set algorithm is based on the sequen-

tial linear quadratic programming. Once the problem is correctly scaled, we

observe quadratic convergence.

Finally, we note that these optimization methods are local. Indeed, we find

that optimizations are sensitive to the initial guess. To reduce the possibility

of picking a non-global local optimum within each minimization problem, we

apply a multi-start globalization heuristic.

xxii