THE QUARTERLY JOURNAL OF ECONOMICS...Laszlo Sandor, Gary Solon, Danny Yagan, numerous seminar participants, and four anonymous referees for helpful comments. Sarah Abraham, Alex Bell,

THE

QUARTERLY JOURNALOF ECONOMICS

Vol. 129 November 2014 Issue 4

WHERE IS THE LAND OF OPPORTUNITY? THEGEOGRAPHY OF INTERGENERATIONAL MOBILITY IN

THE UNITED STATES*

Raj ChettyNathaniel Hendren

Patrick KlineEmmanuel Saez

We use administrative records on the incomes of more than 40 millionchildren and their parents to describe three features of intergenerational mo-bility in the United States. First, we characterize the joint distribution ofparent and child income at the national level. The conditional expectation ofchild income given parent income is linear in percentile ranks. On average, a 10percentile increase in parent income is associated with a 3.4 percentile increase

*The opinions expressed in this article are those of the authors alone and donot necessarily reflect the views of the Internal Revenue Service or the U.S.Treasury Department. This work is a component of a larger project examiningthe effects of tax expenditures on the budget deficit and economic activity. Allresults based on tax data in this article are constructed using statistics originallyreported in the SOI Working Paper ‘‘The Economic Impacts of Tax Expenditures:Evidence from Spatial Variation across the U.S.,’’ approved under IRS contractTIRNO-12-P-00374 and presented at the National Tax Association meeting onNovember 22, 2013. We thank David Autor, Gary Becker, David Card, DavidDorn, John Friedman, James Heckman, Nathaniel Hilger, Richard Hornbeck,Lawrence Katz, Sara Lalumia, Adam Looney, Pablo Mitnik, Jonathan Parker,Laszlo Sandor, Gary Solon, Danny Yagan, numerous seminar participants, andfour anonymous referees for helpful comments. Sarah Abraham, Alex Bell, ShelbyLin, Alex Olssen, Evan Storms, Michael Stepner, and Wentao Xiong providedoutstanding research assistance. This research was funded by the NationalScience Foundation, the Lab for Economic Applications and Policy at Harvard,the Center for Equitable Growth at UC-Berkeley, and Laura and John ArnoldFoundation. Publicly available portions of the data and code, including interge-nerational mobility statistics by commuting zone and county, are available athttp://www.equality-of-opportunity.org.

! The Author(s) 2014. Published by Oxford University Press, on behalf of Presidentand Fellows of Harvard College. All rights reserved. For Permissions, please email:[email protected] Quarterly Journal of Economics (2014), 1553–1623. doi:10.1093/qje/qju022.Advance Access publication on September 14, 2014.

1553

at University of O

ttawa on January 9, 2015

http://qje.oxfordjournals.org/D

ownloaded from

http://www.equality-of-opportunity.orghttp://qje.oxfordjournals.org/

in a child’s income. Second, intergenerational mobility varies substantiallyacross areas within the United States. For example, the probability that achild reaches the top quintile of the national income distribution startingfrom a family in the bottom quintile is 4.4% in Charlotte but 12.9% in SanJose. Third, we explore the factors correlated with upward mobility. High mo-bility areas have (i) less residential segregation, (ii) less income inequality, (iii)better primary schools, (iv) greater social capital, and (v) greater family stabil-ity. Although our descriptive analysis does not identify the causal mechanismsthat determine upward mobility, the publicly available statistics on interge-nerational mobility developed here can facilitate research on such mechanisms.JEL Codes: H0, J0, R0.

I. Introduction

The United States is often hailed as the ‘‘land of opportu-nity,’’ a society in which a person’s chances of success dependlittle on his or her family background. Is this reputation war-ranted? We show that this question does not have a clearanswer because there is substantial variation in intergenera-tional mobility across areas within the United States. TheUnited States is better described as a collection of societies,some of which are ‘‘lands of opportunity’’ with high rates of mo-bility across generations, and others in which few children escapepoverty.

We characterize intergenerational mobility using informa-tion from deidentified federal income tax records, which providedata on the incomes of more than 40 million children and theirparents between 1996 and 2012. We organize our analysis intothree parts.

In the first part, we present new statistics on intergenera-tional mobility in the United States as a whole. In our baselineanalysis, we focus on U.S. citizens in the 1980–1982 birth co-horts—the oldest children in our data for whom we can reliablyidentify parents based on information on dependent claiming. Wemeasure these children’s income as mean total family incomein 2011 and 2012, when they are approximately 30 years old.We measure their parents’ income as mean family income be-tween 1996 and 2000, when the children are between the agesof 15 and 20.1

1. We show that our baseline measures do not suffer from significant life cycleor attenuation bias (Solon 1992; Zimmerman 1992; Mazumder 2005) by establish-ing that estimates of mobility stabilize by the time children reach age 30 and are notvery sensitive to the number of years used to measure parent income.

QUARTERLY JOURNAL OF ECONOMICS1554

at University of O



ownloaded from

http://qje.oxfordjournals.org/

Following the prior literature (e.g., Solon 1999), we beginby estimating the intergenerational elasticity of income (IGE)by regressing log child income on log parent income.Unfortunately, we find that this canonical log-log specificationyields very unstable estimates of mobility because the relation-ship between log child income and log parent income is nonlinearand the estimates are sensitive to the treatment of children withzero or very small incomes. When restricting the sample betweenthe 10th and 90th percentiles of the parent income distributionand excluding children with zero income, we obtain an IGE esti-mate of 0.45. However, alternative specifications yield IGEs rang-ing from 0.26 to 0.70, spanning most of the estimates in the priorliterature.2

To obtain a more stable summary of intergenerational mobil-ity, we use a rank-rank specification similar to that used by Dahland DeLeire (2008). We rank children based on their incomesrelative to other children in the same birth cohort. We rank par-ents of these children based on their incomes relative to otherparents with children in these birth cohorts. We characterize mo-bility based on the slope of this rank-rank relationship, whichidentifies the correlation between children’s and parents’ posi-tions in the income distribution.3

We find that the relationship between mean child ranks andparent ranks is almost perfectly linear and highly robust to al-ternative specifications. A 10 percentile point increase in parentrank is associated with a 3.41 percentile increase in a child’sincome rank on average. Children’s college attendance and teen-age birth rates are also linearly related to parent income ranks. A10 percentile point increase in parent income is associated with a6.7 percentage point (pp) increase in college attendance rates anda 3 pp reduction in teenage birth rates for women.

In the second part of the article, we characterize variation inintergenerational mobility across commuting zones (CZs). Com-muting zones are geographical aggregations of counties that aresimilar to metro areas but cover the entire United States,

2. In an important recent study, Mitnik et al. (2014) propose a new dollar-weighted measure of the IGE and show that it yields more stable estimates. Wediscuss the differences between the new measure of mobility proposed by Mitnik etal. and the canonical definition of the IGE in Section IV.A.

3. The rank-rank slope and IGE both measure the degree to which differencesin children’s incomes are determined by their parents’ incomes. We discuss theconceptual differences between the two measures in Section II.

WHERE IS THE LAND OF OPPORTUNITY? 1555

at University of O



ownloaded from


including rural areas (Tolbert and Sizer 1996). We assign chil-dren to CZs based on where they lived at age 16—that is, wherethey grew up—irrespective of whether they left that CZ after-ward. When analyzing CZs, we continue to rank both childrenand parents based on their positions in the national income dis-tribution, which allows us to measure children’s absolute out-comes as we discuss later.

The relationship between mean child ranks and parent ranksis almost perfectly linear within CZs, allowing us to summarizethe conditional expectation of a child’s rank given his parents’rank with just two parameters: a slope and intercept. The slopemeasures relative mobility: the difference in outcomes betweenchildren from top versus bottom income families within a CZ. Theintercept measures the expected rank for children from familiesat the bottom of the income distribution. Combining the interceptand slope for a CZ, we can calculate the expected rank of childrenfrom families at any given percentile p of the national parentincome distribution. We call this measure absolute mobility atpercentile p. Measuring absolute mobility is valuable because in-creases in relative mobility have ambiguous normative implica-tions, as they may be driven by worse outcomes for the rich ratherthan better outcomes for the poor.

We find substantial variation in both relative and absolutemobility across CZs. Relative mobility is lowest for children whogrew up in the Southeast and highest in the Mountain West andthe rural Midwest. Some CZs in the United States have relativemobility comparable to the highest mobility countries in theworld, such as Canada and Denmark, while others have lowerlevels of mobility than any developed country for which dataare available.

We find similar geographical variation in absolute mobility.We focus much of our analysis on absolute mobility at p¼ 25,which we call ‘‘absolute upward mobility.’’ This statistic measuresthe mean income rank of children with parents in the bottom halfof the income distribution given linearity of the rank-rank rela-tionship. Absolute upward mobility ranges from 35.8 in Charlotteto 46.2 in Salt Lake City among the 50 largest CZs. A 1 standarddeviation increase in CZ-level upward mobility is associated witha 0.2 standard deviation improvement in a child’s expected rankgiven parents at p¼ 25, 60% as large as the effect of a 1 standarddeviation increase in his own parents’ income. Other measures ofupward mobility exhibit similar spatial variation. For instance,


at University of O



ownloaded from


the probability that a child reaches the top fifth of the incomedistribution conditional on having parents in the bottom fifth is4.4% in Charlotte, compared with 10.8% in Salt Lake City and12.9% in San Jose. The CZ-level mobility statistics are robust toadjusting for differences in the local cost of living, shocks to localgrowth, and using alternative measures of income.

Absolute upward mobility is highly correlated with relativemobility: areas with high levels of relative mobility (low rank-rank slopes) tend to have better outcomes for children from low-income families. On average, children from families belowpercentile p¼ 85 have better outcomes when relative mobility isgreater; those above p¼85 have worse outcomes. Location mat-ters more for children growing up in low-income families: the ex-pected rank of children from low-income families varies moreacross CZs than the expected rank of children from high incomefamilies.

The spatial patterns of the gradients of college attendanceand teenage birthrates with respect to parent income across CZsare very similar to the variation in intergenerational income mo-bility. This suggests that the spatial differences in mobility aredriven by factors that affect children while they are growing uprather than after they enter labor market.

In the final part of the article, we explore such factors bycorrelating the spatial variation in mobility with observable char-acteristics. To begin, we show that upward income mobility issignificantly lower in areas with larger African American popu-lations. However, white individuals in areas with large AfricanAmerican populations also have lower rates of upward mobility,implying that racial shares matter at the community level.

We then identify five factors that are strongly correlated withthe variation in upward mobility across areas. The first is segre-gation: areas that are more residentially segregated by race andincome have lower levels of mobility. Second, areas with moreinequality as measured by Gini coefficients have less mobility,consistent with the ‘‘Great Gatsby curve’’ documented acrosscountries (Krueger 2012; Corak 2013). Top 1% income sharesare not highly correlated with intergenerational mobility bothacross CZs within the United States and across countries, sug-gesting that the factors that erode the middle class may hamperintergenerational mobility more than the factors that lead toincome growth in the upper tail. Third, proxies for the qualityof the K-12 school system are positively correlated with mobility.


at University of O



ownloaded from


Fourth, social capital indexes (Putnam 1995)—which are proxiesfor the strength of social networks and community involvement inan area—are also positively correlated with mobility. Finally,mobility is significantly lower in areas with weaker family struc-tures, as measured, for example, by the fraction of single parents.As with race, parents’ marital status does not matter purelythrough its effects at the individual level. Children of marriedparents also have higher rates of upward mobility in communitieswith fewer single parents. Interestingly, we find no correlationbetween racial shares and upward mobility once we control forthe fraction of single parents in an area.

We find modest correlations between upward mobility andlocal tax policies and no systematic correlation between mobilityand local labor market conditions, rates of migration, or access tohigher education. In a multivariable regression, the five key fac-tors described above generally remain statistically significantpredictors of both relative and absolute upward mobility, evenin specifications with state fixed effects. However, we emphasizethat these factors should not be interpreted as causal determi-nants of mobility because all of these variables are endogenouslydetermined and our analysis does not control for numerous otherunobserved differences across areas.

Our results build on an extensive literature on intergenera-tional mobility, reviewed by Solon (1999) and Black and Devereux(2011). Our estimates of the level of mobility in the United Statesas a whole are broadly consistent with prior results, with theexception of Mazumder’s (2005) and Clark’s (2014) IGE esti-mates, which imply much lower levels of intergenerational mo-bility. We discuss why our findings may differ from their resultsin Online Appendices D and E. Our focus on within-country com-parisons offers two advantages over the cross-country compari-sons that have been the focus of prior comparative work (e.g.,Bjorklund and Jäntti 1997; Jäntti et al. 2006; Corak 2013).First, differences in measurement and methods make it difficultto reach definitive conclusions from cross-country comparisons(Solon 2002). The variables we analyze are measured using thesame data sources across all CZs. Second, and more important,we characterize both relative and absolute mobility across CZs.The cross-country literature has focused exclusively on differ-ences in relative mobility; much less is known about how theprospects of children from low-income families vary across coun-tries when measured on a common absolute scale (Ray 2010).


at University of O



ownloaded from

http://qje.oxfordjournals.org/lookup/suppl/doi:10.1093/qje/qju022/-/DC1http://qje.oxfordjournals.org/lookup/suppl/doi:10.1093/qje/qju022/-/DC1http://qje.oxfordjournals.org/

Our analysis also relates to the literature on neighborhoodeffects, reviewed by Jencks and Mayer (1990) and Sampson et al.(2002). Unlike recent experimental work on neighborhood effects(e.g., Katz, Kling, and Liebman 2001; Oreopoulos 2003), our de-scriptive analysis does not shed light on whether the differencesin outcomes across areas are due to the causal effect of neighbor-hoods or differences in the characteristics of people living in thoseneighborhoods. However, in a followup paper, Chetty andHendren (2014) show that a substantial portion of the spatialvariation documented here is driven by causal effects of placeby studying families that move across areas with children of dif-ferent ages.

The article is organized as follows. We begin in Section II bydefining the measures of intergenerational mobility that westudy and discussing their conceptual properties. Section IIIdescribes the data. Section IV reports estimates of intergenera-tional mobility at the national level. In Section V, we presentestimates of absolute and relative mobility by commuting zone.Section VI reports correlations of our mobility measures withobservable characteristics of commuting zones. Section VII con-cludes. Statistics on intergenerational mobility and related covar-iates are publicly available by commuting zone, metropolitanstatistical area, and county on the project website (www.equali-ty-of-opportunity.org).

II. Measures of Intergenerational Mobility

At the most general level, studies of intergenerational mobil-ity seek to measure the degree to which a child’s social and eco-nomic opportunities depend on his parents’ income or socialstatus. Because opportunities are difficult to measure, virtuallyall empirical studies of mobility measure the extent to which achild’s income (or occupation) depends on his parents’ income (oroccupation).4 Following this approach, we aim to characterize the

4. This simplification is not innocuous, as a child’s realized income may differfrom his opportunities. For instance, children of wealthy parents may choose not towork or may choose lower-paying jobs, which would reduce the persistence ofincome across generations relative to the persistence of underlying opportunities.


at University of O



ownloaded from

http://www.equality-of-opportunity.orghttp://www.equality-of-opportunity.orghttp://qje.oxfordjournals.org/

joint distribution of a child’s lifetime pretax family income (Yi),and his parents’ lifetime pretax family income (Xi).

5

In large samples, one can characterize the joint distributionof (Yi, Xi) nonparametrically, and we provide such a characteri-zation in the form of a 100� 100 centile transition matrix below.However, to provide a parsimonious summary of the degree ofmobility and compare rates of mobility across areas, it is usefulto characterize the joint distribution using a small set of statis-tics. We divide measures of mobility into two classes that capturedifferent normative concepts: relative mobility and absolute mo-bility. In this section, we define a set of statistics that we use tomeasure these two concepts empirically and compare their con-ceptual properties.

II.A. Relative Mobility

One way to study intergenerational mobility is to ask, ‘‘Whatare the outcomes of children from low-income families relative tothose of children from high-income families?’’ This question,which focuses on the relative outcomes of children from differentparental backgrounds, has been the subject of most prior researchon intergenerational mobility (Solon 1999; Black et al. 2011).

The canonical measure of relative mobility is the elasticity of

child income with respect to parent income dE½log YijXi¼x�dlog x

� �, com-

monly called the intergenerational income elasticity (IGE). Themost common method of estimating the IGE is to regress log childincome (logYi) on log parent income (logXi), which yields a coeffi-cient of

IGE ¼ �XYSDðlogYiÞSDðlogXiÞ

; ð1Þ

where �XY¼Corr(logXi, logYi) is the correlation between log childincome and parent income and SD() denotes the standard devia-tion. The IGE is a relative mobility measure because it measuresthe difference in (log) outcomes between children of high versuslow income parents.

5. If taxes and transfers do not generate rank reversals (as is typically the casein practice), using post-tax income instead of pretax income would have no effect onour preferred rank-based measures of mobility. See Mitnik et al. (2014) for a com-parison of pretax and post-tax measures of the IGE of income.


at University of O



ownloaded from


An alternative measure of relative mobility is the correlationbetween child and parent ranks (Dahl and DeLeire 2008). Let Ridenote child i’s percentile rank in the income distribution of chil-dren and Pi denote parent i’s percentile rank in the income dis-tribution of parents. Regressing the child’s rank Ri on his parents’rank Pi yields a regression coefficient �PR¼Corr(Pi, Ri), which wecall the rank-rank slope.6 The rank-rank slope �PR measures theassociation between a child’s position in the income distributionand his parents’ position in the distribution.

To understand the connection between the IGE and the rank-rank slope, note that the correlation of log incomes �XY and thecorrelation of ranks �PR are closely related scale-invariant mea-sures of the degree to which child income depends on parentincome.7 Hence, equation (1) implies that the IGE combines thedependence features captured by the rank-rank slope with theratio of standard deviations of income across generations.8

The IGE differs from the rank-rank slope to the extent that in-equality changes across generations. Intuitively, a given increasein parents’ incomes has a greater effect on the level of children’sincomes when inequality is greater among children than amongparents.

We estimate both the IGE and the rank-rank slope to distin-guish differences in mobility from differences in inequality and toprovide a comparison to the prior literature. However, we focusprimarily on rank-rank slopes because they prove to be muchmore robust across specifications and are thus more suitable forcomparisons across areas from a statistical perspective.

II.B. Absolute Mobility

A different way to measure intergenerational mobility is toask, ‘‘What are the outcomes of children from families of a givenincome level in absolute terms?’’ For example, one may be

6. The regression coefficient equals the correlation coefficient because bothchild and parent ranks follow a uniform distribution by construction.

7. For example, if parent and child income follow a bivariate log normal dis-

tribution, �PR ¼6ArcSinð�XY2 Þ

p &3�XYp ¼ 0:95�XY when �XY is small (Trivedi and Zimmer

2007).8. More generally, the joint distribution of parent and child incomes can be

decomposed into two components: the joint distribution of parent and child percen-tile ranks (the copula) and the marginal distributions of parent and child income.The rank-rank slope depends purely on the copula, whereas the IGE combines bothcomponents.


at University of O



ownloaded from


interested in measuring the mean outcomes of children whosegrow up in low-income families. Absolute mobility may be ofgreater normative interest than relative mobility. Increases inrelative mobility (i.e., a lower IGE or rank-rank slope) could beundesirable if they are caused by worse outcomes for the rich. Incontrast, increases in absolute mobility at a given income level,holding fixed absolute mobility at other income levels, unambig-uously increase welfare if one respects the Pareto principle (and ifwelfare depends purely on income).

We consider three statistical measures of absolute mobility.Our primary measure, which we call absolute upward mobility, isthe mean rank (in the national child income distribution) of chil-dren whose parents are at the 25th percentile of the nationalparent income distribution.9 At the national level, this statisticis mechanically related to the rank-rank slope and does not pro-vide any additional information about mobility.10 However, whenwe study small areas within the United States, a child’s rank inthe national income distribution is effectively an absolute out-come because incomes in a given area have little effect on thenational distribution.

The second measure we analyze is the probability of risingfrom the bottom quintile to the top quintile of the income distri-bution (Corak and Heisz 1999; Hertz 2006), which can be inter-preted as a measure of the fraction of children who achieve the‘‘American Dream.’’ Again, when the quintiles are defined in thenational income distribution, these transition probabilities canbe interpreted as measures of absolute outcomes in small areas.Our third measure is the probability that a child has familyincome above the poverty line conditional on having parents atthe 25th percentile. Because the poverty line is defined in abso-lute dollar terms in the United States, this statistic measures the

9. This measure is the analog of the rank-rank slope in terms of absolute mo-bility. The corresponding analogof the IGE is the mean log income of children whoseparents are at the 25th percentile. We do not study this statistic because it is verysensitive to the treatment of zeros and small incomes.

10. We show below that the rank-rank relationship is approximately linear.Because child and parent ranks each have a mean of 0.5 by construction in thenational distribution, the mean rank of children with parents at percentile p issimply 0.5þ�PR(p – 0.5). Conceptually, the slope is the only free parameter in thelinear national rank-rank relationship. Intuitively, if one child moves up in theincome distribution in terms of ranks, another must come down.


at University of O



ownloaded from


fraction of children who achieve a given absolute livingstandard.11

It is useful to analyze multiple measures of mobility becausethe appropriate measure of intergenerational mobility dependson one’s normative objective (Fields and Ok 1999). Fortunately,we find that the patterns of spatial variation in absolute and rel-ative mobility are very similar using alternative measures. Inaddition, we provide nonparametric transition matrices and mar-ginal distributions that allow readers to construct measures ofmobility beyond those we consider here.

III. Data

We use data from federal income tax records spanning 1996–2012. The data include both income tax returns (1040 forms) andthird-party information returns (such as W-2 forms), which giveus information on the earnings of those who do not file tax re-turns. We provide a detailed description of how we construct ouranalysis sample starting from the raw population data in OnlineAppendix A. Here, we briefly summarize the key variable andsample definitions. Note that in what follows, the year alwaysrefers to the tax year (i.e., the calendar year in which theincome is earned).

III.A. Sample Definitions

Our base data set of children consists of all individuals who(i) have a valid Social Security number or individual taxpayeridentification number, (ii) were born between 1980 and 1991,and (iii) are U.S. citizens as of 2013. We impose the citizenshiprequirement to exclude individuals who are likely to have immi-grated to the United States as adults, for whom we cannot mea-sure parent income. We cannot directly restrict the sample toindividuals born in the United States because the database onlyrecords current citizenship status.

We identify the parents of a child as the first tax filers (be-tween 1996 and 2012) who claim the child as a child dependentand were between the ages of 15 and 40 when the child was born.

11. Another intuitive measure of upward mobility is the fraction of childrenwhose income exceeds that of their parents. This statistic turns out to be problem-atic for our application because we measure parent and child income at differentages and because it is very sensitive to differences in local income distributions.


at University of O



ownloaded from


If the child is first claimed by a single filer, the child is defined ashaving a single parent. For simplicity, we assign each child aparent (or parents) permanently using this algorithm, regardlessof any subsequent changes in parents’ marital status or depen-dent claiming.12

If parents never file a tax return, we cannot link them to theirchild. Although some low-income individuals do not file tax re-turns in a given year, almost all parents file a tax return atsome point between 1996 and 2012 to obtain a tax refund ontheir withheld taxes and the Earned Income Tax Credit (Cilke1998). We are therefore able to identify parents for approximately95 percent of the children in the 1980–1991 birth cohorts. Thefraction of children linked to parents drops sharply prior to the1980 birth cohort because our data begin in 1996 and many chil-dren begin to the leave the household starting at age 17 (OnlineAppendix Table I). This is why we limit our analysis to childrenborn during or after 1980.

Our primary analysis sample, which we refer to as the coresample, includes all children in the base data set who (i) are bornin the 1980–1982 birth cohorts, (ii) for whom we are able to iden-tify parents, and (iii) whose mean parent income between 1996and 2000 is strictly positive (which excludes 1.2% of children).13

For some robustness checks, we use the extended sample, whichimposes the same restrictions as the core sample, but includes allbirth cohorts from 1980 to 1991. There are approximately 10 mil-lion children in the core sample and 44 million children in theextended sample.

1. Statistics of Income Sample. Because we can only reliablylink children to parents starting with the 1980 birth cohort in thepopulation tax data, we can only measure earnings of children upto age 32 (in 2012) in the full sample. To evaluate whether

12. Twelve percent of children in our core sample are claimed as dependents bydifferent individuals in subsequent years. To ensure that this potential measure-ment error in linking children to parents does not affect our findings, we show thatwe obtain similar estimates of mobility for the subset of children who are neverclaimed by other individuals (row 9 of Online Appendix Table VII).

13. We limit the sample to parents with positive income because parents whofile a tax return (as required to link them to a child) yet have zero income are un-likely to be representative of individuals with zero income and those with negativeincome typically have large capital losses, which are a proxy for having significantwealth.


at University of O



ownloaded from

http://qje.oxfordjournals.org/lookup/suppl/doi:10.1093/qje/qju022/-/DC1http://qje.oxfordjournals.org/lookup/suppl/doi:10.1093/qje/qju022/-/DC1http://qje.oxfordjournals.org/lookup/suppl/doi:10.1093/qje/qju022/-/DC1http://qje.oxfordjournals.org/

estimates of intergenerational mobility would change signifi-cantly if earnings were measured at later ages, we supplementour analysis using annual cross-sections of tax returns main-tained by the Statistics of Income (SOI) division of the InternalRevenue Service (IRS) prior to 1996. The SOI cross-sections pro-vide identifiers for dependents claimed on tax forms starting in1987, allowing us to link parents to children back to the 1971birth cohort using an algorithm analogous to that describedabove (see Online Appendix A for further details). The SOIcross-sections are stratified random samples of tax returns witha sampling probability that rises with income; using samplingweights, we can calculate statistics representative of the nationaldistribution. After linking parents to children in the SOI sample,we use population tax data to obtain data on income for childrenand parents, using the same definitions as in the core sample.There are approximately 63,000 children in the 1971–1979birth cohorts in the SOI sample (Online Appendix Table II).

III.B. Variable Definitions and Summary Statistics

In this section, we define the key variables we use to measureintergenerational mobility. We measure all monetary variables in2012 dollars, adjusting for inflation using the consumer priceindex (CPI-U).

1. Parent Income. Following Lee and Solon (2009), ourprimary measure of parent income is total pretax income atthe household level, which we label parent family income.More precisely, in years where a parent files a tax return, we de-fine family income as adjusted gross income (as reported on the1040 tax return) plus tax-exempt interest income and the non-taxable portion of Social Security and Disability (SSDI) benefits.In years where a parent does not file a tax return, we definefamily income as the sum of wage earnings (reported on formW-2), unemployment benefits (reported on form 1099-G), andgross social security and disability benefits (reported on formSA-1099) for both parents.14 In years where parents have no

14. The database does not record W-2s and other information returns prior to1999, so nonfiler’s income is coded as 0 prior to 1999. Assigning nonfiling parents 0income has little effect on our estimates because only 2.9% of parents in our coresample do not file in each year prior to 1999 and most nonfilers have very low W-2income. For instance, in 2000, median W-2 income among nonfilers was $29.


at University of O



ownloaded from


tax return and no information returns, family income is coded aszero.15

Our baseline income measure includes labor earnings andcapital income as well as unemployment insurance, SocialSecurity, and disability benefits. It excludes nontaxable cashtransfers such as Temporary Assistance to Needy Families andSupplemental Security Income, in-kind benefits such as foodstamps, all refundable tax credits such as the EITC, nontaxablepension contributions (such as to 401(k)s), and any earned incomenot reported to the IRS. Income is always measured prior to thededuction of individual income taxes and employee-level payrolltaxes.

In our baseline analysis, we average parents’ family incomeover the five years from 1996 to 2000 to obtain a proxy for parentlifetime income that is less affected by transitory fluctuations(Solon 1992). We use the earliest years in our sample to best re-flect the economic resources of parents while the children in oursample are growing up.16 We evaluate the robustness of our find-ings using data from other years and using a measure of individ-ual parent income instead of family income. We define individualincome as the sum of individual W-2 wage earnings, unemploy-ment insurance benefits, SSDI payments, and half of householdself-employment income (see Online Appendix A for details).

Furthermore, we show below that defining parent income based on data from 1999to 2003 (when W-2 data are available) yields virtually identical estimates (Table I,row 5). Note that we never observe self-employment income for nonfilers and there-fore code it as 0; given the strong incentives for individuals with children to filecreated by the EITC, most nonfilers likely have very low levels of self-employmentincome as well.

15. Importantly, these observations are true zeros rather than missing data.Because the database covers all tax records, we know that these individuals have 0taxable income.

16. Formally, we define mean family income as the mother’s family income plusthe father’s family income in each year from 1996 to 2000 divided by 10 (by 5 if weonly identify a single parent). For parents who do not change marital status, this issimply mean family income over the five-year period. For parents who are marriedinitially and then divorce, this measure tracks the mean family incomes of the twoparents over time. For parents who are single initially and then get married, thismeasure tracks individual income prior to marriage and total family income (in-cluding the new spouse’s income) after marriage. These household measures ofincome increase with marriage and naturally do not account for cohabitation; toensure that these features do not generate bias, we assess the robustness of ourresults to using individual measures of income.


at University of O



ownloaded from

http://qje.oxfordjournals.org/lookup/suppl/doi:10.1093/qje/qju022/-/DC1http://qje.oxfordjournals.org/

2. Child Income. We define child family income in the sameway as parent family income. In our baseline analysis, we aver-age child family income over the last two years in our data (2011and 2012), when children are in their early thirties. We reportresults using alternative years to assess the sensitivity of ourfindings. For children, we define household income based on cur-rent marital status rather than marital status at a fixed point intime. Because family income varies with marital status, we alsoreport results using individual income measures for children,constructed in the same way as for parents.

3. College Attendance. We define college attendance as an in-dicator for having one or more 1098-T forms filed on one’s behalfwhen the individual is aged 18–21. Title IV institutions—all col-leges and universities as well as vocational schools and otherpostsecondary institutions eligible for federal student aid—arerequired to file 1098-T forms that report tuition payments orscholarships received for every student. Because the forms arefiled directly by colleges, independent of whether an individualfiles a tax return, we have complete records on college attendancefor all children. The 1098-T data are available from 1999 to 2012.Comparisons to other data sources indicate that 1098-T formscapture college enrollment quite accurately overall (Chetty,Friedman, and Rockoff 2014, Appendix B).17

4. College Quality. Using data from 1098-T forms, Chetty,Friedman, and Rockoff (2014) construct an earnings-basedindex of ‘‘college quality’’ using the mean individual wage earn-ings at age 31 of children born in 1979–1980 based on the collegethey attended at age 20. Children who do not attend college areincluded in a separate ‘‘no college’’ category in this index. Weassign each child in our sample a value of this college qualityindex based on the college in which they were enrolled at age

17. Colleges are not required to file 1098-T forms for students whose qualifiedtuition and related expenses are waived or paid entirely with scholarships orgrants. However, the forms are frequently available even for such cases, presum-ably because of automated reporting to the IRS by universities. Approximately 6%of 1098-T forms are missing from 2000 to 2003 because the database contains no1098-T forms for some small colleges in these years. To verify that this does notaffect our results, we confirm that our estimates of college attendance by parentincome gradients are very similar for later birth cohorts (not reported).


at University of O



ownloaded from


20. We then convert this dollar index to percentile ranks withineach birth cohort. The children in the no-college group, who con-stitute roughly 54 percent of our core sample, all have the samevalue of the college quality index. Breaking ties at the mean, weassign all of these children a college quality rank of approxi-mately 542 ¼ 27.

18

5. Teenage Birth. We define a woman as having a teenagebirth if she ever claims a dependent who was born while shewas between the ages of 13 and 19. This measure is an imperfectproxy for having a teenage birth because it only covers childrenwho are claimed as dependents by their mothers. Nevertheless,the aggregate level and spatial pattern of teenage births in ourdata are closely aligned with estimates based on the AmericanCommunity Survey.19

6. Summary Statistics. Online Appendix Table III reportssummary statistics for the core sample. Median parent familyincome is $60,129 (in 2012 dollars). Among the 30.6% of childrenmatched to single parents, 72.0% are matched to a female parent.Children in our core sample have a median family income of$34,975 when they are approximately 30 years old; 6.1% of chil-dren have zero income in both 2011 and 2012; 58.9% are enrolledin a college at some point between the ages of 18 and 21; and15.8% of women have a teenage birth.

In Online Appendix B and Appendix Table IV, we showthat the total cohort size, labor force participation rate, distri-bution of child income, and other demographic characteristics ofour core sample line up closely with corresponding estimates inthe Current Population Survey and American CommunitySurvey. This confirms that our sample covers roughly the samenationally representative population as previous survey-basedresearch.

18. The exact value varies across cohorts. For example, in the 1980 birth cohort,55.1% of children do not attend college. We assign these children a rank of55:1

2 þ0:02 ¼ 27:7% because 0.2% of children in the 1980 birth cohort attend collegeswhose mean earnings are below the mean earnings of those not in college.

19. Of women in our core sample, 15.8% have teenage births; the correspondingnumber is 14.6% in the 2003 ACS. The unweighted correlation between state-levelteenage birth rates in the tax data and the ACS is 0.80.


at University of O



ownloaded from


IV. National Statistics

We begin our empirical analysis by characterizing the rela-tionship between parent and child income at the national level.We first present a set of baseline estimates of relative mobilityand then evaluate the robustness of our estimates to alternativesample and income definitions.20

IV.A. Baseline Estimates

In our baseline analysis, we use the core sample (1980–1982birth cohorts) and measure parent income as mean family incomefrom 1996 to 2000 and child income as mean family income in2011–2012, when children are approximately 30 years old.Figure I Panel A presents a binned scatter plot of the meanfamily income of children versus the mean family income oftheir parents. To construct this figure, we divide the horizontalaxis into 100 equal-sized (percentile) bins and plot mean childincome versus mean parent income in each bin.21 This binnedscatter plot provides a nonparametric representation of the con-ditional expectation of child income given parent income,E[YijXi¼ x]. The regression coefficients and standard errors re-ported in this and all subsequent binned scatter plots are esti-mated on the underlying microdata using OLS regressions.

The conditional expectation of children’s income given par-ents’ income is strongly concave. Below the 90th percentile ofparent income, a $1 increase in parent family income is associ-ated with a 33.5 cent increase in average child family income. Incontrast, between the 90th and 99th percentile, a $1 increase inparent income is associated with only a 7.6 cent increase in childincome.

20. We do not present estimates of absolute mobility at the national level be-cause absolute mobility in terms of percentile ranks is mechanically related to rel-ative mobility at the national level (see Section II). Although one can computemeasures of absolute mobility at the national level based on mean incomes (e.g.,the mean income of children whose parents are at the 25th percentile), there is nonatural benchmark for such a statistic as it has not been computed in other coun-tries or time periods.

21. For scaling purposes, we exclude the top bin (parents in the top 1%) in thisfigure only; mean parent income in this bin is $1,408,760 and mean child income is$113,846.


at University of O



ownloaded from


AL

evel

of

Ch

ild F

amily

Inco

me

vs. P

aren

t F

amily

Inco

me

Lo

g C

hild

Fam

ily In

com

e vs

. Lo

g P

aren

t F

amily

Inco

me

B

FIG

UR

EI

Ass

ocia

tion

bet

wee

nC

hil

dre

n’s

an

dP

are

nts

’In

com

es

Th

ese

figu

res

pre

sen

tn

onp

ara

met

ric

bin

ned

scatt

erp

lots

ofth

ere

lati

onsh

ipbet

wee

nch

ild

inco

me

an

dp

are

nt

inco

me.

Bot

hp

an

els

are

base

don

the

core

sam

ple

(1980–1982

bir

thco

hor

ts)

an

dbase

lin

efa

mil

yin

com

ed

efin

itio

ns

for

pare

nts

an

dch

ild

ren

.C

hil

din

com

eis

the

mea

nof

2011–2012

fam

ily

inco

me

(wh

enth

ech

ild

isap

pro

xim

ate

ly30

yea

rsol

d),

wh

erea

sp

are

nt

inco

me

ism

ean

fam

ily

inco

me

from

1996

to2000.

Inco

mes

are

in2012

dol

lars

.T

oco

nst

ruct

Pan

elA

,w

ebin

pare

nt

fam

ily

inco

me

into

100

equ

al-

size

d(c

enti

le)

bin

san

dp

lot

the

mea

nle

vel

ofch

ild

inco

me

ver

sus

mea

nle

vel

ofp

are

nt

inco

me

wit

hin

each

bin

.F

orsc

ali

ng

pu

rpos

es,

we

do

not

show

the

poi

nt

for

the

top

1%

inP

an

elA

.In

the

top

1%

bin

,m

ean

pare

nt

inco

me

is$1.4

mil

lion

an

dm

ean

chil

din

com

eis

$114,0

00.

InP

an

elB

,w

eagain

bin

pare

nt

fam

ily

inco

me

into

100

bin

san

dp

lot

mea

nlo

gin

com

efo

rch

ild

ren

(lef

ty-

axis

)an

dth

efr

act

ion

ofch

ild

ren

wit

hze

rofa

mil

yin

com

e(r

igh

ty-

axis

)ver

sus

mea

np

are

nts

’lo

gin

com

e.C

hil

dre

nw

ith

zero

fam

ily

inco

me

are

excl

ud

edfr

omth

elo

gin

com

ese

ries

.In

bot

hp

an

els,

the

10th

an

d90th

per

cen

tile

ofp

are

nts

’in

com

eare

dep

icte

din

dash

edver

tica

lli

nes

.T

he

coef

fici

ent

esti

mate

san

dst

an

dard

erro

rs(i

np

are

nth

eses

)re

por

ted

onth

efi

gu

res

are

obta

ined

from

OL

Sre

gre

ssio

ns

onth

em

icro

data

.In

Pan

elA

,w

ere

por

tse

para

tesl

opes

for

pare

nts

bel

owth

e90th

per

cen

tile

an

dp

are

nts

bet

wee

nth

e90th

an

d99th

per

cen

tile

.In

Pan

elB

,w

ere

por

tsl

opes

ofth

elo

g-l

ogre

gre

ssio

n(i

.e.,

the

inte

rgen

erati

onal

elast

icit

yof

inco

me

orIG

E)

inth

efu

llsa

mp

lean

dfo

rp

are

nts

bet

wee

nth

e10th

an

d90th

per

cen

tile

s.


at University of O



ownloaded from


TA

BL

EI

INT

ER

GE

NE

RA

TIO

NA

LM

OB

ILIT

YE

ST

IMA

TE

SA

TT

HE

NA

TIO

NA

LL

EV

EL

(1)

(2)

(3)

(4)

(5)

(6)

(7)

Sam

ple

Ch

ild

’sou

tcom

eP

are

nt’

sin

com

ed

ef.

Cor

esa

mp

leM

ale

chil

dre

nF

emale

chil

dre

nM

arr

ied

pare

nts

Sin

gle

pare

nts

1980–1985

coh

orts

Fix

edage

at

chil

dbir

th

1.

Log

fam

ily

inco

me

Log

fam

ily

inco

me

0.3

44

0.3

49

0.3

42

0.3

03

0.2

64

0.3

16

0.3

61

(excl

ud

ing

zero

s)(0

.0004)

(0.0

006)

(0.0

005)

(0.0

005)

(0.0

008)

(0.0

003)

(0.0

008)

2.

Log

fam

ily

inco

me

Log

fam

ily

inco

me

0.6

18

0.6

97

0.5

40

0.5

09

0.5

28

0.5

80

0.6

42

(rec

odin

gze

ros

to$1)

(0.0

009)

(0.0

013)

(0.0

011)

(0.0

011)

(0.0

020)

(0.0

006)

(0.0

018)

3.

Log

fam

ily

inco

me

Log

fam

ily

inco

me

0.4

13

0.4

35

0.3

92

0.3

58

0.3

22

0.3

80

0.4

34

(rec

odin

gze

ros

to$1,0

00)

(0.0

004)

(0.0

007)

(0.0

006)

(0.0

006)

(0.0

009)

(0.0

003)

(0.0

009)

4.

Fam

ily

inco

me

ran

kF

am

ily

inco

me

ran

k0.3

41

0.3

36

0.3

46

0.2

89

0.3

11

0.3

23

0.3

59

(0.0

003)

(0.0

004)

(0.0

004)

(0.0

004)

(0.0

007)

(0.0

002)

(0.0

006)

5.

Fam

ily

inco

me

ran

kF

am

ily

inco

me

ran

k0.3

39

0.3

33

0.3

44

0.2

87

0.2

94

0.3

23

0.3

57

(1999–2003)

(0.0

003)

(0.0

004)

(0.0

004)

(0.0

004)

(0.0

007)

(0.0

002)

(0.0

006)

6.

Fam

ily

inco

me

ran

kT

opp

ar.

inco

me

ran

k0.3

12

0.3

07

0.3

17

0.2

56

0.2

53

0.2

96

0.3

27

(0.0

003)

(0.0

004)

(0.0

004)

(0.0

004)

(0.0

006)

(0.0

002)

(0.0

006)

7.

Ind

ivid

ual

inco

me

ran

kF

am

ily

inco

me

ran

k0.2

87

0.3

17

0.2

57

0.2

65

0.2

79

0.2

86

0.2

92

(0.0

003)

(0.0

004)

(0.0

004)

(0.0

004)

(0.0

007)

(0.0

002)

(0.0

006)

8.

Ind

ivid

ual

earn

ings

ran

kF

am

ily

inco

me

ran

k0.2

82

0.3

13

0.2

49

0.2

59

0.2

72

0.2

83

0.2

87

(0.0

003)

(0.0

004)

(0.0

004)

(0.0

004)

(0.0

007)

(0.0

002)

(0.0

006)

9.

Col

lege

att

end

an

ceF

am

ily

inco

me

ran

k0.6

75

0.7

08

0.6

44

0.6

41

0.6

63

0.6

78

0.6

61

(0.0

005)

(0.0

007)

(0.0

007)

(0.0

006)

(0.0

013)

(0.0

003)

(0.0

010)


at University of O



ownloaded from


TA

BL

EI

(CO

NT

INU

ED)

(1)

(2)

(3)

(4)

(5)

(6)

(7)

Sam

ple

Ch

ild

’sou

tcom

eP

are

nt’

sin

com

ed

ef.

Cor

esa

mp

leM

ale

chil

dre

nF

emale

chil

dre

nM

arr

ied

pare

nts

Sin

gle

pare

nts

1980–1985

coh

orts

Fix

edage

at

chil

dbir

th

10.

Col

lege

qu

ali

tyra

nk

Fam

ily

inco

me

ran

k0.1

91

0.1

88

0.1

95

0.1

74

0.1

72

0.1

98

0.1

89

(P75–P

25

gra

die

nt)

(0.0

010)

(0.0

014)

(0.0

015)

(0.0

014)

(0.0

020)

(0.0

007)

(0.0

022)

11.

Tee

nage

bir

thF

am

ily

inco

me

ran

k�

0.2

98

�0.2

31

�0.3

22

�0.2

85

�0.2

90

(fem

ale

son

ly)

(0.0

006)

(0.0

007)

(0.0

016)

(0.0

004)

(0.0

011)

Nu

mber

ofob

serv

ati

ons

9,8

67,7

36

4,9

35,8

04

4,9

31,0

66

6,8

54,5

88

3,0

13,1

48

20,5

20,5

88

2,2

50,3

80

Not

es.

Each

cell

inth

ista

ble

rep

orts

the

coef

fici

ent

from

au

niv

ari

ate

OL

Sre

gre

ssio

nof

an

outc

ome

for

chil

dre

non

am

easu

reof

thei

rp

are

nts

’in

com

esw

ith

stan

dard

erro

rsin

pare

nth

eses

.A

llro

ws

rep

ort

esti

mate

sof

slop

eco

effi

cien

tsfr

omli

nea

rre

gre

ssio

ns

ofth

ech

ild

outc

ome

onth

ep

are

nt

inco

me

mea

sure

exce

pt

row

10,

inw

hic

hw

ere

gre

ssco

lleg

equ

ali

tyra

nk

ona

qu

ad

rati

cin

pare

nt

inco

me

ran

k(a

sin

Fig

ure

IVP

an

elA

).In

this

row

,w

ere

por

tth

ed

iffe

ren

cebet

wee

nth

efi

tted

valu

esfo

rch

ild

ren

wit

hp

are

nts

at

the

75th

per

cen

tile

an

dp

are

nts

at

the

25th

per

cen

tile

usi

ng

the

qu

ad

rati

csp

ecifi

cati

on.

Col

um

n(1

)u

ses

the

core

sam

ple

ofch

ild

ren

,w

hic

hin

clu

des

all

curr

ent

U.S

.ci

tize

ns

wit

ha

vali

dS

SN

orIT

INw

ho

are

(i)

bor

nin

bir

thco

hor

ts1980–1982,

(ii)

for

wh

omw

eare

able

toid

enti

fyp

are

nts

base

don

dep

end

ent

claim

ing,

an

d(i

ii)

wh

ose

mea

np

are

nt

inco

me

over

the

yea

rs1996–2000

isst

rict

lyp

osit

ive.

Col

um

ns

(2)

an

d(3

)li

mit

the

sam

ple

use

din

colu

mn

(1)

tom

ale

sor

fem

ale

s.C

olu

mn

s(4

)an

d(5

)li

mit

the

sam

ple

toch

ild

ren

wh

ose

pare

nts

wer

em

arr

ied

oru

nm

arr

ied

inth

eyea

rth

ech

ild

was

lin

ked

toth

ep

are

nt.

Col

um

n(6

)u

ses

all

chil

dre

nin

the

1980–1985

bir

thco

hor

ts.

Col

um

n(7

)re

stri

cts

the

core

sam

ple

toch

ild

ren

wh

ose

pare

nts

bot

hfa

llw

ith

ina

5-y

ear

win

dow

ofm

edia

np

are

nt

age

at

tim

eof

chil

dbir

th(a

ge

26–30

for

fath

ers;

24–28

for

mot

her

s);

we

imp

ose

only

one

ofth

ese

rest

rict

ion

sfo

rsi

ngle

pare

nts

.C

hil

dfa

mil

yin

com

eis

the

mea

nof

2011–2012

fam

ily

inco

me,

wh

ile

pare

nt

fam

ily

inco

me

isth

em

ean

from

1996

to2000.

Pare

nt

top

earn

erin

com

eis

the

mea

nin

com

eof

the

hig

her

-earn

ing

spou

sebet

wee

n1999–2003

(wh

enW

-2d

ata

are

avail

able

).C

hil

d’s

ind

ivid

ual

inco

me

isth

esu

mof

W-2

wage

earn

ings,

UI

ben

efits

,an

dS

SD

Iben

efits

,an

dh

alf

ofan

yre

main

ing

inco

me

rep

orte

don

the

1040

form

.In

div

idu

al

earn

ings

incl

ud

esW

-2w

age

earn

ings,

UI

ben

efits

,S

SD

Iin

com

e,an

dse

lf-e

mp

loym

ent

inco

me.

Col

lege

att

end

an

ceis

defi

ned

as

ever

att

end

ing

coll

ege

from

age

18

to21,

wh

ere

att

end

ing

coll

ege

isd

efin

edas

pre

sen

ceof

a1098-T

form

.C

olle

ge

qu

ali

tyra

nk

isd

efin

edas

the

per

cen

tile

ran

kof

the

coll

ege

that

the

chil

datt

end

sat

age

20

base

don

the

mea

nea

rnin

gs

at

age

31

ofch

ild

ren

wh

oatt

end

edth

esa

me

coll

ege

(ch

ild

ren

wh

od

on

otatt

end

coll

ege

are

incl

ud

edin

ase

para

te‘‘n

oco

lleg

e’’gro

up

);se

eS

ecti

onII

I.B

for

furt

her

det

ail

s.T

een

age

bir

this

defi

ned

as

havin

ga

chil

dw

hil

ebet

wee

nage

13

an

d19.

Inco

lum

ns

(1)–

(5)

an

d(7

),in

com

ep

erce

nti

lera

nk

sare

con

stru

cted

by

ran

kin

gall

chil

dre

nre

lati

ve

toot

her

sin

thei

rbir

thco

hor

tbase

don

the

rele

van

tin

com

ed

efin

itio

nan

dra

nk

ing

all

pare

nts

rela

tive

toot

her

pare

nts

inth

eco

resa

mp

le.

Ran

ks

are

alw

ays

defi

ned

onth

efu

llsa

mp

leof

all

chil

dre

n;

that

is,

they

are

not

red

efin

edw

ith

inth

esu

bsa

mp

les

inco

lum

ns

(2)–

(5)

or(7

).In

colu

mn

(6),

pare

nts

are

ran

ked

rela

tive

toot

her

pare

nts

wit

hch

ild

ren

inth

e1980–1985

bir

thco

hor

ts.

Th

en

um

ber

ofob

serv

ati

ons

corr

esp

ond

sto

the

spec

ifica

tion

inro

w4.

Th

en

um

ber

ofob

serv

ati

ons

isap

pro

xim

ate

ly7%

low

erin

row

1bec

au

sew

eex

clu

de

chil

dre

nw

ith

zero

inco

me.

Th

en

um

ber

ofob

serv

ati

ons

isap

pro

xim

ate

ly50%

low

erin

row

11

bec

au

sew

ere

stri

ctto

the

sam

ple

offe

male

chil

dre

n.

Th

ere

are

866

chil

dre

nin

the

core

sam

ple

wit

hu

nk

now

nse

x,

wh

ich

isw

hy

the

nu

mber

ofob

serv

ati

ons

inth

eco

resa

mp

leis

not

equ

al

toth

esu

mof

the

obse

rvati

ons

inth

em

ale

an

dfe

male

sam

ple

s.


at University of O



ownloaded from


1. Log-Log Intergenerational Elasticity Estimates. Partly mo-tivated by the nonlinearity of the relationship in Figure I Panel A,the canonical approach to characterizing the joint distribution ofchild and parent income is to regress the log of child income on thelog of parent income (as discussed in Section II), excluding chil-dren with zero income. This regression yields an estimated IGE of0.344, as shown in the first column of row 1 of Table I.

Unfortunately, this estimate turns out to be quite sensitive tochanges in the regression specifications for two reasons, illus-trated in Figure I Panel B. First, the relationship between logchild income and log parent income is highly nonlinear, consis-tent with the findings of Corak and Heisz (1999) in Canadian taxdata. This is illustrated in the series in circles in Figure I Panel B,which plots mean log child income versus mean log family incomeby percentile bin, constructed using the same method as Figure IPanel A. Because of this nonlinearity, the IGE is sensitive to thepoint of measurement in the income distribution. For example,restricting the sample to observations between the 10th and90th percentile of parent income (denoted by the verticaldashed lines in the graph) yields a considerably higher IGEestimate of 0.452.

Second, the log-log specification discards observations withzero income. The series in triangles in Figure I Panel B plots thefraction of children with zero income by parental income bin. Thisfraction varies from 17% among the poorest families to 3% amongthe richest families. Dropping children with zero income there-fore overstates the degree of intergenerational mobility. The waythese zeros are treated can change the IGE dramatically. Forinstance, including the zeros by assigning those with zeroincome an income of $1 (so that the log of their income is zero)raises the estimated IGE to 0.618, as shown in row 2 of Table I. Ifinstead we treat those with 0 income as having an income of$1,000, the estimated IGE becomes 0.413. These exercises showthat small differences in the way children’s income is measured atthe bottom of the distribution can produce substantial variationin IGE estimates.

Columns (2)–(7) in Table I replicate the baseline specificationin column (1) for alternative subsamples analyzed in the priorliterature. Columns (2)–(5) split the sample by the child’sgender and the parents’ marital status in the year they firstclaim the child. Column (6) replicates column (1) for the extendedsample of 1980–1985 birth cohorts. Column (7) restricts the


at University of O



ownloaded from


sample to children whose mothers are between the ages of 24 and28 and fathers are between 26 and 30 (a 5-year window aroundthe median age of birth). This column eliminates variation inparent income correlated with differences in parent age at childbirth and restricts the sample to parents who are younger than 50years when we measure their incomes (for children born in 1980).Across these subsamples, the IGE estimates range from 0.264 (forchildren of single parents, excluding children with zero income) to0.697 (for male children, recoding zeroes to $1).

The IGE is unstable because the income distribution isnot well approximated by a bivariate log-normal distribution, aresult that was not apparent in smaller samples used in priorwork. This makes it difficult to obtain reliable comparisonsof mobility across samples or geographical areas using theIGE. For example, income measures in survey data aretypically top-coded and sometimes include transfers and othersources of income that increase incomes at the bottom of thedistribution, which may lead to larger IGE estimates than those ob-tained in administrative data sets such as the one used here.

In a recent paper, Mitnik et al. (2014) propose a new measureof the IGE, the elasticity of expected child income with respect to

parent income dlog E½YijXi¼x�dlog x

� �, which they show is more robust to

the treatment of small incomes. In large samples, one can estimatethis parameter by regressing the log of mean child income in eachpercentile bin (plotted in Figure I Panel A) on the log of meanparent income in each bin. In Online Appendix C, we show thatMitnik et al.’s statistic can be interpreted as a dollar-weightedaverage of elasticities (placing greater weight on high-income chil-dren), whereas the traditional IGE weights all individuals withpositive income equally. These two parameters need not coincidein general and the ‘‘correct’’ parameter depends on the policy ques-tion one seeks to answer. However, it turns out that in our data,the Mitnik et al. dollar-weighted IGE estimate is 0.335, very sim-ilar to our baseline IGE estimate of 0.344 when excluding childrenwith zero income (Online Appendix Figure I Panel A).22

22. Mitnik et al. (2014) find larger estimates of the dollar-weighted IGE in theirsample of tax returns. A useful direction for further work would be to understandwhy the two samples yield different IGE estimates.


at University of O



ownloaded from


In another recent study, Clark (2014) argues that traditionalestimates of the IGE understate the persistence of status acrossgenerations because they are attenuated by fluctuations in real-ized individual incomes across generations. To resolve this prob-lem, Clark estimates the IGE based on surname-level means ofincome in each generation and obtains a central IGE estimate of0.8, much larger than that in prior studies. In our data, estimatesof mobility based on surname means are similar to our baselineestimates based on individual income data (Online AppendixTable V). One reason that Clark (2014) may obtain larger esti-mates of intergenerational persistence is that his focus on distinc-tive surnames partly identifies the degree of convergence inincome between racial or ethnic groups (Borjas 1992) ratherthan across individuals (see Online Appendix D for furtherdetails).23

2. Rank-Rank Estimates. Next we present estimates of therank-rank slope, the second measure of relative mobility dis-cussed in Section II. We measure the percentile rank of parentsPi based on their positions in the distribution of parent incomes inthe core sample. Similarly, we define children’s percentile ranksRi based on their positions in the distribution of child incomeswithin their birth cohorts. Importantly, this definition allows usto include zeros in child income.24 Unless otherwise noted, wehold the definition of these ranks fixed based on positions in theaggregate distribution, even when analyzing subgroups.

Figure II Panel A presents a binned scatter plot of the meanpercentile rank of children E[RijPi¼p] versus their parents’ per-centile rank p. The conditional expectation of a child’s rank givenhis parents’ rank is almost perfectly linear. Using an OLS regres-sion, we estimate that a 1 percentage point (pp) increase in parentrank is associated with a 0.341 pp increase in the child’s mean

23. For example, Clark (2014, p. 60, Figure 3.10) compares the outcomes ofindividuals with the surname Katz (a predominantly Jewish name) versusWashington (a predominantly black name). This comparison generates an impliedIGE close to 1, which partly reflects the fact that the black-white income gap haschanged very little over the past few decades. Estimates of the IGE based on indi-vidual-level data (or pooling all surnames) are much lower because there is muchmore social mobility within racial groups.

24. In the case of ties, we define the rank as the mean rank for the individuals inthat group. For example, if 10% of a birth cohort has zero income, all children withzero income would receive a percentile rank of 5.


at University of O



ownloaded from


203040506070

Mean Child Income Rank

010

2030

4050

6070

8090

100

Ran

k-R

ank

Slo

pe =

0.3

41(0

.000

3)

Par

ent I

ncom

e R

ank

203040506070

010

2030

4050

6070

8090

100

Mean Child Income Rank

Can

ada

Den

mar

k

Ran

k-R

ank

Slo

pe (

Can

ada)

= 0

.174

(0.0

05)

Ran

k-R

ank

Slo

pe (

Den

mar

k) =

0.1

80(0

.006

)

Uni

ted

Sta

tes

Par

ent I

ncom

e R

ank

AM

ean

Ch

ild In

com

e R

ank

vs. P

aren

t In

com

e R

ank

in t

he

U.S

.C

ross

-Co

un

try

Co

mp

aris

on

sB

FIG

UR

EII

Ass

ocia

tion

bet

wee

nC

hil

dre

n’s

an

dP

are

nts

’P

erce

nti

leR

an

ks

Th

ese

figu

res

pre

sen

tn

onp

ara

met

ric

bin

ned

scatt

erp

lots

ofth

ere

lati

onsh

ipbet

wee

nch

ild

ren

’san

dp

are

nt’

sp

erce

nti

lein

com

era

nk

s.B

oth

figu

res

are

base

don

the

core

sam

ple

(1980–1982

bir

thco

hor

ts)

an

dbase

lin

efa

mil

yin

com

ed

efin

itio

ns

for

pare

nts

an

dch

ild

ren

.C

hil

din

com

eis

the

mea

nof

2011–2012

fam

ily

inco

me

(wh

enth

ech

ild

isap

pro

xim

ate

ly30

yea

rsol

d),

an

dp

are

nt

inco

me

ism

ean

fam

ily

inco

me

from

1996

to2000.

Ch

ild

ren

are

ran

ked

rela

tive

toot

her

chil

dre

nin

thei

rbir

thco

hor

t,an

dp

are

nts

are

ran

ked

rela

tive

toall

oth

erp

are

nts

inth

eco

resa

mp

le.

Pan

elA

plo

tsth

em

ean

chil

dp

erce

nti

lera

nk

wit

hin

each

pare

nt

per

cen

tile

ran

kbin

.T

he

seri

esin

tria

ngle

sin

Pan

elB

plo

tsth

ean

alo

gou

sse

ries

for

Den

mark

,co

mp

ute

dby

Bos

eru

p,

Kop

czu

k,

an

dK

rein

er(2

013)

usi

ng

asi

mil

ar

sam

ple

an

din

com

ed

efin

itio

ns.

Th

ese

ries

insq

uare

sp

lots

esti

mate

sof

the

ran

k-r

an

kse

ries

usi

ng

the

dec

ile-

dec

ile

tran

siti

onm

atr

ixfr

omC

orak

an

dH

eisz

(1999).

Th

ese

ries

inci

rcle

sin

Pan

elB

rep

rod

uce

sth

era

nk

-ran

kre

lati

onsh

ipin

the

Un

ited

Sta

tes

from

Pan

elA

as

are

fere

nce

.T

he

slop

esan

dbes

t-fi

tli

nes

are

esti

mate

du

sin

gan

OL

Sre

gre

ssio

non

the

mic

rod

ata

for

the

Un

ited

Sta

tes

an

don

the

bin

ned

seri

es(a

sw

ed

on

oth

ave

acc

ess

toth

em

icro

data

)fo

rD

enm

ark

an

dC

an

ad

a.

Sta

nd

ard

erro

rsare

rep

orte

din

pare

nth

eses

.


at University of O



ownloaded from


rank, as reported in row 4 of Table I. The rank-rank slope esti-mates are generally quite similar across subsamples, as shown incolumns (2)–(7) of Table I.

Figure II Panel B compares the rank-rank relationship in theUnited States with analogous estimates for Denmark constructedusing data from Boserup, Kopczuk, and Kreiner (2013) and esti-mates for Canada constructed from the decile transition matrixreported by Corak and Heisz (1999).25 The relationship betweenchild and parent ranks is nearly linear in Denmark and Canadaas well, suggesting that the rank-rank specification provides agood summary of mobility across diverse environments. Therank-rank slope is 0.180 in Denmark and 0.174 in Canada,nearly half that in the United States.

Importantly, the smaller rank-rank slopes in Denmark andCanada do not necessarily mean that children from low-incomefamilies in these countries do better than those in the UnitedStates in absolute terms. It could be that children of high-income parents in Denmark and Canada have worse outcomesthan children of high-income parents in the United States. One

TABLE II

NATIONAL QUINTILE TRANSITION MATRIX

Parent quintileChild quintile 1 2 3 4 5

1 33.7% 24.2% 17.8% 13.4% 10.9%2 28.0% 24.2% 19.8% 16.0% 11.9%3 18.4% 21.7% 22.1% 20.9% 17.0%4 12.3% 17.6% 22.0% 24.4% 23.6%5 7.5% 12.3% 18.3% 25.4% 36.5%

Notes. Each cell reports the percentage of children with family income in the quintile given by the rowconditional on having parents with family income in the quintile given by the column for the 9,867,736children in the core sample (1980–1982 birth cohorts). See notes to Table I for income and sample definitions.See Online Appendix Table VI for an analogous transition matrix constructed using the 1980–1985 cohorts.

25. Both the Danish and Canadian studies use administrative earnings infor-mation for large samples as we do here. The Danish sample, which was constructedto match the analysis sample in this article as closely as possible, consists of chil-dren in the 1980–1981 birth cohorts and measures child income based on meanincome between 2009 and 2011. Child income in the Danish sample is measured atthe individual level, and parents’ income is the mean of the two biological parents’income from 1997 to 1999, irrespective of their marital status. The Canadiansample is less comparable to our sample, as it consists of male children in the1963–1966 birth cohorts and studies the link between their mean earnings from1993 to 1995 and their fathers’ mean earnings from 1978 to 1982.


at University of O



ownloaded from


cannot distinguish between these possibilities because the ranksare defined within each country. One advantage of the within–United States CZ-level analysis implemented below is that it nat-urally allows us to study both relative and absolute outcomes byanalyzing children’s performance on a fixed national scale.

3. Transition Matrixes. Table II presents a quintile transitionmatrix: the probability that a child is in quintile m of the childincome distribution conditional on his parent being in quintile nof the parent income distribution. One statistic of particular in-terest in this matrix is the probability of moving from the bottomquintile to the top quintile, a simple measure of success that wereturn to later. This probability is 7.5% in the United States,compared with 11.7% in Denmark (Boserup, Kopczuk, andKreiner 2013) and 13.4% in Canada (Corak and Heisz 1999). Inthis sense, the chances of achieving the American dream are con-siderably higher for children in Denmark and Canada than thosein the United States.

In Online Data Table I, we report a 100�100 percentile-leveltransition matrix for the United States. Using this matrix and themarginal distributions for child and parent income in OnlineData Table II, one can construct any mobility statistic of interestfor the U.S. population.26

IV.B. Robustness of Baseline Estimates

We now evaluate the robustness of our estimates of interge-nerational mobility to alternative specifications. We begin byevaluating two potential sources of bias emphasized in priorwork: life cycle bias and attenuation bias.

1. Life Cycle Bias. Prior research has shown that measuringchildren’s income at early ages can understate intergenerationalpersistence in lifetime income because children with high lifetimeincomes have steeper earnings profiles when they are young(Solon 1999; Grawe 2006; Haider and Solon 2006). To evaluatewhether our baseline estimates suffer from such life cycle bias,Figure III Panel A plots estimates of the rank-rank slope by theage at which the child’s income is measured. We construct the

26. All of the online data tables are available at http://www.equality-of-opportunity.org/index.php/data.


at University of O



ownloaded from

http://qje.oxfordjournals.org/lookup/suppl/doi:10.1093/qje/qju022/-/DC1http://qje.oxfordjournals.org/lookup/suppl/doi:10.1093/qje/qju022/-/DC1http://qje.oxfordjournals.org/lookup/suppl/doi:10.1093/qje/qju022/-/DC1http://www.equality-of-opportunity.org/index.php/datahttp://www.equality-of-opportunity.org/index.php/datahttp://qje.oxfordjournals.org/

AL

ifec

ycle

Bia

s: R

ank-

Ran

k S

lop

es b

y A

ge

of

Ch

ildA

tten

uat

ion

Bia

s: R

ank-

Ran

k S

lop

es b

y N

um

ber

of

Yea

rs U

sed

to

Mea

sure

Par

ent

Inco

me

B

FIG

UR

EII

I

Rob

ust

nes

sof

Inte

rgen

erati

onal

Mob

ilit

yE

stim

ate

s

Th

isfi

gu

reev

alu

ate

sth

ero

bu

stn

ess

ofth

era

nk

-ran

ksl

ope

esti

mate

din

Fig

ure

IIP

an

elA

toch

an

ges

inth

eage

at

wh

ich

chil

din

com

eis

mea

sure

d(P

an

elA

)an

dth

en

um

ber

ofyea

rsu

sed

tom

easu

rep

are

nts

’in

com

e(P

an

elB

).In

bot

hp

an

els,

chil

din

com

eis

defi

ned

as

mea

nfa

mil

yin

com

ein

2011

to2012.

InP

an

elA

,p

are

nt

inco

me

isd

efin

edas

mea

nfa

mil

yin

com

efr

om1996

to2000.

Each

poi

nt

inP

an

elA

show

sth

esl

ope

coef

fici

ent

from

ase

para

teO

LS

regre

ssio

nof

chil

din

com

era

nk

onp

are

nt

inco

me

ran

k,

vary

ing

the

chil

d’s

bir

thco

hor

tan

dh

ence

the

chil

d’s

age

in2011–2012

wh

enth

ech

ild

’sin

com

eis

mea

sure

d.

Th

eci

rcle

su

seth

eex

ten

ded

sam

ple

inth

ep

opu

lati

ond

ata

,an

dth

etr

Documents

THE QUARTERLY JOURNAL OF ECONOMICS...Laszlo Sandor, Gary Solon, Danny Yagan, numerous seminar participants, and four anonymous referees for helpful comments. Sarah Abraham, Alex Bell,