ESTIMATING TRENDS IN U.S. INCOME INEQUALITY ...We analyze trends in the inequality of size-adjusted pre-tax post-transfer household income in the USA between 1975 and 2004. Estimates

The research program of the Center for Economic Studies (CES)produces a wide range of theoretical and empirical economicanalyses that serve to improve the statistical programs of the U.S.Bureau of the Census. Many of these analyses take the form of CESresearch papers. The papers are intended to make the results ofCES research available to economists and other interested partiesin order to encourage discussion and obtain suggestions forrevision before publication. The papers are unofficial and havenot undergone the review accorded official Census Bureaupublications. The opinions and conclusions expressed in the papersare those of the authors and do not necessarily represent those ofthe U.S. Bureau of the Census. Republication in whole or part mustbe cleared with the authors.

ESTIMATING TRENDS IN U.S. INCOME INEQUALITY USING THECURRENT POPULATION SURVEY:

THE IMPORTANCE OF CONTROLLING FOR CENSORING

by

Richard V. Burkhauser *Cornell University

Shuaizhang Feng *Shanghai University of Finance and Economics

Stephen P. Jenkins *University of Essex

and

Jeff Larrimore *Cornell University

CES 08-25 August, 2008

All papers are screened to ensure that they do not discloseconfidential information. Persons who wish to obtain a copy of thepaper, submit comments about the paper, or obtain generalinformation about the series should contact Sang V. Nguyen, Editor,Discussion Papers, Center for Economic Studies, Bureau of theCensus, 4600 Silver Hill Road, 2K132F, Washington, DC 20233, (301-763-1882) or INTERNET address [email protected].

Abstract

Using internal and public use March Current Population Survey (CPS) data, we analyzetrends in US income inequality (1975–2004). We find that the upward trend in income inequalityprior to 1993 significantly slowed thereafter once we control for top coding in the public use dataand censoring in the internal data. Because both series do not capture trends at the very top of theincome distribution, we use a multiple imputation approach in which values for censoredobservations are imputed using draws from a Generalized Beta distribution of the Second Kind(GB2) fitted to internal data. Doing so, we find income inequality trends similar to those derivedfrom unadjusted internal data. Our trend results are generally robust to the choice of inequalityindex, whether Gini coefficient or other commonly-used indices. When we compare our bestestimates of the income shares held by the richest tenth with those reported by Piketty and Saez(2003), our trends fairly closely match their trends, except for the top 1 percent of thedistribution. Thus, we argue that if United States income inequality has been substantiallyincreasing since 1993, such increases are confined to this very high income group.

Key Words: U.S. Income Inequality, Time Trend, CPS, Censoring, Top Coding JEL Classifications: D31, C81

* The research in this paper was conducted while Burkhauser, Feng and Larrimore wereSpecial Sworn Status researchers of the U.S. Census Bureau at the New York Census ResearchData Center at Cornell University. Conclusions expressed are those of the authors and do notnecessarily reflect the views of the U.S. Census Bureau. This paper has been screened to ensurethat no confidential data are disclosed. Supports for this research from the National ScienceFoundation (award nos. SES-0427889 SES-0322902, and SES-0339191) and the NationalInstitute for Disability and Rehabilitation Research (H133B040013 and H133B031111) arecordially acknowledged. Jenkins’s research was supported by core funding from the Universityof Essex and the UK Economic and Social Research Council for the Research Centre on Micro-Social Change and the United Kingdom Longitudinal Studies Centre. We thank Lisa MarieDragoset, Arnie Reznek, Laura Zayatz, the Cornell Census RDC Administrators, and all theirU.S. Census Bureau colleagues who have helped with this project. This paper has also benefitedfrom comments made by Peter Gottschalk, Bruce Meyer, and other participants at the EmpiricalEconomics Workshop of Shanghai University of Finance and Economics, Shanghai, May 2008.

Introduction

The public use version of the March Current Population Survey (CPS) is the primary data

source used by public policy researchers and administrators to investigate trends in average

income and its distribution in the United States as well as in cross-national comparisons of

income inequality with other countries. (For systematic reviews of the cross-national income

inequality literature, see Atkinson, Rainwater and Smeeding, 1995, Gottschalk and Smeeding,

1997, Atkinson and Brandolini, 2001. For more recent examples of the use of the public use CPS

in measuring income inequality trends in the United States see: Burkhauser, Couch, Houtenville

and Rovba 2003-2004, Gottschalk and Danziger, 2005 and Kruger and Perri, 2006.) The

consensus of this research, based on public use CPS data is that income inequality in the United

States increased substantially in the 1970s and 1980s and also increased relative to other OECD

countries.

Despite the widely held view that income inequality has increased substantially since the

1980s, most of the evidence of a large increase in income inequality since 1993 has come from

Internal Revenue Service (IRS) administrative record data based on personal adjusted gross

income that the IRS has made available to the research community in tabular form (Piketty and

Saez, 2003). In contrast, we argue that yearly income inequality increases are small, and

significantly smaller than the increases over the nearly two decades of CPS data available to us

prior to 1993.

We compare our CPS-based estimates of inequality trends with those of Piketty and Saez

(2003) based on IRS data. When, like Piketty and Saez, we focus on the share of reported income

held by subgroups within the richest tenth of the distribution, our trends fairly closely match their

trends, except for the top 1 percent of the distribution over the period 1975–2004. On the basis of

1

these findings, we argue that if income inequality has been substantially increasing since 1993 in

the USA, such increases have been confined to the very upper tail of the income distribution.

Our analysis derives from unprecedented access to internal CPS data. To protect the

confidentiality of its respondents, the Census Bureau censors each source of income of

individuals above specified topcoded levels in the public use data made available to the research

community. However, for its official work, researchers at the Census Bureau have access to the

internal March CPS that is less severely censored. For instance, the yearly income and income

inequality values reported in U.S. Census Bureau (various years) are based on the internal March

CPS data. We use these data too.

We analyze trends in the inequality of size-adjusted pre-tax post-transfer household

income in the USA between 1975 and 2004. Estimates of inequality levels and trends derived

from CPS internal data are compared with estimates from several series derived from public use

data. In particular, using the internal data, we derive a cell mean series in which topcoded

incomes in the public use data, for each year back to 1975, are replaced by the mean of all

income values above the topcode for any income source of any individual in the public use that

has been topcoded.1 This cell mean series can be used in conjunction with cell means provided

by the Census Bureau for later years to create a complete set of cell means for topcoded

observations. See Larrimore, Burkhauser, Feng, and Zayatz (forthcoming) for a detailed

discussion of the creation of this cell mean series and all cell mean values.

When we use the public use data augmented with imputations for topcoded observations

from the extended cell mean series, we closely match yearly mean income and income inequality

levels and trends in the United States population derived from internal data.2 Internal data are

themselves censored albeit to a substantially smaller extent than the public use data, and there are

2

time-inconsistencies in censoring practices in the internal data. However, we show using

consistent censoring methods that the trends in inequality derived from our public use extended

cell mean series are not significantly affected by time-inconsistent censoring levels. The

exception is between 1992 and 1993, when the public use extended cell mean series overstates

the rise in inequality because it does not take full account of the significant increases in internal

censoring levels that took effect in 1993, and other CPS redesign aspects.

Because consistently topcoded public use and internal data do not contain information

about the very highest incomes, they understate the level of income inequality. However, once

the break between 1992 and 1993 is controlled for, income inequality based on all these data

series yield the same trends, namely a rise in inequality between 1975 and 1992 followed by a

significantly smaller increase in inequality after 1993. Our key findings are generally robust to

whether income inequality is summarized using the Gini coefficient or three other widely-used

indices.

Because of the censoring in the internal CPS data, even our extended cell mean series

does not incorporate information about the very highest incomes. To address this issue, we use a

multiple imputation approach in which, for each year, values for top-coded observations in the

internal data are imputed using draws from a Generalized Beta distribution of the Second Kind

(GB2) fitted to internal data. Using the augmented internal data, we investigate the extent to

which the systematic exclusion of the top part of the distribution affects estimates of the level

and trends in inequality. Unsurprisingly, we find that compared to estimates derived from the

multiple imputation approach, the unadjusted internal data as well as the consistently censored

public use and internal data all understate the level of inequality in all years. However, all the

3

series reveal the same trends: an increase in inequality over the entire period 1975–2004, but

with a rate of increase noticeably lower after 1993 compared to before 1993.

Finally, we compare our CPS-based estimates of trends in top income shares based on

distributions of size-adjusted pre-tax post-transfer household income distributions to the

estimates of top income shares derived by Piketty and Saez (2003) from IRS administrative data

on adjusted personal gross tax unit income. Both our and their estimates of the income share of

the 90th–95th percentile group have a relatively flat trend during the period 1975–2004, although

the Piketty and Saez (2003) values are slightly higher in level. Similarly, for the shares of the

95th–99th percentile group, despite slight differences in levels, the two series exhibit remarkably

similar trends over the 30 year period. In contrast, while the share of income held by the richest 1

percent increased substantially over this period according to both our CPS-based and the IRS-

based Piketty and Saez (2003) estimates, their estimates of the size of the income shares and of

the magnitude of the increase over time are much larger than ours.

In the next section, we explain the nature of censoring in the CPS public use and internal

data, and explain the several methods by which we address this issue. In particular, we explain

how we derive a public use data series augmented by cell mean imputations for topcoded

observations based on the censored internal data. We also explain how we derive other data

series in which public use and internal data are augmented by imputations for censored

observations using draws from a parametric income distribution model fitted to the data. And we

explain our method of ‘consistent’ top coding. At the heart of the paper are comparisons of

estimates derived from the various CPS data series of income inequality levels and trends for the

period 1975–2004. The fourth section provides a comparison of CPS-based estimate of trends in

4

top income shares with the trends reported by Piketty and Saez (2003) using IRS administrative

record data. The final section of the paper presents a summary and conclusions.

Throughout the paper, income is defined as pre-tax post-transfer household income

excluding capital gains, adjusted for differences in household size using the square root of

household size.3 Each individual is attributed with the size-adjusted income of the household to

which he or she belongs. Income refers to income for the calendar year preceding the March

interview. We convert the small number of negative and zero household income values each year

to one dollar prior to our calculations because a number of inequality indices are defined only for

positive income values. Our samples for each year are all individuals in CPS respondent

households, excluding individuals in group quarters or in households containing a member of the

military. All statistics are calculated using the relevant CPS sampling weights.

Topcoding in the March CPS

Topcoding in CPS public use data

In the March CPS, a respondent in each household is asked a series of questions on the

sources of income for the household. Starting in 1975, respondents reported income from 11

sources and since 1987 they have done so for income from 24 sources. See Appendix Table 1 for

a list of these income sources. Rather than simply topcoding high total household income values

in the public use data, the US Census Bureau topcodes high values for each source of household

income. See Appendix Tables 2 and 3 for a full list of topcode values over time in the public use

data. Prior to 1995, the Census Bureau assigned the topcode value from that source of income to

all topcoded income values in the public use data. Since then, they have substituted a cell mean

value derived from the internal data to each topcoded value in the public use data.

5

Because the Census Bureau cell means series starts in 1995, using the public use data

without correcting for this major change in the values included for the highest incomes, source

by source, results in a significant increase in measured income and income inequality in 1995

and subsequently simply because of the closer correspondence between the true reported income

and the value included in the public use file. Hence, whereas the use of cell mean imputations

after 1995 makes the public use data better conform to the internal data, not taking this

improvement in measurement into account overestimates how much actual income increased in

1995 among those at the highest income levels and the overall levels of income inequality. An

additional complication is that household income is the aggregation of multiple income sources

(income types and household members), each of which may be topcoded. As a result, the

prevalence of topcoding in household income is significantly greater than for any particular

income source, and topcoded household income values are not necessarily the highest incomes –

they may occur throughout the income distribution. Thus using the ratio of the 90th percentile to

the 10th percentile (the ‘P90/P10 ratio’) to minimize the impact of topcoding on inequality

estimates will not be entirely successful: see Burkhauser, Feng, and Jenkins (2007).

The major change in the public use data in 1995 is a specific example of the more general

problem that income topcoding presents researchers interested in measuring levels and trends in

the income and income distribution of the US population. Inconsistently-defined topcode values

lead to artificial increases or decreases in mean income and income inequality, since different

fractions of the population are subject to topcoding each year.4 This is a legitimate concern since

topcodes in the public use data have, in general, increased in a non-systematic manner over its

history, and part of the apparent trend in average income and income dispersion over time in the

uncorrected data is caused by topcoding capturing a larger portion of the income distribution.

6

Using cell mean imputations for topcoded observations substantially alleviates this problem after

1995, since cell means provide the information necessary to capture the income distribution

represented by the internal data. As a result, even if public topcodes are inconsistently adjusted

in the public use data, mean income for the entire population in the internal data can be matched

with the public use data using cell mean imputations.

Despite the Census Bureau’s attempt to alleviate the problem of topcoding, their cell

means have generally been ignored by researchers studying long-term income trends in the USA,

since to do otherwise exacerbates time-inconsistencies that arise from using unadjusted public

use data for the pre-1995 period and CPS data with cell mean imputations thereafter. Instead,

researchers analyzing CPS data that includes years prior to 1995 use other methods of controlling

for inconsistent topcoding in the public use data. These methods include measuring inequality

with P90/P10 ratios (e.g. Danziger and Gottschalk, 1993; Gottschalk and Danziger, 2005; Daly

and Valletta, 2006); artificially truncating the data by removing the highest and lowest two

percent of observations (e.g. Welch, 1999); or artificially lowering the topcodes to create a series

in which a constant percentage of people is topcoded in the data for each year (e.g. Karoly and

Burtless, 1995; Burkhauser, Butler, Feng, and Houtenville, 2004). This is referred to as the

consistent topcoding method. While each of these methods is preferable to using unadjusted data,

each has drawbacks.

Burkhauser, Feng, and Jenkins (2007) show that using P90/P10 ratios does not

completely alleviate topcoding problems, since incomes are topcoded by income source (not by

total income) and so there are topcoded incomes below the 90th percentile. Additionally, the

P90/P10 ratio captures different aspects of the income distribution than widely-used measures of

dispersion such as the Gini coefficient or members of the Generalized Entropy family.

7

Consistent top coding of the public use data can also cause problems. The percentage of

individuals with topcoded incomes, whether based on their own income or the income of others

in their household each year in the public use data, has grown sporadically over time from just

under 1 percent in 1975 to nearly 6 percent in 2006: see Figure 1. Unless public use data

topcodes are systematically increased over time as income increases, these topcodes cut deeper

into the income distribution. In that case, consistent topcoding methods do an increasingly poorer

job of capturing trends at the top of the income distribution as they remove larger fractions of the

population to maintain their consistency.

Censoring in CPS internal data

Using public use data with cell mean imputations cannot solve the problem of topcoding

entirely, because the internal data from which the cell means are derived are themselves

censored, albeit to a lesser degree than the public use data: see Figure 1.

Censoring in the internal data was initially implemented due to data-storage limitations in

the computing systems of the 1970s which necessitated truncating written records to 5 digits.

This was most binding on wage income, self-employment income and farm income. While these

data storage limitations are no longer a constraint, the Census Bureau continues its internal

censoring practice, partly due to concerns about data reliability of individuals who report an

extremely high income value and partly due to concerns about confidentiality.

In 1985 censoring limits were raised to $250,000 on each source of labor earnings and

internal censoring fell close to zero. Subsequent increases in censoring values have kept the

percentage of censored individuals in the internal data well below 1 percent although increases in

censoring from non-labor income sources in the past few years has increased the share of

censored individuals somewhat since 1994. Appendix Tables 4 and 5 provide the censoring

8

levels over time for each income source in the internal data. See Ryscavage (1995), Jones and

Weinberg (2000), Welniak (2003), and Burkhauser et al. (2004), for earlier discussions of the

problems of censoring in the public use and internal data.

Because we had access to the internal CPS data, we are able for the first time to correct

for censoring in these data back to 1975, and hence provide a more consistent measure of trends

in the income distribution and compare these trends with those found with the public use data.

A Consistent Censoring Approach

Consistent top coding (e.g. Karoly and Burtless, 1995; Burkhauser, Butler, Feng, and

Houtenville, 2004) creates censoring points that capture the same percentile of each source of

income in every year. Because total household income is derived as the sum of all of the

individual sources of income in a household, it is necessary to create a consistent topcode for

each of these sources of income over all years. We do so by finding the year at which the

censoring point for a given source is at the lowest point in the distribution of income within that

source and then using that percentile as our cutoff point for all years. Since the sources of income

reported by the Census Bureau changed in 1987, we recode income and censoring points for

years after this time to the pre-1987 sources in order to create a time consistent series across the

entire 30 year period between 1975 and 2004. Appendix Table 6 reports the percentage of the

distribution affected by our censoring points and the year in which the most restrictive censoring

points occurred for each source of income in both the public use and internal data sets.

The consistent censoring approach makes no parametric assumptions about the

underlying income distribution, and maximizes the use of the common proportion of the

available income information. Nevertheless, by construction, the approach ignores the very

highest incomes in the distribution each year, and so it will underestimate levels of inequality.

9

However, the fractions omitted do not vary over time, and so this approach provides time-

consistent trends of inequality for the part of the distribution below a certain fixed percentile that

it does capture.

A Multiple Imputation Approach

To correct for censoring and also capture the whole distribution, we develop the

imputation idea in a different direction. The key differences from the cell mean approach are,

first, we attempt to account for censoring in the internal data (as well as in public use data) and,

second, we derive imputations for topcoded values using a parametric model of the income

distribution (since, by definition, the ‘true’ uncensored underlying income in the internal data are

not available to us). Third, while the cell mean approach leads to a single imputation for each

topcoded value, we derive multiple imputed values using a suitable randomization procedure

which leads to multiple distributions for each year, and account for potential stochastic

imputation error by averaging estimates (hence the label ‘multiple imputation’).

The parametric model of the income distribution used for the imputation model is the

Generalized Beta of the Second Kind (GB2). The GB2 is a flexible form widely used in the

income distribution literature, and a number of studies have shown that it fits income

distributions extremely well across different times and countries: see e.g. Bordley, McDonald

and Mantrala (1996), Brachmann, Stich and Trede (1996), and Bandourian, McDonald, and

Turley (2002). Feng, Burkhauser, and Butler (2006) fitted GB2 distributions to labor earnings

using public use CPS data and argue that their Gini coefficients calculated from the fitted GB2

distributions provide more plausible estimates of inequality trends than do the Gini coefficients

derived from unadjusted internal data reported by the Census Bureau.

10

Our multiple imputation approach differs from earlier studies that used parametric

models to derive imputed values for topcoded incomes: see Fichtenbaum and Shahidi (1988) and

Bishop, Chiou, and Formby (1994). First, those authors fitted a one-parameter Pareto distribution

to the upper ranges of the income distribution, rather than the more flexible four-parameter GB2

model. Second, our approach takes account of the fact that, rather than being common to all

individuals, censoring levels vary across individuals (because topcoding is done at the level of

each income source rather than at the level of aggregate household income). Third, by using

multiple imputation methods, rather than a single imputation, we account for the variability

intrinsic in the imputation process.

Our multiple imputation approach involves five steps. First, for each year’s data, we fit a

GB2 distribution by maximum likelihood, accounting for individual level right-censoring. To

ensure that model fit is maximized at the top of the distribution, the GB2 is fitted using

observations in the richest 70 percent of the distribution only (using appropriate corrections for

left truncation in the ML procedure). Second, for each observation with a censored income, we

draw a value from the income distribution that is implied by the fitted GB2 distribution, using an

appropriate stochastic procedure. Third, using the distribution comprising imputations for

censored observations and observed incomes for non-censored observations, we estimate our

various inequality indices. Fourth, we repeated steps 2 and 3 one hundred times, and finally, we

combine the one hundred sets of estimates from each of the one hundred data sets for each year

using the ‘averaging’ rules proposed by Rubin (1987) and modified by Reiter (2003) for the case

of partially synthetic data. See Jenkins, Burkhauser, Feng and Larrimore (2008) for further

details of our multiple imputation approach.

11

In what follows we estimate inequality levels and trends using different series defined

according to the method by which we take account of censoring, and whether the basis source is

the public use or internal data. The series, and the acronyms by which we refer to them

subsequently, are summarized in Table 1.

Estimates of inequality levels and trends, 1975–2006

The revised cell mean series (Public-CM)

Figure 2 reports our estimates of income inequality levels and trends derived from four

series: Public-NoCM, Public-Unadjusted, Public-CM, and Internal-Unadjusted. Inequality is

summarized using the Gini coefficient. Because public use data are topcoded, mean household

income and Gini coefficients derived from them will be understated relative to those estimated

from the internal data unless cell mean imputations are used, since cell means provide more

information about individuals at the top of the income distribution.

In all years, the Gini estimates based on the Public-NoCM series are below those from

the Internal-Unadjusted series. To correct for this difference (and for several other reasons), the

Census Bureau revised its topcoding procedures in income year 1995. The result is reflected in

the difference between the Gini estimates derived from the Public-NoCM and Public-Unadjusted

series. Prior to 1995 the series are identical. After 1995 the Public-Unadjusted Gini values equal

the Internal-Unadjusted Gini values, and are dramatically greater than those derived from the

Public-NoCM series. (Note that this is not the case in 1999. We believe this is the result of an

uncorrected error in the Census Bureau cell mean series.) While the Gini values from the Public-

Unadjusted series now better represents the Gini values derived from the Internal-Unadjusted

data, anyone uncritically using Gini values from the Public-Unadjusted data to describe changes

in income inequality in the United States would greatly exaggerate its increase in 1995.

12

Our Public-CM series corrects this problem – the Gini estimates derived from these data

match those derived from the Internal-Unadjusted data in all years including 1999: see Figure 2.

Thus the addition of our cell mean imputations to the CPS public use data allows researchers

without access to the Internal-Unadjusted data to produce the same inequality levels and trends

found in the Internal-Unadjusted data. But anyone uncritically using Gini values based on the

Internal-Unadjusted data to describe changes in income inequality would greatly exaggerate its

increase in 1993 when major changes occurred in the way the CPS was collected including

significant changes in the censoring points in the internal CPS data.

Long term trends in US income inequality

We now extend our analysis of long term trends using additional data series. There are

six in total: Public-Unadjusted; Public-CTC; Public-MI, Internal-Unadjusted; Internal-CTC, and

Internal-MI. Figure 3 depicts the trends over the period 1975–2004 in the Gini coefficients

derived from each of the series. We stop at 2004 because that is the last year of internal CPS data

to which we had access.

As discussed above, the use of cell means since 1995 in the Public-Unadjusted data

allows researchers to estimate Gini coefficients that closely match those estimated using the

Internal-Unadjusted data since then. But using the Public-Unadjusted Gini estimates to describe

trends in income inequality before and after 1995 distorts these trends. When we consistently

topcode public use data (Public-CTC series), the Gini estimate for each year is below the

corresponding estimate from the Public-Unadjusted data: see Figure 3. But the artificial jump in

1995 in the Public-Unadjusted Gini values is gone and the trend in Public-CTC Gini values more

closely follows the trends estimated using the Internal-Unadjusted data, including a noticeable

rise in inequality between 1992 and 1993.

13

When we repeat this exercise by consistently top coding internal data (Internal-CTC

series), once again the Gini estimates are smaller than those based on the Internal-Unadjusted

data but they are much closer to those estimated from the Internal-Unadjusted data than those

based on the Public-CTC data. This is unsurprising, since the Internal-CTC data now reach a

higher percentile of the Internal Unadjusted data. There remains a pronounced rise in inequality

based on the Internal-CTC data between 1992 and 1993, but that rise is much smaller that the

rise shown by the Internal-Unadjusted series between those years. This suggests that part of the

1993 jump is due to changes in censoring levels in the Internal-Unadjusted data in 1993. But

even the smaller but pronounced increase in the Gini between 1992 and 1993 that remains in the

Internal-CTC series is unlikely to come from a genuine change in income inequality and suggests

that other changes in CPS data collection methods are responsible. (See Ryscavage, 1995 and

Jones and Weinberg, 2000.)

Because both the Public-MI and Internal-MI series capture the very highest incomes, it is

not surprising that both lead to higher measured inequality levels than their respective unadjusted

counterparts. Unlike the Public-Unadjusted series, the Public-MI series does not jump in 1995,

suggesting that the jump in the Public-Unadjusted series at this point is purely due to introducing

cell mean imputations into the public use data. On the other hand, the Gini estimates derived

from the Internal MI series rose sharply in year 1993, showing the same pattern as Internal-

Unadjusted series. This is additional evidence that the 1993 jump in Gini is primarily due to

other survey design changes rather than data problems related to censoring.

Gini estimates derived from the Public-MI and Internal-MI series are relatively close to

each other, suggesting that our assumptions about the parametric form of the income distribution

work consistently despite the differences in topcoding prevalence in the public use and internal

14

CPS data. Before 1993, the two series are nearly identical. They diverged somewhat between the

1993–2004 period, but are still relatively close overall. The average of the Public-MI Gini

coefficients since 1993 is 0.431, only slightly lower than the 0.439 average for the Internal-MI

series.

We now use regression analysis to more formally investigate whether the differences in

levels and trends in inequality differ between the data series. Table 2 reports the results of our

analysis of differences between the Gini estimates derived from Public-CTC, Internal-CTC,

Public-MI, Internal-MI and those derived from Internal-Unadjusted CPS data. We do not

consider the Public-Unadjusted series as it contains less information than the internal CPS data

and its artificial 1995 jump makes it flawed compared to the other alternatives.

The dependent variable (y) in the regressions is a Gini estimate for a given year and data

series. The explanatory variables are: a constant, which is the Gini derived from the Internal-

Unadjusted data; a time trend (t = 1, 2, …, 30), which summarizes the trend in this Gini; four

dummy variables to represent other Gini data sources and thereby control for the difference

between the Gini estimates from each series and the corresponding estimate from the Internal-

Unadjusted series (pc = 1 if the series is Public-CTC and 0 otherwise; ic = 1 if the series is

Internal-CTC and 0 otherwise; pm = 1 if the series is Public-MI and 0 otherwise; im = 1 if the

series is Internal-MI and 0 otherwise); interactions between each of the series dummy variables

(pc, ic, pm, and im) and time (t), which controls for the difference between the trends across

series; a dummy variable for post-1992 years (u) which controls for a change in the levels after

1992 using the internal data; interactions between each of the Gini source variables (pc, ic, pm,

and im) separately interacted with u, which controls for divergence in levels after 1992; an

interaction between t and u, which controls for changes in the trend after 1992 using internal

15

data; and the Gini source variables (pc, ic, pm, and im) separately interacted with t and u, which

controls for divergence between trends after 1992.

First we discuss our results with respect to Gini estimates from the Internal-Unadjusted

series. The significant positive coefficient for time trend (t) shown in Table 2, column 1, shows

that inequality rose between 1975 and 1992. After 1992, although inequality continued to rise,

the rate of increase slowed significantly compared to the pre-1993 period (note the coefficients

on t and especially t*u). There was also a significant jump in inequality in the post-1992 years

compared to pre-1993 years: observe the statistically significant positive coefficient for u.

However, much of this increase in inequality results from changes in survey methods and

internal censoring points, as we demonstrate by comparisons with the other Gini series.

The coefficients for pc and for interactions with it allow us to compare Gini estimates

from the Public-CTC series to the Internal-Unadjusted series. Because consistent topcoding in

the public use data imposes topcodes below those found in unadjusted public use data, it is

unsurprising that the level of inequality is significantly lower using the Public-CTC series than

using the Internal-Unadjusted one. This can be seen for years prior to 1993 from the statistically

significant negative coefficient for pc and, for years after 1992, from the statistically significant

negative coefficient for interaction pc*u. Additionally, in 1993, when censoring points increased

dramatically, we see that while inequality increased according to the Public-CTC series (look at

the coefficients on u + pc*u), the magnitude of the increase was substantially smaller than that in

the Internal-Unadjusted series. This is also not surprising since consistent top coding dampens

Gini levels in every year, and so is more likely to mitigate changes.

Despite the statistically significant difference in levels of inequality and the artificial

jump in inequality in 1992, in years other than 1992–1993 the trends in inequality shown by the

16

Internal-Unadjusted series are not significantly different from those shown by the Public-CTC

series: the coefficient on pc*t is not statistically significant. An F-test of the null hypothesis that

pc*t + pc*t*u= 0 is not rejected, indicating that after 1992, the inequality trends in the two series

are not significantly different.

We showed previously that the Gini estimates derived from Internal-Unadjusted data can

be replicated using the Public-CM series. So the comparison just undertaken could be repeated

with a comparison between inequality levels derived from the Public-CTC and Public-CM series.

But the Public-CTC series is clearly dominated by the Internal-CTC series because internal data

allow us to observe the reported incomes of a greater share of the population, while still

controlling for problems of time inconsistency in topcoding.

Therefore, to check whether inequality trends of inequality according to Internal-

Unadjusted data differ significantly from those found for lower parts of the income distribution

that are consistently censored, we also examined the differences between the inequality series

based on the Internal-CTC and Internal-Unadjusted series. While the Internal-CTC data imposes

topcodes for observations at values below the incomes for the same observation in the Internal-

Unadjusted data, the difference in levels of inequality is no longer significant for years prior to

1993: observe the negative but statistically insignificant coefficient for ic. But the difference for

the years after 1993 is significantly lower as can be seen by the statistically significant negative

coefficient for ic + ic*u. However, once again, the difference in inequality trends between the

Internal-CTC and Internal-Unadjusted series are not significantly different either before 1993

(observe the coefficient on ic*t) or after 1992 (ic*t + ic*t*u).

We now examine the levels and time trends shown by the two Gini series using our

multiple imputation method. Because the multiple imputation method imputes income values for

17

topcoded observations, both Public-MI and Internal-MI lead to Gini estimates that are larger than

their Internal-Unadjusted counterparts, although the differences are only significant at the 5

percent level for the Public-MI series (note the coefficients for pm and im). The Public-MI series

also jumps in 1993 (u + pm*u), but the increase is much less than the jump shown by the

Internal-Unadjusted series (shown by a statistically significant negative coefficient for pm*u). In

contrast, as shown by a small negative and insignificant coefficient for im*u, the Internal-MI

series jumps in 1993 almost as much as the Internal-Unadjusted series. This again shows that

changes in survey methods are important factors underlying the measured change in inequality

levels after 1993. Our results are consistent with Census Bureau warnings that pre-1993 and

post-1992 inequality measures based on data from CPS internal data are not directly comparable

(Jones and Weinberg 2000).

Nevertheless, both Internal-MI and Public-MI series show the same linear time trend as

the Internal-Unadjusted series, in both the pre-1993 and post-1992 periods. This is shown by the

non-significance of the four time trend coefficients (pm*t, pm*t*u, im*t, and im*t*u). Thus,

subject to the break in 1993, all five Gini series demonstrate the same time trends for the whole

period 1975–2004. Also, the post-1992 rate of increase is much lower than the pre-1993 period.

Do results differ if different measures of inequality are used?

The Gini coefficient is just one of many commonly-used indices of inequality. It

incorporates particular assumptions about how income differences are aggregated at different

parts of the income distribution: it is relatively sensitive to income differences in the middle of

the distribution (‘middle sensitive’). This raises the question of whether alternative inequality

measures, ones incorporating different aggregation assumptions, might lead to different

conclusions about inequality trends, depending, for example, on whether changes in dispersion

18

occur mainly at the top or the bottom of the distribution. We explore the sensitivity of our results

to the choice of inequality measure by repeating our analysis using three indices from the

Generalized Entropy parametric family. (See e.g. Cowell, 2000, or Jenkins and Van Kerm,

2009.) We employ the three most commonly used General Entropy (GE) measures: I(0), the

mean logarithmic deviation (MLD), which is a relatively bottom-sensitive index; I(1), the Theil

index, which is relatively middle-sensitive; and I(2), half the squared coefficient of variation,

which is an index that is top-sensitive.

We calculated each of these GE indices using each of the six data series for which we

derived Gini coefficients, and graph the estimates in Figures 4, 5 and 6 for I(0), I(1) and I(2)

respectively. In all cases, measured inequality in a given year is greater in an internal data series

than in its public use data counterpart.

Once again, the total period can be divided into three sub-periods, 1975–1992, 1992–

1993, and 1993–2004. For each sub-period, we calculated the average annual percentage change

in each GE index and the Gini according to the different series: see Table 3. Once again, the

estimates based on Public-Unadjusted data are not shown because of the cell mean problem.

Table 3 shows that the change in measured income inequality between 1992 and 1993 is much

greater than the average change for each of the years before or after, regardless of the inequality

index or data series used. This further suggests the changes between 1992 and 1993 represent

changes in survey practice rather than a genuine change in the distribution of income.

Table 3 shows that average yearly increase in measured inequality is greater in the years

prior to 1993 than in the years after 1993 for all measures across all data sets, but the difference

is much less for the I(0) values than for any of the other measures. Because the I(0) values put

more weight on income differences in the lower ranges of the income distribution, this suggests

19

that changes in the upper ranges of the income distribution might be driving the slowdown of the

increase in inequality in the period 1993–2004.

As we did for our Gini estimates, we now use regression analysis to examine more

precisely the differences in the levels and trends in inequality according to the different data

series, where inequality is now measured by the three GE indices: see columns 2 through 4 of

Table 2. The explanatory variables in the regressions are defined in an analogous fashion to those

employed when comparing Gini levels and trends.

The results for I(0) clearly differ from those based on the Gini measure: see Table 2,

column 2. While the basic finding of a rise in inequality is still present for the Internal-

Unadjusted series – the coefficient on t is positive and statistically significant – there is no

change in the index’s trend after 1992, as the coefficient on t*u is not statistically significantly

different from zero. All other series demonstrate the same time trend: the F-test that t + pc*t, t +

ic*t, t + pm*t, and t + im*t each equal zero is soundly rejected at the 1 percent level in each case.

Similarly, none of the other series show a slowdown in the rising trend for the period 1993–2004.

Even at the 10 percent level, we cannot reject the null hypothesis that the coefficients on

t*u+pc*t*u, t*u+ic*t*u, t*u+pm*t*u, and t*u+im*t*u equal zero.

With the exception that the level of inequality described by the Public-CTC series is

significantly lower than the corresponding values from Internal-Unadjusted series – the

coefficient on pc is negative and significant – all other variables in this regression are statistically

insignificant. In addition, the trends found in all five series are not significantly different over the

entire period. Since I(0) is relatively bottom-sensitive, and the differences between these

different series arise at the top of the distribution, it is not surprising that the choice of topcoding

method makes little difference for this index.

20

In contrast, the results for the I(1) values are very close to those found for the Gini

coefficient: see Table 2, column 3. According to the Internal-Unadjusted series, the basic time

trend (coefficient on t) is positive and statistically significant, as is the coefficient on u, while the

coefficient on their interaction, t*u, is negative and statistically significant. Thus, as for the Gini

index, inequality after 1992 according to I(1) is greater, but the annual increase in inequality

since 1993 is significantly lower than in the years prior to 1993.

When the Public-CTC and Internal-CTC series are used rather than the Internal-

Unadjusted one, the pre-1993 levels in I(1) are both significantly lower, as shown by the

coefficients on pc and ic. For the post-1992 period, both series show significantly smaller jumps

in levels: the coefficients on pc*u and ic*u are negative and statistically significant. But the post-

1992 inequality levels are still significantly higher than those for the pre-1993 period, for both

series. The null hypotheses that u+pc*u and u+ic*u are each equal to zero are both soundly

rejected. With respect to trends, the coefficient on pc*t is negative and statistically significant,

whereas the coefficient on ic*t is negative but not statistically significant.

The Public-MI and Internal-MI series both show higher levels of inequality than the

Internal-Unadjusted series for corresponding years. The Public-MI series show a smaller jump in

post-1993 levels, as shown by the negative and statistically significant coefficient on pm*u. Both

series, though, show higher levels of inequality in the post-1993 period than in the pre-1993

period. Both series show a clearly rising trend for the whole period, but the rates of increase in

the post-1993 period are significantly smaller, as shown by F-tests on t*u+pm*t*u and

t*u+im*t*u.

Finally, we turn to examine the levels and trends in I(2): see Table 2, column 4.

According to the Internal-Unadjusted series, the basic time trend is positive and significant, as is

21

the coefficient on t, while t*u is negative and significant. Therefore, as with the Gini and the I(1)

indices, inequality after 1992 is greater but the annual increase in inequality since 1993 is

significantly lower than the increase over the years prior to 1993.

When the Public-CTC and Internal-CTC series are compared to the Internal-Unadjusted

one, the pre-1993 levels of I(2) are significantly lower for both series, as shown by the

coefficients on pc and ic. For the post-1993 period, both series also show significantly smaller

degrees of increase in levels because the coefficients of pc*u and ic*u are negative and

statistically significant. But for both series, the post-1993 inequality levels are still significantly

higher than the pre-1993 period. The null hypotheses that u+pc*u and u+ic*u each equal zero are

both rejected. Both the CTC series show smaller upward trend for the pre-1993 period than does

the Internal-Unadjusted one: the coefficients on pc*t and ic*t are negative and statistically

significant. Although the Public-CTC series shows a lower trend after 1993 compared to before

1992, we cannot reject the null that the time trends before and after 1993 are the same in the

Internal-CTC series.

According to both the Public-MI and Internal-MI series, I(2) values are larger than the

corresponding value in the Internal-Unadjusted series. In contrast to the time trend results for the

Gini and I(1), the rates of increase in the post-1993 period are the same as the pre-1993 period.

Using F-tests, we find that the coefficients on t*u+pm*t*u and t*u+im*t*u are not significantly

different from zero. These results, as well as the contrast with those for the Gini and I(1), may

arise because I(2) estimates tend to have larger sampling variability relative to these other

indices, other things being equal.5 This can also be seen from large year-on-year changes in I(2)

values, and the large differences between their Public-MI and Internal-MI estimates, especially

22

in the post-1993 period (see Figure 6). Thus, more caution is warranted when interpreting I(2)

values.

Trends in Top Income Shares

Our results may surprise some readers – occasional users of the CPS may not be aware of

the problems that inconsistent top coding can cause. For example, they may not be aware of

comparability issues introduced by the Census Bureau’s use of cell mean imputations beginning

in 1995, and the changes in the CPS (public use and internal data) before and after 1993.6 Other

readers may be surprised by our findings because they appear to contradict the seminal work of

Piketty and Saez (2003), who report substantial growth in income inequality as measured by the

share of “adjusted gross income” (the amount of income that is taxable under federal income tax

laws in place each year) held by tax units (unadjusted for household size) at the top of the

distribution.

Up to now, we have focused on how best to use CPS data to make statements about levels

and trends in income inequality as it has conventionally been measured in the income

distribution literature. (See Gottschalk and Smeeding, 1997, and Atkinson and Brandolini, 2001,

for reviews.) In this section, we compare our CPS-based estimates of levels and trends in the

income distribution with those of Piketty and Saez (2003).7 In doing so, it is critical to point out

that there are fundamental differences between the measurement conventions used when

analyzing size-adjusted pre-tax post-transfer household income of individuals using cross-

sectional surveys like the CPS and using IRS data on the adjusted gross income of tax units as

done by Piketty and Saez (2003). Piketty and Saez (2003) were primarily interested in measuring

very long run trends (since 1913) in US income inequality and, because of data limitations, focus

23

only on a small part of that distribution – the share of adjusted gross income held by the top 10

percent of tax filing units and how that share is distributed within this part of the distribution.

In Figure 7 we report our estimates of the share of total income (as discussed in the first

part of this paper) held by the richest 10 percent of the population. We do so by cumulatively

reporting the share held by the 90th-95th percentile group and then by this group plus the next

richest 4 percent (95th-99th percentile group) and finally by adding on the share held by the top 1

percentile group for each year 1975–2004, based on the Internal-MI series. We also report the

corresponding cumulative share estimates from Piketty and Saez (2003) for their tax unit

samples and use of adjusted gross income (excluding capital gains).

Our Internal-MI top share estimates are in general somewhat lower than the

corresponding ones reported by Piketty and Saez (2003). This is not surprising since two quite

different concepts of income are being used. For instance, adjusted gross income does not

include government transfers since this income is not taxed. Because a much greater share of

non-taxable government in-cash transfers – AFDC/TANF, Social Security benefits, etc. – are

held by those in the poorest 90 percent of the pre-tax post transfer distribution, and this income is

included in our income measure, we would expect those in the top 10 percent of the pre-tax post-

transfer income distribution to have a smaller share of total income than of adjusted gross

income in all years, as is the case in Figure 7.

Nevertheless, for both the income share held by the 90th–95th percentile groups and the

95th–99th percentile groups, the CPS-based and tax data-based series resemble each other closely,

in terms of both levels and trends. This is remarkable given the very different underlying

concepts of income and approaches taken. Both the Piketty and Saez (2003) IRS series and our

Internal-MI CPS series suggest that the share of income held by the 90th–95th percentile groups

24

remained relatively stable during the whole period 1975–2004, accounting for around 10–12

percent of total income. In terms of the share of income held by the 95th–99th percentile groups,

the Piketty and Saez (2003) IRS-based estimate grows slowly from 13 percent in 1975 to 15.1

percent in 2004. Over the same period, the growth in our corresponding Internal-MI CPS

estimate is remarkably similar: an increase from 10.5 percent to 12.5 percent.

In Figure 8, we disaggregate the share of income held by the richest 10 percent of the

income distribution further by reporting the share of income held by percentile groups within the

95th–99th percentile group according to our Internal-MI CPS series. Although income shares

increased slightly at the higher percentiles, through to the 99th percentile group, these increases

were relatively modest. Even for the 98th–99th percentile group, the growth in income shares

from 1975–2004 was less than 1 percent.

In contrast to the similar trends found in the share of income held by those in the 90th–

99th percentile group using either the Piketty and Saez (2003) series or our Internal-MI series,

there are substantial differences between the two series with respect to the level and trends in the

share of income held by the richest 1 percent. While the share of income help by the richest 1

percent grew substantially over the 1975–2004 period according to both series, growth in the

Piketty and Saez (2003) series is much greater, from 8.0 percent to 16.1 percent compared to an

increase from 5.4 percent to 9.8 percent in the Internal-MI CPS series. But this difference is even

greater when it is observed that one third of the increase in the income share of the top 1 percent

in the Internal-MI series from 1975–2004 occurred in 1993, and which is primarily attributable

to the CPS redesign. Hence a significant minority of this increase in the share of income held by

the top 1 percent in the Internal-MI series is likely to be due to better measurement of their

income by the CPS rather than by an actual increase in the share of income they held.

25

This same problem of changes in measurement versus changes in real income held by the

richest 1 percent of the distribution is also likely to explain at least some part of the rise in

income shares at the top of the income distribution in the Piketty and Saez (2003) series. As can

be seen in Figure 7, there is a dramatic 4 percentage point jump in the share of gross taxable

income held by the highest 1 percent of tax units reported by Piketty and Saez (2003) between

the years 1986 and 1988. This finding must, to some degree, be the result of changes in the way

the very richest tax units chose to report their income as a result of the Tax Reform Act of 1986

rather than the result of genuine changes in income inequality. This legislation provided

substantial incentives for the very richest tax units to switch income from Subchapter-S

corporations to wage income. (See Slemrod, 1996, and Reynolds, 2006 for a fuller discussion of

this type of tax elasticity issue in the Piketty and Saez, 2003 data and Feenberg and Poterba,

1993 for a more general discussion of the problems of measuring income inequality based on

income tax records.) Subtracting the 1986–1988 jump in the Piketty and Saez series would cut in

half the increase in the share of income held by the richest 1 percent over the whole period. Their

data for other years may also be subject to this same type of tax elasticity behavior, albeit to a

lesser extent. For example, for a group of high income corporate executives, Goolsbee (2000)

showed that almost all of the estimated responsiveness of taxable wage and salary income to

marginal tax rates from 1991 to 1995 was the result of shifts in the timing of compensation rather

than permanent shifts in the form of compensation.

Keeping this in mind, it is difficult to determine what is going on at the very top of the

income distribution using either the CPS data or the IRS data. Piketty and Saez (2003) find

greater increases in inequality after 1993 than we do primarily because of greater growth in the

share of income held by the richest 1 percent in their IRS data. It is uncertain to what degree this

26

difference is the result of our decreasing ability to capture income at the very highest income

levels, even using internal CPS data, or of behavioral changes in the way that individual tax units

report their adjusted gross income on their tax returns.

The growth in income in the years just prior to the 2000 stock market crash was greater

among the highest income groups, and the estimates of the income share of the richest 1 percent

from both series capture this increase as well as the decline in their share in the recession that

followed the stock market crash. The CPS may have been less able to capture this income going

to the top of the income distribution. But it also may be the case, as Reynolds (2006) argues, that

a greater increase in the use of tax-deferred savings accounts (401k plans, Keogh plans and IRA

tax shelters) by those in richer percentile groups but not in the very richest 1 percent of the

adjusted gross income distribution of tax units may also explain part of the rise in the top income

share reported by Piketty and Saez (2003).

Additional work is necessary to determine what precisely is happening at the very highest

income levels. But one more piece of evidence with respect to the poorest 99 percent of the

distribution – which can be measured with confidence in the CPS data – is that our trends for

income inequality are consistent with the trends in wage earnings over this period found by

Autor, Katz and Kearney (2008) using the public use CPS data.

Conclusions

We analyze trends in income inequality in the USA over three decades (1975–2004),

drawing on internal CPS data that are not generally available to researchers outside the Census

Bureau. Although the CPS is the most authoritative survey data source for studying US income

inequality, topcoding of each of its many sources of income artificially lowers the level of

27

inequality and potentially affects estimates of its trends over time, whether one uses public or

internal CPS data. We consider several methods for addressing topcoding, and as a byproduct of

this work, derive a revised cell mean imputation series that can be applied to public use data.

This series allows researchers to more accurately capture levels and trends in the internal data. In

addition, we take account of topcoding in the internal data using a multiple imputation approach

that uses parametric methods to derive imputations.

Using the Gini coefficient to measure inequality, we find the level of income inequality

based on our Internal-MI series is slightly higher in corresponding years inequality based on the

Internal-Unadjusted series before 1993 and significantly higher after 1993. But despite this

difference in levels, the trends in inequality we find using the two series are not significantly

different. Inequality rose over the whole period 1975–2004, but the rate of increase slowed after

1993. We find this same result using our Internal-CTC and Public-CTC estimates.

The story about trends differs when we use the three GE inequality indices, but the

differences are readily explicable in terms of the properties of the indices, specifically their

differences in sensitivity to income differences in different ranges of the income distribution. I(0)

rose steadily during 1975–2004, showing no sign of slowing down after 1993. But our I(1) and

I(2) series are similar to our Gini series in their trends, subject to the caveat that I(2) indices are

less precisely estimated, showing large year-on-year variation. Because I(0) is more bottom-

sensitive than the other inequality measures, our results suggest that what is happening in the

upper part of the income distribution is responsible for the 1990s slowdown in rate of inequality

increase.

Our results for household income are largely consistent with other research that has

examined levels and trends in wage inequality using public use CPS data (see e.g. Autor, Katz

28

and Kearney, 2008) but differ from the results found by Piketty and Saez (2003) after 1986

almost solely due to different trends in the share of income held by the richest 1 percent. With

IRS data about the distribution of adjusted gross income, not only do Piketty and Saez find

higher levels of inequality over all years than we do, but also a major increase in inequality

between 1986 and 1988 and a greater increase in income inequality trends in the post-1993

period. Although the rapid increase in income inequality in their tax data is caused in part by

changes in the way income is reported rather than a real change in underlying income inequality,

it is less certain why they find such larger increased in income inequality after 1993.

Our results suggest that, for at least the poorest 99 percent of the income distribution, the

increase in inequality since 1993 has been significantly slower in the USA than in the previous

two decades. Based on our Internal-MI series we find the level of income inequality rises when

we include an estimate that includes the very top part of the income distribution censored in the

unadjusted internal CPS data. But even in these estimates, the rise in income inequality slowed

after 1993. Our findings are consistent with those found by Piketty and Saez (2003) for the 90th–

99th percentile groups. It is only with respect to the richest 1 percent that we differ. And it is here

that we are at the limits of current knowledge, both with respect to the CPS data because of its

difficulty in obtaining information on the highest income households, and with respect to the IRS

data because of behavioral effects caused by changes in the tax laws. It is difficult to fully

understand how much of the yearly changes in inequality are the result of real changes in the

incomes of the very richest income tax units and how much is due simply to changes in the way

they report that income.

29

References

Atkinson, Anthony B., and Andrea Brandolini. 2001. “Promises and Pitfalls in the Use of Secondary Data Sets: Income Inequality in OECD Countries as a Case Study.” Journal of Economic Literature, 39 (3): 771–799.

Atkinson, Anthony B., Lee Rainwater, and Timothy Smeeding. 1995. Income Distribution in

OECD Countries. Evidence from the Luxembourg Income Study. Social Policy Studies No. 18. Paris: Organization for Economic Cooperation and Development.

Autor, David, Lawrence Katz, and Melissa Kearney. 2008. “Trends in U.S. Wage Inequality:

Revising the Revisionists.” Review of Economics and Statistics 90(2): 300–323. Bandourian, Ripsy, James B. McDonald, and Robert S. Turley. 2002. “A Comparison of

Parametric Models of Income Distribution across Countries and Over Time.” Working Paper No. 305. Luxembourg: Luxembourg Income Study. http://www.lisproject.org/publications/LISwps/305.pdf

Bishop, John A., Jong-Rong Chiou, and John P. Formby. 1994.“Truncation Bias and the Ordinal

Evaluation of Income Inequality.” Journal of Business and Economic Statistics, 12: 123–127.

Bordley, Robert F., James B. McDonald, and Anand Mantrala. 1996. “Something New,

Something Old: Parametric Models for the Size Distribution of Income.” Journal of Income Distribution 6: 91–103.

Brachmann, Klaus, Andreas Stich, and Mark Trede. 1996. “Evaluating Parametric Income

Distribution Models.” Allgemeines Statistisches Archiv 80: 285–298. Burkhauser, Richard V., J.S. Butler, Shuaizhang Feng, and Andrew Houtenville. 2004. “Long-

Term Trends in Earnings Inequality: What the CPS Can Tell Us.” Economics Letters, 82: 295–299.

Burkhauser, Richard V., Kenneth A. Couch, Andrew J. Houtenville, and Ludmila Rovba. 2003-

2004“Income Inequality in the 1990s: Re-Forging a Lost Relationship?” Journal of Income Distribution, 12 (3–4): 8–35.

Burkhauser, Richard V., Shuaizhang Feng, and Stephen P. Jenkins. 2007. “Using the P90/P10

Ratio to Measure US Inequality Trends with Current Population Survey Data: A View from Inside the Census Bureau Vaults.” ISER Working Paper 2007-14. Colchester, UK: Institute for Social and Economic Research, University of Essex. http://www.iser.essex.ac.uk/pubs/workpaps/pdf/2007-14.pdf

Burkhauser, Richard V. and Jeff Larrimore. 2008. “Trends in the Relative Economic Well Being

of Working-Age Men with Disabilities: Correcting the Record using Internal Current Population Survey Data.” Center for Economic Studies Working Paper Series CES-WP-

30

http://www.lisproject.org/publications/LISwps/305.pdf

http://www.iser.essex.ac.uk/pubs/workpaps/pdf/2007-14.pdf

08-05, March 2008. http://www.ces.census.gov/index.php/ces/cespapers?down_key=101811

Burkhauser, Richard V., Takashi Oshio, and Ludmila Rovba. 2008. “How the Distribution of

After-Tax Income Changed Over the 1990s Business Cycle: A Comparison of the United States, Great Britain, Germany, and Japan.” Journal of Income Distribution, 17 (1): 87–109.

Cowell, Frank A. 2000. “Measurement of Inequality.” In: Anthony B. Atkinson and François

Bourguignon (eds), Handbook of Income Distribution, 87–166. Elsevier North Holland: Amsterdam.

Cowell, Frank A., and Maria-Pia Victoria-Feser. 1996. “Robustness Properties of Inequality

measures.” Econometrica, 64: 77–101. Daly, Mary C. and Robert G. Valletta. 2006. “Inequality and Poverty in United States: The

Effects of Rising Dispersion of Men’s Earnings and Changing Family Behaviour.” Economica, 73 (289): 75–98.

Danziger, Sheldon and Peter Gottschalk. (Eds.) 1993. Uneven Tides: Rising Inequality in

America. New York: Russell Sage Foundation. Feenberg, Daniel and James M. Poterba. 1993. “Income Inequality and the Incomes of Very

High-Income Taxpayers: Evidence from Tax Returns,” in Tax Policy and the Economy, ed. James M. Poterba, 7, Cambridge, MA: NBER/MIT Press, pp.

Feng, Shuaizhang, Richard V. Burkhauser, and J.S. Butler. 2006. “Levels and Long-Term Trends

in Earnings Inequality: Overcoming Current Population Survey Censoring Problems Using the GB2 Distribution.” Journal of Business and Economic Statistics 24 (1): 57–62.

Fichtenbaum, R. and Shahidi, H. (1988) “Truncation Bias and the Measurement of Income

Inequality.” Journal of Business and Economic Statistics, 6: 335–337. Goolsbee, Austan. 2000. “What Happens When You Tax the Rich? Evidence from Executive

Compensation.” Journal of Political Economy, 108 (2): 352–378. Gottschalk, Peter, and Timothy M. Smeeding. 1997. “Cross-National Comparisons of Earnings

and Income Inequality.” Journal of Economic Literature, 35 (2): 633–687. Gottschalk, Peter, and Sheldon Danziger. 2005. “Inequality of Wage Rates, Earnings and Family

Income in the United States, 1975–2002.” Review of Income and Wealth 51 (2): 231–254. Jenkins, Stephen P., and Philippe Van Kerm. 2009 forthcoming. “The Measurement of Economic

Inequality.” In: W. Salverda, B. Nolan and Timothy M. Smeeding (eds), The Oxford Handbook on Economic Inequality. Oxford: Oxford University Press.

31

http://www.ces.census.gov/index.php/ces/cespapers?down_key=101811

Jenkins, Stephen P., Richard V. Burkhauser, Shuaizhang Feng, and Jeff Larrimore. 2008. “Measuring Inequality with Censored Data: A Multiple Imputation Approach.” ISER Working Paper. Colchester, UK: Institute for Social and Economic Research, University of Essex. (in preparation).

Jones, A.F., Jr., and Weinberg, D.H. 2000. “The Changing Shape of the Nation’s Income

Distribution.” Current Population Reports, U.S. Census Bureau, June 2000. http://www.census.gov/prod/2000pubs/p60-204.pdf

Karoly, Lynn A. and Gary Burtless. 1995. “Demographic Changes, Rising Earnings Inequality,

and the Distribution of Personal Well-Being, 1959–1989.” Demography, 32 (3): 379–405.

Kruger, Dirk, and Fabrizio Perri. 2006. “Does Income Inequality Lead to Consumption

Inequality?” Review of Economic Studies, 73 (1): 163–193. Larrimore, Jeff, Richard V. Burkhauser, Shuaizhang Feng and Laura Zayatz. 2009 forthcoming.

“Consistent Cell Means for Topcoded Incomes in the Public Use March CPS (1976–2007).” Journal of Economic and Social Measurement.

Levy, Frank, and Richard J. Murnane. 1992. “U.S. Earnings Levels and Earnings Inequality: A Review of Recent Trends and Proposed Explanations.” Journal of Economic Literature, 30 (3): 1333–1381.

Piketty, Thomas, and Emmanuel Saez. 2003. “Income Inequality in the United States, 1913–

1998.” Quarterly Journal of Economics, 118 (1): 1–39. Reiter, J.P. 2003. “Inference for Partially Synthetic, Public Use Microdata Sets.” Survey

Methodology, 29: 181–188. Reynolds, Alan. 2006. Income and Wealth. Westport, Connecticut: Greenwood Press. Rubin, Donald B. 1987. Multiple Imputation for Nonresponse in Surveys. New York: Wiley. Ryscavage P. 1995. “A surge in growing income inequality?” Monthly Labor Review, August,

51–61. http://www.bls.gov/opub/mlr/1995/08/art5full.pdf Slemrod, Joel. 1996. “High-income Families and the Tax Changes of the 1980s: The Anatomy of

Behavioral Response”, In Empirical Foundations of Household Taxation, edited by Martin Feldstein and James M. Poterba. Chicago: University of Chicago Press (for NBER).

U.S. Census Bureau. Various years. Income, Poverty, and Health Insurance Coverage in the United States: various years. (August), Current Population Reports, Consumer Income, Washington DC: GPO.

32

http://www.census.gov/prod/2000pubs/p60-204.pdf

http://www.bls.gov/opub/mlr/1995/08/art5full.pdf

U.S. Census Bureau. 2007. Current Population Survey Annual Social and Economic Supplement Technical Documentation. Washington DC: GPO.

Welch, Finis. 1999. “In Defense of Inequality” American Economic Review 89: 1–17. Welniak, Edward J. 2003. “Measuring Household Income Inequality Using the CPS.” in James

Dalton and Beth Kilss (Eds.), Special Studies in Federal Tax Statistics 2003, Statistics of Income Directorate, Inland Revenue Service, Washington DC.

33

Figure 1: Percentage of Individuals with Censored Household Income in March CPS, by year

Internal data were not available for years after 2005. Source: Authors’ calculations from internal and public use data files of March CPS.

34

Figure 2: Gini Coefficient Estimates for Four Topcode Adjustment Methods, 1975–2006

Internal data were not available for years after 2005. Source: Authors’ calculations from internal and public use data files of March CPS.

35

Figure 3: Gini Coefficient Estimates for Six Topcode Adjustment Methods, 1975-2004

Source: Authors’ calculations from internal and public use data files of March CPS.

36

Figure 4: I(0) Inequality Index Estimates for Six Topcode Adjustment Methods, 1975-2004


37



38



39

Figure 7: Estimates of Top Income Shares: Piketty & Saez and Internal-MI Series, 1975–2004

Source: http://elsa.berkeley.edu/~saez/ and authors’ calculations from internal data files of March CPS.

40

http://elsa.berkeley.edu/%7Esaez/

Figure 8: Estimates of Top Income Shares: Internal-MI Series, 1975–2004

Source: Author’s calculations using internal March CPS data

41

Table 1: Income Distribution Series Derived from the March CPS, by Source and Method for Addressing Censoring

Acronym Source Method for Addressing Censoring Issues Internal-CTC Internal Consistently top-coded Internal-Unadjusted Internal Uses internal data as provided in Census Bureau files, without any

adjustments Internal-MI Internal Topcoded observations replaced by imputations derived from GB2

imputation model fitted to internal data; inequality estimates derived using multiple imputation combination methods.

Public-CTC Public Use Consistently top-coded Public-Unadjusted Public Use Uses public use data as provided in Census Bureau files; includes

Census Bureau cell mean imputations for topcoded observations from 1995 onwards

Public-CM Public Use Uses public use data as provided in Census Bureau files; includes cell mean imputations for topcoded observations for all years

Public-NoCM Public Use Uses public use data as provided in Census Bureau files, except that no cell mean imputations used for any year (topcoded values used ‘as is’).

Public-MI Public Use Topcoded observations replaced by imputations derived from GB2 imputation model fitted to public use data; inequality estimates derived using multiple imputation combination methods.

See main text for further details of the methods used to address topcoding issues.

42

Table 2: Differences in Inequality Levels and Trends Based on Regression Analyses Gini I(0) I(1) I(2) constant 0.3528** 0.2850** 0.2121** 0.2489** (0.0018) (0.0048) (0.0029) (0.0082) t (time trend) 0.0030** 0.0044** 0.0042** 0.0066** (0.0002) (0.0004) (0.0003) (0.0008) u (=1 if post-1992) 0.0592** 0.0190 0.1117** 0.3849** (0.0047) (0.0261) (0.0099) (0.0454) t*u -0.0023** 0.0002 -0.0032** -0.0071** (0.0002) (0.0011) (0.0005) (0.0021) pc (=1 if public CTC) -0.0158** -0.0173* -0.0250** -0.0535** (0.0025) (0.0073) (0.0035) (0.0085) pc*t -0.0003 -0.0005 -0.0009* -0.0026** (0.0002) (0.0006) (0.0003) (0.0008) pc*u -0.0282** -0.0410 -0.0801** -0.3586** (0.0067) (0.0365) (0.0119) (0.0464) pc*t*u 0.0006 0.0009 0.0015* 0.0059** (0.0003) (0.0016) (0.0006) (0.0022) ic (=1 if internal CTC) -0.0041 -0.0048 -0.0080* -0.0204* (0.0026) (0.0074) (0.0037) (0.0088) ic*t -0.0002 -0.0003 -0.0005 -0.0017* (0.0002) (0.0006) (0.0003) (0.0008) ic*u -0.0176* -0.0274 -0.0632** -0.3338** (0.0067) (0.0376) (0.0126) (0.0497) ic*t*u 0.0004 0.0006 0.0012* 0.0061** (0.0003) (0.0016) (0.0006) (0.0023) pm (=1 if public MI) 0.0046* 0.0059 0.0115** 0.0382** (0.0023) (0.0069) (0.0038) (0.0143) pm*t 0.0001 0.0002 0.0006 0.0043** (0.0002) (0.0006) (0.0004) (0.0015) pm*u -0.0188* -0.0265 -0.0498* -0.2143 (0.0087) (0.0361) (0.0238) (0.1565) pm*t*u 0.0005 0.0007 0.0010 0.0055 (0.0004) (0.0015) (0.0010) (0.0065) im (=1 if internal MI) 0.0043 0.0057 0.0122** 0.0508** (0.0026) (0.0072) (0.0040) (0.0135) im*t 0.0001 0.0001 0.0003 0.0029 (0.0003) (0.0006) (0.0005) (0.0018) im*u -0.0026 -0.0032 0.0002 0.1110 (0.0073) (0.0359) (0.0256) (0.2586) im*t*u 0.0003 0.0005 0.0010 0.0114 (0.0004) (0.0015) (0.0012) (0.0115) The baseline data source is Internal-Unadjusted. Numbers reported in parentheses are standard errors. ** Significant at the 1% level. * Significant at the 5% level. Source: Authors’ calculations from internal and public use data files of March CPS.

43

Table 3: Average Annual Percentage Change in inequality by period, using alternative March CPS datasets and inequality indices Average annual percentage change 1975-1992 1992-1993 1993-2004 Gini Coefficients Unadjusted Internal 0.74 5.85 0.14 Consistent Topcoding Public 0.75 1.93 0.23 Consistent Topcoding Internal 0.74 3.24 0.21 MI-GB2 public 0.82 2.34 0.30 MI-GB2 Internal 0.77 6.63 0.20 I(0) Unadjusted Internal 1.48 12.48 1.20 Consistent Topcoding Public 1.49 5.86 1.45 Consistent Topcoding Internal 1.48 7.80 1.39 MI-GB2 public 1.63 6.41 1.47 MI-GB2 Internal 1.54 14.04 1.28 I(1) Unadjusted Internal 1.58 22.08 0.26 Consistent Topcoding Public 1.63 3.67 0.55 Consistent Topcoding Internal 1.61 7.71 0.59 MI-GB2 public 1.97 5.39 0.80 MI-GB2 Internal 1.72 29.32 0.43 I(2) Unadjusted Internal 1.76 71.82 0.03 Consistent Topcoding Public 1.88 4.09 0.88 Consistent Topcoding Internal 1.89 12.67 1.10 MI-GB2 public 3.31 11.22 1.69 MI-GB2 Internal 2.21 147.17 0.45


44

Appendix Table 1. Income Items Reported in the Current Population Survey

Name Name in

Public Files

Name in Internal

Files Definition 1975–1986

Labor Earnings Wages I51A WSAL_VAL Wages and Salaries Self Employment I51B SEMP_VAL Self employment income Farm I51C FRSE_VAL Farm income Other Sources

Social Security I52A I52A_VAL Income from Social Security and/or Railroad Retirement

Supplemental Security I52B SSI_VAL Supplemental Security Income Public Assistance I53A PAW_VAL Public Assistance Interest I53B INT_VAL Interest Dividends Rentals I53C I53C_VAL Dividends, Rentals, Trust Income Veterans I53D I53D_VAL Veteran's, unemployment, worker's compensation Retirement I53E I53E_VAL Pension Income Other I53F I53F_VAL Alimony, Child Support, Other income

1987–2006 Labor Earnings Primary earnings ERN_VAL ERN_VAL Primary Earnings Wages WS_VAL WS_VAL Wages and Salaries-Second Source Self Employment SE_VAL SE_VAL Self employment income -Second Source Farm FRM_VAL FRM_VAL Farm income -Second Source Other Sources Social Security SS_VAL SS_VAL Social Security Income Supplemental Security SSI_VAL SSI_VAL Supplemental Security Income Public Assistance PAW_VAL PAW_VAL Public Assistance & Welfare Income Interest INT_VAL INT_VAL Interest Dividends DIV_VAL DIV_VAL Dividends Rental RNT_VAL RNT_VAL Rental income Alimony ALM_VAL ALM_VAL Alimony income Child Support CSP_VAL CSP_VAL Child Support Income Unemployment UC_VAL UC_VAL Unemployment income Workers Comp WC_VAL WC_VAL Worker's compensation income Veterans VET_VAL VET_VAL Veteran's Benefits Retirement - Source 1 RET_VAL1 RET_VAL1 Retirement income - source 1 Retirement - Source 2 RET_VAL2 RET_VAL2 Retirement income - source 2 Survivors - Source 1 SUR_VAL1 SUR_VAL1 Survivor's income - source 1 Survivors - Source 2 SUR_VAL2 SUR_VAL2 Survivor's income - source 2 Disability - Source 1 DIS_VAL1 DIS_VAL1 Disability income - source 1 Disability - Source 2 DIS_VAL2 DIS_VAL2 Disability income - source 2 Education assistance ED_VAL ED_VAL Education assistance Financial assistance FIN_VAL FIN_VAL Financial Assistance Other OI_VAL OI_VAL Other income

Sources: Current Population Survey Annual Demographic File Technical Documentation, 1976-2002. Current Population Survey Annual Social and Economic Supplement Technical Documentation, 2003-2007

45

Appendix Table 2. Public Use CPS Censoring Points for each Income Source in Dollars (1975–1986)

Year Wages (I51A)

Self Employment

(I51B) Farm

(I51C)

SocialSecurity(I52A)

SupplementalSecurity (I52B)

Public Assistance

(I53A) Interest(I53B)

Dividends Rentals (I53C)

Veterans and

Workers Comp (I53D)

Retirement(I53E)

Other(I53F)

1975 50,000 50,000 50,000 9,999 5,999 19,999 50,000 50,000 29,999 50,000 50,000 1976 50,000 50,000 50,000 9,999 5,999 19,999 50,000 50,000 29,999 50,000 50,000 1977 50,000 50,000 50,000 9,999 5,999 19,999 50,000 50,000 29,999 50,000 50,000 1978 50,000 50,000 50,000 9,999 5,999 19,999 50,000 50,000 29,999 50,000 50,000 1979 50,000 50,000 50,000 9,999 5,999 19,999 50,000 50,000 29,999 50,000 50,000 1980 50,000 50,000 50,000 9,999 5,999 19,999 50,000 50,000 29,999 50,000 50,000 1981 75,000 75,000 75,000 19,999 5,999 19,999 75,000 75,000 29,999 75,000 75,000 1982 75,000 75,000 75,000 19,999 5,999 19,999 75,000 75,000 29,999 75,000 75,000 1983 75,000 75,000 75,000 19,999 5,999 19,999 75,000 75,000 29,999 75,000 75,000 1984 99,999 99,999 99,999 19,999 9,999 19,999 99,999 99,999 29,999 99,999 99,999 1985 99,999 99,999 99,999 19,999 9,999 19,999 99,999 99,999 29,999 99,999 99,999 1986 99,999 99,999 99,999 19,999 9,999 19,999 99,999 99,999 29,999 99,999 99,999

Source: Current Population Survey Annual Demographic File Technical Documentation Note: In the 1985 March CPS (income year 1984), six values for INCOMP exceeded $29,999 but were not topcoded. In the calculations we did for this paper we corrected this error and topcoded these values at $29,999.

46

Appendix Table 3. Public Use CPS Censoring Points for each Income Source in Dollars (1987–2006)

Year

Primary Earnings

(ERN_VAL) Wages

(WS_VAL)

Self Employment(SE_VAL)

Farm (FRM_VAL)

Social Security

(SS_VAL)

SupplementalSecurity

(SSI_VAL)

Public Assistance (PAW_VAL)

Interest (INT_VAL)

Dividends(DIV_VAL)

Rental (RNT_VAL)

Alimony (ALM_VAL)

Child Support

(CSP_VAL)

1987 99,999 99,999 99,999 99,999 29,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1988 99,999 99,999 99,999 99,999 29,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1989 99,999 99,999 99,999 99,999 29,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1990 99,999 99,999 99,999 99,999 29,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1991 99,999 99,999 99,999 99,999 29,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1992 99,999 99,999 99,999 99,999 29,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1993 99,999 99,999 99,999 99,999 49,999 9,999 24,999 99,999 99,999 99,999 99,999 99,999 1994 99,999 99,999 99,999 99,999 49,999 9,999 24,999 99,999 99,999 99,999 99,999 99,999 1995 150,000 25,000 40,000 25,000 49,999 25,000 24,999 99,999 99,999 99,999 99,999 99,999 1996 150,000 25,000 40,000 25,000 49,999 25,000 24,999 99,999 99,999 99,999 99,999 99,999 1997 150,000 25,000 40,000 25,000 49,999 25,000 24,999 99,999 99,999 99,999 99,999 99,999 1998 150,000 25,000 40,000 25,000 49,999 25,000 24,999 35,000 15,000 25,000 50,000 15,000 1999 150,000 25,000 40,000 25,000 49,999 25,000 24,999 35,000 15,000 25,000 40,000 15,000 2000 150,000 25,000 40,000 25,000 49,999 25,000 24,999 35,000 15,000 25,000 40,000 15,000 2001 150,000 25,000 40,000 25,000 49,999 25,000 24,999 35,000 15,000 25,000 40,000 15,000 2002 200,000 35,000 50,000 25,000 49,999 25,000 24,999 25,000 15,000 40,000 45,000 15,000 2003 200,000 35,000 50,000 25,000 49,999 25,000 24,999 25,000 15,000 40,000 45,000 15,000 2004 200,000 35,000 50,000 25,000 49,999 25,000 24,999 25,000 15,000 40,000 45,000 15,000 2005 200,000 35,000 50,000 25,000 49,999 25,000 24,999 25,000 15,000 40,000 45,000 15,000 2006 200,000 35,000 50,000 25,000 49,999 25,000 24,999 25,000 15,000 40,000 45,000 15,000

Source: Current Population Survey Annual Demographic File Technical Documentation (1988-2002), Current Population Survey Annual Social and Economic Supplement Technical Documentation (2003-2007)

47

Appendix Table 3. (Continued)

Year Unemployment

(UC_VAL)

Workers Comp

(WC_VAL) Veterans

(VET_VAL)

Retirement1st source

(RET_VAL1)

Retirement2nd Source

(RET_VAL2)

Survivors 1st Source

(SUR_VAL1)

Survivors 2nd Source

(SUR_VAL2)

Disability 1st Source(DIS_VAL1)

Disability 2nd Source(DIS_VAL2)

EducationAssistance(ED_VAL)

FinancialAssistance(FIN_VAL)

Other (OI_VAL)

1987 99,999 99,999 29,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1988 99,999 99,999 29,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1989 99,999 99,999 29,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1990 99,999 99,999 29,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1991 99,999 99,999 29,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1992 99,999 99,999 29,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1993 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1994 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1995 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1996 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1997 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1998 99,999 99,999 99,999 45,000 45,000 50,000 50,000 35,000 35,000 20,000 30,000 25,000 1999 99,999 99,999 99,999 45,000 45,000 50,000 50,000 35,000 35,000 20,000 30,000 25,000 2000 99,999 99,999 99,999 45,000 45,000 50,000 50,000 35,000 35,000 20,000 30,000 25,000 2001 99,999 99,999 99,999 45,000 45,000 50,000 50,000 35,000 35,000 20,000 30,000 25,000 2002 99,999 99,999 99,999 45,000 45,000 50,000 50,000 35,000 35,000 20,000 30,000 25,000 2003 99,999 99,999 99,999 45,000 45,000 50,000 50,000 35,000 35,000 20,000 30,000 25,000 2004 99,999 99,999 99,999 45,000 45,000 50,000 50,000 35,000 35,000 20,000 30,000 25,000 2005 99,999 99,999 99,999 45,000 45,000 50,000 50,000 35,000 35,000 20,000 30,000 25,000 2006 99,999 99,999 99,999 45,000 45,000 50,000 50,000 35,000 35,000 20,000 30,000 25,000

Source: Current Population Survey Annual Demographic File Technical Documentation (1988-2002), Current Population Survey Annual Social and Economic Supplement Technical Documentation (2003-2007)

48

Appendix Table 4. Internal CPS Censoring Points for each Income Source in Dollars (1975–1986)

Year Wages (I51A)

Self Employment

(I51B) Farm

(I51C)

SocialSecurity(I52A)

SupplementalSecurity (I52B)

Public Assistance

(I53A) Interest(I53B)

Dividends Rentals (I53C)

Veterans and

Workers Comp (I53D)

Retirement(I53E)

Other(I53F)

1975 99,999 99,999 99,999 9,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1976 99,999 99,999 99,999 9,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1977 99,999 99,999 99,999 9,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1978 99,999 99,999 99,999 9,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1979 99,999 99,999 99,999 19,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1980 99,999 99,999 99,999 19,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1981 99,999 99,999 99,999 19,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1982 99,999 99,999 99,999 19,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1983 99,999 99,999 99,999 19,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1984 99,999 99,999 99,999 19,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1985 250,000 250,000 250,000 19,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1986 250,000 250,000 250,000 19,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999


49

Appendix Table 5. Internal CPS Censoring Points for each Income Source in Dollars (1987–2005)

Year

Primary Earnings

(ERN_VAL) Wages

(WS_VAL)

Self Employment(SE_VAL)

Farm (FRM_VAL)

Social Security

(SS_VAL)

SupplementalSecurity

(SSI_VAL)

Public Assistance (PAW_VAL)

Interest (INT_VAL)

Dividends(DIV_VAL)

Rental (RNT_VAL)

Alimony (ALM_VAL)

Child Support

(CSP_VAL)

1987 299,999 99,999 99,999 99,999 29,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1988 299,999 99,999 99,999 99,999 29,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1989 299,999 99,999 99,999 99,999 29,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1990 299,999 99,999 99,999 99,999 29,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1991 299,999 99,999 99,999 99,999 29,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1992 299,999 99,999 99,999 99,999 29,999 9,999 19,999 99,999 99,999 99,999 99,999 99,999 1993 999,999 999,999 999,999 999,999 49,999 25,000 24,999 99,999 99,999 99,999 99,999 99,999 1994 1,099,999 1,099,999 999,999 999,999 50,000 25,000 25,000 99,999 99,999 99,999 99,999 99,999 1995 1,099,999 1,099,999 999,999 999,999 50,000 25,000 25,000 99,999 99,999 99,999 99,999 99,999 1996 1,099,999 1,099,999 999,999 999,999 50,000 25,000 25,000 99,999 99,999 99,999 99,999 99,999 1997 1,099,999 1,099,999 999,999 999,999 50,000 25,000 25,000 99,999 99,999 99,999 99,999 99,999 1998 1,099,999 1,099,999 999,999 999,999 50,000 25,000 25,000 99,999 99,999 99,999 99,999 99,999 1999 1,099,999 1,099,999 999,999 999,999 50,000 25,000 25,000 99,999 99,999 99,999 99,999 99,999 2000 1,099,999 1,099,999 999,999 999,999 50,000 25,000 25,000 99,999 99,999 99,999 99,999 99,999 2001 1,099,999 1,099,999 999,999 999,999 50,000 25,000 25,000 99,999 99,999 99,999 99,999 99,999 2002 1,099,999 1,099,999 999,999 999,999 50,000 25,000 25,000 99,999 99,999 99,999 99,999 99,999 2003 1,099,999 1,099,999 999,999 999,999 50,000 25,000 25,000 99,999 99,999 99,999 99,999 99,999 2004 1,099,999 1,099,999 999,999 999,999 50,000 25,000 25,000 99,999 99,999 99,999 99,999 99,999


50

Appendix Table 5. (Continued)

Year Unemployment

(UC_VAL)

Workers Comp

(WC_VAL) Veterans

(VET_VAL)

Retirement1st source

(RET_VAL1)

Retirement2nd Source

(RET_VAL2)

Survivors 1st Source

(SUR_VAL1)

Survivors 2nd Source

(SUR_VAL2)

Disability 1st Source(DIS_VAL1)

Disability 2nd Source(DIS_VAL2)

EducationAssistance(ED_VAL)

FinancialAssistance(FIN_VAL)

Other (OI_VAL)

1987 99,999 99,999 29,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1988 99,999 99,999 29,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1989 99,999 99,999 29,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1990 99,999 99,999 29,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1991 99,999 99,999 29,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1992 99,999 99,999 29,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1993 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1994 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1995 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1996 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1997 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1998 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 1999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 2000 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 2001 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 2002 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 2003 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 2004 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999 99,999


51

Appendix Table 6: Rate of topcoding when using consistent topcoding procedures Public Consistent Topcodes Internal Consistent Topcodes

Percent

Topcoded Binding

Year Percent

Topcoded Binding

Year Wage 1.31% 2001 0.36% 1984 Self Employment 3.93% 1978 1.38% 1984 Farm 2.42% 2001 0.28% 1983 Public Assistance 0.48% 1995 0.48% 1995 Supplemental Security 1.94% 1994 0.30% 1992 Social Security 0.40% 1997 0.40% 1997 Interest 1.15% 2004 0.21% 2004 Dividends 1.58% 2001 0.15% 1984 Veterans / Workers Comp 0.23% 1985 0.00% 1986 Retirement 0.17% 1979 0.10% 1984 Other Income 0.17% 1979 0.07% 1984

Source: Author’s calculations using public use and internal March CPS data

52

ENDNOTES 1 Each CPS survey measures income from the previous calendar year. In this paper, all

references are to the income year, so when we discuss the year 1975, this refers to income from

various sources that members of the household received in 1975 reported at the March 1976

Current Population Survey interview.

2 In 2006, we were granted permission to use the internal March CPS to test the sensitivity of

measured income inequality in the public use CPS. We were given access to internal March CPS

records from 1975–2004. These data include information on income above the public use

topcode thresholds for respondents in the March CPS survey up to the processing limit in the

March CPS data. The internal March CPS data the Census Bureau uses to produce the statistics

in its official Census publications are subject to these same processing limits. However these

processing limits are lower than the data collection limits for income in the March CPS, which

are the limits on the actual values collected in the March CPS. (See Welniak, 2003 for a more

complete discussion of the processing limits and data collection limits.) Although we had access

to the same income information the Census Bureau uses to produce its official Census

publications, since our research project was limited to how topcoding affects measurement of

inequality in the USA, the files do not contain the responses to all of the non-income related

questions in the March CPS dataset. These omissions occur because the U.S. Census Bureau

limits internal data availability to data within the scope of the project.

3 We follow the same procedure for generating size-adjusted household income as discussed in

Burkhauser and Larrimore, 2008. These procedures, particularly the use of the square root of

household members to adjust for household size are standard in the international comparative

income distribution literature: see for instance Atkinson, Rainwater and Smeeding (1995). Its use

53

is increasing in studies of US income distribution trends. See Burkhauser, Couch, Houtenville

and Rovba (2004), Gottschalk and Danziger (2005), and Burkhauser, Osaki, and Rovba (2008).

4 See Levy and Murnane (1992) for an early review of the income distribution literature and a

more formal statement of this problem.

5 I(2) is also relatively prone to non-robustness to the effects of outliers in the sense discussed by

Cowell and Victoria-Feser (1996). However, we would stress the sampling variability aspect,

since the ‘averaging’ process used to combine the estimates from our 100 multiply imputed data

sets are likely to smooth out the effects of any outliers being added by the imputation process.

6 For instance, the Luxembourg Income Study website (http://www.lisproject.org) reports Gini

estimates derived from Public-Unadjusted data across the 1980s, 1990s and 2000s without

documentation of the topcoding issues discussed here.

7 Piketty and Saez (2003) only constructed top income shares up to year 1998, but they

subsequently updated this series to year 2006, see http://elsa.berkeley.edu/~saez/. It is these

latter values that we report here.

54

http://www.lisproject.org/

http://elsa.berkeley.edu/%7Esaez/

Documents

ESTIMATING TRENDS IN U.S. INCOME INEQUALITY ...We analyze trends in the inequality of size-adjusted pre-tax post-transfer household income in the USA between 1975 and 2004. Estimates