11 ACS Public Use Microdata Samples of 2005 and 2006 – How to Use the Replicate Weights B. Dale Garrett and Michael Starsinic U.S. Census Bureau AAPOR

11

ACS Public Use Microdata Samples of 2005 and 2006 –

How to Use the Replicate Weights

B. Dale Garrett and Michael StarsinicU.S. Census Bureau

AAPOR Conference, New OrleansMay 16, 2008

2

Public Data

• The American Community Survey (ACS) produces an annual Public Use Microdata Sample (PUMS) file.

• You can download these files for free.

• Write your own program to tally and analyze data.

3

Key Points

• PUMS data users want to know the reliability of an estimate.

• This paper explains how to use PUMS replicate weights to estimate standard errors.

44

Outline

• the American Community Survey (ACS)

• the Public Use Microdata Sample (PUMS)– sample design– confidentiality– weights– standard errors– issues with standard errors

55

The American Community Survey

• The 2005 ACS – Sample of 250,000 housing units per month.– Every county represented in the fifty states,

District of Columbia and Puerto Rico.– Collects population and housing characteristics

• The 2006 ACS was similar but added– A sample of both institutional and noninstitutional

Group Quarters population.– GQ sample size was 16,000 persons per month

66

PUMS Sample Design

• PUMS is a subsample of ACS

– Sort the ACS interviews on geography, mode of interview, types of housing units, demographics

– Sample size: • one percent of the total HUs and HH persons in 2005 and

2006.• one percent of total GQ persons in 2006

– Systematic sampling at the state and PUMA level.

7

PUMA Definition

• PUMA - Public Use Microdata Area

– Designed for public release of information by local state officials.

– Large enough to achieve disclosure avoidance. • An area of 100,000 population or more as of the 2000

Census.

88

PUMS Protects Confidentiality• PUMS does not reveal:

– Names of persons.– Address.– Detailed Type of group quarters.– Geographic data below the PUMA level.

• The respondent’s identity is protected.– Top-coding of age, income and other variables.– Data swapping– Synthetic data– Perturbation of data

9

Rural PUMAs in KY

9

10

PUMAs in Baltimore Co., MD

10

1111

PUMS Weighting

• The PUMS initial weight was equal to the ACS final weight times the sampling interval.

• The 2006 PUMS file was ratio-estimated to ACS– persons in households by sex by PUMA– housing units by vacant/occupied by PUMA– persons in group quarters by institutional/

noninstitutional by state

1212

How to Program an Estimate – Counts, Aggregates, Ratios, Medians

• Totals (counts)– Sum the PUMS weights (for the characteristic).

• Aggregates– Sum the product of the PUMS weight times the

value

• Ratios– Form the total or aggregate for the numerator– Sum the PUMS weights for the characteristic in the

denominator– Divide

• Medians – use weighted distributions

1313

ACS Standard Errors

• The ACS uses the successive difference model of replicate weights to estimate standard errors.

• The successive difference model of Kirk Wolter was developed for ACS by Robert Fay and George Train.

http://www.census.gov/hhes/www/saipe/asapaper/FayTrain95.pdf

1414

Two Methods for PUMS Standard Errors

• Design factor method– Design factors are factors to multiply times the

standard error of a simple random sample. – Easier to use than the replicate weights

• Replicate weight method– Generally, you get a more accurate standard error

estimate by using the replicate weights.– Somewhat more work than design factors.

http://acsweb2.acs.census.gov/acs/www/Downloads/2006/AccuracyPUMS.pdf

1515

Three Steps to Standard ErrorsUsing Replicate Weights

• Write a program to derive an estimate using the PUMS weight.

• Run the program 80 more times using each of the 80 replicate weights.

• Use the PUMS estimate and the 80 replicate estimates in the Standard Error formula.

1616

ACS PUMS Replicate Weight Formula for a Standard Error

80

1

2r XX

80

4SE

r

• where:– X is the estimate formed from the PUMS weight

– Xr is the estimate formed from the rth replicate weight.

1717

Standard Errors of Differences

• There are two estimates, A and B.

• You want to use a Z-test to see if the difference (A – B) is significant.

• The Z-test requires the standard error of the difference.

1818

For Independent Estimates

• SEA-B – the standard error of (A – B) • SEA – the standard error of estimate A• SE B– the standard error of estimate B

22B)-(A SE BA SESE

Use the standard errors of the two estimates to estimate the standard error of the difference.

1919

For Correlated Estimates

• Directly use the replicate weights to calculate the standard error of the difference.

– Let X = (A - B) = the difference

– Let Xr = (Ar – Br )• for the 80 replicate differences X1 … X80

• Use the replicate weight formula (seen earlier).

2020

Replicate Weight Issues• Estimate is zero, standard error is not zero.

– Cannot use replicate weights to estimate the standard error.– See the PUMS Accuracy document for a formula.

• The replicate standard error is zero, estimate is not zero.– Zero means that if you reselected the sample the answer would

be the same.– Acceptable if estimate controlled in the weighting.– Not acceptable if the estimate is a median. Often a direct

median gives a zero standard error.

21

Standard Error Options for Medians

• Direct median with replicate weights may give a zero standard error. This is not good.

• Categorical median with replicate weights will give a more stable standard error, but still some zero standard errors.

• Design factor method – Start with either the direct or categorical median, use design factors for the standard error.

22

Conclusion

• Replicate weights for ACS PUMS are:– Available for 2005 PUMS and later.– Easy to use for most estimates.– Few issues

• For medians – Replicate weight standard errors may be zeros.– To avoid the zeros use the design factor method.

2323

References

• US Census Bureau: Accuracy of the Data (2006) for ACS is found at:

– http://www.census.gov/acs/www/Downloads/ACS/accuracy2006.pdf

• US Census Bureau: PUMS Accuracy of the Data (2006) is found at:

– http://acsweb2.acs.census.gov/acs/www/Downloads/2006/AccuracyPUMS.

pdf

• US Census Bureau: Design and Methodology: American Community

Survey, Technical Paper 67, May 2006,

– http://www.census.gov/acs/www/Downloads/tp67.pdf

• Fay & Train, Aspects of Survey and Model-Based Postcensal Estimation of

Income and Poverty Characteristics for States and Counties, 1995

– http://www.census.gov/hhes/www/saipe/asapaper/FayTrain95.pdf

http://www.census.gov/acs/www/Downloads/ACS/accuracy2006.pdf



http://www.census.gov/acs/www/Downloads/tp67.pdf

2424

Contact Information

• For questions about this presentation or for an example program to generate standard errors.

• Contact me at [email protected]

Views expressed in this paper are those of the authors and not necessarily those of the U.S. Census Bureau.

mailto:[email protected]

2525

How to Derive an Estimate – Direct Medians

• The direct median is the weighted sample median or the distributional median.

• Sum the weights for the characteristic total.• Sort the file on the value of interest.• Sum the weights until the 50% point.• The direct median is the value of the record

which crosses the 50% point.• Or a point between the values of two records

that divide the file into two exact halves.

2626

How to Derive an Estimate –Categorical Medians

• Categorical or interpolated medians.– Used for published ACS statistics in Factfinder.

• Categorical medians are interpolations:– A weighted distribution of the characteristic.– Each bin or row is assigned a range of values.– Uses linear interpolation for most variables.

27

Direct Median Example Based on 5 Records

Record # Percent of Total

Income from record

Direct median

1 18% 18,000

2 22% 33,000

3 20% 41,000 41,000

4 15% 49,000

5 25% 62,000

28

Direct and Categorical Medians Example Based on 5 Records

Income

Range

Record #

Percent of Total

Income from record

Direct median

Categorical median

-59,000 to 20,000

1

18% 18,000

20,000 to 40,000

2

22% 33,000

40,000

to

60,000

3 20% 41,000 41,000 45,700

4 15% 49,000

60,000 + 5 25% 62,000

Documents

11 ACS Public Use Microdata Samples of 2005 and 2006 – How to Use the Replicate Weights B. Dale Garrett and Michael Starsinic U.S. Census Bureau AAPOR