Upload
jewel-leonard
View
214
Download
0
Embed Size (px)
Citation preview
Sally Obenski and Jim FarberU.S. Census Bureau
CNSTAT Panel January 26, 2007
Expanding the Use of Administrative Records: Methods, Applications, and
Challenges
2
Overview
Overview and History Key technical breakthroughs Decennial and Survey Applications Medicaid Undercount Study Value of Integrated Data Sets Operational and Technical Constraints Policy Challenges Conclusions
3
Mandate for Administrative Records Use
Title 13, Section 6: Use administrative records information as
extensively as possible in lieu of conducting direct inquires
Census Bureau Strategic Plan:Reduce reporting burden and minimize cost to
taxpayer by acquiring and developing high-quality data from sources maintained by other government and commercial entities
4
Legal Guidance and Protections
Title 13, U.S.C., Section 6, 9, and 214
Title 26, U.S.C., Section 6103(j)
Privacy Act of 1974
Paperwork Reduction Act
Government Information Security Reform Act
(GISRA)
E-Government Act of 2002, including Federal Information Security Management Act (FISMA)
5
Safeguarding Administrative Records
Consistent Application of Policies To ensure that projects have the appropriate legal
authorization, comply with existing data agreements, and provide adequate controls to protect confidentiality and privacy
Administrative Controls Numerous levels of approval Need-to-know access Removal of identifiable information Administrative Records Tracking System Security and confidentiality training
6
AR Program Evolution
Today
Program begins
mid 1990s
July 1993 1996
AR Research staff created
Survey launched to gather info on potential AR files
Early 1990s
Statistical uses of AR
conference held
July 1993
1999/2000
Projects included AREX 2000 and the
1999 StARS prototype
1999
Centralized program emerges
Data Stewardship
program begins
2001
AR Test for 2000
Census
Race Model Addresses
Quality Concerns
PVS Increases Linking
Capacity
Infrastructure investments
allow new interagency
collaborations
2004
2002 2005
7
StARS Provides Technical Infrastructure
Person & Address Databases consist of 7 national files:
IRS 1040 HUD TRACS IHS SSS
IRS 1099 HUD PIC Medicare
CY2004 records Persons Addresses
Raw input 894 million 767 million
StARS 308 million 152 million
8
Administrative Records Experiment Validated StARS
Local test of AR census models conducted in 5 counties
Coverage issues similar to Census 2000
Validated conformance of StARS to Census 2000 addresses & persons
Improvements to StARS continue, including move to more real-time redesign (E-StARS)
9
NUMIDENT Provides National Reference File
Social Security Administration (SSA) Numerical Identification (Numident) Transaction file with 803 million records
Collapse to 431 million unique SSN records
Usages:Look-up file that provides demographic dataSocial Security Numbers (SSNs) verification/
validation
10
Race and Hispanic Origin Model Rectified Quality Concerns
Initial weakness was dependence on race data from SSA’s SSN transaction file
Census 2000 records matched to SSN transaction file
Model completed missing linkages
11
Person Validation System (PVS) Increases Linking Capacity
Use master file of SSN/name/DOB as reference file
Link addresses with SSN reference file Match incoming census or survey record using
name, address, DOB Search within address first (high quality match) Search by name/DOB nationally if address
search not successful Replaces SSN with unique identifier (PIK)
12
Person Validation System (PVS) Increases Linking Capacity
Prepare NumidentPrepare Incoming DataRun Verification PhaseRun Geokey SearchRun Name SearchApply PIKs
Incoming Incoming DataData
Incoming Incoming DataDataNumidentNumidentNumidentNumident
VerificationVerificationVerificationVerification
Geokey Geokey SearchSearch
Geokey Geokey SearchSearch
NameNameSearchSearch
NameNameSearchSearch
FinalFinalFinalFinal
13
Implementing the ACS Provides Current Long Form Data
Designed to ameliorate constraints of decennial long form data collection
Provides means for timely analyses and estimates at small geographic areas
Provides means to push models based on less granular surveys to smaller geographies
14
AR Integral to Census Bureau Programs
Internal Revenue Service (IRS) 1040 Intercensal EstimatesSmall Area Income and Poverty Estimates
CMS Medicare and MedicaidNational Longitudinal Mortality StudySmall Area Income and Poverty EstimatesSmall Area Health Insurance Estimates
State Unemployment Insurance FilesLongitudinal Employer-Household Dynamics
15
Current Decennial Census Research
Using AR to “assign” age, race, sex, Hispanic Origin, when a record can be matched
Use AR to identify households with coverage problems
Determine if commercially available & other lists can improve & help build GQ frame
16
Emerging Survey Improvement (1)
Reducing ACS small area varianceUse AR as controls to adjust survey weights right
after nonresponse adjustmentPreliminary research highly promising
Obtaining characteristics on nonrespondentsCompared StARS persons to CPS responders to
ensure consistencyUsed StARS to obtain characteristics of
nonresponding households
17
Emerging Survey Improvement (2)
Reacting to disaster and other near-real time requirementsKatrina’s effect on the federal statistical system
and our lack of current response data highlighted need
Acquired the USPS National Change of Address File and FEMA’s emergency management and flood insurance files
Developing next generation StARS – near real-time measurements
18
Medicaid Undercount Study (1)
Survey estimates are important to policy research
Examining the large discrepancy between survey estimates and Medicaid enrollment figures
Multi-phased, interagency study, including academia
19
Medicaid Undercount Study (2)
Phase I: Examines quality and characteristics of Medicaid and Medicare files (2000-2002)
Phase II: Conducts national match of Medicaid files to Current Population Survey ASEC (2001-2002)
Phase III: Conducts selected state matches of Medicaid files to states in CPS ASEC
Phase IV: Conducts national match of Medicaid files to the National Health Interview Survey and compares results
20
Value of Integrated Data Sets
Provides more robust and accurate picture
Builds on strengths of both views while controlling for their weaknesses
Provides better statistics for input into simulations for predictions and funds distribution
As the demand for data increases and budgets decrease data re-use many be the only cost-effective option
21
Operational Constraints
File Acquisition ComplexitiesComplex Memoranda of UnderstandingState by state negotiationDifferences in content definition, quality, and
program rules over time
File lag timeMSIS (Federal Medicaid) lags by about 4 yearsMost lag for about a yearMany applications require more near real-time
response
22
Illustrative Examples
Federal Files—yearly acquisitions IRS 1040, IRS 1099, CMS Medicare, Medicaid
Federal Files—quarterly or monthly updatesSSA Numident, USPS NCOA
State Files—on as needed basisMD Food Stamps, MD TANF, MD Child Care
Subsidy State Files—on a quarterly basis
UI Wage and ES 202 (currently from 40 states) Commercial Files—INFOUSA
23
Technical Constraints
Obtaining the right data in the right format
Varying rates of validation (e.g., Medicare 99%, Medicaid 91%)
Coarseness of administrative data compared to nuances of surveys
Measuring error
24
Policy Challenges
Communicating the benefits vs. privacy concerns
Need for interagency teams to ensure accurate results
Interagency agreements and mission
“Ownership” of the integrated data sets
Growth of possible disclosure risks
Need for longitudinal data bases in order to find an anonomyzed person at an address at a point in time
25
Overcoming the Constraints
Resolving file acquisition issues may require OMB or Congressional assistance
Lag time for general demographics addressed by National Change of Address file—planning move to Enhanced StARS for more near real-time response
Standardized and centralized file acquisition
New files in address search phase and SAS-based matcher increased validation rates
Data Quality Standards team addressing measuring error in integrated data sets
Increasing inter-agency efforts for generating integrated data sets
26
Conclusions
New files and innovations leading to expansion of AR uses
New challenges continue to arise Regular updating of billions of records to have a near real-
time response system Effectively acquiring state-based records Understanding integrated data sets
At incipience of a new generation of products, services, and inter-agency opportunities
27
Contact Information
Sally M. ObenskiAssistant Division Chief for Administrative
Records ApplicationsData Integration DivisionU.S. Census BureauWashington D.C. 20233-8100
Email: [email protected]: (301) 763-4044Cell: (301) 873-7615