Upload
alyssa
View
22
Download
0
Embed Size (px)
DESCRIPTION
“Mortgages, Privacy, and Deidentified Data”. Professor Peter Swire Ohio State University Center for American Progress Consumer Financial Protection Bureau Conference on “New Research on Sustainable Mortgages & Access to Credit” October 6, 2011. Overview. - PowerPoint PPT Presentation
Citation preview
“Mortgages, Privacy, and Deidentified Data”
Professor Peter Swire
Ohio State University
Center for American Progress
Consumer Financial Protection Bureau Conference on “New Research on Sustainable
Mortgages & Access to Credit”
October 6, 2011
Overview• Federal experience to date with deidentification (“DeID”)• Why DeID technically harder over time• Technical & administrative measures to protect identity• Court records: public records and privacy• Conclusion: Technology alone often cannot succeed, so the choice
becomes make public, keep private, or create effective data use agreements
Federal DeID to Date 2000 HIPAA rule
Recognized reidentification (“ReID”) is possible Can scrub 18 data fields; or expert testifies have “very small”
risk of ReID Current HHS study in progress on DeID – similar issues to
financial data Data.gov
Administration push for transparency Privacy & DeID more challenging than many had hoped
Census data History of census data sensitivity, required data collection Suppress small cell size; technical limits on researchers’ access
Why DeID is Harder over Time• Two tech trends
• Search vastly improved: Google incorporated in 1999• Increase in (almost) unique publicly available facts
• Mortgages• Street View of each house -- pictures• Public records and likely market values & date of sale of each
house• Social networks, blogs, marketing information available for
purchase:• “We got our new house today, and Bank X did a great/lousy
job”• How hard for forensic, automated efforts to reID?
• Sweeney “K-anonymity” and can shrink “deID mortgage” to one or a few properties
Technical Measures• Technical measures to DeID may:
• Be subject to ReID (previous slide);• Introduce noise to data; or• Both
• Add noise (or subtract signal)• Census approach
• Public data set, suppress small cell size, lots of noise; or• Researchers can run regressions using somewhat better
data• Cynthia Dwork’s “differential privacy” (Microsoft Research)
• Limits queries into database based on tolerance for ReID• Agrawal and other IBM research
• “Hippocratic Database” adds noise with goal of allowing analysis but minimizing risk of linkage
Administrative Measures• HIPAA data use agreements
• Agreements apply to a “limited data set”, with obvious identifiers (name, address) stripped out
• Data use agreement• Contractual guarantees to use data only for limited
purposes, such as research• Promise to use appropriate safeguards on data• Promise not to reID the data
• 2009 CDT conference report on DeID and health data emphasized importance of administrative safeguards
Public Records & Privacy Court records have been the subject of intense study on tradeoffs of
public records and privacy Strong reasons for public access Privacy: juvenile court, financial account info, etc.
Annual Williamsburg conference, each November Many state task forces on subject
Conclusion
Some records are or should be public Some records are or should be private Ability to ReID is large and growing
Technical measures to mask exist but are limited in applicability
Administrative measures often essential for researchers to get meaningful results
Technology alone often cannot succeed, so the choice becomes make public, keep private, or create effective data use agreements