8
“Mortgages, Privacy, and Deidentified Data” Professor Peter Swire Ohio State University Center for American Progress Consumer Financial Protection Bureau Conference on “New Research on Sustainable Mortgages & Access to Credit” October 6, 2011

“Mortgages, Privacy, and Deidentified Data”

  • Upload
    alyssa

  • View
    22

  • Download
    0

Embed Size (px)

DESCRIPTION

“Mortgages, Privacy, and Deidentified Data”. Professor Peter Swire Ohio State University Center for American Progress Consumer Financial Protection Bureau Conference on “New Research on Sustainable Mortgages & Access to Credit” October 6, 2011. Overview. - PowerPoint PPT Presentation

Citation preview

Page 1: “Mortgages, Privacy, and  Deidentified  Data”

“Mortgages, Privacy, and Deidentified Data”

Professor Peter Swire

Ohio State University

Center for American Progress

Consumer Financial Protection Bureau Conference on “New Research on Sustainable

Mortgages & Access to Credit”

October 6, 2011

Page 2: “Mortgages, Privacy, and  Deidentified  Data”

Overview• Federal experience to date with deidentification (“DeID”)• Why DeID technically harder over time• Technical & administrative measures to protect identity• Court records: public records and privacy• Conclusion: Technology alone often cannot succeed, so the choice

becomes make public, keep private, or create effective data use agreements

Page 3: “Mortgages, Privacy, and  Deidentified  Data”

Federal DeID to Date 2000 HIPAA rule

Recognized reidentification (“ReID”) is possible Can scrub 18 data fields; or expert testifies have “very small”

risk of ReID Current HHS study in progress on DeID – similar issues to

financial data Data.gov

Administration push for transparency Privacy & DeID more challenging than many had hoped

Census data History of census data sensitivity, required data collection Suppress small cell size; technical limits on researchers’ access

Page 4: “Mortgages, Privacy, and  Deidentified  Data”

Why DeID is Harder over Time• Two tech trends

• Search vastly improved: Google incorporated in 1999• Increase in (almost) unique publicly available facts

• Mortgages• Street View of each house -- pictures• Public records and likely market values & date of sale of each

house• Social networks, blogs, marketing information available for

purchase:• “We got our new house today, and Bank X did a great/lousy

job”• How hard for forensic, automated efforts to reID?

• Sweeney “K-anonymity” and can shrink “deID mortgage” to one or a few properties

Page 5: “Mortgages, Privacy, and  Deidentified  Data”

Technical Measures• Technical measures to DeID may:

• Be subject to ReID (previous slide);• Introduce noise to data; or• Both

• Add noise (or subtract signal)• Census approach

• Public data set, suppress small cell size, lots of noise; or• Researchers can run regressions using somewhat better

data• Cynthia Dwork’s “differential privacy” (Microsoft Research)

• Limits queries into database based on tolerance for ReID• Agrawal and other IBM research

• “Hippocratic Database” adds noise with goal of allowing analysis but minimizing risk of linkage

Page 6: “Mortgages, Privacy, and  Deidentified  Data”

Administrative Measures• HIPAA data use agreements

• Agreements apply to a “limited data set”, with obvious identifiers (name, address) stripped out

• Data use agreement• Contractual guarantees to use data only for limited

purposes, such as research• Promise to use appropriate safeguards on data• Promise not to reID the data

• 2009 CDT conference report on DeID and health data emphasized importance of administrative safeguards

Page 7: “Mortgages, Privacy, and  Deidentified  Data”

Public Records & Privacy Court records have been the subject of intense study on tradeoffs of

public records and privacy Strong reasons for public access Privacy: juvenile court, financial account info, etc.

Annual Williamsburg conference, each November Many state task forces on subject

Page 8: “Mortgages, Privacy, and  Deidentified  Data”

Conclusion

Some records are or should be public Some records are or should be private Ability to ReID is large and growing

Technical measures to mask exist but are limited in applicability

Administrative measures often essential for researchers to get meaningful results

Technology alone often cannot succeed, so the choice becomes make public, keep private, or create effective data use agreements