Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
1
Data Privacy in Biomedicine
Lecture 1: Introduction and Overview
Bradley Malin, PhD ([email protected])
Professor of Biomedical Informatics, Biostatistics, & Computer Science
Vanderbilt University
January 6, 2020
© 2020 Bradley Malin 2Data Privacy in Biomedicine: Lecture 1
https://www.nytimes.com/2019/11/11/business/google-ascension-health-data.html
https://healthitsecurity.com/news/google-ascension-partnership-fuels-overdue-hipaa-privacy-debate
© 2020 Bradley Malin 3Data Privacy in Biomedicine: Lecture 1
“Privacy”
© 2020 Bradley Malin 4Data Privacy in Biomedicine: Lecture 1
“Patient Privacy”
© 2020 Bradley Malin 5Data Privacy in Biomedicine: Lecture 1
“Medical Data Privacy”
© 2020 Bradley Malin 6Data Privacy in Biomedicine: Lecture 1
2
© 2020 Bradley Malin 7Data Privacy in Biomedicine: Lecture 1
April 2011“Big Data”
© 2020 Bradley Malin 8Data Privacy in Biomedicine: Lecture 1
http://www.nytimes.com/2014/05/16/us/us-mines-personal-health-data-to-aid-emergency-response
May 2014“Risk Prediction”
© 2020 Bradley Malin 9Data Privacy in Biomedicine: Lecture 1
Dec 2014“Access”
http://www.thestar.com/news/gta/2014/12/26/star_investigation_3_gta_hospitals_dont_proactively_audit_accesstopatient_files.html
© 2020 Bradley Malin 10Data Privacy in Biomedicine: Lecture 1
http://www.nytimes.com/2015/12/24/nyregion/a-patient-is-sued-and-his-mental-health-diagnosis-becomes-public.html
Dec 2015“Revelation”
© 2020 Bradley Malin 11Data Privacy in Biomedicine: Lecture 1
January 2018“DNA”
© 2020 Bradley Malin 12Data Privacy in Biomedicine: Lecture 1
https://www.news-medical.net/news/20190104/Advances-in-artificial-intelligence-threaten-privacy-of-peoples-health-data.aspx
January 2019
3
© 2020 Bradley Malin 13Data Privacy in Biomedicine: Lecture 1
https://www.mobihealthnews.com/news/opinion-ai-privacy-and-apis-will-mould-digital-health-2020
© 2020 Bradley Malin 14Data Privacy in Biomedicine: Lecture 1
© 2020 Bradley Malin 15Data Privacy in Biomedicine: Lecture 1
Welcome toData Privacy in Biomedicine
For CS : You’re sitting in 8396-02
For Informatics: You’re sitting in 7380
When: Mondays and Wednesdays, 3:30 – 4:45pmWhere: Featheringill Hall, Room 211
Office Hours: Upon Request
Contact: [email protected]
© 2020 Bradley Malin 16Data Privacy in Biomedicine: Lecture 1
Your Professor is Brad Malin BS, Molecular Biology
MS, Computer Science (Machine Learning)
MPhil, Public Policy & Management
PhD, Computer Science (Software Engineering)
Faculty Member: DBMI (1st), Biostats (2nd), EECS (2nd)
Directs: Health Data Science Center (https://www.vumc.org/heads/)
Directs: Health Information Privacy Laboratory (http://www.hiplab.org)
Sample Research Areas Medical Record Access Control, Mining, and Modeling
Anonymization of Medical & Genomic Data
Big Data Record Linkage
Synthetic Data Generation
© 2020 Bradley Malin 17Data Privacy in Biomedicine: Lecture 1
http://www.hiplab.org/courses/BMIF380
More to come (projects, homeworks, etc.) – links will be available from front page of website
© 2020 Bradley Malin 18Data Privacy in Biomedicine: Lecture 1
Course Objectives
After this course, you should be able to analyze data privacy from three non-exclusive perspectives: Data Detectives: Understand how seemingly private
information, can be discovered (or exploited) using automated strategies.
Data Protectors: Construct privacy protection technologies that provide formal computational guarantees of privacy in disclosed databases.
Technology Policy Designers: Develop privacy protection technologies that complement policy regulations.
4
© 2020 Bradley Malin 19Data Privacy in Biomedicine: Lecture 1
Expectations
You are expected to be competent in an object oriented programming language (Java, C++, Python, …)
You are expected to have a working knowledge of the Internet, word processing, and basic databases (Access, Oracle, MySQL, PostGres) and analysis tools (R, Python, Matlab, Scala, Julia, Excel, …)
© 2020 Bradley Malin 20Data Privacy in Biomedicine: Lecture 1
Beyond Expectations
You have experience in information security
data structures, algorithms, and statistics
public policy and legal frameworks
© 2020 Bradley Malin 21Data Privacy in Biomedicine: Lecture 1
Grading
This is a research-oriented course. There are no exams.
A substantial portion of your grade will be based on your “final” project.
Criteria % of Total Grade
Final Project 50%
Homework Assignments 30%
Reading Summaries 10%
Class Participation 10%
© 2020 Bradley Malin 22Data Privacy in Biomedicine: Lecture 1
Homework Policy
Unless the assignment calls for a group project, please do your own homework.
You can discuss the homework with other students, including the ways in which you approach the solutions to the questions, but the final submission must be your own.
Do not plagiarize without proper attribution – not even in your reading summaries, which leads me to…
© 2020 Bradley Malin 23Data Privacy in Biomedicine: Lecture 1
Reading Summaries There is no textbook for this course.
Assigned readings will be available the lecture before it is due (at the latest).
Your summaries should be no more than 1 page in length
Summaries will be graded on a {-, , +} scale - : You skimmed the reading and barely understood its meaning
: You read the reading and provided a reasonable account of its contents
+: You demonstrated critical reasoning and insight regarding the topic
Submit summaries to [email protected] before class
© 2020 Bradley Malin 24Data Privacy in Biomedicine: Lecture 1
Final Projects
Your project should be an independent study on a data privacy issue, with relationship to the area of biology, medicine, or health more generally
You may design your own project or choose from a predefined set of topics (will be available on the course website later in the semester)
Do not be afraid to discuss your project ideas with the instructor!
5
© 2020 Bradley Malin 25Data Privacy in Biomedicine: Lecture 1
Sample Topics Access Control Frameworks for Distributed Medical Record Systems
Surveillance of Electronic Medical Record Accesses for Suspicious Behavior
Evaluation and Design of Privacy Technologies for Personal Health Records (See Microsoft HealthVault Initiative)
Finding & Relating Publicly Available Repositories of Person Specific Biomedical Information
Building and Evaluating Clinical Text De-identification Tools
Anonymization of clinical profiles / sets of diagnoses
Applications of big data frameworks to sanitizing clinical data
Applications of security frameworks (e.g., blockchain)
© 2020 Bradley Malin 26Data Privacy in Biomedicine: Lecture 1
Final ProjectsCriteria Due Date
% of Grade
Project Proposal: A one-pager that describes the project area and how you intend to address the research within the confines of this semester. This will be broken down into a several phases.
March 11 5%
Status Report Presentation: Briefing for the class on project area and first phase of research. (No more than 5 minutes)
March 25 5%
Written Project Status Report: A summary of the progress you have made (No more than 4 pages).
March 29 10%
Final Project Presentation: Showcase of researchmethods and results. (No more than 15 minutes)
April 20 (last day of class)
5%
Final Project Report: This will be in the form of a conference-style paper. It will summarize the research area, your methodology, experience, and contributions of your work.
April 26 (inlieu of final)
25%
© 2020 Bradley Malin 27Data Privacy in Biomedicine: Lecture 1
Why Do We Need A Course on Privacy?
© 2020 Bradley Malin 28Data Privacy in Biomedicine: Lecture 1
Authentication: login with password, tokens, keys
Authorization: permission and role-based models to read/write data
Encryption: to avoid eavesdropping during transmission and storage
Security for Privacy?
© 2020 Bradley Malin 29Data Privacy in Biomedicine: Lecture 1
Authentication
Authorization
Encryption
But Data Can Re-identify!
Can I see some anonymous
data?
Security for Privacy?
© 2020 Bradley Malin 30Data Privacy in Biomedicine: Lecture 1
Security for Privacy?
Ah! I know who this is!
Authentication
Authorization
Encryption
But Data Can Re-identify!
6
© 2020 Bradley Malin 31Data Privacy in Biomedicine: Lecture 1
Data Privacy Definitions(paraphrase Sweeney)
• Privacy Protection (“data protectors”):• release information such that entity-specific
properties (e.g. identity) are controlled
• restrict what can be learned
• The study of computational solutions for releasing data such that a) the data is practically useful (utility) while b) the aspects of the subjects of the data are not revealed (privacy).
• Data Linkage (“data detectives”)
• combining disparate pieces of entity-specific information to learn more about an entity
© 2020 Bradley Malin 32Data Privacy in Biomedicine: Lecture 1
A Visual Perspective
Utility
Privacy
© 2020 Bradley Malin 33Data Privacy in Biomedicine: Lecture 1
A Visual Perspective
Utility
Privacy
To ensure utility, you must reveal all the
data
© 2020 Bradley Malin 34Data Privacy in Biomedicine: Lecture 1
A Visual Perspective
Utility
Privacy
To ensure privacy, you must not reveal any
data
© 2020 Bradley Malin 35Data Privacy in Biomedicine: Lecture 1
A Visual Perspective
Utility
Privacy
HereLives
Data Privacy
© 2020 Bradley Malin 36Data Privacy in Biomedicine: Lecture 1
A Visual Perspective
Utility
Privacy
7
© 2020 Bradley Malin 37Data Privacy in Biomedicine: Lecture 1
Privacy, Policy, & Preference
Individuals want control over who can – AND CAN NOT – view their health-related records
Physicians
Insurance
Researchers
HOSPITAL RECORDS
YodaData
YodaData
YodaData
Yoda’s PreferencesPhysicians = YesInsurance = Yes
Researchers = No
© 2020 Bradley Malin 38Data Privacy in Biomedicine: Lecture 1
Data Collection, Policy, and Privacy
Can design technology to:Standardize policy specification
Inform about data collection
Address specific privacy concerns in data sharing Anonymity
Confidentiality
Solitude
© 2020 Bradley Malin 39Data Privacy in Biomedicine: Lecture 1
Beyond Policy andInformative Technology
We can not always control who gets, and has access to, our information
Legally, however, data collectors may be required to maintain your privacy
© 2020 Bradley Malin 40Data Privacy in Biomedicine: Lecture 1
Commons Center
Data Collection occurs everywhere, everyday, in all different forms(http://webcams.vanderbilt.edu/)
Ingram Commons
© 2020 Bradley Malin 41Data Privacy in Biomedicine: Lecture 1
Featheringill!
https://engineering.vanderbilt.edu/atrium-cam/
© 2020 Bradley Malin 42Data Privacy in Biomedicine: Lecture 1
More Cameras
http://peabody.vanderbilt.edu/about/webcams/peabody_library_terrace_webcam.php
Peabody Library Terrace
Even More Cameras!http://peabody.vanderbilt.edu/about/webcams/index.phphttp://webcams.vanderbilt.edu/kissam/
8
© 2020 Bradley Malin 43Data Privacy in Biomedicine: Lecture 1
Biomedical Information
Not quite in the public
But… information is shared for various purposes in various contexts
How do you protect privacy of corresponding individuals?
© 2020 Bradley Malin 44Data Privacy in Biomedicine: Lecture 1
Schedule
Let’s look at the syllabus.
© 2020 Bradley Malin 45Data Privacy in Biomedicine: Lecture 1
Privacy Policy & the Law(Week 1)
Privacy ideologies & frameworksWho gets to collect
information?
When is health information shared?
How is health information reused and why?
© 2020 Bradley Malin 46Data Privacy in Biomedicine: Lecture 1
Auditing (Week 2)
Medical Records & Audits Who looked at my medical record?
Should they be looking?
How do we construct machine learning strategies that make sense?
© 2020 Bradley Malin 47Data Privacy in Biomedicine: Lecture 1
Audits, Access Control, & Roles(Weeks 2 and 3)
Access Control Who gets to see information when?
Roles, job functions, & permissions
Formal representations
What Constitutes a “Good” Role Representation of organizational
behavior
Grouping users based on legacy knowledge provided by system administrators
© 2020 Bradley Malin 48Data Privacy in Biomedicine: Lecture 1
Martin Luther King Day
No class on January 20 https://www.huffingtonpost.com/2014/01/20/martin-luther-king-fbi_n_4631112.html
9
© 2020 Bradley Malin 49Data Privacy in Biomedicine: Lecture 1
De-identification & Scrubbing Narratives(Week 4)
How can we detect and suppress “identifiers” from unstructured data (e.g., clinical narratives)?
Welcome to the wonderful world of natural language processing
© 2020 Bradley Malin 50Data Privacy in Biomedicine: Lecture 1
Blockchain(Week 5)
What is a blockchain?
What is it useful for?
Under what conditions can it facilitate healthcare?
© 2020 Bradley Malin 51Data Privacy in Biomedicine: Lecture 1
Identifiability & A Whole Lot of Data(Week 5)
When can we find what was suppressed? Using sample & population
statistics to identify
Techniques to compute distinguishability
It’s hard to protect health information… simply because there’s so much data And there’s alot more data than
what’s in the medical record!
We’ll use Social Security Numbers as an example
© 2020 Bradley Malin 52Data Privacy in Biomedicine: Lecture 1
Record Linkage (Weeks 6 and 7)
Given all the data, how can we link it?
Look at “deterministic” methods, such as rules What are the idiosynchrasies in the health domain that allow them to
work? Enable them to fail?
Look at probabilistic methods based on frequentist and Bayesian statistics
© 2020 Bradley Malin 53Data Privacy in Biomedicine: Lecture 1
Anonymization (Weeks 8 and 9)
If de-identification fails, can we provably protect identity? Yes we can! Welcome to formal models of anonymization
We’ll start with
k-based models Guarantee every shared
record corresponds to at least k people
Efficient algorithms to achieve this goal
Various “types” of data
© 2020 Bradley Malin 54Data Privacy in Biomedicine: Lecture 1
Spring Break!!!
No class on March 2 or 4
10
© 2020 Bradley Malin 55Data Privacy in Biomedicine: Lecture 1
Moving Beyond Anonymity(Weeks 9 - 11)
Hiding in a crowd doesn’t always protect sensitive knowledge How to design algorithms to protect
against homogeneity and inference attacks.
We’ll look at the identifiability concerns associated high-dimensional data with a focus on: genetic information
Statistical methods for identification
Strategies for anonymization of DNA data
© 2020 Bradley Malin 56Data Privacy in Biomedicine: Lecture 1
Ethical Reasoning, the Law, and Privacy(Weeks 9 and 10)
When should you publish on privacy ad vulnerabilities?
Should you disseminate re-identification software or findings?
What is law enforcement’s responsibility?
© 2020 Bradley Malin 57Data Privacy in Biomedicine: Lecture 1
Image and Video Privacy (Week 12)
Images are everywhere in healthcare
Video is becoming more prevalent
How can we remove identifiers from JPEG, MPEG and other multimedia?
© 2020 Bradley Malin 58Data Privacy in Biomedicine: Lecture 1
Epidemiology and Geospatial Privacy (Week 13)
Location data is shared for various purposes, but too much granularity can lead to identification
How does identification occur?
What anonymization strategies work for geocoded and spatial data? When?
© 2020 Bradley Malin 59Data Privacy in Biomedicine: Lecture 1
Privacy Preserving Data Mining(Week 14)
You have data. I have data. We all have data. How can be combine data to reveal results, but no individual records? We’ll look at cryptographic
methods for secure multiparty computation.
Consider “horizontal” (different people different place) vs. “vertical” (same person, different place) partitioned data systems
© 2020 Bradley Malin 60Data Privacy in Biomedicine: Lecture 1
Final Project Presentations!(Week 15)
The students are in control
You’ll be graded by a committee of special reviewers
11
© 2020 Bradley Malin 61Data Privacy in Biomedicine: Lecture 1
Readings for Next Lecture
S. Warren and L. Brandeis. The right to privacy. Harvard Law Review. 1890; V. IV, No. 5. http://faculty.uml.edu/sgallagher/Brandeisprivacy.htm
Department of Health and Human Services Summary of the Privacy Rule of the Health Information Portability and Accountability Act (HIPAA) http://www.hhs.gov/ocr/privacy/hipaa/understanding/summary/index.html
Optional D. McGraw, J. Dempsey, L. Haris, and J. Goldman. Privacy as an enabler, not an
impediment: building trust into health information exchange. Health Affairs. 2009; 28(2): 416-427. http://content.healthaffairs.org/content/28/2/416.full.pdf+html
O. Tene and J. Polonetsky. Privacy in the age of big data: a time for big decisions. Stanford Law Review (Online). 2012; 64: 63.