Upload
prudence-wood
View
213
Download
1
Tags:
Embed Size (px)
Citation preview
The View from Computation and Algorithms
Andrew OlneyUniversity of Memphis
This Session
• Una-May O’Reilly– MOOCs: Research collaboration, data privacy, and
the role of technology
• Shuangbao Wang– The illusion of privacy in an age of cyberinsecurity
• Solon Barocas– Big data and unexpected threats to privacy
My Background
• Research– Language, Education, AI
• Data– Video, Speech, Motion, Posture, Text, EEG,
Eyetracking, Learning, Decisions/Judgments
• Admin
MOOCdb (Una-May)
• Open-ended standard data description
• Enable cross-course analysis
Video (Shuangbao)
• Automated video content analysis (inVideo)– Audio: keywords/language patterns– Video: reference pictures/knowledge
• inVideo could be applied to provide rich data on videos, turn them into more effective learning tools, and improve MOOCs
Privacy Threats (Solon)
• Benefits– Scientific knowledge– Decision making– Self knowledge
• Privacy protections must be sufficient to enable benefits• Problems
– Anonymity is an oxymoron• An identifier is an identifier• De-anonymization• Inference
– Informed consent cannot be guaranteed– Tyranny of the minority – the Target case
• Risk assessment
Focus Questions
• Threats/harms– De-anonymization– Public perception/discouragement
• Potential value– Scientific knowledge– Decision making– Self knowledge
• What IRB should do
Deanonymization
• Encryption?
• Self-identification– AOL’s 4417749
• Cross-comparison – Netflix (external)– Target (internal)
Identifiability
• How much “encryption” is enough?– Time vs. set size
• Is it possible to guarantee?– Relative to data type– Relative to cross-comparison
Identifiable data types
• Important characteristics– Stationary– Distinctive
• Face• Vocal tracts• Movement• Word choice
Cross-comparison
Threats
• Deanonymization very real– Low dimensionality data set with “vanilla”
indicators
• “Real World” data makes it worse– More chance of cross-comparison– But this is where the interesting questions are
What should IRB do?
• Risk analysis – centered– Worst case scenario considered for
privacy/confidentiality breach
• How will data be shared– Is public ‘anonymized’ warranted?– Restricted-use
Questions?
http://andrewmolney.name