Genomic data privacy
Genomic data are increasingly collected, stored, and shared in research and clinical environments
Genomic data are person-specific (there exists no public registrar that maps genomes to names of individuals)
Genomic data is not specified as an identifying patient attribute under HIPAA privacy rule and may be released for public research purposes
How can person-specific DNA be shared, such that it cannot be associated to its explicit identity?
Data sharing scenario
John Smith admitted to a local hospital which stores clinical and DNA information
John visits other hospitals The hospital forward certain DNA data onto a
research group, with institution and pseudonyms of the patients
The hospital sends identified discharge record onto a state-controlled database
Data at a specific location
Identified table of patient demographics De-identified DNA sequences
Can we uniquely link identified data to DNA data?
Data at multiple locations
Each site has an identified table and de-identified DNA sequences
Can we uniquely link identified data to DNA data?
Trails
The set of locations each patient visited is called a trail
The trails can be tracked and matched to link DNA data to identified data
REIDIT-Complete
Re-identification of data in trails (REIDIT) for complete publishing
If there is a unique trail match, then a re-identification occurred
Reserved publishing Data releasers can reserve certain information N is reserved to P vs. P is reserved to N
REIDIT - Incomplete
REIDIT for reserved publishing
For each trail in the track with incomplete trails, if there is only one supertrail, then a re-identification occurred
Remove the re-identified supertrail Important because a trail can be a supertrail to
many trails Repeat the process
REIDIT-Incomplete
0.0, 0.1, 0.5, 0.9: probability of reserving information; hospital rank based on # of patients
Comments and open issues
Can k-anonymity solve the problem? Pseudonyms subject to dictionary attacks,
how to allow linkage of the data without pseudonyms
Genomic protection methods incorporating utility of the genomic data