7
2015.9.28 Differential Privacy

2015.9.28. - A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has some form of

Embed Size (px)

Citation preview

Page 1: 2015.9.28. - A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has some form of

2015.9.28

Differential Privacy

Page 2: 2015.9.28. - A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has some form of

A preliminary story- A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has some form of cancer.

- We want to know the total number of patients with cancers? Easy! A summation over these binary values

patient has cancer

Amy 0

Tom 1

Jack 1

- But how about if we know anyone must on the list? Or anyone must be the end of the list? Whether Jack has cancer? S(3)-S(2)

Page 3: 2015.9.28. - A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has some form of

A preliminary story- If f is a random query function, for example:

f(i) = count(i) + noise f(5) : { 2, 2, 5, 3} f(4): {2, 2, 5, 3} with same probability

f(5) – f(4) is useless !

Page 4: 2015.9.28. - A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has some form of

GIC Incidence [Sweeny 2002]

• Group Insurance Commissions (GIC, Massachusetts)– Collected patient data for ~135,000 state employees.– Gave to researchers and sold to industry.– Medical record of the former state governor is identified.

Patient 1 Patient 2 Patient n

GIC, MA

DB

……

…… Age Sex Zip code Disease

69 M 47906 Cancer

65 M 47907 Cancer

52 F 47902 Flu

43 F 46204 Gastritis

42 F 46208 Hepatitis

47 F 46203 Bronchitis

Name

Bob

Carl

Daisy

Emily

Flora

Gabriel

4Re-identification occurs!Topic 21: Data Privacy

Page 5: 2015.9.28. - A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has some form of

DefinitionsLet be a randomized algorithm. Let be two datasets that differ in at most one entry (we call these database neighbors)

xi xi’

D1 D2

Database neighbors

Deifinition 1. Let . Define to be private if for all neighboring databases , and for all (measurable) subsets, we have

Where the probability is taken over the coin tosses of

Page 6: 2015.9.28. - A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has some form of

Deifinition 1. Let . Define to be private if for all neighboring databases , and for all(measurable) subsets , we have

Where the probability is taken over the coin tosses of

Observation 2. Because we can switch interchangeably, Definition 1 implies that

Since for small , then we have roughly

satisfies

Page 7: 2015.9.28. - A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has some form of

Laplace distribution