CSE711 Topics in Differential Privacy - University at...

Preview:

Citation preview

Marco GaboardiRoom: 338-B

gaboardi@buffalo.eduhttp://www.buffalo.edu/~gaboardi

CSE711 Topics in Differential Privacy

Data

Data

Statistics over Data

Private Queries?

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Private Queries?

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

medicalcorrelation?

queryanswer

Private Queries?

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

answer

query

Does Joe have

cancer?

Private Queries?

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Does Joe have

cancer?

Anonymization?

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Anonymization?

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

medicalcorrelation

queryanswer

Anonymization?

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

answer

query?!?

Anonymous Data

Anonymization

Anonymous Data

Anonymization

Anonymous Data

Anonymization

Additional Data

Anonymous Data

Anonymization

Additional Data

A Possible Solution:Differential Privacy

Differential Privacy

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: the idea

A. Haeberlen

Promising approach: Differential privacy

3USENIX Security (August 12, 2011)

Private data

N(flue, >1955)?

826±10

N(brain tumor, 05-22-1955)?

3 ±700

Noise

?!?

Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.

Trade-off between utility and privacy.

DMNS’06

Differential Privacy

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: the idea

A. Haeberlen

Promising approach: Differential privacy

3USENIX Security (August 12, 2011)

Private data

N(flue, >1955)?

826±10

N(brain tumor, 05-22-1955)?

3 ±700

Noise

?!?

Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.

Trade-off between utility and privacy.

query

answer + noise

medicalcorrelation?

DMNS’06

Differential Privacy

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: the idea

A. Haeberlen

Promising approach: Differential privacy

3USENIX Security (August 12, 2011)

Private data

N(flue, >1955)?

826±10

N(brain tumor, 05-22-1955)?

3 ±700

Noise

?!?

Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.

Trade-off between utility and privacy.

answer + noise

query

?!?

DMNS’06

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: motivation

A. Haeberlen

Motivation: Protecting privacy

2USENIX Security (August 12, 2011)

19144 02-15-1964 flue

19146 05-22-1955 brain tumor

34505 11-01-1988 depression

25012 03-12-1972 diabets

16544 06-14-1956 anemia

...

Fair insurance

plan?

Resend

policy?

DataI know Bob is

born 05-22-1955

in Philadelphia…

A first approach: anonymization.I Using different correlated anonymized data sets one

can learn private data.

Differential Privacy: the idea

A. Haeberlen

Promising approach: Differential privacy

3USENIX Security (August 12, 2011)

Private data

N(flue, >1955)?

826±10

N(brain tumor, 05-22-1955)?

3 ±700

Noise

?!?

Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.

Trade-off between utility and privacy.

?!?

Differential Privacy

DMNS’06

Foundamental Law of Information ReconstructionThe release of too many overly accurate statistics gives

privacy violations.

[DinurNissim02]

Privacy vs. Utility

Privacy vs. Utility

UtilityPrivacy

Privacy vs. Utility

Utility

Privacy

Privacy vs. Utility

Utility

Privacy

Privacy vs. Utility

UtilityPrivacy

This class:understanding the mathematical and

computational meaning of this trade-off.

18

• US Census Bureau - onTheMap, new releases

• Google - RAPPOR tool for Chrome

• Apple - typing statistics reports

• LeapYear - startup

• …

Some Official Users

Syllabus for the courseLocation: Davis 113A Time: Thursday 16:30 - 19:00? Office Hours: Friday 11:00 - 12:00 or by appointment Discussion forums: NB and Piazza class website: http://www.buffalo.edu/~gaboardi/teaching/CSE711-spring17.html

Course load: - presenting part of the material, - commenting on the material presented every week, beforehand on NB and during class, - working on a project and presenting the results (optional)

Grading

50% - material presentation 50% - engagement and participation in class and on NB and Piazza

Reference Material

Cynthia Dwork and Aaron Roth, “The Algorithmic Foundations of Differential Privacy,” 2014 Linked from the class website.

Salil Vadhan, “The Complexity of Differential Privacy” 2016. Linked from the class website.

Questions?

Recommended