Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Marco GaboardiUniversity of Dundee
Verifying Differentially Private Bayesian Inference
Joint work with G. Barthe, G.P. Farina, E.J. Gallego Arias, A.Gordon,…
Differentially Private vs Probabilistic Inference
DifferentialPrivacy
ProbabilisticInference
Differentially Private vs Probabilistic Inference
DifferentialPrivacy
ProbabilisticInference
The goal in machine learning is very often similar to the goal in private data analysis. The learner typically wishes to learn some simple rule that explains a data set. However, she wishes this rule to generalize […] Generally, this means that she wants to learn a rule that captures distributional information about the data set on hand, in a way that does not depend too specifically on any single data point.
C. Dwork & A. Roth - The algorithmic Foundations of Differential privacy
Outline
• Bayesian learning
• A language for bayesian learning
• Differential Privacy
• Combining differential privacy and bayesian learning
Bayesian Inference
Probabilistic Inference
ProgramProbabilistic
PriorDistribution
PosteriorDistribution
Fun/TabularData
Bayes theorem
Bayes’ theorem
� � � � � �� �
P | PP |
Px y y
y xx
We have just (perhaps without realising it) derived and used Bayes’ theorem to solve a problem.
It inverts a conditional.It follows directly from the product rule.
AC50001 - Introduction to machine learning and data mining - Marco Gaboardi - 2014
� � � � � � � � � �P , P | P P | Px y x y y y x x
Bayes theorem
Bayes’ theorem
� � � � � �� �
P | PP |
Px y y
y xx
We have just (perhaps without realising it) derived and used Bayes’ theorem to solve a problem.
It inverts a conditional.It follows directly from the product rule.
AC50001 - Introduction to machine learning and data mining - Marco Gaboardi - 2014
� � � � � � � � � �P , P | P P | Px y x y y y x x
PriorPosterior
NormalizationFactor
Model
A simple example: bias of a coin
ID disease1 02 13 04 15 16 17 08 19 …
Bayes theorem
Bayes’ theorem
� � � � � �� �
P | PP |
Px y y
y xx
We have just (perhaps without realising it) derived and used Bayes’ theorem to solve a problem.
It inverts a conditional.It follows directly from the product rule.
AC50001 - Introduction to machine learning and data mining - Marco Gaboardi - 2014
� � � � � � � � � �P , P | P P | Px y x y y y x x
PriorPosterior Model
Model/Likelihood: Bernoulli
y = (0,1,0,1,1,1,…,0)Observed values
Model/Likelihood: Bernoulli
The probability mass function f(s,Θ) is
where Θ is the “bias of the coin”. This can also be expressed as:
f(s,Θ) = Θ if s = 1{ 1 - Θ if s = 0
f(s,Θ) = Θs(1-Θ)1-s
y = (0,1,0,1,1,1,…,0)Observed values
Prior: Beta distribution
f(a,b,Θ) = Θa(1-Θ)b
Probability Distribution Function
B(a,b)
Conjugate to the Bernoulli distribution
Prior: Beta distribution
Bias of a Coin
Beta(2500,500)Beta(1500,1500)
Another example
ID Cholesterol1 4.662 5.013 3.45 4 4.12 5 6.35 6 4.45 7 6.068 4.98 9 …
Known variance
Another example
ID Y X1 4.2 1.5 2 2.5 0.2 3 2.2 24 1.1 6.5 5 1.5 8.26 0.5 5.6 7 2.0 2.3 8 0.8 4.3 9 … …
Graphical ModelsID
disease1 02 13 04 15 16 17 08 19 …
Probabilistic PCF
Probabilistic Semantics
Observe for Conditional Distributions
observe z ⇒ X(z) in Y
Observe for Conditional Distributions
observe z ⇒ X(z) in Y Bayes’ theorem
� � � � � �� �
P | PP |
Px y y
y xx
We have just (perhaps without realising it) derived and used Bayes’ theorem to solve a problem.
It inverts a conditional.It follows directly from the product rule.
AC50001 - Introduction to machine learning and data mining - Marco Gaboardi - 2014
� � � � � � � � � �P , P | P P | Px y x y y y x x
Semantics of Observe
Semantics of Observe
Filter and rescaling
Semantics of Observe
Filter and rescaling
Differential Privacy
Private Queries?
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Private Queries?
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
medicalcorrelation?
queryanswer
Private Queries?
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
answer
query
Does Critias have
cancer?
Private Queries?
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
answer
query
Does Critias have
cancer?
I know he visited Atlantis
From Plato’s Timaeus dialogue
Private Queries?
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Does Critias have
cancer?
I know he visited Atlantis
From Plato’s Timaeus dialogue
Differential Privacy
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
query
answer + noise
medicalcorrelation?
Differential Privacy
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
answer + noise
query
?!?
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: motivation
A. Haeberlen
Motivation: Protecting privacy
2USENIX Security (August 12, 2011)
19144 02-15-1964 flue
19146 05-22-1955 brain tumor
34505 11-01-1988 depression
25012 03-12-1972 diabets
16544 06-14-1956 anemia
...
Fair insurance
plan?
Resend
policy?
DataI know Bob is
born 05-22-1955
in Philadelphia…
A first approach: anonymization.I Using different correlated anonymized data sets one
can learn private data.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
?!?
Differential Privacy
(ε,δ)-Differential Privacy
Definition Given ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R:
Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ
(ε,δ)-Differential Privacy
Definition Given ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R:
Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ
A query returning a probability distribution
(ε,δ)-Differential Privacy
Definition Given ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R:
Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ
Privacy parameters
(ε,δ)-Differential Privacy
Definition Given ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R:
Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ
a quantification over all the databases
(ε,δ)-Differential Privacy
Definition Given ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R:
Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ
a notion of adjacency or distance
(ε,δ)-Differential Privacy
Definition Given ε,δ ≥ 0, a probabilistic query Q: db → R is (ε,δ)-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R:
Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S] + δ
and over all the possible outcomes
ε-Differential PrivacyDefinition Given ε ≥ 0, a probabilistic query Q: db → R is ε-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R:
Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S]
Let’s substitute a concrete instance:
Let’s use the two quantifiers:exp(-ε)· Pr[Q(b)∈ S] ≤ Pr[Q(b∪{x})∈ S] ≤ exp(ε)· Pr[Q(b)∈ S]
Definition Given ε ≥ 0, a probabilistic query Q: db → R is ε-differentially private iff ∀b1, b2:db differing in one row and for every S⊆R:
Pr[Q(b1)∈ S] ≤ exp(ε)· Pr[Q(b2)∈ S]
Pr[Q(b∪{x})∈ S] ≤ exp(ε)· Pr[Q(b)∈ S]
ε-Differential Privacy
Pr[Q(b∪{x})∈S] Pr[Q(b)∈S]log ≤ε
and so:
Q : db => R probabilistic
Q(b) Q(b∪{x})
ε-Differential Privacy
ε-Differential PrivacyPr[Q(b∪{x})∈S]
Pr[Q(b) ∈ S]log ≤ε
Probability of a bad event∫BQ(b∪{x}) ∫BQ(b)
log ≤ε
Probability of a bad eventDataset bad
eventwithout my data 16%
with my data 16.3%
Pr[Q(b’)∈S] ≦ eε Pr[Q(b) ∈ S] + δ
(ε,δ)-Differential privacy
for b ~1 b’
Pr[Q(b’)∈S] ≦ eε Pr[Q(b) ∈ S] + δ
(ε,δ)-Differential privacy
for b ~1 b’
Program
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Noise
+Differentially Private Program
PP Result
Sensitivity
PP Result’
+
PP Result
Sensitivity
PP Result’
+
Sensitivity
≤K
Calibrating the Noise on the Sensitivity
Program
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Noise( )
+k sensitive
ε-Differentially Private Program
εk
• Relational Refinement Types
• Higher Order Refinements
• Partiality Monad
• Semantic Subtyping
• Approximate Equivalence for Distributions
Components
HOARe2: Higher Order Approximate Relational Refinement Types for DP
Barthe et al,POPL’15
HOARe2: Higher Order Approximate Relational Refinement Types for DP
• Relational Refinement Types
• Higher Order Refinements
• Partiality Monad
• Semantic Subtyping
• Approximate Equivalence for Distributions
Components
Barthe et al,POPL’15
reasoning about two runs of a program
HOARe2: Higher Order Approximate Relational Refinement Types for DP
Program Logic
Program PX Y
Program P
Precondition Postcondition
X Y
Program Logic
Program P
Precondition Postcondition
X Y
x ≥ 0 y ≥ 1
Program Logic
exp
Program P Precondition Postcondition→
Refinement Type
P : {x | Pre(x)} → {y | Post(y)}
LogicalPredicates
Program P Precondition Postcondition→
Refinement Type
P : {x | Pre(x)} → {y | Post(y)}
Program P Precondition Postcondition→
exp : {x | x ≥ 0} → {y | y ≥ 1}Example
Refinement Type
P : {x | Pre(x)} → {y | Post(y)}
Program P
Precondition Postcondition
X1 Y1
Program PX2 Y2
Differential privacy as a Relational Property
Program P
Precondition Postcondition
X1 Y1
Program PX2 Y2
X1 and X2 differs for the presence or
absence of an individual
Pr[Y1∈ S] ≤ exp(ε)· Pr[Y2∈ S]
Differential privacy as a Relational Property
P : {x | Pre(x1,x2)} → {y | Post(y1,y2)}
Program P Precondition Postcondition→
HOARe2:
Relational Refinement Types
P : {x | Pre(x1,x2)} → {y | Post(y1,y2)}
Program P Precondition Postcondition→
HOARe2:
Relational Refinement Types
LogicalRelations
P : {x | Pre(x1,x2)} → {y | Post(y1,y2)}
Program P Precondition Postcondition→
HOARe2:
Relational Refinement Types
exp : {x | x1 ≤ x2 } → {y | y1 ≤ y2}Example: Monotonicity of exponential
• Relational Refinement Types
• Higher Order Refinements
• Partiality Monad
• Semantic Subtyping
• Approximate Equivalence for Distributions
Components
Barthe et al,POPL’15
reasoning about DP
HOARe2: Higher Order Approximate Relational Refinement Types for DP
Lifting of P
P(x1,x2) P*(x1,x2)
Relation over distributionsRelation
iff it exists a dist. μ over AxB s.t.
• μ(x) > 0 implies x∈R
• 𝞹1μ ≦ μ1 and 𝞹2μ ≦ μ2
Lifting of P
• Given dist. μ1 over A and μ2 over B:
μ1 P* μ2
iff it exists a dist. μ over AxB s.t.
• μ(x) > 0 implies x∈R
• 𝞹1μ ≦ μ1 and 𝞹2μ ≦ μ2
• maxA(𝞹iμ(A) - eε μi(A), μi(A) - eε 𝞹iμ(A)) ≦ δ
(ε,δ)-Lifting of P
• Given dist. μ1 over A and μ2 over B:
μ1 P* μ2εδ
Verifying Differential Privacy
C:{x:db|d(y1,y2)≦1}!{y:O|y1 =* y2}
If we can conclude
then C is (ε,δ)-differentially private.
εδ
Programming Languages forDifferentially Private Probabilistic Inference
DifferentialPrivacy
ProbabilisticInference
Programming Languages forDifferentially Private Probabilistic Inference
DifferentialPrivacy
ProbabilisticInference
ProgrammingLanguage
Tools
Adding noise
Adding noise
Program
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Noise
+Differentially Private Program
Probabilistic
Adding noise
Adding noise
ProgramProbabilistic
PriorDistribution
PosteriorDistribution
Adding noise
ProgramProbabilistic
PriorDistribution
PosteriorDistribution
Adding noise
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
ProgramProbabilistic
PriorDistribution
PosteriorDistribution
Adding noise
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
ProgramProbabilistic
PriorDistribution
PosteriorDistribution
Adding noiseDifferential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
ProgramProbabilistic
PriorDistribution
PosteriorDistribution
Adding noise
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
ProgramProbabilistic
PriorDistribution
PosteriorDistribution
Adding noise
ProgramProbabilistic
PriorDistribution
PosteriorDistribution
Adding noise
ProbabilisticInference
ProgramProbabilistic
PriorDistribution
PosteriorDistribution
Adding noise
ProbabilisticInference
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
ProgramProbabilistic
PriorDistribution
PosteriorDistribution
Adding noise
ProbabilisticInference
ProgramProbabilistic
PriorDistribution
PosteriorDistribution
Adding noise
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
ProbabilisticInference
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
ProgramProbabilistic
PriorDistribution
PosteriorDistribution
Adding noise on the data
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
ProbabilisticInference
An examplefunction privBerInput (l: B list) (p1: R) (p2: R): M[(0,1)]{ let function vExp (l: B list) : M[B list]{ match l with |nil -> mreturn nil |x::xs -> coercion (exp eps((0,0)->1,(0,1)->0,(1,1)->1, (1,0)->0) x) :: (vExp l) } in mlet nl = (vExp l) in let prior = mreturn(beta(p1,p2)) in let function Ber (l: B list) (p:M[(0,1)]): M[(0,1)]{ match l with |nil -> ran(p) |x::xs -> observe y => y = x in (Ber xs p) } in mreturn(infer (Ber nl prior)) }
An examplefunction privBerInput (l: B list) (p1: R) (p2: R): M[(0,1)]{ let function vExp (l: B list) : M[B list]{ match l with |nil -> mreturn nil |x::xs -> coercion (exp eps((0,0)->1,(0,1)->0,(1,1)->1, (1,0)->0) x) :: (vExp l) } in mlet nl = (vExp l) in let prior = mreturn(beta(p1,p2)) in let function Ber (l: B list) (p:M[(0,1)]): M[(0,1)]{ match l with |nil -> ran(p) |x::xs -> observe y => y = x in (Ber xs p) } in mreturn(infer (Ber nl prior)) }
Noise
ProgramProbabilistic
PriorDistribution
PosteriorDistribution
Adding noise on the output
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
ProbabilisticInference
DP for Probabilistic Programs
PosteriorDistribution
DP for Probabilistic Programs
PosteriorDistribution
Releasing theParameters
DP for Probabilistic Programs
PosteriorDistribution
Releasing theParameters
Sampling fromthe Distribution
DP for Probabilistic Programs
PosteriorDistribution
Releasing theParameters
Sampling fromthe Distribution
An exampleA distance over distributions
function privBerInput (l: B list) (p1: R) (p2: R): M[(0,1)]{ let function hellingerDistance (a0:R) (b0:R) (a1:R) (b1:R) : R { let gamma (r:R) = (r-1)! in let betaf (a:R) (b:R) = gamma(a)*gamma(b))/gamma(a+b) in let num=betaf ((a0+a1)/2.0) ((b0+b1)/2.0) in let denum=Math.Sqrt((betaf a0 b0)*(betaf a1 b1)) in Math.Sqrt(1.0-(num/denum)) } in let function score (input:M[(0,1)]) (output:M[(0,1)]) : R { let beta(a0,b0) = input in let beta(a1,b1) = output in (-1.0) * (hellingerDistance a0 b0 a1 b1) } in let prior = mreturn(beta(p1,p2)) in let function Ber (l: B list) (p:M[(0,1)]): M[(0,1)]{ match l with |nil -> ran(p) |x::xs -> observe y => y = x in (Ber xs p) } in exp eps score (infer (Ber l prior)) }
Noise
An exampleA distance over distributions
function privBerInput (l: B list) (p1: R) (p2: R): M[(0,1)]{ let function hellingerDistance (a0:R) (b0:R) (a1:R) (b1:R) : R { let gamma (r:R) = (r-1)! in let betaf (a:R) (b:R) = gamma(a)*gamma(b))/gamma(a+b) in let num=betaf ((a0+a1)/2.0) ((b0+b1)/2.0) in let denum=Math.Sqrt((betaf a0 b0)*(betaf a1 b1)) in Math.Sqrt(1.0-(num/denum)) } in let function score (input:M[(0,1)]) (output:M[(0,1)]) : R { let beta(a0,b0) = input in let beta(a1,b1) = output in (-1.0) * (hellingerDistance a0 b0 a1 b1) } in let prior = mreturn(beta(p1,p2)) in let function Ber (l: B list) (p:M[(0,1)]): M[(0,1)]{ match l with |nil -> ran(p) |x::xs -> observe y => y = x in (Ber xs p) } in exp eps score (infer (Ber l prior)) }
Noise
An exampleA distance over distributions
function privBerInput (l: B list) (p1: R) (p2: R): M[(0,1)]{ let function hellingerDistance (a0:R) (b0:R) (a1:R) (b1:R) : R { let gamma (r:R) = (r-1)! in let betaf (a:R) (b:R) = gamma(a)*gamma(b))/gamma(a+b) in let num=betaf ((a0+a1)/2.0) ((b0+b1)/2.0) in let denum=Math.Sqrt((betaf a0 b0)*(betaf a1 b1)) in Math.Sqrt(1.0-(num/denum)) } in let function score (input:M[(0,1)]) (output:M[(0,1)]) : R { let beta(a0,b0) = input in let beta(a1,b1) = output in (-1.0) * (hellingerDistance a0 b0 a1 b1) } in let prior = mreturn(beta(p1,p2)) in let function Ber (l: B list) (p:M[(0,1)]): M[(0,1)]{ match l with |nil -> ran(p) |x::xs -> observe y => y = x in (Ber xs p) } in exp eps score (infer (Ber l prior)) }
More general Lifting of P
P(x1,x2) P*(x1,x2)
Relation over distributionsRelation
Accuracy
Different ways of adding noise can have differentaccuracy.- we have theoretical accuracy (how to integrate it
in a framework for reasoning about DP?)- we have experimental accuracy (we need a
framework for for test our programs)
ProgramProbabilistic
PriorDistribution
PosteriorDistribution
A Nice Playground!
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
ProbabilisticInference
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Differential Privacy: the idea
A. Haeberlen
Promising approach: Differential privacy
3USENIX Security (August 12, 2011)
Private data
N(flue, >1955)?
826±10
N(brain tumor, 05-22-1955)?
3 ±700
Noise
?!?
Differential Privacy:Ensuring that the presence/absence of an individual has anegligible statistical effect on the query’s result.
Trade-off between utility and privacy.
Tänan teid väga!