Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Privacy-preservingReleaseofStatistics:DifferentialPrivacy
PiotrMardziel orAnupam DattaCMU
Fall2018
18734:FoundationsofPrivacy
Privacy-PreservingStatistics:Non-InteractiveSetting
2
Goals:• Accuratestatistics(lownoise)• Preserveindividualprivacy(whatdoesthatmean?)
Addnoise,sample,generalize,suppress
x1…xn
DatabaseDmaintainedbytrustedcurator
• Censusdata• Healthdata• Networkdata• …
AnalystSanitizedDatabaseD’
Privacy-PreservingStatistics:InteractiveSetting
3
Goals:• Accuratestatistics(lownoise)• Preserveindividualprivacy(whatdoesthatmean?)
Queryf
f(D)+noise
x1…xn
DatabaseDmaintainedbytrustedcurator
Analyst
#individualswithsalary>$30K
• Censusdata• Healthdata• Networkdata• …
Somepossibledefenses
• Anonymizedata– Re-identification,informationamplification
• Queriesoverlargedatasets– Differencingattack
• Queryauditing– Refusalleaks,computationaltractability
• Summarystatistics– Frequencylists
4
ClassicalIntuitionforPrivacy
• “IfthereleaseofstatisticsSmakesitpossibletodeterminethevalue[ofprivateinformation]moreaccuratelythanispossiblewithoutaccesstoS,adisclosurehastakenplace.”[Dalenius 1977]– Privacymeansthatanythingthatcanbelearnedaboutarespondentfromthestatisticaldatabasecanbelearnedwithoutaccesstothedatabase
• Similartosemanticsecurityofencryption
5
ImpossibilityResult[Dwork,Naor 2006]
• Result:Forreasonable“breach”,ifsanitizeddatabasecontainsinformationaboutdatabase,thensomeadversarybreaksthisdefinition
• Example– TerryGrossistwoinchesshorterthantheaverageLithuanianwoman
– DBallowscomputingaverageheightofaLithuanianwoman
– ThisDBbreaksTerryGross’sprivacyaccordingtothisdefinition…evenifherrecordisnot inthedatabase!
6
VeryInformalProofSketch• SupposeDBisuniformlyrandom• “Breach”ispredictingapredicateg(DB)– Example:g(DB)=“TerryGross’sheight=6feet”
• Adversary’sbackgroundknowledge:– r,[H(r;San(DB)) ⊕g(DB) ]whereHisasuitablehashfunction,r=H(DB)
Example:“TerryGrossistwoinchesshorterthantheaverageLithuanianwoman“
• Byitself,doesnotleakanythingaboutDB• TogetherwithSan(DB),revealsg(DB)– Example:San(DB)=“averageheightofaLithuanianwoman“
7
DifferentialPrivacy:Idea
Releasedstatisticisaboutthesameifanyindividual’srecord isremovedfromthedatabase
8
[Dwork,McSherry,Nissim,Smith2006]
AnInformationFlowIdea
Changinginputdatabasesinaspecificwaychangesoutputstatisticbyasmallamount
9
NotAbsoluteConfidentiality
DoesnotguaranteethatTerryGross’sheightwon’tbelearnedbytheadversary
10
DifferentialPrivacy:Definition
Randomizedsanitizationfunctionκ hasε-differentialprivacyifforalldatasetsD1 andD2 differingbyatmostoneelement andallsubsetsS oftherangeofκ,
Pr[κ(D1)∈ S]≤eε Pr[κ(D2)∈ S]
Answertoquery#individualswithsalary>$30Kisinrange[100,110]withapproximatelythesame
probabilityinD1 andD2
11
AchievingDifferentialPrivacy:InteractiveSetting
Howmuchandwhattypeofnoiseshouldbeadded?
Tellmef(D)
f(D)+noisex1…xn
DatabaseDUser
12
Example:NoiseAddition
13
Slide:AdamSmith
GlobalSensitivity
14
Slide:AdamSmith
Exercise
15
• Functionf:#individualswithsalary>$30K• GlobalSensitivityoff=?
• Answer:1
BackgroundonProbabilityTheory(seeOct11,2013recitation)
16
ContinuousProbabilityDistributions
• Probabilitydensityfunction(PDF),fX
• Exampledistributions– Normal,exponential,Gaussian,Laplace
17
LaplaceDistribution
18
Mean=μ
Variance=2b2
PDF=
Source:Wikipedia
LaplaceDistribution
19
Changeofnotationfrompreviousslide:xà yμà 0bà λ
AchievingDifferentialPrivacy
20
LaplaceMechanism
21
Slide:AdamSmith
LaplaceMechanism:ProofIdea
22
Pr[A(x)=t]Pr[A(x’)=t]
LaplaceMechanism:Moredetails
• Pr 𝐴 𝑥 ∈ 𝑆 = ∫ 𝑝 𝐴 𝑥 = 𝑡 𝑑𝑡,∈-
• 𝑝 𝐴 𝑥 = 𝑡 = 𝑝 𝐿 = 𝑡 − 𝑓 𝑥 = ℎ 𝑡 − 𝑓 𝑥
• 2(,45 6 )2(,45 68 )
≤:;< =?@ A
B
:;< =?@ A8
B
≤ exp 5 6 45 68
F≤ exp
G-@F
• HI J 6 ∈-HI J 68 ∈-
=∫ K J 6 L, M,=∈N
∫ K J 68 L, M,=∈N
=∫ 2 ,45(6) M,=∈N
∫ 2 ,45(68) M,=∈N
≤ expG-@F
• For𝜆 =G-@P,wehave𝜖-differentialprivacy
23
Example:NoiseAddition
24
Slide:AdamSmith
UsingGlobalSensitivity
• Manynaturalfunctionshavelowglobalsensitivity– Histogram,covariancematrix,stronglyconvexoptimizationproblems
25
CompositionTheorem
• IfA1 isε1-differentiallyprivateandA2 isε2-differentiallyprivateandtheyuseindependentrandomcoinsthen< A1,A2>is(ε1+ε2)-differentiallyprivate
• Repeatedqueryingdegradesprivacy;degradationisquantifiable
26
Applications
• Netflixdataset[McSherry,Mironov 2009;MSR]– Accuracyofdifferentiallyprivaterecommendations(wrtonemovierating)comparabletobaselinesetbyNetflix
• Networktracedatasets[McSherry,Mahajan2010;MSR]
27
Challenge:HighSensitivity
• Approach:Addnoiseproportionaltosensitivitytopreserveε-differentialprivacy
• Improvements:– Smoothsensitivity[Nissim,Raskhodnikova,Smith2007;BGU-PSU]
– Restrictedsensitivity[Blocki,Blum,Datta,Sheffet 2013;CMU]
28
Challenge:IdentifyinganIndividual’sInformation
• Informationaboutanindividualmaynotbejustintheirownrecord
– Example: Inasocialnetwork,informationaboutnodeAalsoinnodeBinfluenced byA,forexample,becauseAmayhavecausedalinkbetweenBandC
29
DifferentialPrivacy:Summary
• Anapproachtoreleasingprivacy-preservingstatistics
• Arigorousprivacyguarantee– SignificantactivityintheoreticalCScommunity
• Severalapplicationstorealdatasets– Recommendationsystems,networktracedata,..
• Somechallenges– Highsensitivity,identifyingindividual’sinformation,repeatedquerying
30