Upload
nieve
View
37
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Lower Bounds on Noise. Sergey Yekhanin Institute for Advanced Study. Setting. Database of information about individuals E.g. Medical history, Census data, Customer info. Need to guarantee confidentiality of individual entries - PowerPoint PPT Presentation
Citation preview
Sergey Yekhanin
Institute for Advanced Study
Lower Bounds on Noise
Database of information about individualsE.g. Medical history, Census data, Customer
info.Need to guarantee confidentiality of individual
entries
Want to make deductions about the database; learn large scale trends.E.g. Learn that a drug V increases likelihood of
heart diseaseDo not leak info about individual patients
Setting
Two approaches to database privacy:Interactive: Analyst asks questions; curator
returns approximate answers
Curator Analyst
Message
Two approaches to database privacy:Interactive: Analyst asks questions; curator
returns approximate answersNon-interactive: Publish a “summary” of the
database; analyst can use summary to get answers
Curator AnalystSummary
Message
Two approaches to database privacy:Interactive: Analyst asks questions; curator returns
approximate answersNon-interactive: Publish a “summary” of the
database; analyst can use summary to get answers
Thesis: The interactive approach is the right way to give good accuracy for a given level of privacyAny non-interactive solution permitting “too
accurate” answers to “too many” questions leaks private information.
Message
Mathematical model of database and queries
AttacksSomewhat accurate answers to all queries lead to
privacy leakage. (Fourier analysis) [Y] (extends [DiNi]).
Somewhat accurate answers to a fraction of queries lead to privacy leakage. (Linear programming / Polynomial interpolation) [DMT,DY]
Study of privacy leads to a variety of mathematical challenges!
Plan
[Dinur-Nissim] Simple Model (easily justifiable)Database: n-bit binary vector xQuery: vector aTrue answer: Dot product axResponse is ax + e = True Answer + Noise
Privacy Leakage: Attacker learns a certain bit of x.
Blatant Non-Privacy: Attacker learns n−o(n) bits of x.
Model
Theorem: If a curator adds o(√n) noise to every response; then an attacker can ask n questions, perform O(n log n) computation and recover n-o(n) bits of the database.
Put database records in one-to-one correspondence with elements of a group .
Think of a database as a function D from to {0,1}.
Choose queries to ask for Fourier coefficients of D.Noisy Fourier coefficients approximately determine
the Boolean function D! (Parseval identity).
Fourier attack
kZ2kZ2
Theorem: If a curator adds o(√n) noise to 0.773 fraction of responses; then an attacker can ask O(n) questions, perform O(n3) computation and recover n-o(n) bits of the database.
Arbitrarily large error on arbitrary and unknown 0.239 fraction on answers.
Linear programming attack
Ask O(n) random +1/-1 questions Obtain y=Ax+e, where e is the error vector A natural approach to recover x from y: Solve: min |e'|0 such that y=Ax'+e‘, x' in Rn
(hard!)
Solve a linear program [D, CT, MT]: min |e'|1
such that y=Ax'+e' x' in Rn
Ax'
y
Linear programming attack
Model: Questions have O(c) large coefficients
Theorem: If a curator adds o(c) noise to 0.501 fraction of responses; then an attacker can ask c questions, perform O(c4) computation and reliably recover any particular bit of the database.
Arbitrarily large error on arbitrary and unknown 0.499 fraction on answers.
Polynomial interpolation attack
Assume c is prime.Think of the space of queries as a linear space . To obtain a reliable answer to query x = (1,0, … , 0) , draw a degree two curve through x. Ask all queries that correspond to points on the
curve.Use polynomial interpolation to carefully combine the
answers.
xq1
q2 q3
q4
q5 q6
Polynomial interpolation attackncF
Privacy has a PriceThere is no safe way to avoid increasing the
noise as the number of queries increases
Applies to Non-Interactive SettingAny non-interactive solution permitting answers
that are “too accurate” to “too many” questions is vulnerable to attack.
Cannot just output a noisy table.
Implications
Non-interactive approach has inherent limitations
Interactive approach worksCan also publish a summary, as long as its clear
which stats are accurate, and which ones are not.
Future directions:Fewer queriesUnderstand what can and what cannot be done
privately