28
Post-processing outputs for better utility CompSci 590.03 Instructor: Ashwin Machanavajjhala 1 Lecture 10 : 590.03 Fall 12

Post-processing outputs for better utility

  • Upload
    leland

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

Post-processing outputs for better utility. CompSci 590.03 Instructor: Ashwin Machanavajjhala. Announcement. Project proposal submission deadline is Fri, Oct 12 noon. Recap: Differential Privacy. For every pair of inputs that differ in one value. For every output …. D 1. D 2. O. - PowerPoint PPT Presentation

Citation preview

Page 1: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 1

Post-processing outputs for better utility

CompSci 590.03Instructor: Ashwin Machanavajjhala

Page 2: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 2

Announcement

• Project proposal submission deadline is Fri, Oct 12 noon.

Page 3: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 3

Recap: Differential Privacy

For every output …

OD2D1

Adversary should not be able to distinguish between any D1 and D2 based on any O

Pr[A(D1) = O] Pr[A(D2) = O] .

For every pair of inputs that differ in one value

< ε (ε>0)log

Page 4: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 4

Recap: Laplacian Distribution

-10-4.300000000000021.4 7.09999999999990

0.10.20.30.40.50.6

Laplace Distribution – Lap(λ)

Database

Researcher

Query q

True answer q(d) q(d) + η

η

h(η) α exp(-η / λ)

Privacy depends on the λ parameter

Mean: 0, Variance: 2 λ2

Page 5: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 5

Recap: Laplace MechanismThm: If sensitivity of the query is S, then the following guarantees ε-

differential privacy.

λ = S/εSensitivity: Smallest number s.t. for any d, d’ differing in one entry,

|| q(d) – q(d’) || ≤ S(q)

Histogram query: Sensitivity = 2• Variance / error on each entry = 2x4/ε2 = O(1/ε2)

Page 6: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 6

This class• What is the optimal method to answer a batch of queries?

Page 7: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 7

How to answer a batch of queries?• Database of values {x1, x2, …, xk}

• Query Set: – Value of x1 η1 = x1 + δ1– Value of x2 η2 = x2 + δ2– Value of x1 + x2 η3 = x1 + x2 + δ3

• But we know that η1 and η2 should sum up to η3!

Page 8: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 8

Two Approaches• Constrained inference

– Ensure that the returned answers are consistent with each other.

• Query Strategy– Answer a different set of strategy queries A– Answer original queries using A

– Universal Histograms– Wavelet Mechanism– Matrix Mechanism

Page 9: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 9

Two Approaches• Constrained inference

– Ensure that the returned answers are consistent with each other.

• Query Strategy– Answer a different set of strategy queries A– Answer original queries using A

– Universal Histograms– Wavelet Mechanism– Matrix Mechanism

Page 10: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 10

Constrained Inference

Page 11: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 11

Constrained Inference• Let x1 and x2 be the original values. We observe noisy values η1,

η2 and η3• We would like to reconstruct the best estimators y1 (for x1/) and

y2 (for x2) from the noisy values.

• That is, we want to find the values of y1, y2 such that:

min (y1-η1)2 + (y2 – η2)2 + (y3 – η3)2

s.t., y1 + y2 = y3

Page 12: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 12

Constrained Inference [Hay et al VLDB 10]

Page 13: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 13

Sorted Unattributed Histograms• Counts of diseases

– (without associating a particular count to the corresponding disease)

• Degree sequence: List of node degrees – (without associating a degree to a particular node)

• Constraint: The values are sorted

Page 14: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 14

Sorted Unattributed HistogramsTrue Values 20, 10, 8, 8, 8, 5, 3, 2Noisy Values 25, 9, 13, 7, 10, 6, 3, 1 (noise from Lap(1/ε))

Proof:?

Page 15: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 15

Sorted Unattributed Histograms

Page 16: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 16

Sorted Unattributed Histograms• n: number of values in the histogram• d: number of distinct values in the histogram• ni: number of times ith distinct value appears in the histogram.

Page 17: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 17

Two Approaches• Constrained inference

– Ensure that the returned answers are consistent with each other.

• Query Strategy– Answer a different set of strategy queries A– Answer original queries using A

– Universal Histograms– Wavelet Mechanism– Matrix Mechanism

Page 18: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 18

Query Strategy

IPrivate

Data

WA

Differential Privacy

A(I) A(I) W(I)~ ~

Original Query Workload

Strategy Query Workload

Noisy StrategyAnswers

Noisy WorkloadAnswers

Page 19: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 19

Range Queries• Given a set of values {x1, x2, …, xn}• Range query: q(j,k) = xj + … + xk

Q: Suppose we want to answer all range queries?

Strategy 1: Answer all range queries using Laplace mechanism

• O(n2/ε2) total error. • May reduce using constrained optimization …

Page 20: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 20

Range Queries• Given a set of values {x1, x2, …, xn}• Range query: q(j,k) = xj + … + xk

Q: Suppose we want to answer all range queries?

Strategy 1: Answer all range queries using Laplace mechanism

• Sensitivity = O(n2)• O(n4/ε2) total error across all range queries. • May reduce using constrained optimization …

Page 21: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 21

Range Queries• Given a set of values {x1, x2, …, xn}• Range query: q(j,k) = xj + … + xk

Q: Suppose we want to answer all range queries?

Strategy 2: Answer all xi queries using Laplace mechanism Answer range queries using noisy xi values.

• O(1/ε2) error for each xi. • Error(q(1,n)) = O(n/ε2)• Total error on all range queries : O(n3/ε2)

Page 22: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 22

Universal Histograms for Range Queries

Strategy 3: Answer sufficient statistics using Laplace mechanismAnswer range queries using noisy sufficient statistics.

x1 x2 x3 x4 x5 x6 x7 x8

x12 x34 x56 x78

x1234 x5678

x1-8

[Hay et al VLDB 2010]

Page 23: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 23

Universal Histograms for Range Queries• Sensitivity: log n• q(2,6) = x2+x3+x4+x5+x6 Error = 2 x 5log2n/ε2

= x2 + x34 + x56 Error = 2 x 3log2n/ε2

x1 x2 x3 x4 x5 x6 x7 x8

x12 x34 x56 x78

x1234 x5678

x1-8

Page 24: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 24

Universal Histograms for Range Queries• Every range query can be answered by summing at most log n

different noisy answers• Maximum error on any range query = O(log3n / ε2)• Total error on all range queries = O(n2 log3n / ε2)

x1 x2 x3 x4 x5 x6 x7 x8

x12 x34 x56 x78

x1234 x5678

x1-8

Page 25: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 25

Universal Histograms & Constrained Inference

• Can further reduce the error by enforcing constraintsx1234 = x12 + x34 = x1 + x2 + x3 + x4

• 2-pass algorithm to compute a consistent version of the counts

[Hay et al VLDB 2010]

Page 26: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 26

Universal Histograms & Constrained Inference

• Pass 1: (Bottom Up)

• Pass 2: (Top down)

[Hay et al VLDB 2010]

Page 27: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 27

Universal Histograms & Constrained Inference

• Resulting consistent counts – Have lower error than noisy counts (upto 10 times smaller in some cases)– Unbiased estimators – Have the least error amongst all unbiased estimators

Page 28: Post-processing outputs for better utility

Lecture 10 : 590.03 Fall 12 28

Next Class• Constrained inference

– Ensure that the returned answers are consistent with each other.

• Query Strategy– Answer a different set of strategy queries A– Answer original queries using A

– Universal Histograms– Wavelet Mechanism– Matrix Mechanism