Sample Complexity Bounds on Differentially Private Learning via Communication Complexity Vitaly Feldman David Xiao IBM Research – AlmadenCNRS, Universite

Sample Complexity Bounds on Differentially Private Learning via

Communication Complexity

Vitaly Feldman David XiaoIBM Research – Almaden CNRS, Universite Paris 7

ITA, 2015

Learner has i.i.d. examples: over

PAC model [V 84]: each and for unknown and

For every given examples, with prob. output :

Learning model

Privacy

Each example is created frompersonal data of an individual = (GTTCACG…TC, “YES”)

Differential Privacy [DMNS 06]

(Randomized) algorithm is -differentially private if for any two data sets such that :

What is the cost of privacy?

= sample complexity of PAC learning with and -differential privacy

[KLNRS 08]

() SCDP(𝐶)

Thr𝑏

: iff [F 09, BKN 10]

: iff ; ;

Point s

Our results: lower bounds

Thr𝑏LDIM (𝐶 )

: Littlestone’s dimension. Number of mistakes in online learning

[KLNRS 08]() SCDP(𝐶)Point𝑏Line𝑝

: ,

Corollaries: [L 87]For = linear separators over [MT 94]

Our results: characterization

-with

Private coins: Public coins:

𝜎

𝑧

Alicia

𝑓 ∈𝐶

Roberto

𝑥∈𝑋

Related results

Distributional assumptions/Label privacy only/Count only labeled• [CH 11, BNS 15]

Characterization in terms of distribution independent covers:• [BNS 13a]

Distribution-independent covers

-covers over distr. if s.t

is DI -cover for

Proof: exponential mechanism [MT 07]

Thm: [KLNRS 08, BKN 10]

is a distribution-independent (DI) -cover for if and distr. , -covers over distr.

Let be a distribution over sets of hypotheses

size() is DI -cover for

Randomized DI covers

[BNS 13a]

is a DI -cover for if and ,

and distr. , s.t

, distribution over s.t.

From covers to CC

von Neumann minimax

h

h (𝑥)CC

𝑥∈𝑋

From covers to CC

[N 91]

Lower bound tools

Information theory [BJKS 02]1. Find hard distribution over inputs to -2. Low communication low (mutual) information3. Low information large error

𝑥∅

𝑥0 𝑥1

𝑥0 ..1 𝑥1. .0 𝑥1. .1𝑥0 ..0

𝑓 0. .00 𝑓 0. .01 𝑓 0..10 𝑓 0..11 𝑓 1 ..0 0 𝑓 1 ..0 1 𝑓 1 .. 10 𝑓 1 .. 11

mistake tree

Augmented Index[BJKK 04, BIPW 10]

0 1 0 0 0 1 0 1 0 1

0 1 0 0 0

Our results: upper bounds

Relaxed -differential privacy

[BNS 13b]

is -differentially private if for any two data sets such that :

An efficient -DP algo that learns using examples

Conclusions and open problems

1. Characterization in terms of communication1. Tools from information theory2. Additional applications

¿ Is sample complexity of -diff. private learning different from ?

¿ What is the sample complexity of efficient DP learning of ?

Documents

Sample Complexity Bounds on Differentially Private Learning via Communication Complexity Vitaly Feldman David Xiao IBM Research – AlmadenCNRS, Universite