35
Distributed Private Machine Learning Abhradeep Guha Thakurta University of California Santa Cruz

Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

  • Upload
    others

  • View
    16

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Distributed Private Machine Learning

Abhradeep Guha Thakurta

University of California Santa Cruz

Page 2: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Distributed learning from private data

Page 3: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Traditional machine learning Distributed machine learning

Data

: Model / statistic𝜃 𝜃

Distributed machine learning - Setup

Page 4: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Learning from private data

Phablet

Derp

Photobomb

Woot

Phablet

OMG

Woot

Troll

Prepone

Phablet

awwww

dp

Learn new (and frequent) words typed

Page 5: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Get measurements from gyroscope, display screen etc.

Model / classifier

𝜃

YES / NO

Image courtesy: Research kit (Apple)

Learning from private data

Predict if a person has Parkinson’s disease

Page 6: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Collaborative filtering

Learning from private data

1 ? 2? ?3

2 ? 4? ??

? 3 6? 39

1 1 21 13

2 2 42 26

3 3 63 39

Assumption: Hidden matrix has some

structure (e.g., low-rank)

Page 7: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Need for privacy

Model / statisticTrust boundary

Page 8: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Local differential privacy [Warner65,EGS03,DMNS06]

𝒜(𝑑)Data sample: 𝑑

Data sample: 𝑑′𝒜(𝑑′)

Requirement: 𝒜(𝑑) and 𝒜 𝑑′ should be close in distribution

Page 9: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Local differential privacy [Warner65,EGS03,DMNS06]

𝜖: Privacy parameter (smaller value implies stronger privacy)

Provably protects against membership attacks

Resilient against arbitrary side information

Page 10: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Challenge: Balancing trade-offs

Balancing the tradeoff is hard:

• AOL fiasco: CNBC 101 dumbest moments in business

• Netflix attack [NS08], Facebook attack [Korolova11], …

Conflicting goals

Utility Privacy

Page 11: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

This talk

Conflicting goals

Utility Privacy

Inte

ract

ion

Page 12: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Distributed Private Machine Learning

1. Learning from private data

2. Private distributed model selection

3. Private on-device learning

Page 13: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Distributed Private Machine Learning

1. Learning from private data

2. Private distributed model selection

3. Private on-device learning

Page 14: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Private distributed model seleciton

Page 15: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Predict if a person has Parkinson’s disease

: Model / classifier𝜃

YES / NO

Learning from private data

Page 16: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Towards engineering distributed learning systems

Ideal scenario: Complete parallelism

Each device interacts with server independently

and only once

Model

𝜃

Page 17: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Towards engineering distributed learning systems

State of the art [DJW’13]: Completely adaptive interaction

Model

𝜃

Server must:• Talk to devices in sequence

• Receive message from each device in order to compute message to next device

Page 18: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

This talk [Smith T. Upadhyay’ 17]

New algorithms that use little or no adaptive interaction

Lower bound: Cannot get accurate, general algorithms that use

no adaptive interaction

Distributed private learning with local differential privacy

Page 19: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Previous work

Kasivishwanathan et al. 2008

Introduced the problem of local private learning

Duchi et al. 2013

Tight upper and lower bounds on accuracy

Distributed private learning with local differential privacy

# of adaptive interaction=

# of devices

Talks to each device only once

Page 20: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Key New Results

Single parameter learning: Minimal error with full parallelism

Diff. private model

𝜃𝑝𝑟𝑖𝑣 ∈ ℝ

𝑑1

𝑑2

𝑑𝑛

Diff. private estimate of

Median {𝑑1, 𝑑2,… ,𝑑𝑛}

Translate to…

Page 21: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Key New Results

Multi parameter learning: Minimal error with few rounds of adaptivity

Page 22: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Key New Results

Multi parameter learning: Minimal error with few rounds of adaptivity

Page 23: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Key New Results

Multi parameter learning: Minimal error with few rounds of adaptivity

Diff. private model

𝜃𝑝𝑟𝑖𝑣 ∈ ℝ𝑝

Exponential improvement in the rounds of adaptivity

Still interact with one device only once

Page 24: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Lower bound: Minimal error needs few rounds of adaptivity

𝜃∗ 𝜃∗: Best model

Key New Results

Page 25: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Next Steps

Implement the algorithms and evaluate empirically

Deploy the project in practice

Current lower bounds are only for gradient based methods

• Obtaining non-adaptive algorithms will analyzing non-gradient

based methods

Page 26: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Distributed Private Machine Learning

1. Learning from private data

2. Private distributed model selection

3. Private on-device learning

Page 27: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Distributed Private Machine Learning

1. Learning from private data

2. Private distributed model selection

3. Private on-device learning

Page 28: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Private on-device learning

Page 29: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

On-device learning with sensitive data

Health analytics

Language models

Collaborative filtering

Local learning

Locally learned models

Model estimates via

differentially private channel

Receive model updates

Privacy preserving personalization

Page 30: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

On-device learning with sensitive data

Local learning

Locally learned models

Model estimates via

differentially private channel

Receive model updates

Health analytics

Language models

Collaborative filtering

Privacy preserving personalization

Page 31: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

On-device learning with sensitive dataLocally learned models

Model estimates via

differentially private channel

Receive model updates

Privacy preserving personalization

Local computation Global computation

Diff. private in regards to

everyone else’s data

Diff. private in regards to

everyone’s data

Page 32: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Differentially private on-device learning

New results and future direction

First algorithm with formal error guarantee

Collaborative filtering [Jain T. Thakkar]

1 1 21 13

2 2 42 26

3 3 63 39

Add noise and send

Average error covariance

across all devices

00

-10

04

0 0 -10 04

Error covariance

• Global component: Error covariance

Page 33: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Differentially private on-device learning

New results and future direction

First algorithm with formal error guarantee

Collaborative filtering [Jain T. Thakkar]

1 1 21 13

2 2 42 26

3 3 63 39

• Global component: Error covariance

• Local component: Compute the prediction

[HR12] Hints at trivial error if predictions are public

Next step

Improve on-device machine learning by harnessing

global computation

Page 34: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Distributed Private Machine Learning

1. Learning from private data

2. Private distributed model selection

3. Private on-device learning

Page 35: Distributed Private Machine Learninghelper.ipam.ucla.edu/publications/pbd2018/pbd2018_14607.pdf · Distributed Private Machine Learning Abhradeep Guha Thakurta University of California

Conflicting goals

Utility Privacy

Inte

ract

ion