63
Concurrency Control for Scalable Bayesian Inference Joseph E. Gonzalez Postdoc, UC Berkeley AMPLab Co-founder, GraphLab Inc. [email protected] ISBA’ 2014

Concurrency Control for Scalable Bayesian Inference

  • Upload
    nia

  • View
    25

  • Download
    2

Embed Size (px)

DESCRIPTION

Concurrency Control for Scalable Bayesian Inference. Joseph E. Gonzalez Postdoc, UC Berkeley AMPLab Co-founder, GraphLab Inc. jegonzal @eecs.berkeley.edu. ISBA’ 2014. A Systems Approach to Scalable Bayesian Inference. Joseph E. Gonzalez Postdoc, UC Berkeley AMPLab - PowerPoint PPT Presentation

Citation preview

Page 1: Concurrency Control for  Scalable  Bayesian  Inference

Concurrency Control

for Scalable Bayesian Inference

Joseph E. GonzalezPostdoc, UC Berkeley AMPLabCo-founder, GraphLab [email protected]

ISBA’ 2014

Page 2: Concurrency Control for  Scalable  Bayesian  Inference

A Systems Approach

to Scalable Bayesian Inference

Joseph E. GonzalezPostdoc, UC Berkeley AMPLabCo-founder, GraphLab [email protected]

ISBA’ 2014

Page 3: Concurrency Control for  Scalable  Bayesian  Inference

http://www.domo.com/learn/data-never-sleeps-2

Page 4: Concurrency Control for  Scalable  Bayesian  Inference

http://www.domo.com/learn/data-never-sleeps-2

Page 5: Concurrency Control for  Scalable  Bayesian  Inference

http://www.domo.com/learn/data-never-sleeps-2

Page 6: Concurrency Control for  Scalable  Bayesian  Inference

http://www.domo.com/learn/data-never-sleeps-2

Page 7: Concurrency Control for  Scalable  Bayesian  Inference

Data Velocity is an opportunity for Bayesian Nonparametrics.

How do we scale Bayesian

inference?

Page 8: Concurrency Control for  Scalable  Bayesian  Inference

Opposing Forces

Accuracy Scalability

Ability to estimate the posterior distribution

Ability to effective useparallel resources

SerialInference

Coordination FreeSamplers

Parameter Server

Page 9: Concurrency Control for  Scalable  Bayesian  Inference

Data

ModelState

Serial Inference

Page 10: Concurrency Control for  Scalable  Bayesian  Inference

ModelState

Coordination Free Parallel Inference

Processor 1

Processor 2

Data

Page 11: Concurrency Control for  Scalable  Bayesian  Inference

Data

ModelState

Coordination Free Parallel Inference

Processor 1

Processor 2

Keep Calm and Carry On.

Page 12: Concurrency Control for  Scalable  Bayesian  Inference

Parameter Servers

System for Coordination Free Inference

D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent Dirichlet allocation. In NIPS, 2007.

A. Smola and S. Narayanamurthy. An architecture for parallel topic models. VLDB’10

Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. J. Smola. Scalable inference in latent variable models. WSDM '12

Ho et al. “More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server.” NIPS’13

Page 13: Concurrency Control for  Scalable  Bayesian  Inference

Hierarchical Clustering

Global Variables

Local Variables

13

Page 14: Concurrency Control for  Scalable  Bayesian  Inference

Example: Topic Modeling with LDA

Word Dist. by Topic

Local Variables Documents

Maintained by the Parameter Server

Maintained by the Workers Nodes

Tokens

14

Page 15: Concurrency Control for  Scalable  Bayesian  Inference

Parameter Server

Parameter Server

Parameter Server

Ex: Collapsed Gibbs Sampler for LDA

Partitioning the model and data

15

w1w2w3 w4w5w6 w7w8w9

w1 w2w3

w4w5w6 w7

w8w9

W1:10K W10k:20K W20k:30K

Page 16: Concurrency Control for  Scalable  Bayesian  Inference

Parameter Server

Parameter Server

Parameter Server

Ex: Collapsed Gibbs Sampler for LDA

Partitioning the model and data

16

W1:10K W10k:20K W20k:30K

Parameter Cache

Car

Do g Pig

Bat

Parameter Cache

Cat

Ga s Zo oVW

Parameter Cache

Car

Ri

m bm w$$

Parameter Cache

Ma c iOS

iPo dCat

Page 17: Concurrency Control for  Scalable  Bayesian  Inference

Parameter Server

Parameter Server

Parameter Server

Ex: Collapsed Gibbs Sampler for LDA

Inconsistent model replicas

W1:10K W10k:20K W20k:30K

Parameter Cache

Car

Do g Pig

Bat

Parameter Cache

Cat

Ga s Zo oVW

Parameter Cache

Car

Ri

m bm w$$

Parameter Cache

Ma c iOS

iPo dCat

Inconsistent ValuesC

at

Cat

17

Page 18: Concurrency Control for  Scalable  Bayesian  Inference

Parallel Gibbs SamplingIncorrect Posterior

dependent variables cannot in general be sampled simultaneously.

Strong PositiveCorrelation

t=0

Parallel Execution

t=2 t=3

Strong PositiveCorrelation

t=1

Sequential

Execution

Strong NegativeCorrelation

18

Page 19: Concurrency Control for  Scalable  Bayesian  Inference

Issues with Nonparametrics

Difficult to introduce new clusters asynchronously:

Leads to too many clusters! 19

Parameter Server

Parameter Server

Parameter Server

Create Cluster

7

Create Cluster

7

7 7

Page 20: Concurrency Control for  Scalable  Bayesian  Inference

Opposing Forces

Accuracy Scalability

SerialInference

Asynchronous Samplers

Parameter Server

Page 21: Concurrency Control for  Scalable  Bayesian  Inference

Opposing Forces

Accuracy Scalability

Accu

racy

Scalability

Page 22: Concurrency Control for  Scalable  Bayesian  Inference

Opposing Forces

SerialInference

Asynchronous Samplers

Parameter Server

Accu

racy

Scalability

Page 23: Concurrency Control for  Scalable  Bayesian  Inference

Opposing Forces

SerialInference

Asynchronous Samplers

Parameter Server

Accu

racy

Scalability

Concurrency Control ?

Page 24: Concurrency Control for  Scalable  Bayesian  Inference

Concurrency Control

24

Coordination Free (Parameter Server):

Provably fast and correct under key assumptions.

Concurrency Control:

Provably correct and fast under key assumptions.

Systems Ideas toImprove Efficiency

Page 25: Concurrency Control for  Scalable  Bayesian  Inference

Concurrency Control

Opposing Forces

SerialInference

Asynchronous Samplers

Parameter Server

MutualExclusion

OptimisticConcurrency

Control

SafeUnsafe

Accu

racy

Scalability

Page 26: Concurrency Control for  Scalable  Bayesian  Inference

Mutual ExclusionConditional IndependenceJ. Gonzalez, Y. Low, A. Gretton, and C. Guestrin. Parallel Gibbs

Sampling: From Colored Fields to Thin Junction Trees. AISTATS’11

Exploit the Markov random field for Parallel Gibbs Sampling

GraphColoring

R/W LockMutual Exclusion

Page 27: Concurrency Control for  Scalable  Bayesian  Inference

Time

Mutual Exclusion through Scheduling

Chromatic Gibbs Sampler

Compute a k-coloring of the graphical model

Sample all variables with same color in parallel

Serial Equivalence:

27

Page 28: Concurrency Control for  Scalable  Bayesian  Inference

Theorem: Chromatic Sampler

Ergodic: converges to the correct distribution

»Based on graph coloring of the Markov Random Field

Quantifiable acceleration in mixingTime to update

all variables once

# Variables

# Colors

# Processors

28

Page 29: Concurrency Control for  Scalable  Bayesian  Inference

Data

ModelState

Mutual Exclusion Through Locking

Processor 1

Processor 2

Introducing locking (scheduling) protocols to identify

potential conflicts.

Page 30: Concurrency Control for  Scalable  Bayesian  Inference

Data

ModelState

Processor 1

Processor 2

Enforce serialization of computation that could conflict.

Mutual Exclusion Through Locking

Page 31: Concurrency Control for  Scalable  Bayesian  Inference

Markov Blanket Locks Read/Write Locks:

31

R W

R

R

R

W R

R

R

R

Page 32: Concurrency Control for  Scalable  Bayesian  Inference

Markov Blanket Locks

Eliminate fixed schedule and global coordination

Supports more advanced block sampling

Expected Parallelism:

32

# Processors # Variables

Max Degree

Page 33: Concurrency Control for  Scalable  Bayesian  Inference

A System for Mutual Exclusion on

Markov Random Fields

GraphLab/PowerGraph [UAI’10, OSDI’12]:

• Chromatic Sampling

• Markov Blanket Locks + Block Sampling 33

Page 34: Concurrency Control for  Scalable  Bayesian  Inference

LimitationDensely Connected MRF

V-Structures: observations couple many variables

Collapsed models: clique-like MRFs

Mutual exclusion pessimistically serializes computation that could interfere.

Can we be optimistic and only serialize computation that does

interfere?

34

Page 35: Concurrency Control for  Scalable  Bayesian  Inference

Opposing Forces

SerialInference

Asynchronous Samplers

Parameter Server

MutualExclusion

OptimisticConcurrency

Control

SafeUnsafe

Accu

racy

Scalability

Page 36: Concurrency Control for  Scalable  Bayesian  Inference

Optimistic Concurrency

Controlassume the best and correctX. Pan, J. Gonzalez, S. Jegelka, T. Broderick, M. Jordan.

Optimistic Concurrency Control for Distributed Unsupervised Learning. NIPS’13

XinghaoPan

TamaraBroderick

Stefanie Jegelka

MichaelJordan

Page 37: Concurrency Control for  Scalable  Bayesian  Inference

Classic idea from Database Systems:

Kung & Robinson. On optimistic methods for concurrency control.

ACM Transactions on Database Systems. 1981

Assume most operations won’t conflict:

• Execute operations without blockingFrequent case is fast

• Identify and resolve conflicts after they occur

Infrequent case with potentially costly resolution

37

Optimistic Concurrency Control

Page 38: Concurrency Control for  Scalable  Bayesian  Inference

Data

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

Allow computation to proceed without blocking.

Kung & Robinson. On optimistic methods for concurrency control.

ACM Transactions on Database Systems 1981

Page 39: Concurrency Control for  Scalable  Bayesian  Inference

Data

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

?✔

Validate potential conflicts.

Valid outcome

Kung & Robinson. On optimistic methods for concurrency control.

ACM Transactions on Database Systems 1981

Page 40: Concurrency Control for  Scalable  Bayesian  Inference

Data

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

? ?✗ ✗

Validate potential conflicts.

Invalid Outcome

Kung & Robinson. On optimistic methods for concurrency control.

ACM Transactions on Database Systems 1981

Page 41: Concurrency Control for  Scalable  Bayesian  Inference

Data

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

Take a compensating action.

✗ ✗Amend the Value

Kung & Robinson. On optimistic methods for concurrency control.

ACM Transactions on Database Systems 1981

Page 42: Concurrency Control for  Scalable  Bayesian  Inference

Data

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

✗ ✗

Validate potential conflicts.

Invalid Outcome

Kung & Robinson. On optimistic methods for concurrency control.

ACM Transactions on Database Systems 1981

Page 43: Concurrency Control for  Scalable  Bayesian  Inference

Data

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

✗ ✗Rollback and Redo

Take a compensating action.

Kung & Robinson. On optimistic methods for concurrency control.

ACM Transactions on Database Systems 1981

Page 44: Concurrency Control for  Scalable  Bayesian  Inference

Data

ModelState

Optimistic Concurrency Control

Processor 1

Processor 2

Rollback and Redo

Non-Blocking Computation

Validation: Identify Errors

Resolution: Correct Errors

Concurrency

AccuracyFast

Infrequent

Requirements:

Page 45: Concurrency Control for  Scalable  Bayesian  Inference

Optimistic Concurrency Control

for Bayesian InferenceNon-parametric Models [Pan et al., NIPS’13]:

• OCC DP-Means: Dirichlet Process Clustering

• OCC BP-Means: Beta Process Feature Learning

Conditional Sampling: (In Progress)

• Collapsed Gibbs LDA

• Retrospective Sampling for HDP

45

Page 46: Concurrency Control for  Scalable  Bayesian  Inference

DP-Means Algorithm

Start with DP Gaussian mixture model:

small variance limit

[Kulis and Jordan, ICML’12]

Page 47: Concurrency Control for  Scalable  Bayesian  Inference

DP-Means Algorithm

Start with DP Gaussian mixture model:

small variance limit redefine :

[Kulis and Jordan, ICML’12]

DecreasesRapidly

Page 48: Concurrency Control for  Scalable  Bayesian  Inference

DP-Means Algorithm

Corresponding Gibbs sampler conditionals:

Taking the small variance limit

[Kulis and Jordan, ICML’12]

Page 49: Concurrency Control for  Scalable  Bayesian  Inference

DP-Means Algorithm

Gibbs updates become deterministic:

Taking the small variance limit

[Kulis and Jordan, ICML’12]

Page 50: Concurrency Control for  Scalable  Bayesian  Inference

DP-Means Algorithm

Gibbs updates become deterministic:

[Kulis and Jordan, ICML’12]

Page 51: Concurrency Control for  Scalable  Bayesian  Inference

DP-Means Algorithm

Computing cluster membership

[Kulis and Jordan, ICML’12]

λ

Page 52: Concurrency Control for  Scalable  Bayesian  Inference

DP-Means Algorithm

Updating cluster centers:

[Kulis and Jordan, ICML’12]

Page 53: Concurrency Control for  Scalable  Bayesian  Inference

DP-Means Parallel Execution

Computing cluster membership in parallel:

CPU 1

CPU 2

Cannot introduce

overlapping clusters in parallel

Page 54: Concurrency Control for  Scalable  Bayesian  Inference

Optimistic Concurrency Control

for Parallel DP-Means

ResolutionAssign new cluster center to existing cluster

Optimistic AssumptionNo new cluster created nearby

ValidationVerify that new clusters don’t overlap

CPU 1

CPU 2

Page 55: Concurrency Control for  Scalable  Bayesian  Inference

OCC DP-means

Theorem: OCC DP-means is serializable and therefore preserves theoretical properties of DP-means.

Theorem: Assuming well spaced clusters the expected overhead of OCC DP-means does not depend on data size.

Correctness

Concurrency

Page 56: Concurrency Control for  Scalable  Bayesian  Inference

Empirical Validation Failure Rate

56

Para

llelis

m

OC

C O

verh

ead

Poin

ts F

aili

ng

Valid

ati

on

Dataset Size

λ Separable Clusters

2 Processors

4 Processors

8 Processors

16 Processors

32 Processors

Page 57: Concurrency Control for  Scalable  Bayesian  Inference

Empirical Validation Failure Rate

57

Para

llelis

m

OC

C O

verh

ead

Poin

ts F

aili

ng

Valid

ati

on

Dataset Size

Overlapping Clusters

2 Processors

4 Processors

8 Processors

16 Processors

32 Processors

Page 58: Concurrency Control for  Scalable  Bayesian  Inference

Distributed Evaluation Amazon EC2

1 2 3 4 5 6 7 80

500

1000

1500

2000

2500

3000

3500

Number of Machines

Ru

nti

me I

n S

econ

dP

er

Com

ple

te P

ass o

ver

Data

OCC DP-means Runtime Projected Linear Scaling

2x #machines≈ ½x runtime

~140 million data points; 1, 2, 4, 8 machines

Page 59: Concurrency Control for  Scalable  Bayesian  Inference

Optimistic Future forOptimistic Concurrency

Control

59

SerialInference

Parameter Server

Optimistic Concurrency

Control

SafeUnsafe

Accuracy Scalability

Page 60: Concurrency Control for  Scalable  Bayesian  Inference

Thank You

Questions?

60

Joseph E. [email protected]

Page 61: Concurrency Control for  Scalable  Bayesian  Inference

OCC Collapsed Gibbs for LDA

Maintain epsilon intervals on the conditional:

System ensures conditionals are ε-accurate

Validate: Accept draws that land outside interval

Resolution: Serially resample rejected tokens

61

0 1u

ε ε ε ε

u

Page 62: Concurrency Control for  Scalable  Bayesian  Inference

OCC Collapsed Gibbs for LDA

62

Page 63: Concurrency Control for  Scalable  Bayesian  Inference

Optimism forOptimistic Concurrency

Control

63

Fast + EasyCoordination Free

Slow + ComplexResolution

Validation

Enable decades of work in serial Bayesian inference

to be extended to the parallel setting.

Frequen

t Rare