Upload
nia
View
25
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Concurrency Control for Scalable Bayesian Inference. Joseph E. Gonzalez Postdoc, UC Berkeley AMPLab Co-founder, GraphLab Inc. jegonzal @eecs.berkeley.edu. ISBA’ 2014. A Systems Approach to Scalable Bayesian Inference. Joseph E. Gonzalez Postdoc, UC Berkeley AMPLab - PowerPoint PPT Presentation
Citation preview
Concurrency Control
for Scalable Bayesian Inference
Joseph E. GonzalezPostdoc, UC Berkeley AMPLabCo-founder, GraphLab [email protected]
ISBA’ 2014
A Systems Approach
to Scalable Bayesian Inference
Joseph E. GonzalezPostdoc, UC Berkeley AMPLabCo-founder, GraphLab [email protected]
ISBA’ 2014
http://www.domo.com/learn/data-never-sleeps-2
http://www.domo.com/learn/data-never-sleeps-2
http://www.domo.com/learn/data-never-sleeps-2
http://www.domo.com/learn/data-never-sleeps-2
Data Velocity is an opportunity for Bayesian Nonparametrics.
How do we scale Bayesian
inference?
Opposing Forces
Accuracy Scalability
Ability to estimate the posterior distribution
Ability to effective useparallel resources
SerialInference
Coordination FreeSamplers
Parameter Server
Data
ModelState
Serial Inference
ModelState
Coordination Free Parallel Inference
Processor 1
Processor 2
Data
Data
ModelState
Coordination Free Parallel Inference
Processor 1
Processor 2
Keep Calm and Carry On.
Parameter Servers
System for Coordination Free Inference
D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent Dirichlet allocation. In NIPS, 2007.
A. Smola and S. Narayanamurthy. An architecture for parallel topic models. VLDB’10
Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. J. Smola. Scalable inference in latent variable models. WSDM '12
Ho et al. “More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server.” NIPS’13
Hierarchical Clustering
Global Variables
Local Variables
13
Example: Topic Modeling with LDA
Word Dist. by Topic
Local Variables Documents
Maintained by the Parameter Server
Maintained by the Workers Nodes
Tokens
14
Parameter Server
Parameter Server
Parameter Server
Ex: Collapsed Gibbs Sampler for LDA
Partitioning the model and data
15
w1w2w3 w4w5w6 w7w8w9
w1 w2w3
w4w5w6 w7
w8w9
W1:10K W10k:20K W20k:30K
Parameter Server
Parameter Server
Parameter Server
Ex: Collapsed Gibbs Sampler for LDA
Partitioning the model and data
16
W1:10K W10k:20K W20k:30K
Parameter Cache
Car
Do g Pig
Bat
Parameter Cache
Cat
Ga s Zo oVW
Parameter Cache
Car
Ri
m bm w$$
Parameter Cache
Ma c iOS
iPo dCat
Parameter Server
Parameter Server
Parameter Server
Ex: Collapsed Gibbs Sampler for LDA
Inconsistent model replicas
W1:10K W10k:20K W20k:30K
Parameter Cache
Car
Do g Pig
Bat
Parameter Cache
Cat
Ga s Zo oVW
Parameter Cache
Car
Ri
m bm w$$
Parameter Cache
Ma c iOS
iPo dCat
Inconsistent ValuesC
at
Cat
17
Parallel Gibbs SamplingIncorrect Posterior
dependent variables cannot in general be sampled simultaneously.
Strong PositiveCorrelation
t=0
Parallel Execution
t=2 t=3
Strong PositiveCorrelation
t=1
Sequential
Execution
Strong NegativeCorrelation
18
Issues with Nonparametrics
Difficult to introduce new clusters asynchronously:
Leads to too many clusters! 19
Parameter Server
Parameter Server
Parameter Server
Create Cluster
7
Create Cluster
7
7 7
Opposing Forces
Accuracy Scalability
SerialInference
Asynchronous Samplers
Parameter Server
Opposing Forces
Accuracy Scalability
Accu
racy
Scalability
Opposing Forces
SerialInference
Asynchronous Samplers
Parameter Server
Accu
racy
Scalability
Opposing Forces
SerialInference
Asynchronous Samplers
Parameter Server
Accu
racy
Scalability
Concurrency Control ?
Concurrency Control
24
Coordination Free (Parameter Server):
Provably fast and correct under key assumptions.
Concurrency Control:
Provably correct and fast under key assumptions.
Systems Ideas toImprove Efficiency
Concurrency Control
Opposing Forces
SerialInference
Asynchronous Samplers
Parameter Server
MutualExclusion
OptimisticConcurrency
Control
SafeUnsafe
Accu
racy
Scalability
Mutual ExclusionConditional IndependenceJ. Gonzalez, Y. Low, A. Gretton, and C. Guestrin. Parallel Gibbs
Sampling: From Colored Fields to Thin Junction Trees. AISTATS’11
Exploit the Markov random field for Parallel Gibbs Sampling
GraphColoring
R/W LockMutual Exclusion
Time
Mutual Exclusion through Scheduling
Chromatic Gibbs Sampler
Compute a k-coloring of the graphical model
Sample all variables with same color in parallel
Serial Equivalence:
27
Theorem: Chromatic Sampler
Ergodic: converges to the correct distribution
»Based on graph coloring of the Markov Random Field
Quantifiable acceleration in mixingTime to update
all variables once
# Variables
# Colors
# Processors
28
Data
ModelState
Mutual Exclusion Through Locking
Processor 1
Processor 2
Introducing locking (scheduling) protocols to identify
potential conflicts.
Data
ModelState
Processor 1
Processor 2
✗
Enforce serialization of computation that could conflict.
Mutual Exclusion Through Locking
Markov Blanket Locks Read/Write Locks:
31
R W
R
R
R
W R
R
R
R
Markov Blanket Locks
Eliminate fixed schedule and global coordination
Supports more advanced block sampling
Expected Parallelism:
32
# Processors # Variables
Max Degree
A System for Mutual Exclusion on
Markov Random Fields
GraphLab/PowerGraph [UAI’10, OSDI’12]:
• Chromatic Sampling
• Markov Blanket Locks + Block Sampling 33
LimitationDensely Connected MRF
V-Structures: observations couple many variables
Collapsed models: clique-like MRFs
Mutual exclusion pessimistically serializes computation that could interfere.
Can we be optimistic and only serialize computation that does
interfere?
34
Opposing Forces
SerialInference
Asynchronous Samplers
Parameter Server
MutualExclusion
OptimisticConcurrency
Control
SafeUnsafe
Accu
racy
Scalability
Optimistic Concurrency
Controlassume the best and correctX. Pan, J. Gonzalez, S. Jegelka, T. Broderick, M. Jordan.
Optimistic Concurrency Control for Distributed Unsupervised Learning. NIPS’13
XinghaoPan
TamaraBroderick
Stefanie Jegelka
MichaelJordan
Classic idea from Database Systems:
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems. 1981
Assume most operations won’t conflict:
• Execute operations without blockingFrequent case is fast
• Identify and resolve conflicts after they occur
Infrequent case with potentially costly resolution
37
Optimistic Concurrency Control
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
Allow computation to proceed without blocking.
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
?✔
Validate potential conflicts.
Valid outcome
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
? ?✗ ✗
Validate potential conflicts.
Invalid Outcome
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
Take a compensating action.
✗ ✗Amend the Value
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
✗ ✗
Validate potential conflicts.
Invalid Outcome
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
✗ ✗Rollback and Redo
Take a compensating action.
Kung & Robinson. On optimistic methods for concurrency control.
ACM Transactions on Database Systems 1981
Data
ModelState
Optimistic Concurrency Control
Processor 1
Processor 2
Rollback and Redo
Non-Blocking Computation
Validation: Identify Errors
Resolution: Correct Errors
Concurrency
AccuracyFast
Infrequent
Requirements:
Optimistic Concurrency Control
for Bayesian InferenceNon-parametric Models [Pan et al., NIPS’13]:
• OCC DP-Means: Dirichlet Process Clustering
• OCC BP-Means: Beta Process Feature Learning
Conditional Sampling: (In Progress)
• Collapsed Gibbs LDA
• Retrospective Sampling for HDP
45
DP-Means Algorithm
Start with DP Gaussian mixture model:
small variance limit
[Kulis and Jordan, ICML’12]
DP-Means Algorithm
Start with DP Gaussian mixture model:
small variance limit redefine :
[Kulis and Jordan, ICML’12]
DecreasesRapidly
DP-Means Algorithm
Corresponding Gibbs sampler conditionals:
Taking the small variance limit
[Kulis and Jordan, ICML’12]
DP-Means Algorithm
Gibbs updates become deterministic:
Taking the small variance limit
[Kulis and Jordan, ICML’12]
DP-Means Algorithm
Gibbs updates become deterministic:
[Kulis and Jordan, ICML’12]
DP-Means Algorithm
Computing cluster membership
[Kulis and Jordan, ICML’12]
λ
DP-Means Algorithm
Updating cluster centers:
[Kulis and Jordan, ICML’12]
DP-Means Parallel Execution
Computing cluster membership in parallel:
CPU 1
CPU 2
Cannot introduce
overlapping clusters in parallel
<λ
Optimistic Concurrency Control
for Parallel DP-Means
<λ
ResolutionAssign new cluster center to existing cluster
Optimistic AssumptionNo new cluster created nearby
ValidationVerify that new clusters don’t overlap
CPU 1
CPU 2
OCC DP-means
Theorem: OCC DP-means is serializable and therefore preserves theoretical properties of DP-means.
Theorem: Assuming well spaced clusters the expected overhead of OCC DP-means does not depend on data size.
Correctness
Concurrency
Empirical Validation Failure Rate
56
Para
llelis
m
OC
C O
verh
ead
Poin
ts F
aili
ng
Valid
ati
on
Dataset Size
λ Separable Clusters
2 Processors
4 Processors
8 Processors
16 Processors
32 Processors
Empirical Validation Failure Rate
57
Para
llelis
m
OC
C O
verh
ead
Poin
ts F
aili
ng
Valid
ati
on
Dataset Size
Overlapping Clusters
2 Processors
4 Processors
8 Processors
16 Processors
32 Processors
Distributed Evaluation Amazon EC2
1 2 3 4 5 6 7 80
500
1000
1500
2000
2500
3000
3500
Number of Machines
Ru
nti
me I
n S
econ
dP
er
Com
ple
te P
ass o
ver
Data
OCC DP-means Runtime Projected Linear Scaling
2x #machines≈ ½x runtime
~140 million data points; 1, 2, 4, 8 machines
Optimistic Future forOptimistic Concurrency
Control
59
SerialInference
Parameter Server
Optimistic Concurrency
Control
SafeUnsafe
Accuracy Scalability
OCC Collapsed Gibbs for LDA
Maintain epsilon intervals on the conditional:
System ensures conditionals are ε-accurate
Validate: Accept draws that land outside interval
Resolution: Serially resample rejected tokens
61
0 1u
ε ε ε ε
u
OCC Collapsed Gibbs for LDA
62
Optimism forOptimistic Concurrency
Control
63
Fast + EasyCoordination Free
Slow + ComplexResolution
Validation
Enable decades of work in serial Bayesian inference
to be extended to the parallel setting.
Frequen
t Rare