92
Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New Parallel Framework for Machine Learning Alex Smola

Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Embed Size (px)

Citation preview

Page 1: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Carnegie Mellon

Joseph GonzalezJoint work with

YuchengLow

AapoKyrola

DannyBickson

CarlosGuestrin

GuyBlelloch

JoeHellerstein

DavidO’Hallaron

A New Parallel Framework for Machine Learning

AlexSmola

Page 2: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

A

BC

D

Originates From

Is the driver

hostile?

C

Lives

Page 3: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Patient presents

abdominal pain.

Diagnosis?

Patient ate

which contains

purchasedfrom

Also sold

to

Diagnoses

withE. Coli

infection

Page 4: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

4

Cameras Cooking

Shopper 1 Shopper 2

Page 5: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

The Hollywood Fiction…Mr. Finch develops software which:

• Runs in “consolidated” data-center with access to all government data

• Processes multi-modal data• Video Surveillance• Federal and Local Databases• Social Networks• …

• Uses Advanced Machine Learning • Identify connected patterns• Predict catastrophic events

Page 6: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

…how far is this from reality?

6

Page 7: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Big Data is a reality

48 Hours a MinuteYouTube

24 Million Wikipedia Pages

750 MillionFacebook Users

6 Billion Flickr Photos

Page 8: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Machine learning is a reality

8

MachineLearning

Understanding

Linear Regression

xxx

xxx

x

x

x

x

Raw Data

Page 9: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Limited to Simplistic Models Fail to fully utilize the data

Substantial System Building EffortSystems evolve slowly and are costly

9

Big Data

+Large-Scale

Compute Clusters

+

We have mastered:

Simple Machine Learning

xxx

xxx

x

x

x

x

Page 10: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Advanced Machine Learning

10

Raw DataMachineLearning

Understanding

Mubarak Obama Netanyahu Abbas

Deep Belief / NeuralNetworks

Markov Random Fields

Needs

Supports

Cooperate

Distrusts

Cameras Cooking

Data dependencies substantiallycomplicate parallelization

Page 11: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Challenges of Learning at ScaleWide array of different parallel architectures:

New Challenges for Designing Machine Learning Algorithms: Race conditions and deadlocksManaging distributed model stateData-Locality and efficient inter-process coordination

New Challenges for Implementing Machine Learning Algorithms:Parallel debugging and profilingFault Tolerance

11

GPUs Multicore Clusters Mini Clouds Clouds

Page 12: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Rich Structured Machine Learning Techniques Capable of fully modeling the data dependencies

Goal: Rapid System DevelopmentQuickly adapt to new data, priors, and objectives Scale with new hardware and system advances

12

Big Data

+Large-Scale

Compute Clusters

+

The goal of the GraphLab project …

AdvancedMachine Learning

Page 13: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

OutlineImportance of Large-Scale Machine Learning

Need to model data-dependencies

Existing Large-Scale Machine Learning AbstractionsNeed for a efficient graph structured abstraction

GraphLab Abstraction:Addresses data-dependences Enables the expression of efficient algorithms

Experimental ResultsGraphLab dramatically outperforms existing abstractions

Open Research Challenges

Page 14: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

How will wedesign and implement

parallel learning systems?

Page 15: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Threads, Locks, & Messages

“low level parallel primitives”

We could use ….

Page 16: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Threads, Locks, and MessagesML experts repeatedly solve the same parallel design challenges:

Implement and debug complex parallel systemTune for a specific parallel platform6 months later the conference paper contains:

“We implemented ______ in parallel.”

The resulting code:is difficult to maintainis difficult to extendcouples learning model to parallel implementation

16

Graduate

students

Page 17: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Map-Reduce / HadoopBuild learning algorithms on-top of

high-level parallel abstractions

... a better answer:

Page 18: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

CPU 1 CPU 2 CPU 3 CPU 4

MapReduce – Map Phase

18

Embarrassingly Parallel independent computation

12.9

42.3

21.3

25.8

No Communication needed

Page 19: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

CPU 1 CPU 2 CPU 3 CPU 4

MapReduce – Map Phase

19

12.9

42.3

21.3

25.8

24.1

84.3

18.4

84.4

Image Features

Page 20: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

CPU 1 CPU 2 CPU 3 CPU 4

MapReduce – Map Phase

20

Embarrassingly Parallel independent computation

12.9

42.3

21.3

25.8

17.5

67.5

14.9

34.3

24.1

84.3

18.4

84.4

Page 21: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

CPU 1 CPU 2

MapReduce – Reduce Phase

21

12.9

42.3

21.3

25.8

24.1

84.3

18.4

84.4

17.5

67.5

14.9

34.3

2226.

26

1726.

31

Image Features

Attractive Face Statistics

Ugly Face Statistics

U A A U U U A A U A U A

Attractive Faces Ugly Faces

Page 22: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

BeliefPropagation

Label Propagation

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

PageRank

Lasso

Map-Reduce for Data-Parallel MLExcellent for large data-parallel tasks!

22

Data-Parallel Graph-Parallel

Algorithm Tuning

Feature Extraction

Map Reduce

Basic Data Processing

Is there more toMachine Learning

?

Page 23: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Concrete Example

Label Propagation

Page 24: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Profile

Label Propagation AlgorithmSocial Arithmetic:

Recurrence Algorithm:

iterate until convergence

Parallelism:Compute all Likes[i] in parallel

Sue Ann

Carlos

Me

50% What I list on my profile40% Sue Ann Likes10% Carlos Like

40%

10%

50%

80% Cameras20% Biking

30% Cameras70% Biking

50% Cameras50% Biking

I Like:

+60% Cameras, 40% Biking

Page 25: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Properties of Graph Parallel Algorithms

DependencyGraph

IterativeComputation

What I Like

What My Friends Like

Factored Computation

Page 26: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

?

BeliefPropagation

Label Propagation

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

PageRank

Lasso

Map-Reduce for Data-Parallel MLExcellent for large data-parallel tasks!

26

Data-Parallel Graph-Parallel

Map Reduce Map Reduce?Algorithm

TuningFeature

Extraction

Basic Data Processing

Page 27: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Why not use Map-Reducefor

Graph Parallel Algorithms?

Page 28: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Data Dependencies

Map-Reduce does not efficiently express data dependencies

User must code substantial data transformations Costly data replication

Inde

pend

ent D

ata

Row

s

Page 29: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Slow

Proc

esso

rIterative Algorithms

Map-Reduce not efficiently express iterative algorithms:

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Iterations

Barr

ier

Barr

ier

Barr

ier

Page 30: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

MapAbuse: Iterative MapReduceOnly a subset of data needs computation:

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Iterations

Barr

ier

Barr

ier

Barr

ier

Page 31: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

MapAbuse: Iterative MapReduceSystem is not optimized for iteration:

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Data

Data

Data

Data

Data

Data

Data

CPU 1

CPU 2

CPU 3

Iterations

Disk Pe

nalty

Disk Pe

nalty

Disk Pe

nalty

Sta

rtup

Pen

alty

Sta

rtup

Pen

alty

Sta

rtup

Pen

alty

Page 32: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

BeliefPropagation

SVM

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

PageRank

Lasso

Map-Reduce for Data-Parallel MLExcellent for large data-parallel tasks!

32

Data-Parallel Graph-Parallel

CrossValidation

Feature Extraction

Map Reduce

Computing SufficientStatistics

Map Reduce?Bulk Synchronous?

Page 33: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Barrie

rBulk Synchronous Parallel (BSP)

Implementations: Pregel, Giraph, …

Compute Communicate

Page 34: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Bulk synchronous computation can be highly inefficient.

34

Problem

Page 35: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Problem with Bulk SynchronousExample Algorithm: If Red neighbor then turn Red

Bulk Synchronous Computation :Evaluate condition on all vertices for every phase

4 Phases each with 9 computations 36 Computations

Asynchronous Computation (Wave-front) :Evaluate condition only when neighbor changes

4 Phases each with 2 computations 8 Computations

Time 0 Time 1 Time 2 Time 3 Time 4

Page 36: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

36

Real-World Example: Loopy Belief Propagation

Page 37: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Loopy Belief Propagation (Loopy BP)

• Iteratively estimate the “beliefs” about vertices– Read in messages– Updates marginal

estimate (belief)– Send updated

out messages• Repeat for all variables

until convergence

37

Page 38: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Bulk Synchronous Loopy BP

• Often considered embarrassingly parallel – Associate processor

with each vertex– Receive all messages– Update all beliefs– Send all messages

• Proposed by:– Brunton et al. CRV’06– Mendiburu et al. GECC’07– Kang,et al. LDMTA’10– …

38

Page 39: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Sequential Computational Structure

39

Page 40: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Hidden Sequential Structure

40

Page 41: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Hidden Sequential Structure

• Running Time:

EvidenceEvidence

Time for a singleparallel iteration

Number of Iterations

41

Page 42: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Optimal Sequential Algorithm

Forward-Backward

Bulk Synchronous

2n2/p

p ≤ 2n

RunningTime

2n

Gap

p = 1

Optimal Parallel

n

p = 2 42

Page 43: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

43

The Splash Operation• Generalize the optimal chain algorithm:

to arbitrary cyclic graphs:

~

1) Grow a BFS Spanning tree with fixed size

2) Forward Pass computing all messages at each vertex

3) Backward Pass computing all messages at each vertex

Page 44: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Data-Parallel Algorithms can be Inefficient

1 2 3 4 5 6 7 80

100020003000400050006000700080009000

Number of CPUs

Runti

me

in S

econ

ds

Optimized in Memory Bulk Synchronous

Asynchronous Splash BP

Page 45: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Summary of Work Efficiency

Bulk Synchronous Model Not Work Efficient!Compute “messages” before they are readyIncreasing processors increase the overall workCosts CPU time and Energy!

How do we recover work efficiency?Respect sequential structure of computationCompute “message” as needed: asynchronously

Page 46: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

BeliefPropagationSVM

KernelMethods

Deep BeliefNetworks

NeuralNetworks

Tensor Factorization

PageRank

Lasso

The Need for a New AbstractionMap-Reduce is not well suited for Graph-Parallelism

46

Data-Parallel Graph-Parallel

CrossValidation

Feature Extraction

Map Reduce

Computing SufficientStatistics

Bulk Synchronous

Page 47: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

OutlineImportance of Large-Scale Machine Learning

Need to model data-dependencies

Existing Large-Scale Machine Learning AbstractionsNeed for a efficient graph structured abstraction

GraphLab Abstraction:Addresses data-dependences Enables the expression of efficient algorithms

Experimental ResultsGraphLab dramatically outperforms existing abstractions

Open Research Challenges

Page 48: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

What is GraphLab?

Page 49: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

The GraphLab Abstraction

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

49

Page 50: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Data Graph

50

A graph with arbitrary data (C++ Objects) associated with each vertex and edge.

Vertex Data:• User profile text• Current interests estimates

Edge Data:• Similarity weights

Graph:• Social Network

Page 51: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Implementing the Data GraphMulticore Setting

In MemoryRelatively Straight Forward

vertex_data(vid) dataedge_data(vid,vid) dataneighbors(vid) vid_list

Challenge:Fast lookup, low overhead

Solution:Dense data-structuresFixed Vdata & Edata typesImmutable graph structure

Cluster Setting

In MemoryPartition Graph:

ParMETIS or Random Cuts

Cached Ghosting

Node 1 Node 2

A B

C D

A B

C D

A B

C D

Page 52: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

The GraphLab Abstraction

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

52

Page 53: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

label_prop(i, scope){ // Get Neighborhood data (Likes[i], Wij, Likes[j]) scope;

// Update the vertex data

// Reschedule Neighbors if needed if Likes[i] changes then reschedule_neighbors_of(i); }

Update Functions

53

An update function is a user defined program which when applied to a vertex transforms the data in the scope of the vertex

Page 54: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

The GraphLab Abstraction

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

54

Page 55: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

The Scheduler

55

CPU 1

CPU 2

The scheduler determines the order that vertices are updated.

e f g

kjih

dcba b

ih

a

i

b e f

j

c

Sch

edule

r

The process repeats until the scheduler is empty.

Page 56: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Choosing a Schedule

GraphLab provides several different schedulersRound Robin: vertices are updated in a fixed orderFIFO: Vertices are updated in the order they are addedPriority: Vertices are updated in priority order

56

The choice of schedule affects the correctness and parallel performance of the algorithm

Obtain different algorithms by simply changing a flag! --scheduler=roundrobin --scheduler=fifo --scheduler=priority Optimal Splash BP

Algorithm

Page 57: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

The GraphLab Abstraction

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

58

Page 58: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Ensuring Race-Free CodeHow much can computation overlap?

Page 59: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Importance of ConsistencyMany algorithms require strict consistency or perform

significantly better under strict consistency.

Alternating Least Squares

Page 60: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Importance of Consistency

Machine learning algorithms require “model debugging”

Build

Test

Debug

Tweak Model

Page 61: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

GraphLab Ensures Sequential Consistency

62

For each parallel execution, there exists a sequential execution of update functions which produces the same result.

CPU 1

CPU 2

SingleCPU

Parallel

Sequential

time

Page 62: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

CPU 1 CPU 2

Common Problem: Write-Write Race

63

Processors running adjacent update functions simultaneously modify shared data:

CPU1 writes: CPU2 writes:

Final Value

Page 63: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Consistency Rules

64

Guaranteed sequential consistency for all update functions

Data

Page 64: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Full Consistency

65

Page 65: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Obtaining More Parallelism

66

Page 66: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Edge Consistency

67

CPU 1 CPU 2

Safe

Read

Page 67: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Consistency Through R/W LocksRead/Write locks:

Full Consistency

Edge Consistency

Write Write WriteCanonical Lock Ordering

Read Write ReadRead Write

Page 68: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

The GraphLab Abstraction

Scheduler Consistency Model

Graph BasedData Representation

Update FunctionsUser Computation

71

Page 69: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

The Code

API Implemented in C++:Pthreads, GCC Atomics, TCP/IP, MPI, in house RPC

Multicore APIMatlab/Java/Python supportAvailable under Apache 2.0 License

Cloud APIBuilt and tested on EC2No Fault Tolerance

http://graphlab.org

Page 70: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Anatomy of a GraphLab Program:

1) Define C++ Update Function2) Build data graph using the C++ graph object3) Set engine parameters:

1) Scheduler type 2) Consistency model

4) Add initial vertices to the scheduler 5) Run the engine on the graph [Blocking C++ call]6) Final answer is stored in the graph

Page 71: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Carnegie Mellon

Bayesian Tensor Factorization

Gibbs Sampling

Dynamic Block Gibbs Sampling

MatrixFactorization

Lasso

SVM

Belief Propagation

PageRank

CoEM

K-Means

SVD

LDA

…Many others…

Page 72: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Startups Using GraphLab

Companies experimenting with Graphlab

Academic projects Exploring Graphlab

1600++ Unique Downloads Tracked(possibly many more from direct repository checkouts)

Page 73: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

GraphLab Matrix Factorization Toolkit

Used in ACM KDD Cup 2011 – track1 5th place out of more than 1000 participants.2 orders of magnitude faster than Mahout

Testimonials:“The Graphlab implementation is significantly faster than the Hadoop implementation … [GraphLab] is extremely efficient for networks with millions of nodes and billions of edges …” -- Akshay Bhat, Cornell

“The guys at GraphLab are crazy helpful and supportive … 78% of our value comes from motivation and brilliance of these guys.” -- Timmy Wilson, smarttypes.org

“I have been very impressed by Graphlab and your support/work on it.” -- Clive Cox, rumblelabs.com

Page 74: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

OutlineImportance of Large-Scale Machine Learning

Need to model data-dependencies

Existing Large-Scale Machine Learning AbstractionsNeed for a efficient graph structured abstraction

GraphLab Abstraction:Addresses data-dependences Enables the expression of efficient algorithms

Experimental ResultsGraphLab dramatically outperforms existing abstractions

Open Research Challenges

Page 75: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Shared MemoryExperiments

Shared Memory Setting16 Core Workstation

78

Page 76: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Loopy Belief Propagation

79

3D retinal image denoising

Data GraphUpdate Function:

Loopy BP Update EquationScheduler:

Approximate PriorityConsistency Model:

Edge Consistency

Vertices: 1 MillionEdges: 3 Million

Page 77: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Loopy Belief Propagation

80

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

Optimal

Bett

er

SplashBP

15.5x speedup

Page 78: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

CoEM (Rosie Jones, 2005)Named Entity Recognition Task

the dog

Australia

Catalina Island

<X> ran quickly

travelled to <X>

<X> is pleasant

Hadoop 95 Cores 7.5 hrs

Is “Dog” an animal?Is “Catalina” a place?

Vertices: 2 MillionEdges: 200 Million

Page 79: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

0 2 4 6 8 10 12 14 160

2

4

6

8

10

12

14

16

Number of CPUs

Spee

dup

Bett

er

Optimal

GraphLab CoEM

CoEM (Rosie Jones, 2005)

82

GraphLab 16 Cores 30 min

15x Faster!6x fewer CPUs!

Hadoop 95 Cores 7.5 hrs

Page 80: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

ExperimentsAmazon EC2

High-Performance Nodes

83

Page 81: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Video Cosegmentation

Segments mean the same

Model: 10.5 million nodes, 31 million edges

Gaussian EM clustering + BP on 3D grid

Page 82: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Video Coseg. Speedups

Page 83: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Prefetching Data & Locks

Page 84: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Matrix FactorizationNetflix Collaborative Filtering

Alternating Least Squares Matrix Factorization

Model: 0.5 million nodes, 99 million edges

Netflix

Users

Movies

d

Page 85: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

NetflixSpeedup Increasing size of the matrix factorization

Page 86: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Distributed GraphLab

Page 87: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

The Cost of Hadoop

Page 88: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

OutlineImportance of Large-Scale Machine Learning

Need to model data-dependencies

Existing Large-Scale Machine Learning AbstractionsNeed for a efficient graph structured abstraction

GraphLab Abstraction:Addresses data-dependences Enables the expression of efficient algorithms

Experimental ResultsGraphLab dramatically outperforms existing abstractions

Open Research Challenges

Page 89: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Storage of Large Data-GraphsFault tolerance to machine/network failure

Can I remove (re-task) a node or network resources without restarting dependent computation?

Relaxed transactional consistencyCan I eliminate locking and approximately recover when data corruption occurs?

Support rapid vertex and edge additionHow can I allow graphs to continuously grow while computation proceeds?

Graph partitioning for “natural graphs” How can I balance the computation while minimizing communication on a power-law graph?

Page 90: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Event driven graph computationTrigger computation on data and structural modifications

Exploit small neighborhood effects

Page 91: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

SummaryImportance of Large-Scale Machine Learning

Need to model data-dependencies

Existing Large-Scale Machine Learning AbstractionsNeed for a efficient graph structured abstraction

GraphLab Abstraction:Addresses data-dependences Enables the expression of efficient algorithms

Experimental ResultsGraphLab dramatically outperforms existing abstractions

Open Research Challenges

Page 92: Carnegie Mellon Joseph Gonzalez Joint work with Yucheng Low Aapo Kyrola Danny Bickson Carlos Guestrin Guy Blelloch Joe Hellerstein David O’Hallaron A New

Carnegie Mellon

Checkout GraphLab

http://graphlab.org

95

Documentation… Code… Tutorials…

Questions & Comments

[email protected]