49
Distributed Machine Learning with the Samsara DSL Sebasan Schelter, Flink Forward 2015

Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Embed Size (px)

Citation preview

Page 1: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Distributed Machine Learning with the Samsara DSL

Sebastian Schelter, Flink Forward 2015

Page 2: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

About me• about to finish my PhD on „Scaling Data Mining in Massively Parallel

Dataflow Systems“

• currently:

– Machine Learning Scientist / Post-Doctoral Researcher at Amazon's Berlin-based ML group

– senior researcher at the Database Group of TU Berlin

• member of the Apache Software Foundation (Mahout/Giraph/Flink)

Page 3: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Samsara• Samsara is an easy-to-use domain specific language (DSL) for distributed large-scale

machine learning on systems like Apache Spark and Apache Flink

• part of the Apache Mahout project

• uses Scala as programming/scripting environment

• system-agnostic, R-like DSL :

val G = B %*% B.t - C - C.t + (ksi dot ksi) * (s_q cross s_q)

• algebraic expression optimizer for distributed linear algebra– provides a translation layer to distributed engines

Tqq

TTT ssCCBBG

Page 4: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Data Types• Scalar real values

• In-memory vectors – dense – 2 types of sparse

• In-memory matrices– sparse and dense– a number of specialized matrices

• Distributed Row Matrices (DRM)– huge matrix, partitioned by rows– lives in the main memory of the cluster– provides small set of parallelized operations– lazily evaluated operation execution

val x = 2.367

val v = dvec(1, 0, 5)

val w = svec((0 -> 1)::(2 -> 5)::Nil)

val A = dense((1, 0, 5), (2, 1, 4), (4, 3, 1))

val drmA = drmFromHDFS(...)

Page 5: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Features (1)• matrix, vector, scalar operators:

in-memory, distributed

• slicing operators

• assignments (in-memory only)

• vector-specific

• summaries

drmA %*% drmBA %*% xA.t %*% drmBA * B

A(5 until 20, 3 until 40)A(5, ::); A(5, 5)x(a to b)

A(5, ::) := x A *= BA -=: B; 1 /:= x

x dot y; x cross y

A.nrow; x.length; A.colSums; B.rowMeans x.sum; A.norm

Page 6: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Features (2)• solving linear systems

• in-memory decompositions

• distributed decompositions

• caching of DRMs

val x = solve(A, b)

val (inMemQ, inMemR) = qr(inMemM)val ch = chol(inMemM)val (inMemV, d) = eigen(inMemM)val (inMemU, inMemV, s) = svd(inMemM)

val (drmQ, inMemR) = thinQR(drmA)val (drmU, drmV, s) = dssvd(drmA, k = 50, q = 1)

val drmA_cached = drmA.checkpoint()

drmA_cached.uncache()

Page 7: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Example

Page 8: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Cereals

Name protein fat carbo sugars rating

Apple Cinnamon Cheerios 2 2 10.5 10 29.509541

Cap‘n‘Crunch 1 2 12 12 18.042851

Cocoa Puffs 1 1 12 13 22.736446

Froot Loops 2 1 11 13 32.207582

Honey Graham Ohs 1 2 12 11 21.871292

Wheaties Honey Gold 2 1 16 8 36.187559

Cheerios 6 2 17 1 50.764999

Clusters 3 2 13 7 40.400208

Great Grains Pecan 3 3 13 4 45.811716

http://lib.stat.cmu.edu/DASL/Datafiles/Cereals.html

Page 9: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Linear Regression• Assumption: target variable y generated by linear combination of feature

matrix X with parameter vector β, plus noise ε

• Goal: find estimate of the parameter vector β that explains the data well

• Cereals example X = weights of ingredients

y = customer rating

Xy

Page 10: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Data Ingestion• Usually: load dataset as DRM from a distributed filesystem:

val drmData = drmFromHdfs(...)

• ‚Mimick‘ a large dataset for our example:

val drmData = drmParallelize(dense( (2, 2, 10.5, 10, 29.509541), // Apple Cinnamon Cheerios (1, 2, 12, 12, 18.042851), // Cap'n'Crunch (1, 1, 12, 13, 22.736446), // Cocoa Puffs (2, 1, 11, 13, 32.207582), // Froot Loops (1, 2, 12, 11, 21.871292), // Honey Graham Ohs (2, 1, 16, 8, 36.187559), // Wheaties Honey Gold (6, 2, 17, 1, 50.764999), // Cheerios (3, 2, 13, 7, 40.400208), // Clusters

(3, 3, 13, 4, 45.811716)), // Great Grains Pecan numPartitions = 2)

Page 11: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Data Preparation• Cereals example: target variable y is customer rating, weights of

ingredients are features X

• extract X as DRM by slicing, fetch y as in-core vector

val drmX = drmData(::, 0 until 4)

val y = drmData.collect(::, 4)

8117164541333

4002084071323

7649995011726

1875593681612

87129221111221

20758232131112

73644622131211

04285118121221

509541291051022

.

.

.

.

.

.

.

.

..

drmX y

Page 12: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Estimating β• Ordinary Least Squares: minimizes the sum of residual squares between

true target variable and prediction of target variable• Closed-form expression for estimation of ß as

• Computing XTX and XTy is as simple as typing the formulas:

val drmXtX = drmX.t %*% drmX

val drmXty = drmX %*% y

yXXX TT 1)(ˆ

Page 13: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Estimating β• Solve the following linear system to get least-squares estimate of ß

• Fetch XTX and XTy onto the driver and use an in-core solver – assumes XTX fits into memory– uses analogon to R’s solve() function

val XtX = drmXtX.collect val Xty = drmXty.collect(::, 0)

val betaHat = solve(XtX, Xty)

yXXX TT

Page 14: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Estimating β• Solve the following linear system to get least-squares estimate of ß

• Fetch XTX and XTy onto the driver and use an in-memory solver – assumes XTX fits into memory– uses analogon to R’s solve() function

val XtX = drmXtX.collect val Xty = drmXty.collect(::, 0)

val betaHat = solve(XtX, Xty)

yXXX TT

→ We have implemented distributed linear regression!

Page 15: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Goodness of fit• Prediction of the target variable is simple matrix-vector multiplication

• Check L2 norm of the difference between true target variable and our prediction

val yHat = (drmX %*% betaHat).collect(::, 0) (y - yHat).norm(2)

ˆ Xy

Page 16: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Adding a bias term• Bias term left out so far

– constant factor added to the model, “shifts the line vertically”

• Common trick is to add a column of ones to the feature matrix– bias term will be learned automatically

141333

171323

111726

181612

1111221

1131112

1131211

1121221

11051022 .

41333

71323

11726

81612

111221

131112

131211

121221

105.1022

Page 17: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Adding a bias term• How do we add a new column to a DRM? → mapBlock() allows for custom modifications of the matrix

val drmXwithBiasColumn = drmX.mapBlock(ncol = drmX.ncol + 1) {

case(keys, block) => // create a new block with an additional column

val blockWithBiasCol = block.like(block.nrow, block.ncol+1)

// copy data from current block into the new block blockWithBiasCol(::, 0 until block.ncol) := block // last column consists of ones blockWithBiasColumn(::, block.ncol) := 1

keys -> blockWithBiasColumn }

Page 18: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Under the covers

Page 19: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Underlying systems

• prototype on Apache Spark

• prototype on h20:

• coming up: support for Apache Flink

Page 20: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Runtime & Optimization• Execution is defered, user

composes logical operators

• Computational actions implicitly trigger optimization (= selection of physical plan) and execution

• Optimization factors: size of operands, orientation of operands, partitioning, sharing of computational paths

• e. g.: matrix multiplication:– 5 physical operators for drmA %*% drmB– 2 operators for drmA %*% inMemA– 1 operator for drm A %*% x – 1 operator for x %*% drmA

val C = A.t %*% A

I.writeDrm(path);

val inMemV =(U %*% M).collect

Page 21: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Optimization Example• Computation of ATA in example

• Naïve execution

1st pass: transpose A (requires repartitioning of A)

2nd pass: multiply result with A(expensive, potentially requires repartitioning again)

• Logical optimization:

rewrite plan to use specialized logical operator for Transpose-Times-Self matrix multiplication

val C = A.t %*% A

Page 22: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Optimization Example• Computation of ATA in example

• Naïve execution

1st pass: transpose A (requires repartitioning of A)

2nd pass: multiply result with A(expensive, potentially requires repartitioning again)

• Logical optimization:

rewrite plan to use specialized logical operator for Transpose-Times-Self matrix multiplication

val C = A.t %*% A

Transpose

A

Page 23: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Optimization Example• Computation of ATA in example

• Naïve execution

1st pass: transpose A (requires repartitioning of A)

2nd pass: multiply result with A(expensive, potentially requires repartitioning again)

• Logical optimization:

rewrite plan to use specialized logical operator for Transpose-Times-Self matrix multiplication

val C = A.t %*% A

Transpose

MatrixMult

A A

C

Page 24: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Optimization Example• Computation of ATA in example

• Naïve execution

1st pass: transpose A (requires repartitioning of A)

2nd pass: multiply result with A(expensive, potentially requires repartitioning again)

• Logical optimization

Optimizer rewrites plan to use specialized logical operator for Transpose-Times-Self matrix multiplication

val C = A.t %*% A

Transpose

MatrixMult

A A

C

Transpose-Times-Self

A

C

Page 25: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Transpose-Times-Self• Samsara computes ATA via row-outer-product formulation

– executes in a single pass over row-partitioned A

m

i

Tii

T aaAA0

Page 26: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Transpose-Times-Self• Samsara computes ATA via row-outer-product formulation

– executes in a single pass over row-partitioned A

m

i

Tii

T aaAA0

A

Page 27: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Transpose-Times-Self• Samsara computes ATA via row-outer-product formulation

– executes in a single pass over row-partitioned A

m

i

Tii

T aaAA0

x

A AT

Page 28: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Transpose-Times-Self• Samsara computes ATA via row-outer-product formulation

– executes in a single pass over row-partitioned A

m

i

Tii

T aaAA0

x = x

A AT a1• a1•T

Page 29: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Transpose-Times-Self• Samsara computes ATA via row-outer-product formulation

– executes in a single pass over row-partitioned A

m

i

Tii

T aaAA0

x = x + x

A AT a1• a1•T a2• a2•

T

Page 30: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Transpose-Times-Self• Mahout computes ATA via row-outer-product formulation

– executes in a single pass over row-partitioned A

m

i

Tii

T aaAA0

x = x + +x x

A AT a1• a1•T a2• a2•

T a3• a3•T

Page 31: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Transpose-Times-Self• Samsara computes ATA via row-outer-product formulation

– executes in a single pass over row-partitioned A

m

i

Tii

T aaAA0

x = x + + +x x x

A AT a1• a1•T a2• a2•

T a3• a3•T a4• a4•

T

Page 32: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Physical operators for the distributed computation of ATA

Page 33: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Physical operators for Transpose-Times-Self

• Two physical operators (concrete implementations) available for Transpose-Times-Self operation

– standard operator AtA – operator AtA_slim, specialized

implementation for tall & skinny matrices

• Optimizer must choose – currently: depends on user-defined

threshold for number of columns– ideally: cost based decision, dependent on

estimates of intermediate result sizes

Transpose-Times-Self

A

C

Page 34: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Physical operator AtA

1100

0101

0111

A

Page 35: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

A2

1100

Physical operator AtA

1100

0101

0111A1

A

worker 1

worker 2

0101

0111

Page 36: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

A2

1100

Physical operator AtA

1100

0101

0111A1

A

worker 1

worker 2

0101

0111

for 1st partition

for 1st partition

Page 37: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

A2

1100

Physical operator AtA

1100

0101

0111A1

A

worker 1

worker 2

0101

0111

01111

1

11000

0

for 1st partition

for 1st partition

Page 38: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

A2

1100

Physical operator AtA

1100

0101

0111A1

A

worker 1

worker 2

0101

0111

01111

1

11000

0

for 1st partition

for 1st partition

01010

1

Page 39: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

A2

1100

Physical operator AtA

1100

0101

0111A1

A

worker 1

worker 2

0101

0111

01111

1

11000

0

for 1st partition

for 1st partition

01010

1

for 2nd partition

for 2nd partition

Page 40: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

A2

1100

Physical operator AtA

1100

0101

0111A1

A

worker 1

worker 2

0101

0111

01111

1

11000

0

for 1st partition

for 1st partition

01010

1

01110

1

for 2nd partition

11001

1

for 2nd partition

Page 41: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

A2

1100

Physical operator AtA

1100

0101

0111A1

A

worker 1

worker 2

0101

0111

01111

1

11000

0

for 1st partition

for 1st partition

01010

1

01110

1

for 2nd partition

01010

1

11001

1

for 2nd partition

Page 42: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

A2

1100

Physical operator AtA

1100

0101

0111A1

A

worker 1

worker 2

0101

0111

0111

0111

0000

0000

for 1st partition

for 1st partition

0000

0101

0000

0111

for 2nd partition

0000

0101

1100

1100

for 2nd partition

Page 43: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

A2

1100

Physical operator AtA

1100

0101

0111A1

A

worker 1

worker 2

0101

0111

0111

0111

0000

0000

for 1st partition

for 1st partition

0000

0101

0000

0111

for 2nd partition

0000

0101

1100

1100

for 2nd partition

0111

0212

worker 3

1100

1312

worker 4

ATA

Page 44: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Physical operator AtA_slim

1100

0101

0111

A

Page 45: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

A2

1100

Physical operator AtA_slim

1100

0101

0111A1

A

worker 1

worker 2

0101

0111

Page 46: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

A2TA2 A2

1100

1

11

000

0000

Physical operator AtA_slim

1100

0101

0111A1

TA1 A1

A

worker 1

worker 2

0101

0111

0

02

011

0212

Page 47: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

A2TA2 A2

1100

1

11

000

0000

Physical operator AtA_slim

1100

0101

0111A1

TA1 A1

A C = ATA

worker 1

worker 2

A1TA1 + A2

TA2

driver

0101

0111

0

02

011

0212

1100

1312

0111

0212

Page 49: Sebastian Schelter – Distributed Machine Learing with the Samsara DSL

Thank you. Questions?