62
Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent Presented by Joe, Jiefeng and Xiaolu

Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Large-Scale Matrix Factorizationwith Distributed Stochastic Gradient

DescentPresented by Joe, Jiefeng and Xiaolu

Page 2: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Matrix FactorizationStochastic Gradient Descent

Distributed SGD with MapReduceExperiments

Summary

Page 3: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Matrix Factorization

Original Image Actual Image

Reconstructed Image

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 4: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Collaborative Filtering -- Recommendation Systems

Problem Set of users Set of items (movies, books, jokes, products, stories, … Feedback (ratings, purchase, click-through, tags, ...

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 5: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Collaborative Filtering -- Recommendation Systems

Problem Set of users Set of items (movies, books, jokes, products, stories, … Feedback (ratings, purchase, click-through, tags, …

Predict additional items a user may like Assumptions: Similar feedback Similar taste

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 6: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Collaborative Filtering -- Recommendation Systems

Problem Set of users Set of items (movies, books, jokes, products, stories, … Feedback (ratings, purchase, click-through, tags, …

Predict additional items a user may like Assumptions: Similar feedback Similar taste

Example:

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 7: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Collaborative Filtering -- Recommendation Systems

Problem Set of users Set of items (movies, books, jokes, products, stories, … Feedback (ratings, purchase, click-through, tags, …

Predict additional items a user may like Assumptions: Similar feedback Similar taste

Example:

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 8: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

traditional Matrix Factorization

Given an m*n matrix V and a rank r, find an m*r matrix W and an r*n matrix H such that V = W H.

The goal here is to obtain a low-rank approximation V ~ W H.

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 9: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 10: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 11: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 12: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Generalized Matrix Factorization

A general machine learning problem Recommender systems, text indexing, face recognition, ...

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 13: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Generalized Matrix Factorization

A general machine learning problem Recommender systems, text indexing, face recognition, …Training data V: m * n input matrix (e.g., rating matrix) Z: training set of indexes in V (e.g., subset of known ratings)

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 14: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Generalized Matrix Factorization

A general machine learning problem Recommender systems, text indexing, face recognition, …Training data V: m * n input matrix (e.g., rating matrix) Z: training set of indexes in V (e.g., subset of known ratings)Parameter space W: row factors (e.g., m * r latent customer factors) H: column factors (e.g., r * n latent movie factors)

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 15: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Generalized Matrix Factorization

A general machine learning problem Recommender systems, text indexing, face recognition, …Training data V: m * n input matrix (e.g., rating matrix) Z: training set of indexes in V (e.g., subset of known ratings)Parameter space W: row factors (e.g., m * r latent customer factors) H: column factors (e.g., r * n latent movie factors)Model Lij(Wi*, H*j): loss at element (i, j) Includes prediction error, regularization, auxiliary information, … Constraints (e.g., non-negativity)

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 16: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Generalized Matrix FactorizationA general machine learning problem Recommender systems, text indexing, face recognition, …Training data V: m * n input matrix (e.g., rating matrix) Z: training set of indexes in V (e.g., subset of known ratings)Parameter space W: row factors (e.g., m * r latent customer factors) H: column factors (e.g., r * n latent movie factors)Model Lij(Wi*, H*j): loss at element (i, j) Includes prediction error, regularization, auxiliary information, … Constraints (e.g., non-negativity)Find best model

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 17: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Successful Applications Movie recommendation (Netflix) >20M users, >20k movies, 4B ratings (projected) 60GB data, 15GB model (projected) Collaborative filtering

Website recommendation (Microsoft, WWW10) 51M users, 15M URLs, 1.2B clicks 17.8GB data, 161GB metadata, 49GB model Gaussian non-negative matrix factorization

News personalization (Google, WWW07) Millions of users, millions of stories, ? clicks Probabilistic latent semantic indexing

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 18: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Successful Applications Movie recommendation (Netflix) >20M users, >20k movies, 4B ratings (projected) 60GB data, 15GB model (projected) Collaborative filtering Website recommendation (Microsoft, WWW10) 51M users, 15M URLs, 1.2B clicks 17.8GB data, 161GB metadata, 49GB model Gaussian non-negative matrix factorization News personalization (Google, WWW07) Millions of users, millions of stories, ? clicks Probabilistic latent semantic indexing

How to handle such massive scale? Big data Large models Expensive, iterative computations

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 19: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Matrix FactorizationStochastic Gradient Descent

Distributed SGD with MapReduceExperiments

Summary

Page 20: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Stochastic Gradient Descent

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 21: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Stochastic Gradient Descent

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 22: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Stochastic Gradient Descent

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 23: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Stochastic Gradient Descent

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 24: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Stochastic Gradient Descent

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 25: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Stochastic Gradient Descent

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 26: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Stochastic Gradient Descent for Matrix Factorization

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 27: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Stochastic Gradient Descent for Matrix Factorization

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 28: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Stochastic Gradient Descent for Matrix Factorization

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 29: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Stochastic Gradient Descent for Matrix Factorization

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 30: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Stochastic Gradient Descent for Matrix Factorization

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 31: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

A Small Problem for Distributed Computation

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 32: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

A Small Problem for Distributed Computation

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 33: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

A Small Problem for Distributed Computation, however

Cite: Large-Scale Matrix Factorization, Rainer Gemulla, November 23, 2012

Page 34: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Matrix FactorizationStochastic Gradient Descent

Distributed SGD with MapReduceExperiments

Summary

Page 35: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,
Page 36: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,
Page 37: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,
Page 38: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,
Page 39: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,
Page 40: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,
Page 41: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,
Page 42: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,
Page 43: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,
Page 44: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,
Page 45: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,
Page 46: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,
Page 47: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,
Page 48: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,
Page 49: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,
Page 50: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,
Page 51: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,
Page 52: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,
Page 53: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,
Page 54: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,
Page 55: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,
Page 56: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Matrix FactorizationStochastic Gradient Descent

Distributed SGD with MapReduceExperiments

Summary

Page 57: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Example: Netflix data

Page 58: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Example: Synth data

Page 59: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Scalability

Page 60: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Speed up

Page 61: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

Matrix FactorizationStochastic Gradient Descent

Distributed SGD with MapReduceExperiments

Summary

Page 62: Descent with Distributed Stochastic Gradient Large-Scale ...web.cs.wpi.edu/~cs525/f13b-EAR//cs525-homepage/... · A general machine learning problem Recommender systems, text indexing,

References:Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent. Rainer Gemulla. Peter J. Haas. Erik

Nijkamp. Yannis SismanisLarge-Scale Matrix Factorization, Rainer Gemulla