27
Dhaval Shah R&D Software Engineer, Bloomberg L. P. Recommender Systems at scale using HBase and Hadoop 1 Bloomberg

Recommender System at Scale Using HBase and Hadoop

Embed Size (px)

DESCRIPTION

Recommender Systems play a crucial role in a variety of businesses in today`s world. From E-Commerce web sites to News Portals, companies are leveraging data about their users to create a personalizes user experience, gain competitive advantage and eventually drive revenue. Dealing with the sheer quantity of data readily available can be a daunting task by itself. Consider applying machine learning algorithms on top of it and it makes the problem exponentially complex. Fortunately, tools like Hadoop and HBase make this task a little more manageable by taking out some of the complexities of dealing with a large amount of data. In this talk, we will share our success story of building a recommender system for Bloomberg.com leveraging the Hadoop ecosystem. We will describe the high level architecture of the system and discuss the pros and cons of our design choices. Bloomberg.com operates at a scale of 100s of millions of users. Building a recommendation engine for Bloomberg.com entails applying Machine Learning algorithms on terabytes of data and still being able to serve sub-second responses. We will discuss techniques for efficiently and reliably collecting data in near real-time, the notion of offline vs. online processing and most importantly, how HBase perfectly fits the bill by serving as a real-time database as well as input/output for running MapReduce.

Citation preview

Page 1: Recommender System at Scale Using HBase and Hadoop

1

Dhaval ShahR&D Software Engineer, Bloomberg L. P.

Recommender Systems at scale using HBase and Hadoop

Bloomberg

Page 2: Recommender System at Scale Using HBase and Hadoop

2

Agenda Introduction to Recommender Systems Types of Recommender Systems Building a Recommender System Summary (Hopefully) Lots of Q&A

Bloomberg

Page 3: Recommender System at Scale Using HBase and Hadoop

3

What is a Recommender System? Wikipedia1 – Recommender systems are a subclass of

information filtering system that seek to predict the ‘rating’ or ‘preference’ that user would give to an item or social element they had not yet considered, using a model built from the characteristics of an item (content-based approaches) or the user’s social environment (collaborative filtering approaches).

Introduction to Recommender Systems

Bloomberg

Page 4: Recommender System at Scale Using HBase and Hadoop

Introduction to Recommender Systems

Where are Recommender Systems used? Everywhere! (Well almost!)

E-Commerce Web Portals Online Radio Streaming Movies Media/News

4Bloomberg

Page 5: Recommender System at Scale Using HBase and Hadoop
Page 6: Recommender System at Scale Using HBase and Hadoop
Page 7: Recommender System at Scale Using HBase and Hadoop
Page 8: Recommender System at Scale Using HBase and Hadoop

8

Introduction to Recommender Systems

Bloomberg

Page 9: Recommender System at Scale Using HBase and Hadoop
Page 10: Recommender System at Scale Using HBase and Hadoop
Page 11: Recommender System at Scale Using HBase and Hadoop

11

Why do you need a Recommender System? Too much useful information Bloomberg.com statistics

o 500-1000 stories, 100-200 videos published per dayo Average user consumption << Articles publishedo Satisfied user = Content Quality + User preferenceo Double digit increases in CTR

Introduction to Recommender Systems

Bloomberg

Page 12: Recommender System at Scale Using HBase and Hadoop

12

Types of Recommender Systems

Content-Based Collaborative filter based

User-based Item-based

Hybrid

Bloomberg

Page 13: Recommender System at Scale Using HBase and Hadoop

13

Building a Recommender System

Collect/Generate metadata about stories/videos Identify and track users Track user activity Store user activity Generate user models Serve recommendations

Bloomberg

Page 14: Recommender System at Scale Using HBase and Hadoop

14

Collect metadata about stories/videos URLs, Headlines, etc. Sqoop, Custom Scripts

Generate features for stories LDA from Mahout Custom extensions

Bloomberg

Building a Recommender System

Page 15: Recommender System at Scale Using HBase and Hadoop

15

Identify and track users Registered Anonymous

o Cookie based trackingo IP based tracking

Bloomberg

Building a Recommender System

Page 16: Recommender System at Scale Using HBase and Hadoop

16

Types of user activity Explicit interactions Implicit interactions

Bloomberg

Building a Recommender System

Page 17: Recommender System at Scale Using HBase and Hadoop

17

Tracking user activity

Bloomberg

Building a Recommender System

Browser(Javascript)

HTTP

ServerD

Flume HBase

Page 18: Recommender System at Scale Using HBase and Hadoop

18

Tracking : Key Features 1000s of ppm Asynchronous - Instantaneous responses to client Reliability Multiple HTTP Servers → Multiple Clusters Client to HBase in milliseconds

Bloomberg

Building a Recommender System

Page 19: Recommender System at Scale Using HBase and Hadoop

19

Why HBase? Scalable Fault-tolerant Auto-sharding Schema-less and sparse Real-time queries MR integration

Bloomberg

Building a Recommender System

Page 20: Recommender System at Scale Using HBase and Hadoop

20

Store user activity 100s of millions of users Millions of stories/videos TBs of data Wide Tables – 1 row per user High load Sub-second response times Multiple MR jobs every few mins

Bloomberg

Building a Recommender System

Page 21: Recommender System at Scale Using HBase and Hadoop

21

Generate user models using ML 100s of millions of users High IO/Processing power Train multiple times an hour

Bloomberg

Building a Recommender System

Page 22: Recommender System at Scale Using HBase and Hadoop

22

Content-based Recommender Models User model independent of other users Train only when user has new interaction Easily parallelizable No Reducer Incremental training Train 1000 user models a minute

Bloomberg

Building a Recommender System

Page 23: Recommender System at Scale Using HBase and Hadoop

23

Collaborative filter based Recommender Models User model dependent of other users Train all models frequently Map side self join No Reducer Batch training Train 10s of millions of user models on each batch

Bloomberg

Building a Recommender System

Page 24: Recommender System at Scale Using HBase and Hadoop

24

Serve recommendations Query HBase Evaluate articles against user models In-memory cache 1000s of requests per minute 50ms responses

Bloomberg

Building a Recommender System

Page 25: Recommender System at Scale Using HBase and Hadoop

25

Summary

Recommender System are important Content based and Collaborative filter based Cross domain expertise – Big Data, Machine Learning Hadoop/MapReduce for offline components HBase as a hybrid data store

Bloomberg

Page 26: Recommender System at Scale Using HBase and Hadoop

26

Hiring

Email: [email protected]

Bloomberg

Page 27: Recommender System at Scale Using HBase and Hadoop

27

Questions?

Bloomberg