Upload
hadoopsummit
View
2.708
Download
2
Embed Size (px)
DESCRIPTION
Recommender Systems play a crucial role in a variety of businesses in today`s world. From E-Commerce web sites to News Portals, companies are leveraging data about their users to create a personalizes user experience, gain competitive advantage and eventually drive revenue. Dealing with the sheer quantity of data readily available can be a daunting task by itself. Consider applying machine learning algorithms on top of it and it makes the problem exponentially complex. Fortunately, tools like Hadoop and HBase make this task a little more manageable by taking out some of the complexities of dealing with a large amount of data. In this talk, we will share our success story of building a recommender system for Bloomberg.com leveraging the Hadoop ecosystem. We will describe the high level architecture of the system and discuss the pros and cons of our design choices. Bloomberg.com operates at a scale of 100s of millions of users. Building a recommendation engine for Bloomberg.com entails applying Machine Learning algorithms on terabytes of data and still being able to serve sub-second responses. We will discuss techniques for efficiently and reliably collecting data in near real-time, the notion of offline vs. online processing and most importantly, how HBase perfectly fits the bill by serving as a real-time database as well as input/output for running MapReduce.
Citation preview
1
Dhaval ShahR&D Software Engineer, Bloomberg L. P.
Recommender Systems at scale using HBase and Hadoop
Bloomberg
2
Agenda Introduction to Recommender Systems Types of Recommender Systems Building a Recommender System Summary (Hopefully) Lots of Q&A
Bloomberg
3
What is a Recommender System? Wikipedia1 – Recommender systems are a subclass of
information filtering system that seek to predict the ‘rating’ or ‘preference’ that user would give to an item or social element they had not yet considered, using a model built from the characteristics of an item (content-based approaches) or the user’s social environment (collaborative filtering approaches).
Introduction to Recommender Systems
Bloomberg
Introduction to Recommender Systems
Where are Recommender Systems used? Everywhere! (Well almost!)
E-Commerce Web Portals Online Radio Streaming Movies Media/News
4Bloomberg
8
Introduction to Recommender Systems
Bloomberg
11
Why do you need a Recommender System? Too much useful information Bloomberg.com statistics
o 500-1000 stories, 100-200 videos published per dayo Average user consumption << Articles publishedo Satisfied user = Content Quality + User preferenceo Double digit increases in CTR
Introduction to Recommender Systems
Bloomberg
12
Types of Recommender Systems
Content-Based Collaborative filter based
User-based Item-based
Hybrid
Bloomberg
13
Building a Recommender System
Collect/Generate metadata about stories/videos Identify and track users Track user activity Store user activity Generate user models Serve recommendations
Bloomberg
14
Collect metadata about stories/videos URLs, Headlines, etc. Sqoop, Custom Scripts
Generate features for stories LDA from Mahout Custom extensions
Bloomberg
Building a Recommender System
15
Identify and track users Registered Anonymous
o Cookie based trackingo IP based tracking
Bloomberg
Building a Recommender System
16
Types of user activity Explicit interactions Implicit interactions
Bloomberg
Building a Recommender System
17
Tracking user activity
Bloomberg
Building a Recommender System
Browser(Javascript)
HTTP
ServerD
Flume HBase
18
Tracking : Key Features 1000s of ppm Asynchronous - Instantaneous responses to client Reliability Multiple HTTP Servers → Multiple Clusters Client to HBase in milliseconds
Bloomberg
Building a Recommender System
19
Why HBase? Scalable Fault-tolerant Auto-sharding Schema-less and sparse Real-time queries MR integration
Bloomberg
Building a Recommender System
20
Store user activity 100s of millions of users Millions of stories/videos TBs of data Wide Tables – 1 row per user High load Sub-second response times Multiple MR jobs every few mins
Bloomberg
Building a Recommender System
21
Generate user models using ML 100s of millions of users High IO/Processing power Train multiple times an hour
Bloomberg
Building a Recommender System
22
Content-based Recommender Models User model independent of other users Train only when user has new interaction Easily parallelizable No Reducer Incremental training Train 1000 user models a minute
Bloomberg
Building a Recommender System
23
Collaborative filter based Recommender Models User model dependent of other users Train all models frequently Map side self join No Reducer Batch training Train 10s of millions of user models on each batch
Bloomberg
Building a Recommender System
24
Serve recommendations Query HBase Evaluate articles against user models In-memory cache 1000s of requests per minute 50ms responses
Bloomberg
Building a Recommender System
25
Summary
Recommender System are important Content based and Collaborative filter based Cross domain expertise – Big Data, Machine Learning Hadoop/MapReduce for offline components HBase as a hybrid data store
Bloomberg
27
Questions?
Bloomberg