Mendeley Suggest: Engineering a Personalised Article Recommender System

Preview:

DESCRIPTION

I gave this presentation at the RecSysChallenge workshop (http://2012.recsyschallenge.com/) at Recommender Systems 2012 (http://recsys.acm.org/2012/) in Dublin on 13 September, 2012. This presentation describes how we have been making use of Mahout to power Mendeley Suggest. First, it includes some results from tuning Mahout's recommender on AWS and the cost vs. precision tradeoff. Then it concludes with details on how to make use of other big data technologies and AWS in order to put a serving layer in place. Acknowledgement to Daniel Jones for making the slides for the serving layer part of the presentation.

Citation preview

Mendeley Suggest: Engineering a

Personalised Article Recommender System

Kris Jack, PhDChief Data Scientist

https://twitter.com/_krisjack

➔ What's Mendeley?

➔ What's Mendeley Suggest?

➔ Computation Layer

➔ Serving Layer➔ Architecture➔ Technologies➔ Deployment

➔ Conclusions

Overview

What's Mendeley?

➔ Mendeley is a platform that connects researchers, research data and apps

Mendeley Open API

➔ Mendeley is a platform that connects researchers, research data and apps

➔ Startup company with ~20 R&D engineers

Mendeley Open API

What's Mendeley Suggest?

Use Case

➔ Good researchers are on top of their game➔ Difficult with the amount being produced

➔ There must be a technology that can help

➔ Help researchers by recommending relevant research

Mendeley Suggest

Computation Layer

Mendeley Suggest

Mendeley Suggest

Mendeley Suggest

Running on Amazon's Elastic Map Reduce

On demand use and easy to cost

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

3

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

3

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

3

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

3

-4.1K(63%)

Mahout'sPerformance

ParitionersMR allocation

1.5M Users, 50M ArticlesComputation Layer

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

3

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

Orig. user-based➔1K, 2.5

3

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

Orig. user-based➔1K, 2.5

3

-1.4K(58%)

+1 (67%)

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

Orig. user-based➔1K, 2.5

3

Cust. user-based➔0.3K, 2.5

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

Orig. user-based➔1K, 2.5

3

Cust. user-based➔0.3K, 2.5

-0.7K(70%)

Mahout'sPerformance

-4.1K(63%)

1.5M Users, 50M ArticlesComputation Layer

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

Orig. user-based➔1K, 2.5

3

Cust. user-based➔0.3K, 2.5

-6.2K(95%)

Mahout'sPerformance

+1 (67%)

1.5M Users, 50M ArticlesComputation Layer

Mahout as the Computation Layer

➔ Out of the box, didn't work so well for us➔ Needed to understand Hadoop better➔ Contributed patch back to community (user-user)

➔ Next step, the serving layer...

Serving Layer

MendeleyHadoopCluster

UserLibraries

Cascading

Architecture

ComputationLayer

AWS

MendeleyHadoopCluster

DynamoDB ElasticBeanstalk

ElasticBeanstalk

ElasticBeanstalk

UserLibraries

Map Reduce

Architecture

ComputationLayer

ServingLayer

➔ Spring dependency injection framework➔ Context-wide integration testing is easy, including pre-loading

of test data➔ Allows other Spring features (cache, security, messaging)

➔ Spring MVC 3.2.M1➔ Annotated controllers, type conversion 'for free'➔ Asynchronous Servlet 3.0 supports thread 'parking'

➔ AlternatorDB➔ In-memory DynamoDB implementation for testing

Technologies

Recommendation<K>

LongRecommendation UuidRecommendation

DocumentRecommendationGroupRecommendation PersonRecommendation

➔ Build once, employ in several use cases

Technologies

➔ AWS ElasticBeanstalk➔ Managed, auto-scaling, health-checking .war container

➔ Jenkins continuous integration (CI) server➔ Maven build tool (useful dependency management)➔ beanstalk-maven-plugin (push a button to deploy)

➔ Deploys to ElasticBeanstalk➔ Replaces existing application version if required➔ 'Zero downtime' updates (tested at ~300ms)➔ Triggered by Jenkins

Deployment

Putting it all together... $$$

➔ Real-time article recommendations for 2 million users➔ 20 requests per second➔ $65.84/month

➔ $34.24 ElasticBeanstalk➔ $28.17 DynamoDB➔ $2.76 bandwidth

➔ $30 to update the computation layer periodically

Conclusions

Conclusions

➔ Mendeley Suggest is a personalised article recommender➔ Built by small team for big data➔ Uses Mahout as computation layer

➔ Needs some love out of the box➔ Serves from AWS

➔ Reduces maintenance costs and is reliable➔ Intend to release Mendeley Suggest to all users this year

We're Hiring!

➔ Data Scientist➔ apply recommender technologies to Mendeley's data

➔ work on improving the quality of Mendeley's research catalogue

➔ starting in first quarter of 2013

➔ 6 month secondment in KNOW Center, TU Graz, Austria as part of the EC FP7 TEAM project (http://team-project.tugraz.at/)

➔ http://www.mendeley.com/careers/

www.mendeley.com

Recommended