39
Mendeley Suggest: Engineering a Personalised Article Recommender System Kris Jack, PhD Chief Data Scientist https://twitter.com/_krisjack

Mendeley Suggest: Engineering a Personalised Article Recommender System

Embed Size (px)

DESCRIPTION

I gave this presentation at the RecSysChallenge workshop (http://2012.recsyschallenge.com/) at Recommender Systems 2012 (http://recsys.acm.org/2012/) in Dublin on 13 September, 2012. This presentation describes how we have been making use of Mahout to power Mendeley Suggest. First, it includes some results from tuning Mahout's recommender on AWS and the cost vs. precision tradeoff. Then it concludes with details on how to make use of other big data technologies and AWS in order to put a serving layer in place. Acknowledgement to Daniel Jones for making the slides for the serving layer part of the presentation.

Citation preview

Page 1: Mendeley Suggest: Engineering a Personalised Article Recommender System

Mendeley Suggest: Engineering a

Personalised Article Recommender System

Kris Jack, PhDChief Data Scientist

https://twitter.com/_krisjack

Page 2: Mendeley Suggest: Engineering a Personalised Article Recommender System

➔ What's Mendeley?

➔ What's Mendeley Suggest?

➔ Computation Layer

➔ Serving Layer➔ Architecture➔ Technologies➔ Deployment

➔ Conclusions

Overview

Page 3: Mendeley Suggest: Engineering a Personalised Article Recommender System

What's Mendeley?

Page 4: Mendeley Suggest: Engineering a Personalised Article Recommender System

➔ Mendeley is a platform that connects researchers, research data and apps

Mendeley Open API

Page 5: Mendeley Suggest: Engineering a Personalised Article Recommender System

➔ Mendeley is a platform that connects researchers, research data and apps

➔ Startup company with ~20 R&D engineers

Mendeley Open API

Page 6: Mendeley Suggest: Engineering a Personalised Article Recommender System

What's Mendeley Suggest?

Page 7: Mendeley Suggest: Engineering a Personalised Article Recommender System

Use Case

➔ Good researchers are on top of their game➔ Difficult with the amount being produced

➔ There must be a technology that can help

➔ Help researchers by recommending relevant research

Page 8: Mendeley Suggest: Engineering a Personalised Article Recommender System

Mendeley Suggest

Page 9: Mendeley Suggest: Engineering a Personalised Article Recommender System

Computation Layer

Page 10: Mendeley Suggest: Engineering a Personalised Article Recommender System

Mendeley Suggest

Page 11: Mendeley Suggest: Engineering a Personalised Article Recommender System

Mendeley Suggest

Page 12: Mendeley Suggest: Engineering a Personalised Article Recommender System

Mendeley Suggest

Page 13: Mendeley Suggest: Engineering a Personalised Article Recommender System

Running on Amazon's Elastic Map Reduce

On demand use and easy to cost

Page 14: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 15: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 16: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 17: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 18: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

3

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 19: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

3

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 20: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

3

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 21: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

3

-4.1K(63%)

Mahout'sPerformance

ParitionersMR allocation

1.5M Users, 50M ArticlesComputation Layer

Page 22: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

3

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 23: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

Orig. user-based➔1K, 2.5

3

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 24: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

Orig. user-based➔1K, 2.5

3

-1.4K(58%)

+1 (67%)

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 25: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

Orig. user-based➔1K, 2.5

3

Cust. user-based➔0.3K, 2.5

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 26: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

Orig. user-based➔1K, 2.5

3

Cust. user-based➔0.3K, 2.5

-0.7K(70%)

Mahout'sPerformance

-4.1K(63%)

1.5M Users, 50M ArticlesComputation Layer

Page 27: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

Orig. user-based➔1K, 2.5

3

Cust. user-based➔0.3K, 2.5

-6.2K(95%)

Mahout'sPerformance

+1 (67%)

1.5M Users, 50M ArticlesComputation Layer

Page 28: Mendeley Suggest: Engineering a Personalised Article Recommender System

Mahout as the Computation Layer

➔ Out of the box, didn't work so well for us➔ Needed to understand Hadoop better➔ Contributed patch back to community (user-user)

➔ Next step, the serving layer...

Page 29: Mendeley Suggest: Engineering a Personalised Article Recommender System

Serving Layer

Page 30: Mendeley Suggest: Engineering a Personalised Article Recommender System

MendeleyHadoopCluster

UserLibraries

Cascading

Architecture

ComputationLayer

Page 31: Mendeley Suggest: Engineering a Personalised Article Recommender System

AWS

MendeleyHadoopCluster

DynamoDB ElasticBeanstalk

ElasticBeanstalk

ElasticBeanstalk

UserLibraries

Map Reduce

Architecture

ComputationLayer

ServingLayer

Page 32: Mendeley Suggest: Engineering a Personalised Article Recommender System

➔ Spring dependency injection framework➔ Context-wide integration testing is easy, including pre-loading

of test data➔ Allows other Spring features (cache, security, messaging)

➔ Spring MVC 3.2.M1➔ Annotated controllers, type conversion 'for free'➔ Asynchronous Servlet 3.0 supports thread 'parking'

➔ AlternatorDB➔ In-memory DynamoDB implementation for testing

Technologies

Page 33: Mendeley Suggest: Engineering a Personalised Article Recommender System

Recommendation<K>

LongRecommendation UuidRecommendation

DocumentRecommendationGroupRecommendation PersonRecommendation

➔ Build once, employ in several use cases

Technologies

Page 34: Mendeley Suggest: Engineering a Personalised Article Recommender System

➔ AWS ElasticBeanstalk➔ Managed, auto-scaling, health-checking .war container

➔ Jenkins continuous integration (CI) server➔ Maven build tool (useful dependency management)➔ beanstalk-maven-plugin (push a button to deploy)

➔ Deploys to ElasticBeanstalk➔ Replaces existing application version if required➔ 'Zero downtime' updates (tested at ~300ms)➔ Triggered by Jenkins

Deployment

Page 35: Mendeley Suggest: Engineering a Personalised Article Recommender System

Putting it all together... $$$

➔ Real-time article recommendations for 2 million users➔ 20 requests per second➔ $65.84/month

➔ $34.24 ElasticBeanstalk➔ $28.17 DynamoDB➔ $2.76 bandwidth

➔ $30 to update the computation layer periodically

Page 36: Mendeley Suggest: Engineering a Personalised Article Recommender System

Conclusions

Page 37: Mendeley Suggest: Engineering a Personalised Article Recommender System

Conclusions

➔ Mendeley Suggest is a personalised article recommender➔ Built by small team for big data➔ Uses Mahout as computation layer

➔ Needs some love out of the box➔ Serves from AWS

➔ Reduces maintenance costs and is reliable➔ Intend to release Mendeley Suggest to all users this year

Page 38: Mendeley Suggest: Engineering a Personalised Article Recommender System

We're Hiring!

➔ Data Scientist➔ apply recommender technologies to Mendeley's data

➔ work on improving the quality of Mendeley's research catalogue

➔ starting in first quarter of 2013

➔ 6 month secondment in KNOW Center, TU Graz, Austria as part of the EC FP7 TEAM project (http://team-project.tugraz.at/)

➔ http://www.mendeley.com/careers/

Page 39: Mendeley Suggest: Engineering a Personalised Article Recommender System

www.mendeley.com