Transcript
Page 1: Mendeley Suggest: Engineering a Personalised Article Recommender System

Mendeley Suggest: Engineering a

Personalised Article Recommender System

Kris Jack, PhDChief Data Scientist

https://twitter.com/_krisjack

Page 2: Mendeley Suggest: Engineering a Personalised Article Recommender System

➔ What's Mendeley?

➔ What's Mendeley Suggest?

➔ Computation Layer

➔ Serving Layer➔ Architecture➔ Technologies➔ Deployment

➔ Conclusions

Overview

Page 3: Mendeley Suggest: Engineering a Personalised Article Recommender System

What's Mendeley?

Page 4: Mendeley Suggest: Engineering a Personalised Article Recommender System

➔ Mendeley is a platform that connects researchers, research data and apps

Mendeley Open API

Page 5: Mendeley Suggest: Engineering a Personalised Article Recommender System

➔ Mendeley is a platform that connects researchers, research data and apps

➔ Startup company with ~20 R&D engineers

Mendeley Open API

Page 6: Mendeley Suggest: Engineering a Personalised Article Recommender System

What's Mendeley Suggest?

Page 7: Mendeley Suggest: Engineering a Personalised Article Recommender System

Use Case

➔ Good researchers are on top of their game➔ Difficult with the amount being produced

➔ There must be a technology that can help

➔ Help researchers by recommending relevant research

Page 8: Mendeley Suggest: Engineering a Personalised Article Recommender System

Mendeley Suggest

Page 9: Mendeley Suggest: Engineering a Personalised Article Recommender System

Computation Layer

Page 10: Mendeley Suggest: Engineering a Personalised Article Recommender System

Mendeley Suggest

Page 11: Mendeley Suggest: Engineering a Personalised Article Recommender System

Mendeley Suggest

Page 12: Mendeley Suggest: Engineering a Personalised Article Recommender System

Mendeley Suggest

Page 13: Mendeley Suggest: Engineering a Personalised Article Recommender System

Running on Amazon's Elastic Map Reduce

On demand use and easy to cost

Page 14: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 15: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 16: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 17: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 18: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

3

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 19: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

3

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 20: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

3

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 21: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

3

-4.1K(63%)

Mahout'sPerformance

ParitionersMR allocation

1.5M Users, 50M ArticlesComputation Layer

Page 22: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

3

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 23: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

Orig. user-based➔1K, 2.5

3

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 24: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

Orig. user-based➔1K, 2.5

3

-1.4K(58%)

+1 (67%)

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 25: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

Orig. user-based➔1K, 2.5

3

Cust. user-based➔0.3K, 2.5

Mahout'sPerformance

1.5M Users, 50M ArticlesComputation Layer

Page 26: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

Orig. user-based➔1K, 2.5

3

Cust. user-based➔0.3K, 2.5

-0.7K(70%)

Mahout'sPerformance

-4.1K(63%)

1.5M Users, 50M ArticlesComputation Layer

Page 27: Mendeley Suggest: Engineering a Personalised Article Recommender System

Nor

ma l

ised

Am

azon

Ho u

rs

No. Good Recommendations/10

0

1K

2K

3K

4K

5K

6K

7K

0 0.5 1 1.5 2 2.5

Costly & Bad Costly & Good

Cheap & Bad Cheap & Good

6.5K, 1.5Orig. item-based

Cust. item-based➔2.4K, 1.5

Orig. user-based➔1K, 2.5

3

Cust. user-based➔0.3K, 2.5

-6.2K(95%)

Mahout'sPerformance

+1 (67%)

1.5M Users, 50M ArticlesComputation Layer

Page 28: Mendeley Suggest: Engineering a Personalised Article Recommender System

Mahout as the Computation Layer

➔ Out of the box, didn't work so well for us➔ Needed to understand Hadoop better➔ Contributed patch back to community (user-user)

➔ Next step, the serving layer...

Page 29: Mendeley Suggest: Engineering a Personalised Article Recommender System

Serving Layer

Page 30: Mendeley Suggest: Engineering a Personalised Article Recommender System

MendeleyHadoopCluster

UserLibraries

Cascading

Architecture

ComputationLayer

Page 31: Mendeley Suggest: Engineering a Personalised Article Recommender System

AWS

MendeleyHadoopCluster

DynamoDB ElasticBeanstalk

ElasticBeanstalk

ElasticBeanstalk

UserLibraries

Map Reduce

Architecture

ComputationLayer

ServingLayer

Page 32: Mendeley Suggest: Engineering a Personalised Article Recommender System

➔ Spring dependency injection framework➔ Context-wide integration testing is easy, including pre-loading

of test data➔ Allows other Spring features (cache, security, messaging)

➔ Spring MVC 3.2.M1➔ Annotated controllers, type conversion 'for free'➔ Asynchronous Servlet 3.0 supports thread 'parking'

➔ AlternatorDB➔ In-memory DynamoDB implementation for testing

Technologies

Page 33: Mendeley Suggest: Engineering a Personalised Article Recommender System

Recommendation<K>

LongRecommendation UuidRecommendation

DocumentRecommendationGroupRecommendation PersonRecommendation

➔ Build once, employ in several use cases

Technologies

Page 34: Mendeley Suggest: Engineering a Personalised Article Recommender System

➔ AWS ElasticBeanstalk➔ Managed, auto-scaling, health-checking .war container

➔ Jenkins continuous integration (CI) server➔ Maven build tool (useful dependency management)➔ beanstalk-maven-plugin (push a button to deploy)

➔ Deploys to ElasticBeanstalk➔ Replaces existing application version if required➔ 'Zero downtime' updates (tested at ~300ms)➔ Triggered by Jenkins

Deployment

Page 35: Mendeley Suggest: Engineering a Personalised Article Recommender System

Putting it all together... $$$

➔ Real-time article recommendations for 2 million users➔ 20 requests per second➔ $65.84/month

➔ $34.24 ElasticBeanstalk➔ $28.17 DynamoDB➔ $2.76 bandwidth

➔ $30 to update the computation layer periodically

Page 36: Mendeley Suggest: Engineering a Personalised Article Recommender System

Conclusions

Page 37: Mendeley Suggest: Engineering a Personalised Article Recommender System

Conclusions

➔ Mendeley Suggest is a personalised article recommender➔ Built by small team for big data➔ Uses Mahout as computation layer

➔ Needs some love out of the box➔ Serves from AWS

➔ Reduces maintenance costs and is reliable➔ Intend to release Mendeley Suggest to all users this year

Page 38: Mendeley Suggest: Engineering a Personalised Article Recommender System

We're Hiring!

➔ Data Scientist➔ apply recommender technologies to Mendeley's data

➔ work on improving the quality of Mendeley's research catalogue

➔ starting in first quarter of 2013

➔ 6 month secondment in KNOW Center, TU Graz, Austria as part of the EC FP7 TEAM project (http://team-project.tugraz.at/)

➔ http://www.mendeley.com/careers/

Page 39: Mendeley Suggest: Engineering a Personalised Article Recommender System

www.mendeley.com


Recommended