Mendeley: Recommendation Systems for Academic Literature

Mendeley:Recommendation

Systems for AcademicLiterature

Kris Jack, PhDData Mining Team Lead

“All the time we are very conscious of the huge challenges that human society has now – curing cancer, understanding the brain for Alzheimer‘s [...].

But a lot of the state of knowledge of the human race is sitting in the scientists’ computers, and is currently not shared […] We need to get it unlocked so we can tackle those huge problems.“

➔ what's a recommender and what does it look like?

➔ what's Mendeley?

➔ the secrets behind recommenders

➔ recommenders @ Mendeley

Overview

What's a recommender and

what does it look like?

Definition:

A recommendation system (recommender) is a subclass of information filtering system that aims to predict a user's interest in items.

What's a recommender?

Recommendation Systems in the Wild

Recommendation Vs. Search

➔ search is a pull strategyvs.

➔ recommendation is a push strategy

search is like following a path...

recommendation is like being on a roller coaster...

A differentsense ofcontrol

What's Mendeley?

...a large data technology startup company

...and it's on a mission to change the way that

research is done!

What is Mendeley?

works like this:

1) Install “Audioscrobbler”

2) Listen to music

3) Last.fm builds your music profile and recommends you music you also could like... and it’s the world‘s biggest open music database

Last.fmMendeley

research libraries

researchers

papers

disciplines

music libraries

artists

genres

Last.fmMendeley

...organise their research

Mendeley provides tools to help users...

...collaborate with one another

Tools of scientific discovery

Clean energyClean water

Sustainable food supplies

Pandemic diseases

Terrorist violence

Climate change

US National Academy of Engineering “Grand Challenges”:

Artificial Intelligence

...discover new research

1.4 million+ users; the 20 largest userbases:

University of CambridgeStanford University

MITUniversity of Michigan

Harvard UniversityUniversity of OxfordSao Paulo University

Imperial College LondonUniversity of Edinburgh

Cornell UniversityUniversity of California at Berkeley

RWTH AachenColumbia University

Georgia TechUniversity of Wisconsin

UC San DiegoUniversity of California at LA

University of FloridaUniversity of North Carolina

Real-time data on 28m unique papers:

Thomson Reuters’ Web of Knowledge(dating from 1934)

Mendeley after 16 months:

Q1/2: How can a tool generate recommendations?

Q2/2: How can you measure the tool's performance?

The secrets behind recommenders

Q1/2: How can a tool generate recommendations?

Content-based Filtering Collaborative Filtering

Find items with similar characteristics (e.g. title, discipline) to what the user previously liked

Find items that users who are similar to you also liked (wisdom of the crowds)

TF-IDF, BM25, Bayesian classifiers, decision trees, artificial neural networks

User-based and item-based variations, matrix factorisation

Quickly absorbs new items (ovecomes cold start problem)

No need to understand item characteristics

Can make good recommendations from very few examples

Tends to give more novel recommendations

Hybrid tools too...

Q2/2: How can you measure the tool's performance?

➔ Cross validation with hold outs➔ get yourself a good ground truth➔ hide a fraction of your data from the system➔ try to predict the hidden fraction from the

remaining data➔ calculate precision and recall

➔ Let users decide➔ set up evaluations with real users (experimental)➔ track tool usage by users

2) Personalised Recommendations● given a user's profile (e.g. interests)● find new articles of interest to them

1) Related Research● given 1 research article● find other related articles

Recommenders@ Mendeley

Use Case 1: Related Research

Strategy

content-based approach (tf-idf with lucene implementation)search for articles with same metadata (e.g. title, tags)

Evaluation

cross-validation with hold outs on a ground truth data set

Q2/2 What are our results?

tag abstract mesh-term title general-keyword author keyword0

tf-idf Precision per Field when Field is Available

metadata field

Results 1) tags are the most informative field for finding related research

tag bestCombo abstract mesh-term title general-keyword author keyword0

tf-idf Precision for Field Combos when Field is Available

metadata field(s)

abstract+author+general-keyword+tag+title

Results 2) tags outperform combinations of fields

How does Mendeley use recommendation

technologies?

Personalised Recommendations

2) Personalised Recommendations● given a user's profile (e.g. interests)● find new articles of interest to them

Use Case 2: Perso Recommendations

Strategy

collaborative filtering (item-based with apache mahout)recommend articles to researchers that would interest them

Evaluation

Use Case 2: Perso Recommendations

Strategy

collaborative filtering (item-based with apache mahout)recommend articles to researchers that would interest them

Evaluation

Output:Recommend 10 articles to each user

Input:User libraries

16 months ago

Test:10-fold cross validation50,000 user libraries

Results:<0.025 precision at 10

10 months ago (i.e. + 6 months)

Results:~0.1 precision at 10

Test:Release to a subset of users

10 months ago (i.e. + 6 months)

Results:~0.4 precision at 10

Article Recommendation Acceptance RatesA

Number of months live

Number of articles in user library

Precision by Library Size

So, results comparable to non-distributed recommender

Completely distributed, so can easily run on EC2 within 24 hours...

➔ Recommendations can be complementary to search

➔ They can help users to discover interesting items

➔ They can exploit item metadata (content-based)

➔ They can exploit the 'wisdom of the crowds' (CF)

SummaryConclusions

➔ Crowd-sourced metadata can have a poweful informative value (e.g. article tags)

➔ Sometimes you need to let data grow

➔ Evaluations under lab conditions don't always predict real world results well

➔ Recommenders don't just have to be about making money … remember where we started...?

SummaryConclusions

“All the time we are very conscious of the huge challenges that human society has now – curing cancer, understanding the brain for Alzheimer‘s [...].

But a lot of the state of knowledge of the human race is sitting in the scientists’ computers, and is currently not shared […] We need to get it unlocked so we can tackle those huge problems.“

www.mendeley.com

Mendeley: Recommendation Systems for Academic Literature

Technology

HBase at Mendeley

Mendeley Seminar

Mendeley Quick Guide for Students - Oakland University...Mendeley – Summary Guide to the Literature Manager Registering for a Free Online Account To register for a free account,

Mendeley Teaching Presentation

Mendeley Presentation

Mendeley for Librarians

MENDELEY USING GUIDE - Technion Library · CITATION TOOLS MENDELEY USING GUIDE CONTENT Gain access Mendeley Institutional account Create new Mendeley Institutional account Upgrade

Mendeley teaching presentation. Presentación de Mendeley

Guide Mendeley

Mendeley - Introduction

Mendeley at ALDinHE

Zotero & Mendeley

Machine Learning @ Mendeley

A prototype WWW literature recommendation system for digital libraries

Mendeley lite: la app móvil del gestor de referencias Mendeley

Mendeley introductie

MANAGING & CITING YOUR SOURCES USING MENDELEY (…library.unisel.edu.my/documents/10180/10507/MENDELEY.pdf · New research, Recommendations + Impact DISCOVER: Literature search in

Mastering mendeley

Mendeley (new)

MENDELEY - CORE