View
1.457
Download
2
Category
Tags:
Preview:
DESCRIPTION
I gave this talk to an MSc class about Semantic Technologies at the Technical University of Graz (TUG) on 2012/01/12. It presents what recommendation systems are and how they are often used before delving into how they are used at Mendeley. Real-world results from Mendeley’s article recommendation system are also presented. The work presented here has been partially funded by the European Commission as part of the TEAM IAPP project (grant no. 251514) within the FP7 People Programme (Marie Curie).
Citation preview
Mendeley:Recommendation
Systems for AcademicLiterature
Kris Jack, PhDData Mining Team Lead
“All the time we are very conscious of the huge challenges that human society has now – curing cancer, understanding the brain for Alzheimer‘s [...].
But a lot of the state of knowledge of the human race is sitting in the scientists’ computers, and is currently not shared […] We need to get it unlocked so we can tackle those huge problems.“
➔ what's a recommender and what does it look like?
➔ what's Mendeley?
➔ the secrets behind recommenders
➔ recommenders @ Mendeley
Overview
What's a recommender and
what does it look like?
Definition:
A recommendation system (recommender) is a subclass of information filtering system that aims to predict a user's interest in items.
What's a recommender?
Recommendation Systems in the Wild
Recommendation Vs. Search
➔ search is a pull strategyvs.
➔ recommendation is a push strategy
Recommendation Vs. Search
search is like following a path...
Recommendation Vs. Search
recommendation is like being on a roller coaster...
A differentsense ofcontrol
What's Mendeley?
...a large data technology startup company
...and it's on a mission to change the way that
research is done!
What is Mendeley?
works like this:
1) Install “Audioscrobbler”
2) Listen to music
3) Last.fm builds your music profile and recommends you music you also could like... and it’s the world‘s biggest open music database
Last.fmMendeley
research libraries
researchers
papers
disciplines
music libraries
artists
songs
genres
Last.fmMendeley
...organise their research
Mendeley provides tools to help users...
...organise their research
...organise their research
...collaborate with one another
Mendeley provides tools to help users...
...organise their research
Tools of scientific discovery
Clean energyClean water
Sustainable food supplies
Pandemic diseases
Terrorist violence
Climate change
US National Academy of Engineering “Grand Challenges”:
Artificial Intelligence
...organise their research
...collaborate with one another
...discover new research
Mendeley provides tools to help users...
...organise their research
...organise their research
...collaborate with one another
...discover new research
Mendeley provides tools to help users...
...organise their research
1.4 million+ users; the 20 largest userbases:
University of CambridgeStanford University
MITUniversity of Michigan
Harvard UniversityUniversity of OxfordSao Paulo University
Imperial College LondonUniversity of Edinburgh
Cornell UniversityUniversity of California at Berkeley
RWTH AachenColumbia University
Georgia TechUniversity of Wisconsin
UC San DiegoUniversity of California at LA
University of FloridaUniversity of North Carolina
Real-time data on 28m unique papers:
Thomson Reuters’ Web of Knowledge(dating from 1934)
Mendeley after 16 months:
50m
Q1/2: How can a tool generate recommendations?
Q2/2: How can you measure the tool's performance?
The secrets behind recommenders
Q1/2: How can a tool generate recommendations?
Content-based Filtering Collaborative Filtering
Find items with similar characteristics (e.g. title, discipline) to what the user previously liked
Find items that users who are similar to you also liked (wisdom of the crowds)
TF-IDF, BM25, Bayesian classifiers, decision trees, artificial neural networks
User-based and item-based variations, matrix factorisation
Quickly absorbs new items (ovecomes cold start problem)
No need to understand item characteristics
Can make good recommendations from very few examples
Tends to give more novel recommendations
Hybrid tools too...
Q2/2: How can you measure the tool's performance?
➔ Cross validation with hold outs➔ get yourself a good ground truth➔ hide a fraction of your data from the system➔ try to predict the hidden fraction from the
remaining data➔ calculate precision and recall
➔ Let users decide➔ set up evaluations with real users (experimental)➔ track tool usage by users
2) Personalised Recommendations● given a user's profile (e.g. interests)● find new articles of interest to them
1) Related Research● given 1 research article● find other related articles
Recommenders@ Mendeley
Use Case 1: Related Research
Strategy
content-based approach (tf-idf with lucene implementation)search for articles with same metadata (e.g. title, tags)
Evaluation
cross-validation with hold outs on a ground truth data set
Use Case 1: Related Research
Q2/2 What are our results?
tag abstract mesh-term title general-keyword author keyword0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
tf-idf Precision per Field when Field is Available
metadata field
Pre
cisi
on
@ 5
Results 1) tags are the most informative field for finding related research
Use Case 1: Related Research
tag bestCombo abstract mesh-term title general-keyword author keyword0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
tf-idf Precision for Field Combos when Field is Available
metadata field(s)
pre
cisi
on
@ 5
abstract+author+general-keyword+tag+title
Results 2) tags outperform combinations of fields
How does Mendeley use recommendation
technologies?
Personalised Recommendations
2/2
2) Personalised Recommendations● given a user's profile (e.g. interests)● find new articles of interest to them
Use Case 2: Perso Recommendations
Strategy
collaborative filtering (item-based with apache mahout)recommend articles to researchers that would interest them
Evaluation
cross-validation with hold outs on a ground truth data set
Use Case 2: Perso Recommendations
Strategy
collaborative filtering (item-based with apache mahout)recommend articles to researchers that would interest them
Evaluation
cross-validation with hold outs on a ground truth data set
Output:Recommend 10 articles to each user
Input:User libraries
16 months ago
Test:10-fold cross validation50,000 user libraries
Results:<0.025 precision at 10
Test:10-fold cross validation50,000 user libraries
10 months ago (i.e. + 6 months)
Results:~0.1 precision at 10
Test:Release to a subset of users
10 months ago (i.e. + 6 months)
Results:~0.4 precision at 10
Article Recommendation Acceptance RatesA
ccep
tan
ce r
ate
(i.e
. acc
ept/
reje
ct c
l ick
s)
Number of months live
Pre
cis i
on a
t 10
art
icle
s
Number of articles in user library
Precision by Library Size
Test:10-fold cross validation50,000 user libraries
So, results comparable to non-distributed recommender
Completely distributed, so can easily run on EC2 within 24 hours...
➔ Recommendations can be complementary to search
➔ They can help users to discover interesting items
➔ They can exploit item metadata (content-based)
➔ They can exploit the 'wisdom of the crowds' (CF)
SummaryConclusions
➔ Crowd-sourced metadata can have a poweful informative value (e.g. article tags)
➔ Sometimes you need to let data grow
➔ Evaluations under lab conditions don't always predict real world results well
➔ Recommenders don't just have to be about making money … remember where we started...?
SummaryConclusions
“All the time we are very conscious of the huge challenges that human society has now – curing cancer, understanding the brain for Alzheimer‘s [...].
But a lot of the state of knowledge of the human race is sitting in the scientists’ computers, and is currently not shared […] We need to get it unlocked so we can tackle those huge problems.“
www.mendeley.com
Recommended